Ollama num_ctx and Context Length for BrainSoup
Use Ollama num_ctx when a BrainSoup agent needs a larger context length for documents, tools, code, or longer conversations. The context length is the number of tokens the model can keep in memory while generating an answer; if it is too small, the agent may lose relevant instructions, retrieved knowledge, or previous messages.
For most BrainSoup agents, PARAMETER num_ctx 8192 is a reasonable starting point. Use 16384 or 32768 only if your model and hardware can handle the extra memory and latency.
This article is about optimization. If you are setting up Ollama for the first time, start with the local LLM setup guide for Ollama in BrainSoup before tuning num_ctx and context length. If you are using LM Studio instead of Ollama, follow the LM Studio local inference server setup for BrainSoup.
Below is a step-by-step guide on how to modify an Ollama model's context window with the Ollama Modelfile parameter num_ctx .
Step 1: Retrieve the Model Configuration
First, we need to obtain the configuration file of the model you wish to modify. For example, to modify mistral , use the following command:
ollama show mistral --modelfile > conf.txt
This command exports the current configuration of mistral into a file named conf.txt .
Step 2: Modify the Configuration File
Open conf.txt in your preferred text editor and make the following changes:
- Add a new line with the Ollama Modelfile parameter for the context window size:
PARAMETER num_ctx 8192
- Find the line that starts with
FROMand replace it with:
FROM mistral:latest
This ensures that the new model will be based on mistral:latest . After making these changes, save and close the file.
Your edited Modelfile should contain lines similar to this:
FROM mistral:latest PARAMETER num_ctx 8192
Choosing the Right num_ctx Value
Use these values as practical starting points:
| num_ctx | Best for | Notes |
|---|---|---|
8192 |
General BrainSoup agents, light document use, tool use | Recommended starting point |
16384 |
Larger documents, longer conversations, coding assistance | Needs more memory |
32768 or more |
Heavy research, large codebases, long multi-step tasks | Use only with suitable models and hardware |
A larger context length is not always better. It can increase memory usage and slow down responses. If responses become too slow or Ollama fails to load the model, reduce num_ctx or choose a smaller quantized model.
Step 3: Create a New Model with Updated Configuration
With your modified configuration file ready, create a new local model named mistral-8K . Run the following command:
ollama create mistral-8K -f conf.txt
This command creates a new model based on your updated configuration.
Step 4: Select the Optimized Model in BrainSoup
After ollama create completes, BrainSoup can use the new model like any other Ollama model.
- Open BrainSoup.
- Open the agent settings by double-clicking the agent name in the left pane.
- In the AI settings section, select the new Ollama model, for example
mistral-8K. - Test the agent with the type of documents, tools, or conversations you expect to use.
Troubleshooting
The model is too slow
Lower num_ctx , choose a smaller model, or use a more compact quantization. Context length directly affects memory pressure and can affect speed.
The agent still forgets information
Check whether the needed information is actually available to the agent through BrainSoup knowledge, working memory, chat history, or tool output. num_ctx increases the available window, but it does not automatically add missing documents or context.
Ollama does not show the new model
Run ollama list to confirm that the created model exists. If needed, restart BrainSoup so it refreshes the available Ollama models.
Conclusion
You have successfully created a new local model named mistral-8K with an increased context window. This model is now available for use in Ollama and BrainSoup for enhanced performance during complex tasks.
Remember, adjusting the context window allows for more detailed interactions and processing but may also increase computational requirements. Always consider your specific needs and system capabilities when making such modifications.