Ollama num_ctx and Context Length for BrainSoup

Use Ollama num_ctx  when a BrainSoup agent needs a larger context length for documents, tools, code, or longer conversations. The context length is the number of tokens the model can keep in memory while generating an answer; if it is too small, the agent may lose relevant instructions, retrieved knowledge, or previous messages.

For most BrainSoup agents, PARAMETER num_ctx 8192  is a reasonable starting point. Use 16384  or 32768  only if your model and hardware can handle the extra memory and latency.

This article is about optimization. If you are setting up Ollama for the first time, start with the local LLM setup guide for Ollama in BrainSoup before tuning num_ctx  and context length. If you are using LM Studio instead of Ollama, follow the LM Studio local inference server setup for BrainSoup.

Below is a step-by-step guide on how to modify an Ollama model's context window with the Ollama Modelfile parameter num_ctx .

Step 1: Retrieve the Model Configuration

First, we need to obtain the configuration file of the model you wish to modify. For example, to modify mistral  , use the following command:

ollama show mistral --modelfile > conf.txt

This command exports the current configuration of mistral  into a file named conf.txt  .

Step 2: Modify the Configuration File

Open conf.txt  in your preferred text editor and make the following changes:

  1. Add a new line with the Ollama Modelfile parameter for the context window size:
PARAMETER num_ctx 8192
  1. Find the line that starts with FROM  and replace it with:
FROM mistral:latest

This ensures that the new model will be based on mistral:latest . After making these changes, save and close the file.

Your edited Modelfile should contain lines similar to this:

FROM mistral:latest
PARAMETER num_ctx 8192

Choosing the Right num_ctx Value

Use these values as practical starting points:

num_ctx Best for Notes
8192 General BrainSoup agents, light document use, tool use Recommended starting point
16384 Larger documents, longer conversations, coding assistance Needs more memory
32768  or more Heavy research, large codebases, long multi-step tasks Use only with suitable models and hardware

A larger context length is not always better. It can increase memory usage and slow down responses. If responses become too slow or Ollama fails to load the model, reduce num_ctx  or choose a smaller quantized model.

Step 3: Create a New Model with Updated Configuration

With your modified configuration file ready, create a new local model named mistral-8K  . Run the following command:

ollama create mistral-8K -f conf.txt

This command creates a new model based on your updated configuration.

Step 4: Select the Optimized Model in BrainSoup

After ollama create  completes, BrainSoup can use the new model like any other Ollama model.

  1. Open BrainSoup.
  2. Open the agent settings by double-clicking the agent name in the left pane.
  3. In the AI settings section, select the new Ollama model, for example mistral-8K .
  4. Test the agent with the type of documents, tools, or conversations you expect to use.

Troubleshooting

The model is too slow

Lower num_ctx , choose a smaller model, or use a more compact quantization. Context length directly affects memory pressure and can affect speed.

The agent still forgets information

Check whether the needed information is actually available to the agent through BrainSoup knowledge, working memory, chat history, or tool output. num_ctx  increases the available window, but it does not automatically add missing documents or context.

Ollama does not show the new model

Run ollama list  to confirm that the created model exists. If needed, restart BrainSoup so it refreshes the available Ollama models.

Conclusion

You have successfully created a new local model named mistral-8K  with an increased context window. This model is now available for use in Ollama and BrainSoup for enhanced performance during complex tasks.

Remember, adjusting the context window allows for more detailed interactions and processing but may also increase computational requirements. Always consider your specific needs and system capabilities when making such modifications.