Use Ollama Local LLMs with BrainSoup
Introduction
BrainSoup can use Ollama models as local LLMs, so your agents can run on your own computer instead of sending every prompt to a cloud provider. Install Ollama, download a model, then select that model in BrainSoup's AI settings.
After Ollama is connected, tune num_ctx and context length in the Ollama model optimization guide for BrainSoup agents. If you prefer a desktop interface with a local inference server instead of the Ollama CLI, follow the LM Studio local inference server setup for BrainSoup.
What is Ollama?
Ollama is a tool for running, creating, and sharing LLMs locally. It supports many model families, including Llama, Mistral, Phi, Qwen, DeepSeek, and others available from the Ollama library. Ollama manages models through a command-line interface and exposes a local API that BrainSoup can use.
Step 1: Downloading and Installing Ollama
- Visit the Ollama official website to download the latest version of the application suitable for your operating system.
- Follow the installation instructions provided on the website to set up Ollama on your machine.
Step 2: Downloading a Model with the Ollama CLI
- Open a terminal window on your machine.
- Use the following command to download a model from the Ollama repository (replace
<model_name>with the name of the model you wish to download):
ollama pull <model_name>
Tips:
- You can find a list of available models with the command
ollama list, or by visiting the Ollama library. - Some models are specifically optimized for certain domains or tasks, such as mathematics, programming, medical applications, role-playing, and more. By combining agents with different models, you can create your personalized team of experts.
- Start with a model that fits your hardware. Larger models and longer context windows need more memory and can be slower.
Step 3: Integrating Ollama with BrainSoup
Once Ollama is installed, BrainSoup can automatically detect it if both applications are on the same machine. This seamless integration allows all installed Ollama models to become instantly available within BrainSoup.
For Local Installation:
- BrainSoup detects Ollama automatically.
- All models managed by Ollama will be listed in BrainSoup under the AI providers section in settings.
For Remote Installation or Docker:
If you have Ollama installed on a different machine or within a Docker container, you need to specify the URL of your Ollama server:
- Navigate to the Settings screen in BrainSoup.
- Go to the AI providers section.
- Enter the URL of your remote Ollama server.
- Click on the Connect button to establish the connection.

Step 4: Getting Started with Local LLMs in BrainSoup
All the models managed by Ollama are now accessible within BrainSoup. You can select the desired model for your agents in their respective settings. For this, follow these steps:
- Open the agent settings by double-clicking on the agent's name in the left pane.
- In the AI settings section, select the desired model from the dropdown list.

Context Length and num_ctx
The context length is the number of tokens the model can keep in memory while generating a response. Agent workflows often need a larger context length than a simple chat because BrainSoup may include instructions, tool results, documents, working memory, and conversation history.
In Ollama, the num_ctx parameter controls the context window used by a model. For BrainSoup agents that use documents or tools, a context length of at least 8192 tokens is a practical starting point when your hardware and model support it. More demanding coding, research, or document-heavy workflows may benefit from a larger value.
To set this permanently for an Ollama model, create a custom Modelfile with a line such as:
PARAMETER num_ctx 8192
For the full command sequence, examples, and tradeoffs, follow Optimizing Ollama Models for BrainSoup.
Conclusion
Integrating local LLMs via Ollama offers unparalleled control over your data privacy and computational resources. With this setup, you're equipped to harness the capabilities of advanced language models while maintaining full ownership of your data and infrastructure.
Note: Most Ollama LLMs don't support function calls and are not multimodal, but your agent can still use tools, see images and listen to audio thanks to BrainSoup's ability to delegate these abilities to a more powerful LLM when needed. This multi-LLM cooperation is the cornerstone of BrainSoup, allowing you to leverage the strengths of different models without being limited by their individual capabilities.
Warning: Ollama context length defaults can vary by model, version, and hardware. For advanced BrainSoup scenarios, especially when the agent needs to access documents and use tools, set num_ctx explicitly instead of relying on the default.