Setting Up and Using Ollama: A Comprehensive Guide
I'll walk you through a complete tutorial on setting up and using Ollama for local LLM deployment, including installation, model management, command-line usage, and Python integration.
Ollama will be available as a background service after installation
2. Downloading and Managing Models
Once Ollama is installed, you can download models using the pull command.
Basic Model Download
12
# Pull the DeepSeek R1 model
ollamapulldeepseek-r1:7b
List Available Models
1
ollamalist
Output will look similar to:
123
NAME ID SIZE MODIFIED
deepseek-r1:latest 0a8c26691023 4.7 GB 3 weeks ago
deepseek-r1:7b 0a8c26691023 4.7 GB 3 weeks ago
Remove a Model
1
ollamarmdeepseek-r1:7b
Additional Popular Models
1234
# Pull other useful models
ollamapullllama3:8b
ollamapullmistral:7b
ollamapullgemma:7b
3. Using Ollama via Command Line
Basic Generation
1
ollamarundeepseek-r1:7b"What is a knowledge graph?"
Interactive Chat Session
1
ollamarundeepseek-r1:7b
This opens an interactive chat session where you can type prompts and get responses. Use Ctrl+D to exit.
Advanced Parameters
12
# Run with specific parameters
ollamarundeepseek-r1:7b--temperature0.7--top_p0.9
Creating Custom Model Versions with Modelfiles
Create a file named Modelfile:
123
FROM deepseek-r1:7b
SYSTEM "You are a helpful AI assistant specialized in Python programming."
PARAMETER temperature 0.7
Build your custom model:
1
ollamacreatecode-helper-fModelfile
Run your custom model:
1
ollamaruncode-helper"Write a function to calculate Fibonacci numbers"
4. Using Ollama from Python
Basic API Usage
1 2 3 4 5 6 7 8 910111213141516171819
importrequestsimportjson# Define the API endpointapi_url="http://localhost:11434/api/generate"# Configure the requestpayload={"model":"deepseek-r1:7b","prompt":"What are the key features of a code agent?","stream":False}# Make the API callresponse=requests.post(api_url,json=payload)result=response.json()# Print the responseprint(result['response'])
Streaming Responses
1 2 3 4 5 6 7 8 9101112131415161718192021
importrequestsimportjsonapi_url="http://localhost:11434/api/generate"payload={"model":"deepseek-r1:7b","prompt":"Explain knowledge representation in AI agents","stream":True}# Stream the responsewithrequests.post(api_url,json=payload,stream=True)asresponse:forlineinresponse.iter_lines():ifline:json_response=json.loads(line)if'response'injson_response:print(json_response['response'],end='',flush=True)# Check if this is the final responseifjson_response.get('done',False):print()# Add a newline at the end
importollama# Simple generationresponse=ollama.generate(model='deepseek-r1:7b',prompt='What is a knowledge graph in the context of AI?')print(response['response'])# Chat completion with historymessages=[{'role':'user','content':'What are the key components of a multi-agent system?'}]response=ollama.chat(model='deepseek-r1:7b',messages=messages)# Add the response to the conversationmessages.append({'role':'assistant','content':response['message']['content']})# Continue the conversationmessages.append({'role':'user','content':'How can these components communicate with each other?'})response=ollama.chat(model='deepseek-r1:7b',messages=messages)print(response['message']['content'])
Integration with SmolAgents Framework
Based on your documents, here's an example of how to integrate Ollama with the SmolAgents framework:
importsmolagentsfromsmolagents.modelsimportModelfromsmolagentsimportCodeAgent# Create a model using the local Ollama endpointollama_model=Model(name="ollama-deepseek",base_url="http://localhost:11434",api_type="ollama",model_name="deepseek-r1:7b",temperature=0.2)# Create tools for the agentdefsearch_documentation(query):""" Search documentation for the given query. Args: query (str): The search query Returns: str: Search results """# Implementation would go herereturnf"Documentation results for: {query}"# Create a code agent with the modelagent=CodeAgent(model=ollama_model,tools=[search_documentation],system_prompt="You are a helpful coding assistant that writes Python code.")# Use the agentresponse=agent.run("Create a function to parse CSV files using the pandas library")print(response)
5. Performance Optimization
GPU Acceleration
Ollama automatically uses GPU if available. To check if GPU is being utilized:
12345
# For NVIDIA GPUs
nvidia-smi
# For AMD GPUs
rocm-smi
Memory Requirements
As indicated in your course materials, models like DeepSeek R1:7b work best with GPUs that have at least 12GB of VRAM. Here are some compatible GPU models:
- GeForce RTX 4090 (24 GB)
- GeForce RTX 4080 (16 GB)
- GeForce RTX 3090 Ti (24 GB)
- GeForce RTX 3090 (24 GB)
- GeForce RTX 3080 Ti (12 GB)
- GeForce RTX 3080 (12 GB version)
Model Quantization
Ollama supports different quantization levels to reduce memory requirements:
12345
# Pull a quantized version
ollamapulldeepseek-r1:7b-q4_0
# List to see the size difference
ollamalist
importollamafromtypingimportDict,Any,Listdefcode_agent(prompt:str)->Dict[str,Any]:""" A simple code agent that leverages Ollama to generate Python code. Args: prompt (str): The coding task description Returns: Dict[str, Any]: Results including code and explanation """# Enhance the promptenhanced_prompt=f""" Write Python code for the following task:{prompt} Provide well-commented code with docstrings and explanations. """# Generate the coderesponse=ollama.generate(model='deepseek-r1:7b',prompt=enhanced_prompt,system="You are an expert Python programmer. Generate clean, efficient, and well-documented code.")# Extract and format the codecode_text=response['response']return{"code":code_text,"prompt":prompt,"model":"deepseek-r1:7b"}# Example usageresult=code_agent("Create a function to calculate the Fibonacci sequence up to n terms")print(result["code"])
importollamaimportjsonfromtypingimportDict,List,Any,CallableclassReActAgent:def__init__(self,model_name:str,tools:Dict[str,Callable]):self.model_name=model_nameself.tools=toolsself.messages=[]self.max_iterations=5defadd_system_message(self,content:str):self.messages.append({"role":"system","content":content})defadd_user_message(self,content:str):self.messages.append({"role":"user","content":content})defadd_assistant_message(self,content:str):self.messages.append({"role":"assistant","content":content})defrun(self,query:str)->str:self.add_user_message(query)for_inrange(self.max_iterations):# Get the next action from the modelresponse=ollama.chat(model=self.model_name,messages=self.messages)response_content=response['message']['content']self.add_assistant_message(response_content)# Check if the response contains a tool callif"Action:"inresponse_content:try:# Extract the action and argumentsaction_line=[lineforlineinresponse_content.split('\n')ifline.startswith('Action:')][0]action_name=action_line.replace('Action:','').strip()args_line=[lineforlineinresponse_content.split('\n')ifline.startswith('Action Input:')][0]args_text=args_line.replace('Action Input:','').strip()# Execute the toolifaction_nameinself.tools:tool_result=self.tools[action_name](args_text)self.add_user_message(f"Tool result: {tool_result}")else:self.add_user_message(f"Error: Tool '{action_name}' not found")exceptExceptionase:self.add_user_message(f"Error executing action: {str(e)}")else:# If no action is requested, return the final answerreturnresponse_contentreturn"Max iterations reached without final answer"# Example toolsdefsearch_web(query:str)->str:returnf"Search results for '{query}': [Sample results would appear here]"defget_current_weather(location:str)->str:returnf"Weather in {location}: 72°F, Sunny"# Create and use the agenttools={"search_web":search_web,"get_weather":get_current_weather}agent=ReActAgent(model_name="deepseek-r1:7b",tools=tools)agent.add_system_message("""You are a helpful assistant that can use tools to get information.When you need information, specify the tool to use with:Action: tool_nameAction Input: input for the toolWhen you have the final answer, provide it directly without using a tool.""")result=agent.run("What's the weather like in New York?")print(result)
7. Troubleshooting Common Issues
Connection Refused
If you see "Connection refused" errors when trying to use the API:
12345
# Check if Ollama service is running
psaux|grepollama
# Restart the service if needed
ollamaserve
High Memory Usage
If you're experiencing memory issues:
Use a more quantized model (e.g., q4_0 instead of q8_0)
Ollama provides a flexible and powerful way to run LLMs locally. With the DeepSeek R1:7b model mentioned in your course materials, you can achieve token generation rates of 50+ tokens per second with suitable hardware, making it viable for development and testing of intelligent software agents. The framework's integration capabilities with Python make it an excellent choice for building code agents, implementing ReAct patterns, and developing other advanced agent systems.