Building Smarter AI: Introduction to SmolAgents and Agentic RAG
2025 is shaping up to be the year of AI agents. With advancements in multi-agent orchestration frameworks, we are witnessing a paradigm shift in how AI systems operate. Mark Zuckerberg has openly stated that Meta is moving towards AI agents taking on mid-senior engineering roles. Microsoft now owns three multi-agent orchestration frameworks — AutoGen, Magentic-One, and Tiny-Troupe — while OpenAI has introduced Swarm, and AWS has launched Multi-Agent Orchestrator alongside standalone frameworks like Langraph and CrewAI. Even Hugging Face has now entered the space with their latest offering called SmolAgents.
SmolAgents is another addition to the rapidly growing multi-agent framework ecosystem, but it brings its own unique approach. Unlike other frameworks, SmolAgents is designed to be lightweight and easy to use, making it a compelling choice for developers looking to build intelligent, autonomous AI applications without excessive complexity.
In this article, we will explore the fundamentals of SmolAgents and demonstrate how it can be leveraged to build agentic Retrieval-Augmented Generation (RAG) systems.
Getting Started
Table of contents
- What are Agents?
- What is smolAgents
- Key Features of SmolAgents
- Components of SmolAgents
- Experimenting with SmolAgents
- 1. Creating search agents
- 2. Accessing SmolAgents using Together
- 3. Creating task-oriented agents
- Building Agentic RAG with SmolAgents
- What is Agentic RAG?
- Installing dependencies
- Importing OpenAI Key
- Importing required libraries
- Loading and Splitting the PDF Document
- Generating and Searching Document Embeddings
- Defining a Retriever Tool
- Defining the agent
- Executing the agent
- Resources
What are Agents?
AI Agents are autonomous systems capable of executing tasks on behalf of users or other systems by designing workflows and utilizing external tools like web searches and coding utilities. At their core, AI Agents rely on large language models (LLMs) integrated with backend tools to provide real-time, actionable insights. They act as a bridge between LLMs and the external world, enabling decision-making and task execution based on LLM outputs. The level of agency an AI agent possesses depends on how much control the system grants to the LLM. This agency exists on a spectrum, ranging from minimal influence on near-complete autonomy, allowing for flexible implementation based on specific use cases.
What is smolAgents
SmolAgents, a newly launched framework by Hugging Face, simplifies the creation of intelligent AI agents powered by large language models (LLMs). This lightweight library enables developers to build and integrate agents with minimal code, focusing on practicality and ease of use. With around a thousand lines of core logic, SmolAgents provides a streamlined interface for tasks like web searches and data retrieval while maintaining simplicity and minimal abstraction.
Key Features of SmolAgents
- Code-Centric Agents: Code Agents execute actions directly as Python code, offering greater accuracy and efficiency compared to traditional tool-based agents. Using code for tool actions is superior to JSON snippets because programming languages are designed for efficient execution, while JSON is merely a data format. Code allows composability through nesting, abstraction, and reuse, making complex workflows more manageable. It also supports object management, enabling seamless handling of outputs like generated images — something JSON lacks. Additionally, code provides greater flexibility, as it can express any computational task, whereas JSON is restricted to predefined structures.
- Local Python Interpreter: The CodeAgent executes LLM-generated code within a secure, custom environment using a purpose-built local python interpreter instead of the default Python interpreter. This ensures safety through controlled imports, allowing only user-authorized libraries, strict operation limits to prevent infinite loops or excessive resource usage, and execution restrictions to predefined actions, ensuring no unexpected or unsafe code is run.
- E2B Code Executor: To enhance security, SmolAgents leverages E2B, a remote execution service that runs code in a sandboxed environment. This ensures all code is executed within an isolated container, safeguarding the local environment and providing strong protection.
from smolagents import CodeAgent, VisitWebpageTool, HfApiModel
agent = CodeAgent(
tools = [VisitWebpageTool()],
model=HfApiModel(),
additional_authorized_imports=["requests", "markdownify"],
use_e2b_executor=True
)
agent.run("What was Abraham Lincoln's favourite sports?")
- LLM-Agnostic: Seamlessly integrates with any LLM on the Hugging Face Hub and other popular models via LiteLLM.
- Autonomous Code Execution: Specializes in “Code Agents” that generate and execute code securely in sandboxed environments like E2B.
- Structured Tool Interaction: Uses a thought-action format for tool calls, improving structured output and API integration.
- Extensive Integrations: Supports multiple LLM providers and offers a shared tool repository on Hugging Face Hub for enhanced flexibility.
- Adaptive Workflows: Empowers LLMs to define and control workflows dynamically, enabling complex task automation.
Components of SmolAgents
- LLM Core: Powers the agent’s decision-making and actions.
- Tool Repository: A predefined list of tools available for task execution.
- Parser: Extracts actionable information from the LLM’s outputs.
- System Prompt: Guides the agent with clear instructions, ensuring alignment with the parser.
- Memory: Maintains context across iterations, essential for multi-step agents.
- Error Logging and Retry Mechanisms: Improves system resilience and efficiency.
Experimenting with SmolAgents
In this section, we will demonstrate how to set up and use a lightweight AI agent for task automation. The agent utilizes a large language model (LLM) alongside a search tool to handle tasks that require both computational reasoning and external data retrieval. By configuring different models and tools, the agent becomes highly adaptable for various applications, including research, content generation, and question-answering.
Let’s begin our exploration of SmolAgents with hands-on demonstrations.
1. Creating search agents
In this example, the agent is tasked with retrieving information about Microsoft, highlighting its ability to integrate AI reasoning with real-time web data.
Installing dependencies
- Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
- Install
smolagents
andpython-dotenv
libraries using pip.
pip install smolagents python-dotenv
- Create a file named
app.py
and use the code below to access the SmolAgents search tool.
from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel
# Choose which LLM engine to use
model = LiteLLMModel(model_id="gpt-4o", api_key = "YOUR_OPENAI_API_KEY")
# Create a code agent
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("Tell me about Microsoft")
- Running the app will generate the following output:
2. Accessing SmolAgents using Together
SmolAgents supports model access from various providers, including OpenAI, Hugging Face, Together, and LiteLLM.
If you have a Together account and plan to use its API keys, follow these steps:
- Visit Together.AI and sign in to your account or create a new one if needed.
- Go to the Models section to explore different open-source models.
- Navigate to the Dashboard to collect your API key.
- Use the following code to use DeepSeek-R1 model from Together.
from smolagents import CodeAgent, DuckDuckGoSearchTool, OpenAIServerModel
# Choose Together based DeepSeek model to use
model = OpenAIServerModel(
model_id="deepseek-ai/DeepSeek-R1",
api_base="https://api.together.xyz/v1/", # Leave this blank to query OpenAI servers.
api_key="YOUR_TOGETHER_API_KEY", # Switch to the API key for the server you're targeting.
)
# Create a code agent
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("Tell me about Microsoft")
- Running the app will generate the following output:
3. Creating task oriented agents
The agent utilizes an LLM backend to process natural language queries, identify the appropriate tools (such as DuckDuckGoSearchTool or yfinance), and execute tasks within a secure, controlled environment. This setup offers flexibility in switching between different LLMs and integrating external tools, making SmolAgents a powerful framework for automating diverse workflows.
- Install
yfinance
using pip.
pip install yfinance
- Create a file named app.py and add the following code to it.
from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel
import yfinance as yf
# Initialize the LLM model with the corrected string for the API key
model = LiteLLMModel(model_id="gpt-4o", api_key="Your_OPENAI_API_KEY")
# Define the agent with tools and imports
agent = CodeAgent(
tools=[DuckDuckGoSearchTool()],
additional_authorized_imports=["yfinance"],
model=model
)
# Run the agent to fetch the stock price of Apple Inc.
response = agent.run(
"Fetch the stock price of Microsoft Inc (NASDAQ: AAPL). Use the YFinance Library."
)
# Output the response
print(response)
- Running the app will generate the following output:
Building Agentic RAG with SmolAgents
Building an Agentic Retrieval-Augmented Generation (RAG) system with SmolAgents enables the creation of AI agents that can autonomously make decisions and execute tasks. In this section, we will walk through the step-by-step process to construct an Agentic RAG using SmolAgents.
What is Agentic RAG?
Agentic Retrieval-Augmented Generation (Agentic RAG) is an advanced AI framework that enhances traditional RAG systems by incorporating autonomous decision-making and dynamic task execution.
In a standard RAG system, an AI model retrieves relevant information from external sources (e.g., databases, documents, or web searches) and then generates responses based on that data. However, Agentic RAG goes a step further by leveraging AI agents that can:
- Autonomously plan multi-step retrieval and reasoning workflows.
- Adapt dynamically to new information and modify queries accordingly.
- Utilize multiple tools, such as search APIs, vector databases, and LLM-powered reasoning, to improve accuracy.
With SmolAgents, an Agentic RAG system can coordinate multiple agents to retrieve, analyze, and synthesize information more effectively, making it ideal for applications like research assistants, automated customer support, and intelligent knowledge management.
To build an Agentic RAG system with SmolAgents, we start by processing a PDF document, splitting it into manageable chunks, and generating embeddings for semantic search. These embeddings are stored in a vector database, enabling efficient retrieval of relevant information. Additionally, a search agent is integrated to fetch external data when needed, ensuring the system provides comprehensive and accurate responses to user queries.
Installing dependencies
- Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
- Install
smolagents
,litellm
,pypdf
,faiss-cpu
,langchain-community
andpython-dotenv
libraries using pip.
pip install smolagents litellm pypdf faiss-cpu langchain-community langchain-openai python-dotenv
Importing OpenAI Key
Lets follow the steps to import OpenAI key in the code.
- Create a
.env
file and add your OpenAI API key to the.env
file as follows:
OPENAI_API_KEY=sk-proj-xcQxBf5LslO62AtawFQum9wM2NDrkJmdZfaoNfQIw...
- Import OpenAI key using the following code.
# Importing OpenAI key
import os
from dotenv import load_dotenv
load_dotenv()
openai_api_key = os.getenv('OPENAI_API_KEY')
Importing required libraries
Add the import statements for required modules for:
- Loading PDF documents using
PyPDFLoader
- Storing and searching vectors with
FAISS
- Generating embeddings via
OpenAIEmbeddings
- Splitting text into smaller chunks using
RecursiveCharacterTextSplitter
# Importing required libraries
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from smolagents import Tool
from smolagents import LiteLLMModel, DuckDuckGoSearchTool
Loading and Splitting the PDF Document
In this section, we use the Phi-3 Technical Report (2404.14219v4.pdf) as the reference document for the Retrieval-Augmented Generation (RAG) system.
The following code loads a PDF file, extracts its content, and splits it into smaller, manageable chunks to facilitate efficient processing.
- PyPDFLoader loads the PDF and extracts its pages as a list of document objects.
- RecursiveCharacterTextSplitter splits the document into smaller segments while maintaining contextual coherence:
chunk_size=1000
: Each chunk contains up to 1,000 characters.chunk_overlap=200
: Adjacent chunks overlap by 200 characters to preserve context.
This preprocessing step ensures that the text remains structured and searchable for downstream tasks like embedding generation and retrieval.
# Loading the PDF
loader = PyPDFLoader("2404.14219v4.pdf")
pages = loader.load()
for page in pages:
print(page.page_content)
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
# split_document accepts a list of documents
splitted_docs = splitter.split_documents(pages)
print(len(splitted_docs))
print(splitted_docs[0])
Generating and Searching Document Embeddings
The following code initializes the OpenAIEmbeddings model and generates numerical embeddings for each chunk of text. These embeddings capture the semantic meaning of the content, enabling efficient similarity searches.
- Embedding Generation: The
OpenAIEmbeddings
model converts each text chunk into a high-dimensional vector representation. - FAISS Vector Database: The FAISS library stores these embeddings and facilitates fast similarity searches.
- Similarity Search: The system retrieves the top 5 most relevant document chunks based on a given query.
- Retrieval Output: The most relevant chunk is printed, providing meaningful context for the query.
# Generate embeddings for the documents
embed_model = OpenAIEmbeddings(openai_api_key=openai_api_key)
embeddings = embed_model.embed_documents([chunk.page_content for chunk in splitted_docs])
print(f"Embeddings shape: {len(embeddings), len(embeddings[0])}")
vector_db = FAISS.from_documents(
documents = splitted_docs,
embedding = embed_model)
similar_docs = vector_db.similarity_search("What is Phi-3 training methodology", k=5)
print(similar_docs[0].page_content)
Defining a Retriever Tool
Define a custom RetrieverTool
is a custom tool that leverages the FAISS vector database to perform semantic searches. It accepts a query as input and retrieves the most relevant document chunks based on their embeddings.
class RetrieverTool(Tool):
name = "retriever"
description = "Uses semantic search to retrieve the parts of the documentation that could be most relevant to answer your query."
inputs = {
"query": {
"type": "string",
"description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
}
}
output_type = "string"
def __init__(self, vector_db, **kwargs): # Add vector_db as an argument
super().__init__(**kwargs)
self.vector_db = vector_db # Store the vector database
def forward(self, query: str) -> str:
assert isinstance(query, str), "Your search query must be a string"
docs = self.vector_db.similarity_search(query, k=4) # Perform search here
return "\nRetrieved documents:\n" + "".join(
[
f"\n\n===== Document {str(i)} =====\n" + doc.page_content
for i, doc in enumerate(docs)
]
)
retriever_tool = RetrieverTool(vector_db=vector_db) # Pass vector_db during instantiation
Defining the agent
The agent is initialized with a LiteLLMModel
as its language model and a DuckDuckGoSearchTool
for web-based queries. A CodeAgent
is then created, integrating both the RetrieverTool
and SearchTool
to effectively process and respond to user queries.
model = LiteLLMModel(model_id="gpt-4o", api_key = openai_api_key)
search_tool = DuckDuckGoSearchTool()
from smolagents import CodeAgent
agent = CodeAgent(
tools=[retriever_tool,search_tool], model=model, max_steps=6
)
Executing the agent
In SmolAgents, agent.run()
is the method used to execute an agent with a given input query. The agent receives the input and determines the necessary actions to fulfill the request.
agent_output = agent.run("Tell me about Microsoft")
print(agent_output)
agent_output = agent.run("What is Phi-3 training methodology")
print(agent_output)
agent_output = agent.run("Summarize technical specifications of Phi-3")
print(agent_output)
SmolAgents, combined with Agentic RAG, provides a powerful framework for building intelligent and autonomous AI systems. By leveraging SmolAgents’ lightweight design and Agentic RAG’s dynamic retrieval capabilities, developers can create adaptable, secure, and scalable AI agents. This synergy enables efficient handling of complex tasks, making it ideal for applications in research, decision-making, and automation.
Thanks for reading this article !!
Thanks Gowri M Bhatt for reviewing the content.
If you enjoyed this article, please click on the clap button 👏 and share to help others find it!
The full source code for this tutorial can be found here,