Building Smarter AI: Introduction to SmolAgents and Agentic RAG

12 min readFeb 19, 2025

2025 is shaping up to be the year of AI agents. With advancements in multi-agent orchestration frameworks, we are witnessing a paradigm shift in how AI systems operate. Mark Zuckerberg has openly stated that Meta is moving towards AI agents taking on mid-senior engineering roles. Microsoft now owns three multi-agent orchestration frameworks — AutoGen, Magentic-One, and Tiny-Troupe — while OpenAI has introduced Swarm, and AWS has launched Multi-Agent Orchestrator alongside standalone frameworks like Langraph and CrewAI. Even Hugging Face has now entered the space with their latest offering called SmolAgents.

SmolAgents is another addition to the rapidly growing multi-agent framework ecosystem, but it brings its own unique approach. Unlike other frameworks, SmolAgents is designed to be lightweight and easy to use, making it a compelling choice for developers looking to build intelligent, autonomous AI applications without excessive complexity.

In this article, we will explore the fundamentals of SmolAgents and demonstrate how it can be leveraged to build agentic Retrieval-Augmented Generation (RAG) systems.

Getting Started

What are Agents?
What is smolAgents
Key Features of SmolAgents
Components of SmolAgents
Experimenting with SmolAgents
1. Creating search agents
2. Accessing SmolAgents using Together
3. Creating task-oriented agents
Building Agentic RAG with SmolAgents
What is Agentic RAG?
Installing dependencies
Importing OpenAI Key
Importing required libraries
Loading and Splitting the PDF Document
Generating and Searching Document Embeddings
Defining a Retriever Tool
Defining the agent
Executing the agent
Resources

What are Agents?

AI Agents are autonomous systems capable of executing tasks on behalf of users or other systems by designing workflows and utilizing external tools like web searches and coding utilities. At their core, AI Agents rely on large language models (LLMs) integrated with backend tools to provide real-time, actionable insights. They act as a bridge between LLMs and the external world, enabling decision-making and task execution based on LLM outputs. The level of agency an AI agent possesses depends on how much control the system grants to the LLM. This agency exists on a spectrum, ranging from minimal influence on near-complete autonomy, allowing for flexible implementation based on specific use cases.

What is smolAgents

SmolAgents, a newly launched framework by Hugging Face, simplifies the creation of intelligent AI agents powered by large language models (LLMs). This lightweight library enables developers to build and integrate agents with minimal code, focusing on practicality and ease of use. With around a thousand lines of core logic, SmolAgents provides a streamlined interface for tasks like web searches and data retrieval while maintaining simplicity and minimal abstraction.

Key Features of SmolAgents

Code-Centric Agents: Code Agents execute actions directly as Python code, offering greater accuracy and efficiency compared to traditional tool-based agents. Using code for tool actions is superior to JSON snippets because programming languages are designed for efficient execution, while JSON is merely a data format. Code allows composability through nesting, abstraction, and reuse, making complex workflows more manageable. It also supports object management, enabling seamless handling of outputs like generated images — something JSON lacks. Additionally, code provides greater flexibility, as it can express any computational task, whereas JSON is restricted to predefined structures.

Source: Actions: Enabling the Agent to Engage with Its Environment — Hugging Face Agents Course

Local Python Interpreter: The CodeAgent executes LLM-generated code within a secure, custom environment using a purpose-built local python interpreter instead of the default Python interpreter. This ensures safety through controlled imports, allowing only user-authorized libraries, strict operation limits to prevent infinite loops or excessive resource usage, and execution restrictions to predefined actions, ensuring no unexpected or unsafe code is run.
E2B Code Executor: To enhance security, SmolAgents leverages E2B, a remote execution service that runs code in a sandboxed environment. This ensures all code is executed within an isolated container, safeguarding the local environment and providing strong protection.

from smolagents import CodeAgent, VisitWebpageTool, HfApiModel
agent = CodeAgent(
    tools = [VisitWebpageTool()],
    model=HfApiModel(),
    additional_authorized_imports=["requests", "markdownify"],
    use_e2b_executor=True
)
agent.run("What was Abraham Lincoln's favourite sports?")

LLM-Agnostic: Seamlessly integrates with any LLM on the Hugging Face Hub and other popular models via LiteLLM.
Autonomous Code Execution: Specializes in “Code Agents” that generate and execute code securely in sandboxed environments like E2B.
Structured Tool Interaction: Uses a thought-action format for tool calls, improving structured output and API integration.
Extensive Integrations: Supports multiple LLM providers and offers a shared tool repository on Hugging Face Hub for enhanced flexibility.
Adaptive Workflows: Empowers LLMs to define and control workflows dynamically, enabling complex task automation.

Components of SmolAgents

LLM Core: Powers the agent’s decision-making and actions.
Tool Repository: A predefined list of tools available for task execution.
Parser: Extracts actionable information from the LLM’s outputs.
System Prompt: Guides the agent with clear instructions, ensuring alignment with the parser.
Memory: Maintains context across iterations, essential for multi-step agents.
Error Logging and Retry Mechanisms: Improves system resilience and efficiency.

Experimenting with SmolAgents

In this section, we will demonstrate how to set up and use a lightweight AI agent for task automation. The agent utilizes a large language model (LLM) alongside a search tool to handle tasks that require both computational reasoning and external data retrieval. By configuring different models and tools, the agent becomes highly adaptable for various applications, including research, content generation, and question-answering.

Let’s begin our exploration of SmolAgents with hands-on demonstrations.

1. Creating search agents

In this example, the agent is tasked with retrieving information about Microsoft, highlighting its ability to integrate AI reasoning with real-time web data.

Installing dependencies

Create and activate a virtual environment by executing the following command.

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

Install smolagents and python-dotenv libraries using pip.

pip install smolagents python-dotenv

Create a file named app.py and use the code below to access the SmolAgents search tool.

from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel

# Choose which LLM engine to use
model = LiteLLMModel(model_id="gpt-4o", api_key = "YOUR_OPENAI_API_KEY")

# Create a code agent
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("Tell me about Microsoft")

Running the app will generate the following output:

2. Accessing SmolAgents using Together

SmolAgents supports model access from various providers, including OpenAI, Hugging Face, Together, and LiteLLM.

If you have a Together account and plan to use its API keys, follow these steps:

Visit Together.AI and sign in to your account or create a new one if needed.
Go to the Models section to explore different open-source models.
Navigate to the Dashboard to collect your API key.

Use the following code to use DeepSeek-R1 model from Together.

from smolagents import CodeAgent, DuckDuckGoSearchTool, OpenAIServerModel

# Choose Together based DeepSeek model to use
model = OpenAIServerModel(
    model_id="deepseek-ai/DeepSeek-R1",
    api_base="https://api.together.xyz/v1/", # Leave this blank to query OpenAI servers.
    api_key="YOUR_TOGETHER_API_KEY", # Switch to the API key for the server you're targeting.
)

# Create a code agent
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("Tell me about Microsoft")

Running the app will generate the following output:

3. Creating task oriented agents

The agent utilizes an LLM backend to process natural language queries, identify the appropriate tools (such as DuckDuckGoSearchTool or yfinance), and execute tasks within a secure, controlled environment. This setup offers flexibility in switching between different LLMs and integrating external tools, making SmolAgents a powerful framework for automating diverse workflows.

Install yfinance using pip.

pip install yfinance

Create a file named app.py and add the following code to it.

from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel
import yfinance as yf

# Initialize the LLM model with the corrected string for the API key
model = LiteLLMModel(model_id="gpt-4o", api_key="Your_OPENAI_API_KEY")

# Define the agent with tools and imports
agent = CodeAgent(
   tools=[DuckDuckGoSearchTool()],
   additional_authorized_imports=["yfinance"],
   model=model
)
# Run the agent to fetch the stock price of Apple Inc.
response = agent.run(
   "Fetch the stock price of Microsoft Inc (NASDAQ: AAPL). Use the YFinance Library."
)
# Output the response
print(response)

Running the app will generate the following output:

Building Agentic RAG with SmolAgents

Building an Agentic Retrieval-Augmented Generation (RAG) system with SmolAgents enables the creation of AI agents that can autonomously make decisions and execute tasks. In this section, we will walk through the step-by-step process to construct an Agentic RAG using SmolAgents.

What is Agentic RAG?

Agentic Retrieval-Augmented Generation (Agentic RAG) is an advanced AI framework that enhances traditional RAG systems by incorporating autonomous decision-making and dynamic task execution.

In a standard RAG system, an AI model retrieves relevant information from external sources (e.g., databases, documents, or web searches) and then generates responses based on that data. However, Agentic RAG goes a step further by leveraging AI agents that can:

Autonomously plan multi-step retrieval and reasoning workflows.
Adapt dynamically to new information and modify queries accordingly.
Utilize multiple tools, such as search APIs, vector databases, and LLM-powered reasoning, to improve accuracy.

With SmolAgents, an Agentic RAG system can coordinate multiple agents to retrieve, analyze, and synthesize information more effectively, making it ideal for applications like research assistants, automated customer support, and intelligent knowledge management.

To build an Agentic RAG system with SmolAgents, we start by processing a PDF document, splitting it into manageable chunks, and generating embeddings for semantic search. These embeddings are stored in a vector database, enabling efficient retrieval of relevant information. Additionally, a search agent is integrated to fetch external data when needed, ensuring the system provides comprehensive and accurate responses to user queries.

Installing dependencies

Create and activate a virtual environment by executing the following command.

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

Install smolagents, litellm, pypdf, faiss-cpu, langchain-community and python-dotenv libraries using pip.

pip install smolagents litellm pypdf faiss-cpu langchain-community langchain-openai python-dotenv

Importing OpenAI Key

Lets follow the steps to import OpenAI key in the code.

Create a .env file and add your OpenAI API key to the .env file as follows:

OPENAI_API_KEY=sk-proj-xcQxBf5LslO62AtawFQum9wM2NDrkJmdZfaoNfQIw...

Import OpenAI key using the following code.

# Importing OpenAI key
import os
from dotenv import load_dotenv
load_dotenv()
openai_api_key = os.getenv('OPENAI_API_KEY')

Importing required libraries

Add the import statements for required modules for:

Loading PDF documents using PyPDFLoader
Storing and searching vectors with FAISS
Generating embeddings via OpenAIEmbeddings
Splitting text into smaller chunks using RecursiveCharacterTextSplitter

# Importing required libraries
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from smolagents import Tool
from smolagents import LiteLLMModel, DuckDuckGoSearchTool

Loading and Splitting the PDF Document

In this section, we use the Phi-3 Technical Report (2404.14219v4.pdf) as the reference document for the Retrieval-Augmented Generation (RAG) system.

The following code loads a PDF file, extracts its content, and splits it into smaller, manageable chunks to facilitate efficient processing.

PyPDFLoader loads the PDF and extracts its pages as a list of document objects.
RecursiveCharacterTextSplitter splits the document into smaller segments while maintaining contextual coherence:
chunk_size=1000: Each chunk contains up to 1,000 characters.
chunk_overlap=200: Adjacent chunks overlap by 200 characters to preserve context.

This preprocessing step ensures that the text remains structured and searchable for downstream tasks like embedding generation and retrieval.

# Loading the PDF
loader = PyPDFLoader("2404.14219v4.pdf")
pages = loader.load()
for page in pages:
    print(page.page_content)

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
# split_document accepts a list of documents
splitted_docs = splitter.split_documents(pages) 

print(len(splitted_docs))
print(splitted_docs[0])

Generating and Searching Document Embeddings

The following code initializes the OpenAIEmbeddings model and generates numerical embeddings for each chunk of text. These embeddings capture the semantic meaning of the content, enabling efficient similarity searches.

Embedding Generation: The OpenAIEmbeddings model converts each text chunk into a high-dimensional vector representation.
FAISS Vector Database: The FAISS library stores these embeddings and facilitates fast similarity searches.
Similarity Search: The system retrieves the top 5 most relevant document chunks based on a given query.
Retrieval Output: The most relevant chunk is printed, providing meaningful context for the query.

# Generate embeddings for the documents
embed_model = OpenAIEmbeddings(openai_api_key=openai_api_key)
embeddings = embed_model.embed_documents([chunk.page_content for chunk in splitted_docs])
print(f"Embeddings shape: {len(embeddings), len(embeddings[0])}")

vector_db = FAISS.from_documents(
    documents = splitted_docs,
    embedding = embed_model)

similar_docs = vector_db.similarity_search("What is Phi-3 training methodology", k=5)
print(similar_docs[0].page_content)

Defining a Retriever Tool

Define a custom RetrieverTool is a custom tool that leverages the FAISS vector database to perform semantic searches. It accepts a query as input and retrieves the most relevant document chunks based on their embeddings.

class RetrieverTool(Tool):
    name = "retriever"
    description = "Uses semantic search to retrieve the parts of the documentation that could be most relevant to answer your query."
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
            }
    }
    output_type = "string"
    def __init__(self, vector_db, **kwargs): # Add vector_db as an argument
        super().__init__(**kwargs)
        self.vector_db = vector_db # Store the vector database
    def forward(self, query: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"
        docs = self.vector_db.similarity_search(query, k=4) # Perform search here
        return "\nRetrieved documents:\n" + "".join(
            [
                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
                for i, doc in enumerate(docs)
            ]
        )
retriever_tool = RetrieverTool(vector_db=vector_db) # Pass vector_db during instantiation

Defining the agent

The agent is initialized with a LiteLLMModel as its language model and a DuckDuckGoSearchTool for web-based queries. A CodeAgent is then created, integrating both the RetrieverTool and SearchTool to effectively process and respond to user queries.

model = LiteLLMModel(model_id="gpt-4o", api_key = openai_api_key)
search_tool = DuckDuckGoSearchTool()
from smolagents import CodeAgent
agent = CodeAgent(
    tools=[retriever_tool,search_tool], model=model, max_steps=6
)

Executing the agent

In SmolAgents, agent.run() is the method used to execute an agent with a given input query. The agent receives the input and determines the necessary actions to fulfill the request.

agent_output = agent.run("Tell me about Microsoft")
print(agent_output)
agent_output = agent.run("What is Phi-3 training methodology")
print(agent_output)
agent_output = agent.run("Summarize technical specifications of Phi-3")
print(agent_output)

SmolAgents, combined with Agentic RAG, provides a powerful framework for building intelligent and autonomous AI systems. By leveraging SmolAgents’ lightweight design and Agentic RAG’s dynamic retrieval capabilities, developers can create adaptable, secure, and scalable AI agents. This synergy enables efficient handling of complex tasks, making it ideal for applications in research, decision-making, and automation.

Thanks for reading this article !!

Thanks Gowri M Bhatt for reviewing the content.

If you enjoyed this article, please click on the clap button 👏 and share to help others find it!

The full source code for this tutorial can be found here,

GitHub - codemaker2015/SmolAgents-and-Agentic-RAG

Contribute to codemaker2015/SmolAgents-and-Agentic-RAG development by creating an account on GitHub.

github.com

Resources

agents-course (Hugging Face Agents Course)

Org profile for Hugging Face Agents Course on Hugging Face, the AI community building the future.

huggingface.co

Actions: Enabling the Agent to Engage with Its Environment - Hugging Face Agents Course

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Building good Smolagents - Smolagents

There's a world of difference between building an agent that works and one that doesn't. How can we build agents that…

smolagents.org

Text-to-SQL Example - Smolagents

In this tutorial, we'll see how to implement an agent that leverages SQL using smolagents. Let's start with the golden…

smolagents.org

Building Smarter AI: Introduction to SmolAgents and Agentic RAG

Getting Started

Table of contents

What are Agents?

What is smolAgents

Key Features of SmolAgents

Components of SmolAgents

Experimenting with SmolAgents

1. Creating search agents

Installing dependencies

2. Accessing SmolAgents using Together

3. Creating task oriented agents

Building Agentic RAG with SmolAgents

What is Agentic RAG?

Installing dependencies

Importing OpenAI Key

Importing required libraries

Loading and Splitting the PDF Document

Generating and Searching Document Embeddings

Defining a Retriever Tool

Defining the agent

Executing the agent

GitHub - codemaker2015/SmolAgents-and-Agentic-RAG

Contribute to codemaker2015/SmolAgents-and-Agentic-RAG development by creating an account on GitHub.

Resources

agents-course (Hugging Face Agents Course)

Org profile for Hugging Face Agents Course on Hugging Face, the AI community building the future.

Actions: Enabling the Agent to Engage with Its Environment - Hugging Face Agents Course

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Building good Smolagents - Smolagents

There's a world of difference between building an agent that works and one that doesn't. How can we build agents that…

Text-to-SQL Example - Smolagents

In this tutorial, we'll see how to implement an agent that leverages SQL using smolagents. Let's start with the golden…

Written by Vishnu Sivan

No responses yet