Democratizing chatbot applications: Your ultimate guide to building open source chatbots with FalconAI, LangChain, and Chainlit

Vishnu Sivan
11 min readAug 13, 2023

In the ever-evolving landscape of Artificial Intelligence, Generative AI has emerged as a transformative force, revolutionizing various industries with its ability to create new content and generate creative solutions. Generative Large Language Models (LLMs) have especially been at the forefront of this AI revolution, demonstrating remarkable capabilities that range from producing programmable codes to fully managing Chat Support Systems. However, a significant barrier to widespread adoption has been the limited accessibility of many LLMs, predominantly being closed-source and proprietary.

The lack of open-source alternatives has left developers and researchers yearning for more accessible and inclusive solutions. FalconAI, the latest addition to the world of LLMs, has made headlines by not only topping the OpenLLM leaderboard but also by being fully open-sourced. This groundbreaking development marks a significant step towards democratizing Generative AI technology and fostering a collaborative ecosystem for innovation.

In this article, we embark on a journey to explore the limitless possibilities of FalconAI and its seamless integration with two other powerful open-source tools, LangChain and Chainlit.

Getting Started

Table of contents

What is Falcon AI

The Falcon LLM model is a robust, 40-billion-parameter AI model designed to seamlessly generate both natural language text and code. Developed by Technology Innovation Institute (TII), Abu Dhabi, Falcon AI is open sourced under the Apache 2.0. Falcon LLM has showcased its superiority over other models on the Open LLM leaderboard — an impressive testament to its capabilities. Its architecture optimized for inference, incorporating innovative elements such as FlashAttention and multiquery functionalities.

Falcon even comes with Instruct versions called Falcon-7B-Instruct and Falcon-40B-Instruct, which comes finetuned on conversational data. These can be worked directly with python to create chat applications.

Experimenting on Falcon LLM

Accessing and utilizing the Falcon LLM model is conveniently facilitated through the Hugging Face platform. The model is readily available for exploration, enabling researchers and practitioners to harness its potential. While it is possible to download and run the model on local machines, it is recommended to use Hugging Face APIs due to the high GPU memory requirements necessary for local operations.

HuggingFace has created a dedicated area for the Falcon-40B-Instruct Model called the Falcon-Chat demo. If you are interested to check out the falcon chat, use the following link:

In this section, we will try to use Falcon LLM in the local machine as well as the HuggingFace API using langchain.

Method 1: Using Falcon LLM in local machines

Let’s start with installing the required dependencies to download and use Falcon LLM in our machines.

  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install transformers, accelerate, einops, xformers libraries using pip.
pip install transformers accelerate einops xformers

The transformers package is used to download and work with the state-of-the-art models that are pre-train, like the Falcon. The accelerate package enables us to run PyTorch models irrespective of the system. The einops and xformers are other packages that support the Falcon model.

  • Create a new file named falcon_local_test.py and add the following code to it.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id
)

sequences = pipeline(
"How many colors are there in a rainbow?"
)

for seq in sequences:
print(f"Result: {seq['generated_text']}")

Understanding the code:

  • Choose the model path for testing, favoring the Falcon-7B-Instruct model due to its lower GPU space requirements.
  • Store the link to the Falcon-7B-Instruct Large Language Model within the “model” variable.
  • Utilize the “from_pretrained()” method from the AutoTokenizer class in transformers to download the appropriate tokenizer for the selected model.
  • Supply the LLM path to the method to obtain the suitable Tokenizer tailored for the model.
  • Establish a pipeline, specifying essential parameters during creation, including the model in use and its intended function (e.g., “text-generation”).
  • Provide the pipeline object with pertinent tokenizer details and additional parameters.
  • Explore Falcon’s 7B instruct model output by inputting a query and observing the generated response.

Method 2: Using Falcon LLM with LangChain

Conventional usage of the Falcon model involves downloading it onto a local machine and using it directly. However, this method demands significant GPU memory for optimal functionality. As an alternative approach, one can leverage the inference API provided by HuggingFace, which grants access to a comprehensive array of transformer models within the HuggingFace ecosystem.

Let’s start by creating a HuggingFace Access Token to use Falcon LLM with HuggingFacePipeline.

  • To utilize the HuggingFacePipeline, it’s essential to establish an account on the official HuggingFace website. Once registered, log in using your credentials, proceed to your profile, and navigate to the Settings section. From there, access the Access Token section in order to generate a new access token.
  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install langchain, huggingface_hub and transformers libraries using pip.
pip install langchain huggingface_hub transformers
  • Create a new file named falcon_langchain_test.py and add the following code to it.
from langchain import HuggingFaceHub, PromptTemplate, LLMChain

huggingfacehub_api_token = os.environ['HUGGINGFACEHUB_API_TOKEN']

repo_id = "tiiuae/falcon-7b-instruct"
llm = HuggingFaceHub(huggingfacehub_api_token=huggingfacehub_api_token,
repo_id=repo_id,
model_kwargs={"temperature":0.6, "max_new_tokens":2000})

template = """
You are an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users questions.
Question: {query}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["query"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "How many colors are there in a rainbow?"

print(llm_chain.run(question))
  • Create a file named .env and add the huggingface api token to it.
HUGGINGFACEHUB_API_TOKEN=your-huggingface-api-key

Understanding the code:

  • Import the following libraries: langchain, huggingface_hub, and transformers.
  • Import HuggingFaceHub from the langchain library. This component enables the invocation of the Falcon-7B-Instruct model via the Inference API, facilitating the reception of model-generated responses.
  • Configure the HuggingFace Inference API by storing it within an environment variable: os.environ[‘HUGGINGFACEHUB_API_TOKEN’].
  • Integrate the PromptTemplate, a vital LangChain element, crucial for constructing applications reliant on Large Language Models. This template dictates how the model comprehends user queries and the context within which it formulates responses.
  • Import LLMChain from LangChain, a module designed to seamlessly link diverse LangChain components together.

Running the app

Execute the following command to run the app.

python falcon_langchain_test.py

What is LangChain

LangChain is a robust and open-source framework that serves as a valuable tool for creating applications driven by large language models. It offers more than just typical API interactions; it’s designed to understand and interact with data, fostering connections with different data sources to enhance personalized and enriched experiences.

LangChain simplifies the development process for a range of applications, including chatbots, Generative Question-Answering (GQA), and summarization. Through the integration of various modules, it facilitates the crafting of distinctive applications centered around a Large Language Models (LLM).

Components

The fundamental concept of the library revolves around the ability to “chain” together distinct components, these chains can encompass multiple components from various modules:

  • Prompt templates: These templates serve as foundational structures for diverse prompt types.
  • LLMs: Large Language Models, such as GPT-3 and BLOOM, constitute a vital component within the chain.
  • Agents: Utilizing LLMs, agents determine appropriate actions to undertake.
  • Memory: The concept of memory, encompassing both short-term and long-term facets, plays a pivotal role in the overall framework.

Creating Prompts in LangChain

Let’s start by installing the required dependencies to use LangChain.

  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install langchain library using pip.
pip install langchain
  • Create a file named langchain_demo.py.
  • Import the PromptTemplate class and initialize a template like below,
from langchain import PromptTemplate

template = """Question: {question}

Answer: """
prompt = PromptTemplate(
template=template,
input_variables=['question']
)

# user question
question = "How many colors are there in a rainbow?"
  • The Hugging Face Hub endpoint in LangChain connects to the Hugging Face Hub and runs the models via their free inference endpoints. It requires Hugging Face Access Token. Refer Method 2: Using Falcon LLM with LangChain part to understand the access token generation.
  • Create a file named .env and add the huggingface api token to it.
HUGGINGFACEHUB_API_TOKEN=your-huggingface-api-key
  • Add the following code to langchain_demo.py to access the access token.
import os
from dotenv import load_dotenv
load_dotenv()

huggingfacehub_api_token = os.getenv('HUGGINGFACEHUB_API_TOKEN')
  • Add the following code to langchain_demo.py to explore langchain using hugging face apis.
from langchain import HuggingFaceHub, LLMChain

# initialize Hub LLM
hub_llm = HuggingFaceHub(
repo_id='google/flan-t5-xl',
model_kwargs={'temperature':1e-10}
)

# create prompt template > LLM chain
llm_chain = LLMChain(
prompt=prompt,
llm=hub_llm
)

# ask the user question about rainbow
print(llm_chain.run(question))
  • The entire code is summarized below.
from langchain import PromptTemplate
from langchain import HuggingFaceHub, LLMChain

# access huggingface token from .env file
import os
from dotenv import load_dotenv
load_dotenv()

huggingfacehub_api_token = os.getenv('HUGGINGFACEHUB_API_TOKEN')

template = """Question: {question}

Answer: """
prompt = PromptTemplate(
template=template,
input_variables=['question']
)

# user question
question = "How many colors are there in a rainbow?"

# initialize Hub LLM
hub_llm = HuggingFaceHub(
repo_id='tiiuae/falcon-7b-instruct',
model_kwargs={'temperature':0.6}
)

# create prompt template > LLM chain
llm_chain = LLMChain(
prompt=prompt,
llm=hub_llm
)

# ask the user question about NFL 2010
print(llm_chain.run(question))
  • Run the app using the following command.
python langchain_demo.py

What is Chainlit

Chainlit is an open-source toolkit that simplifies the process of crafting user interfaces for chatbots that use large language models (LLMs). It's designed with React framework and comes with a variety of tools that enables users to build chatbot interactions that are dynamic and captivating. With Chainlit, user can create user interfaces similar to ChatGPT. Notable features include the ability to show intermediary steps visually, manage and display various elements like images and text, and even deploy your chatbot to the cloud.

Let’s start by installing Chainlit.

  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install chainlit using pip.
pip install chainlit
  • Run the hello world app using the following command.
chainlit hello
  • Create a file named chainlit_demo.py and add the following code to it.
import chainlit as cl

@cl.on_message # this function will be called every time a user inputs a message in the UI
async def main(message: str):
# this is an intermediate step
await cl.Message(author="Tool 1", content=f"Response from tool1", indent=1).send()

# send back the final answer
await cl.Message(content=f"This is the final answer").send()
  • Run the app using the following command.
chainlit run chainlit_demo.py -w

Building an open source chatbot

In this section, we will create an open-source Falcon-7B model chatbot.

  • As an initial step, install the required libraries in a new virtual environment.
  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
  • Install langchain, chainlit, huggingface_hub, python-dotenv using pip.
pip install langchain chainlit huggingface_hub python-dotenv
  • Create a file named app.py and add the following code to it.
  • Import the libraries and access the HuggingFace Inference API from the .env file.
import chainlit as cl
import os
from dotenv import load_dotenv
load_dotenv()

huggingfacehub_api_token = os.getenv('HUGGINGFACEHUB_API_TOKEN')

from langchain import HuggingFaceHub, PromptTemplate, LLMChain
  • Provide the path to the model to infer with the Falcon Instruct model through the HuggingFaceHub module. The id of this model can be found directly on the HuggingFace website( ‘tiiuae/falcon-7b-instruct’).
repo_id = "tiiuae/falcon-7b-instruct"
llm = HuggingFaceHub(huggingfacehub_api_token=huggingfacehub_api_token,
repo_id=repo_id,
model_kwargs={"temperature":0.6, "max_new_tokens":2000})
  • The PromptTemplate acts as a guide for the model. It helps the model understand how to react when you ask a question. It also helps the model decides what answers to display based on the questions.
template = """
You are an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

{question}

"""
  • Add the decorators (@cl.on_chat_start and @cl.on_message) from Chainlit for LangChain.
  • Add the prompt template as per the model details.
@cl.on_chat_start
def main():
# Instantiate the chain for that user session
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)

# Store the chain in the user session
cl.user_session.set("llm_chain", llm_chain)


@cl.on_message
async def main(message: str):
# Retrieve the chain from the user session
llm_chain = cl.user_session.get("llm_chain") # type: LLMChain

# Call the chain asynchronously
res = await llm_chain.acall(message, callbacks=[cl.AsyncLangchainCallbackHandler()])

# Do any post processing here

# Send the response
await cl.Message(content=res["text"]).send()
  • The entire source code of the chatbot is given below.
import chainlit as cl
import os
from dotenv import load_dotenv
load_dotenv()

huggingfacehub_api_token = os.getenv('HUGGINGFACEHUB_API_TOKEN')

from langchain import HuggingFaceHub, PromptTemplate, LLMChain

repo_id = "tiiuae/falcon-7b-instruct"
llm = HuggingFaceHub(huggingfacehub_api_token=huggingfacehub_api_token,
repo_id=repo_id,
model_kwargs={"temperature":0.6, "max_new_tokens":2000})

template = """
You are an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

{question}

"""


@cl.on_chat_start
def main():
# Instantiate the chain for that user session
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)

# Store the chain in the user session
cl.user_session.set("llm_chain", llm_chain)


@cl.on_message
async def main(message: str):
# Retrieve the chain from the user session
llm_chain = cl.user_session.get("llm_chain") # type: LLMChain

# Call the chain asynchronously
res = await llm_chain.acall(message, callbacks=[cl.AsyncLangchainCallbackHandler()])

# Do any post processing here

# Send the response
await cl.Message(content=res["text"]).send()

Running the app

Run the chatbot using the following command.

chainlit run app.py -w

Thanks for reading this article.

Thanks Gowri M Bhatt for reviewing the content.

If you enjoyed this article, please click on the clap button 👏 and share to help others find it!

The full source code for this tutorial can be found here,

Here are some useful references:

https://www.pinecone.io/learn/series/langchain/langchain-intro

--

--

Vishnu Sivan

Try not to become a man of SUCCESS but rather try to become a man of VALUE