I'm always excited to take on new projects and collaborate with innovative minds.
🇮🇹 | 🇮🇳
A practical guide to building a Retrieval-Augmented Generation chatbot using LangChain, Ollama, and Chroma — from zero to working chatbot in half an hour.

Build a fully local RAG chatbot using LangChain, Ollama, and Chroma — no cloud APIs, no monthly bills, and no machine learning experience required.
Retrieval-Augmented Generation (RAG) is a technique that lets a language model answer questions about information it was never trained on. Instead of relying solely on memorized knowledge — which is frozen in time and prone to hallucination — RAG gives the model a searchable knowledge base. When you ask a question, the system retrieves the most relevant documents, then feeds them to the LLM alongside your query. The model reads those documents and crafts an answer grounded in facts.
Think of it as an open-book exam. The student is capable, but they do not have every fact memorized. Hand them the right textbook pages, and suddenly they answer with precision. That is RAG: bridging general intelligence with domain-specific knowledge.
In 2026, running LLMs locally has become remarkably accessible. You no longer need a data center budget. With the right open-source tools, you can build a RAG chatbot on your laptop in under 30 minutes.
You will also need Python 3.10 or newer and these pip packages: langchain, langchain-ollama, langchain-chroma, and chromadb.
Install Ollama from ollama.com, then pull the models you need. Run ollama pull llama3.2 for text generation and ollama pull nomic-embed-text for embeddings. Verify with ollama run llama3.2 and a test prompt.
Create a folder called documents and drop in your files — PDFs, text files, markdown notes, or HTML pages. LangChain supports dozens of loaders. Use DirectoryLoader and TextLoader from langchain_community.document_loaders to scan your folder.
Once loaded, split documents into chunks using RecursiveCharacterTextSplitter. Set a chunk size of around 1000 characters with a 200-character overlap. The overlap prevents sentences from losing context when split across boundaries. After splitting, you will have hundreds of small, self-contained text chunks ready for embedding.
Import OllamaEmbeddings from langchain_ollama and initialize it with the model name nomic-embed-text. Call its embed_documents method with your list of text chunks. This returns high-dimensional vectors — numerical fingerprints that capture semantic meaning. This step may take a minute or two but only runs once.
Import Chroma from langchain_chroma and initialize it with your embedding function and a persistence path like ./chroma_db. Use Chroma's from_documents class method, passing in your document chunks. Chroma builds an index for fast similarity searches and saves everything to disk. Test it by calling vectorstore.similarity_search("your query", k=3) to see the three most relevant chunks.
Import ChatOllama from langchain_ollama to wrap your llama3.2 model. Create a prompt template using ChatPromptTemplate from langchain_core.prompts with two variables: context and question. Write a system message instructing the model to answer only from the provided context and to say "I don't know" when the answer is not found. This dramatically reduces hallucination.
Now compose the full chain: retrieve relevant documents from Chroma, format them into a context string, insert both context and question into your prompt template, and send the result to the LLM. Use LangChain's pipe syntax or the RunnablePassthrough pattern — the entire pipeline becomes a single callable object.
Wrap everything in a simple loop using Python's input() function. Feed each question into your chain and print the response. Add a quit command, an optional sources flag to show retrieved chunks, and basic error handling for empty queries. That is it — fire up your script and ask questions about your documents. Answers arrive in seconds, all generated locally.
Everything in this stack runs locally. No API keys, no usage limits, no privacy concerns. As open-source models improve through 2026, your chatbot only gets smarter — just swap in a newer model. Take the next 30 minutes and build something that would have seemed like science fiction a few years ago.
Tharun Ramagiri is a web developer, security researcher, and AI enthusiast exploring the intersection of LLMs and everyday technology. He writes about practical AI tools, cybersecurity awareness, and developer workflows that actually work.
Your email address will not be published. Required fields are marked *