We were trying to build a RAG-based chatbot for an internal use case using LangChain, LangGraph, Ollama, and ChromaDB. The core idea was straightforward: users upload documents, and then query them through natural language to retrieve useful context-rich responses. Everything seemed like it should work smoothly in theory.
But as we started putting it into practice, something felt off. The answers we were getting back from the chatbot weren't really useful — often vague, sometimes irrelevant. My first instinct was that the prompt might not be well-crafted. So, like any of us would, I spent some time refining it, experimenting with different phrasings and prompt structures.
Still, the output wasn't improving.
At this point, I suspected that maybe the problem wasn't the language model at all — maybe it was the retrieval layer. So I went one level deeper and began inspecting the queries being made to ChromaDB. I started issuing retrieval queries directly to Chroma to better understand what documents were actually being returned for a given input.
And sure enough, the quality of retrieved documents wasn't great.
I figured the issue might be related to the vector search configuration itself. After diving into ChromaDB's documentation, particularly the section on HNSW configuration, I started tweaking parameters like ef
, M
, and the distance metric. With some trial and error, I managed to get better results — but only when testing against specific hardcoded queries.
The problem was now shifting: how could I iterate quickly and visually explore the effects of these changes?
There just wasn't a good way to see what was happening under the hood in real-time. I wanted something that let me view collections, inspect the documents being stored, and run queries with different embedding models. Ideally, it would also show me things like the similarity distance, the raw embeddings, and the matched content.
So I went looking for a tool that could help.
I found a couple of open-source UIs on GitHub that seemed relevant — chromadb-admin and chroma-ui. Both were promising. One had a decent UI and visual features, but didn't support querying. The other didn't support connecting to a locally deployed ChromaDB instance, which was a must for my use case.
That's when I decided to build my own.
I didn't want to over-engineer it, so I reached for the fastest and simplest tool I could think of — Gradio. It's an intuitive Python library for building web interfaces, especially useful for demos, internal tools, or quick experimentation. With a bit of Python and help from ChatGPT whenever I got stuck, I started putting together a lightweight interface.
The result is chromadb-viewer — a minimal yet powerful web UI for querying ChromaDB collections using embeddings generated by Ollama models.
The viewer lets you configure your ChromaDB connection parameters dynamically, point to any locally running Ollama instance, and pick the embedding model of your choice. Once connected, you can browse collections, run semantic queries, and view results including the matched document, similarity score, and raw embedding vectors — all in real-time.
One of the nice touches is that it automatically detects and shows the distance metric (like l2
, cosine
, or ip
) being used in the collection's underlying HNSW index, so you always know how similarity is being calculated.
For now, it's focused on local Chroma instances and Ollama embeddings, but I've got plans to expand it. I want to add support for cloud-deployed ChromaDB setups and allow users to switch between different embedding providers — not just Ollama. That way, it becomes more flexible for various real-world use cases.
If you're experimenting with RAG pipelines, vector databases, or just want an easier way to poke around your Chroma collections, this might help you too.
Feel free to check it out, give it a spin, and contribute if you'd like: nobleknightt/chromadb-viewer
Thanks for reading — more updates to come!