Chroma Context-1: Why Separating Search from Answers Makes AI 10x Cheaper

Benchmark	Context-1 (20B)	GPT-4o	Claude 3.5 Sonnet
HotpotQA (multi-hop)	89.2%	87.5%	86.8%
SealQA	Comparable	Reference	Comparable
FRAMES	Superior	Reference	Comparable
Average cost per 1K queries	~$0.50	~$5.00	~$4.50
Average latency	~2 seconds	~4 seconds	~3.5 seconds

Benchmark

Context-1 (20B)

GPT-4o

Claude 3.5 Sonnet

HotpotQA (multi-hop)

89.2%

87.5%

86.8%

SealQA

Comparable

Reference

Comparable

FRAMES

Superior

Reference

Comparable

Average cost per 1K queries

~$0.50

~$5.00

~$4.50

Average latency

~2 seconds

~4 seconds

~3.5 seconds

from context1 import SearchAgent agent = SearchAgent(model="chromadb/context-1") result = agent.search( query="Which open-source RAG frameworks support " "multi-hop search and are compatible with Chroma?", max_steps=5, context_window=32768 ) print(result.answer) print(f"Sources used: {len(result.sources)}") print(f"Queries performed: {result.num_queries}") print(f"Documents pruned: {result.pruned_docs}")

Criteria	Context-1	GPT-4o	Claude 3.5 Sonnet
Model size	20B params	~200B+ (est.)	~175B (est.)
Specialization	Search only	Generalist	Generalist
License	Apache 2.0	Proprietary	Proprietary
Self-hosting	Yes (A100/H100)	No	No
Context self-editing	Yes (native)	No	No
Cost per query	~$0.0005	~$0.005	~$0.0045

Criteria

Context-1

GPT-4o

Claude 3.5 Sonnet

Model size

20B params

~200B+ (est.)

~175B (est.)

Specialization

Search only

Generalist

License

Apache 2.0

Proprietary

Self-hosting

Yes (A100/H100)

Context self-editing

Yes (native)

Cost per query

~$0.0005

~$0.005

~$0.0045

Chroma Context-1: Frequently Asked Questions

What is Chroma Context-1?+

Context-1 is a 20-billion parameter open-weight (Apache 2.0) agentic search model developed by Chroma. It is purpose-built for multi-hop iterative retrieval, designed to find and curate relevant information from document corpora before handing the context to a separate generation model for answer synthesis.

How is Context-1 different from standard RAG retrieval?+

Standard RAG typically performs single-pass vector search and stuffs results into a prompt. Context-1 runs a multi-turn agentic loop averaging 5.2 turns, using hybrid search (BM25 + dense retrieval), document reading, regex search, and pruning to iteratively build a comprehensive, validated context window.

What are Context-1's benchmark results?+

Context-1 achieves 0.97 accuracy on difficult web queries, 0.95 on legal documents, 0.82 on finance, 0.96 on BrowseComp+, and 0.99 on HotpotQA. These results approach or match frontier models at a fraction of the cost and latency.

How much cheaper is Context-1 compared to using frontier models for search?+

Chroma positions Context-1 as approximately 10x cheaper and 10x faster than using frontier LLMs for the same retrieval tasks. The 20B model runs on a single B200 GPU at 400-500 tokens per second, compared to the much higher per-token cost of API calls to models like GPT-5 or Claude.

Can I use Context-1 with any LLM for answer generation?+

Yes. Context-1 outputs curated context passages, not final answers. You can pair it with any generation model of your choice, whether that is Claude, GPT-5, Gemini, or a smaller local model optimized for your use case.

Is Context-1 truly open source?+

Yes. The model weights are on Hugging Face under Apache 2.0 license, and Chroma has also published the full data generation pipeline on GitHub. Teams can fine-tune the model on their own domains.

What infrastructure do I need to run Context-1?+

Context-1 runs on vLLM and achieves 400-500 tokens per second on a single NVIDIA B200 GPU. It operates within a 32,000-token budget per query and supports concurrent processing for batch workloads.

How does Context-1 handle multi-hop questions?+

The model decomposes complex queries into subqueries, performs iterative searches across multiple turns, evaluates retrieved documents, prunes irrelevant content with 0.94 accuracy, and continues searching until it has assembled sufficient evidence. This multi-hop capability is core to its training, which used 8,000+ synthetic multi-hop tasks.

Chroma Context-1: Why Separating Search from Answers Makes AI 10x Cheaper

Soizic

The Retrieval Bottleneck Nobody Talks About

What Context-1 Actually Does

The Numbers That Matter

Retrieval Accuracy

Efficiency Gains Over Base Model

Cost and Speed

Why Decoupling Search from Answers Changes the Economics

Better Retrieval Quality

Lower Total Cost

Architectural Flexibility

Reduced Hallucination

Under the Hood: How the Agent Loop Works

Where Context-1 Fits in the Stack

The Competitive Landscape

How to Get Started With Context-1

What This Means for AI Application Architecture

Chroma Context-1: Frequently Asked Questions

Ready to get started?