Vercel Knowledge Agents: Building AI That Understands Your Data Without Embeddings

Vercel Labs releases an open-source knowledge agent template that replaces RAG and vector databases with grep, find, and cat for simpler AI search.

Data & IT Infrastructure
Vercel Knowledge Agents: Building AI That Understands Your Data Without Embeddings

The Case Against Embeddings

Vercel Logo

If you have tried to build an AI system that answers questions about your own data, you have almost certainly encountered RAG: Retrieval-Augmented Generation. The standard playbook involves splitting your documents into chunks, converting those chunks into vector embeddings, storing them in a vector database, and then retrieving relevant chunks to feed into an LLM when a user asks a question.

RAG works. But it also fails in ways that are maddeningly difficult to debug. When the system returns a wrong answer, is it because the embedding model misunderstood the content? Because the chunking strategy split a critical concept across two fragments? Because the similarity search retrieved the wrong chunks? Because the context window was not large enough to include all relevant information?

Vercel Labs has released an open-source template that sidesteps this entire complexity stack. The Vercel Knowledge Agent Template, announced in March 2026 by Ben Sabic, builds knowledge agents using nothing more than grep, find, and cat executed via bash in isolated Vercel Sandboxes. No embeddings. No vector databases. No chunking strategies. Just an LLM using the same filesystem tools that developers have relied on for decades.

How It Works: Files and Bash Commands

Vercel Knowledge Agents Blog

The architecture is disarmingly simple. Sources like GitHub repositories, YouTube transcripts, and documentation are added through an admin UI and stored in a Postgres database. These sources are synced to a snapshot repository via Vercel Workflow. When an agent needs to answer a question, it loads this snapshot into an isolated Vercel Sandbox and uses bash and bash_batch tools to perform filesystem operations.

The agent's toolset consists of three Unix commands:

  • grep -r to search file contents for patterns and keywords

  • find to locate files by name, type, or other attributes

  • cat to read file contents

That is it. The agent formulates search strategies, executes them against the filesystem, reads the results, and synthesizes answers. There is no embedding step, no vector similarity computation, and no retrieval pipeline to configure or maintain.

This approach works because modern LLMs have extensive training data from code repositories and terminal interactions. They already know how to use filesystem tools effectively. Rather than building an entirely new retrieval infrastructure, the Vercel template leverages skills the model already has.

The Architecture in Detail

Source Management

Sources are managed through an admin UI connected to a Postgres database. Each source is synced to a snapshot that the agent can browse at query time. This design decouples source management from the query engine, making it straightforward to add, remove, or update data sources without touching the agent logic.

Sandboxed Execution

Every agent query runs in an isolated Vercel Sandbox. This isolation ensures that agents cannot interfere with each other, cannot access data outside their snapshot, and cannot persist changes between queries. The sandbox model also provides deterministic traces for debugging, meaning you can replay exactly what the agent did for any given query.

Complexity Router

Not every question needs the same model. The template includes a complexity router that classifies incoming queries and routes them to optimal models via Vercel's AI Gateway. Simple factual lookups can use cheaper, faster models, while complex analytical questions get routed to more capable (and expensive) models. This is a straightforward but effective cost optimization.

Admin and Observability

The admin interface includes statistics, logs, and an AI admin agent with capabilities like query_stats and run_sql for monitoring and managing the system. This is not a black box; operators can see exactly what queries are being run, how the agent searches for information, and where answers come from.

Multi-Platform Deployment

The template uses Vercel's Chat SDK with adapters for multiple platforms. You can deploy the knowledge agent as a web chat interface, a GitHub bot, a Discord bot, or a Slack integration. This is built with the AI SDK and structured as a Nuxt application, with @savoir/sdk providing the tool interface.

Vercel Knowledge Agent - RAG vs Filesystem

Why This Approach Beats RAG for Many Use Cases

The advantages of the filesystem approach over traditional RAG become clear when you consider the failure modes of each system.

Debugging is Transparent

When a RAG system gives a wrong answer, the debugging process is opaque. You need to check whether the right chunks were retrieved, whether the embeddings captured the semantic meaning correctly, and whether the similarity threshold was appropriate. With the filesystem approach, you can see exactly which grep commands the agent ran, which files it read, and how it synthesized the answer. If the answer is wrong, you fix it by editing the source files or adjusting the search strategy, not by retuning an embedding model.

No Silent Failures

One of the most insidious problems with embedding-based retrieval is silent failure. The system retrieves chunks that seem relevant based on vector similarity but miss the actual answer. The user gets a confident-sounding but incorrect response, and there is no easy way to detect this happened. Filesystem search is more explicit: either grep found the content or it did not.

Simpler Maintenance

RAG systems require ongoing maintenance of the embedding pipeline: rechunking when documents change, re-embedding when you switch embedding models, managing vector database capacity and performance, and tuning retrieval parameters. The filesystem approach requires none of this. Add a file, and it is searchable. Update a file, and the updates are immediately available.

Cost Efficiency

Vercel reports that one deployment reduced the cost of a sales call agent from 1.00 dollar to 0.25 dollars per call, a 75 percent reduction, while also improving answer quality. The cost savings come from eliminating the vector database infrastructure and the embedding computation pipeline, plus the complexity router's ability to use cheaper models for simple queries.

When RAG Still Makes Sense

To be fair, the filesystem approach is not universally superior. RAG has genuine advantages for certain use cases.

If your data corpus is massive, hundreds of millions of documents, filesystem search may not scale as efficiently as purpose-built vector indexes. If you need fuzzy semantic matching where exact keywords are not present in the source material, embeddings can surface related content that grep would miss. If your queries require understanding nuanced relationships between concepts that are not co-located in documents, vector similarity can outperform keyword search.

However, many real-world knowledge agent deployments do not have these requirements. Corporate documentation, product knowledge bases, code repositories, and support ticket archives are all well-suited to filesystem-based search. The documents are a manageable size, the relevant information usually contains recognizable keywords, and the structure of the content provides natural context.

Building Your Own Knowledge Agent

The template is open source and available in the vercel-labs/knowledge-agent-template GitHub repository. Deployment is a one-click process on Vercel, making it one of the fastest paths from zero to a functioning knowledge agent.

Step 1: Define Your Sources

Through the admin UI, add the repositories, documents, or content sources you want the agent to search. These are stored in Postgres and synced to the agent's filesystem snapshot.

Step 2: Configure the Complexity Router

Set up routing rules that direct queries to appropriate models. Simple questions can use smaller, cheaper models while complex research questions get premium model access. This configuration is done through the AI Gateway.

Step 3: Deploy and Test

Deploy the application on Vercel and test with representative queries. The deterministic trace logging makes it easy to see how the agent searches for answers and where it might need help.

Step 4: Iterate on Sources, Not Embeddings

When the agent gives a wrong answer, the fix is intuitive. Either the source material does not contain the answer (add it) or the agent's search strategy did not find it (adjust the content or structure). There is no embedding space to retune, no chunking strategy to rethink, and no similarity threshold to adjust.

Vercel Knowledge Agent - Architecture

Practical Applications

Internal Knowledge Management

Companies accumulate documentation across wikis, Confluence pages, Notion databases, Slack threads, and email archives. A filesystem-based knowledge agent can search across all of these once they are synced to the source repository, providing a unified search experience without the complexity of maintaining separate embedding indexes for each source.

Developer Documentation Bots

Open-source projects and internal platforms can deploy a knowledge agent against their documentation repositories. As docs are updated in Git, the agent's knowledge updates automatically. This is significantly simpler than maintaining a RAG pipeline that needs to re-embed on every commit.

Customer Support

The 75 percent cost reduction Vercel reported for a sales call agent is particularly relevant for customer support applications. These agents handle high query volumes where even small per-query cost reductions compound into significant savings. The improved debugging transparency also means support teams can more easily identify and fix knowledge gaps.

Email and Communication Intelligence

The principle of searching existing data rather than transforming it into embeddings applies broadly to communication tools. Rather than embedding email histories to build context, AI systems can search message archives directly for relevant precedents and context. Maylee takes a similar philosophy with its AI Labels feature, which auto-classifies incoming emails based on user-defined rules rather than requiring users to maintain complex filtering systems, proving that simpler approaches often outperform over-engineered ones.

The Bigger Trend: Simplicity Over Complexity

The Vercel Knowledge Agent Template is part of a broader counter-movement in AI development. After years of increasingly complex infrastructure stacks, there is growing recognition that simpler architectures often deliver better results at lower cost.

This does not mean embeddings and vector databases are obsolete. They remain the right tool for certain problems. But the reflexive assumption that every knowledge-intensive AI application needs RAG is being challenged. The filesystem approach demonstrates that LLMs are capable search agents on their own, and sometimes the best retrieval strategy is to let the model search through files the same way a developer would.

For teams evaluating how to build AI systems that understand their data, the Vercel template offers a compelling starting point. It is open source, deploys in minutes, costs less to run than most RAG alternatives, and is dramatically easier to debug and maintain. That combination of simplicity, transparency, and economy makes it worth serious consideration, even if your eventual architecture evolves into something more complex.

The knowledge agent space is moving fast. What is not moving is the fundamental insight at the heart of this approach: the best AI system is often the simplest one that solves the problem.

Frequently Asked Questions

What is the Vercel Knowledge Agent Template?+

It is an open-source template from Vercel Labs for building AI knowledge agents that search your data using grep, find, and cat commands instead of embeddings or vector databases. It runs in isolated Vercel Sandboxes and can be deployed as web chat, GitHub, Discord, or Slack bots.

How does it work without embeddings or a vector database?+

Sources are synced to a filesystem snapshot. When a query arrives, the agent runs bash commands like grep -r, find, and cat to search through files, read content, and synthesize answers. Modern LLMs are already trained on code and terminal interactions, so they use these tools effectively.

What cost savings does this approach offer?+

Vercel reports that one deployment reduced sales call agent costs from 1.00 dollar to 0.25 dollars per call, a 75 percent reduction, while also improving answer quality. Savings come from eliminating vector database infrastructure and using a complexity router to match query difficulty to model cost.

When should I still use RAG with embeddings?+

RAG remains better for very large corpora of hundreds of millions of documents, queries requiring fuzzy semantic matching without exact keywords, and use cases needing nuanced understanding of relationships between concepts not co-located in source documents.

How do I fix wrong answers from the knowledge agent?+

Debugging is transparent. You can see exactly which grep commands the agent ran and which files it read. Fix wrong answers by editing source files or improving content structure, rather than retuning embeddings or adjusting chunking strategies.

What platforms can I deploy the knowledge agent to?+

The template supports deployment as a web chat interface, GitHub bot, Discord bot, or Slack integration through Vercel's Chat SDK adapters. It is built as a Nuxt application using the AI SDK.

Is the template free to use?+

The template itself is open source. Costs come from the underlying Vercel Sandbox and AI SDK usage, which are usage-based. There is no separate pricing for the template.

How does the complexity router work?+

The complexity router classifies incoming queries by difficulty and routes them to appropriate AI models via Vercel's AI Gateway. Simple factual lookups use cheaper, faster models while complex analytical questions get routed to more capable models, optimizing cost without sacrificing quality.

Ready to get started?

Maylee

It thinks inside the box.

Resources

Contact

© 2026 Maylee. All rights reserved.