NVIDIA Agent Toolkit: The Enterprise Platform for Running AI Agents at Scale

NVIDIA launches Agent Toolkit with OpenShell, NeMo Agent Toolkit, and AI-Q Blueprint for building secure, observable enterprise AI agents.

Data & IT Infrastructure
NVIDIA Agent Toolkit: The Enterprise Platform for Running AI Agents at Scale

From GPU Maker to Agent Platform Provider

Nvidia written in black on a white background, with a light green logo on the left.

NVIDIA announced Agent Toolkit at GTC 2026, marking the company's most significant move beyond hardware into the AI application layer. This is not a single product but a comprehensive open-source platform for building, running, evaluating, and optimizing autonomous AI agents with enterprise-grade safety, security, and cost controls.

The timing is deliberate. Enterprises have moved past the question of whether AI agents work. The urgent question now is whether AI agents can be deployed safely, monitored effectively, and scaled economically. NVIDIA Agent Toolkit addresses all three concerns through a collection of complementary components: OpenShell for secure agent runtime, NeMo Agent Toolkit (NAT) for observability and optimization, and AI-Q Blueprint as a reference architecture for deep research agents.

The adoption signals are strong. Adobe, Atlassian, Amdocs, Box, Cadence, Cisco, Cohesity, CrowdStrike, Dassault Systemes, IQVIA, Red Hat, SAP, Salesforce, Siemens, ServiceNow, and Synopsys are among the enterprise platforms integrating components from the toolkit.

OpenShell: Where Autonomous Agents Meet Security Policy

NVIDIA NeMo Agent Toolkit

OpenShell is the most consequential component of the toolkit. It is an open-source runtime layer that sits between the agent and the infrastructure, governing what the agent can do and where inference is routed. Its explicit goal is to make autonomous, long-running agents safer to deploy in enterprise contexts.

The Security Model

OpenShell operates on a deny-by-default principle. Every action an agent attempts, whether filesystem access, network calls, or process execution, must be explicitly permitted by policy. This is fundamentally different from the prompt-based guardrails that most agent frameworks rely on. Policies are enforced out-of-process, meaning the control logic lives outside the agent itself. Even if an agent is compromised or hallucinates dangerous actions, the enforcement layer cannot be overridden by the agent.

Policies can be updated live at sandbox scope without restarting the agent. Every allow or deny decision is logged to a complete audit trail. Agents can propose policy updates for human approval, creating a collaborative governance model rather than a purely restrictive one.

Enforcement Granularity

The policy engine operates across filesystem, network, and process layers with granular checks at the binary, destination, method, and path level. This means you can allow an agent to read files in a specific directory but not write to it, permit API calls to certain endpoints but block others, and restrict which processes the agent can spawn.

Privacy Router

A particularly enterprise-relevant feature is the privacy router, which keeps sensitive context local with open models and routes to frontier models only when policy allows. The routing decisions are driven by cost and privacy policy rather than left to the agent's discretion. This means organizations can use powerful cloud-based models for general reasoning while ensuring sensitive data never leaves their infrastructure.

Zero-Code Integration

NVIDIA claims you can wrap existing coding agents with OpenShell's security layer using a single command, such as: `openshell sandbox create --remote spark --from openclaw`. This works with unmodified agents including OpenClaw, Claude Code, and OpenAI Codex, which dramatically lowers the adoption barrier.

NeMo Agent Toolkit (NAT): Making Agents Measurable

NeMo Agent Toolkit, installed via `pip install nvidia-nat`, is an open-source, framework-agnostic library focused on making agent systems observable, evaluable, and optimizable.

Cross-Framework Compatibility

NAT works alongside existing frameworks rather than replacing them. It explicitly supports LangChain, Google ADK, CrewAI, and custom frameworks, exporting telemetry via OpenTelemetry to observability backends like Phoenix, Langfuse, and Weave. This is a critical design choice: enterprises that have already invested in a specific agent framework do not need to migrate.

Core Capabilities

NAT provides a YAML configuration builder and universal descriptors for agents, tools, and workflows, enabling teams to prototype and tune without large refactors. Built-in evaluation commands test agents against datasets, score outputs with customizable metrics, and generate reports, treating agent testing similarly to software unit testing.

The Agent Hyperparameter Optimizer is particularly valuable. It automatically selects optimal model types, temperature settings, max tokens, and prompts to balance accuracy, latency, and cost. Instead of manually tuning dozens of parameters, teams can define their objectives and let the optimizer find the best configuration.

NAT also includes "intelligent request routing" using telemetry hints with NVIDIA Dynamo, and safety and security middleware for red-teaming workflows covering prompt injection, jailbreak attempts, and tool poisoning.

Model Context Protocol (MCP) Support

NAT emphasizes MCP compatibility, allowing agents to connect to tools served by remote MCP servers and publish their own tools via MCP. This interoperability standard is increasingly important as the agent ecosystem fragments across frameworks and providers. Products that support MCP can participate in a broader agent ecosystem. For example, Maylee's MCP integration allows AI assistants to control email workflows programmatically, demonstrating how MCP is becoming a universal connector between AI-powered tools.

NVIDIA Agent Toolkit - Enterprise Agents

AI-Q Blueprint: A Reference Architecture for Research Agents

AI-Q is NVIDIA's reference implementation for building customizable research agents that produce both quick answers with citations and deeper report-style research. It is built on LangGraph (state machine architecture) with modular agents including an orchestration node that classifies intent and sets research depth.

Architecture

The system uses a multi-agent decomposition with distinct planner, researcher, and orchestrator roles. The orchestration node classifies incoming queries as requiring either shallow or deep research, routing them to specialized agents accordingly. Configuration is handled through YAML files for routing, tools, and LLM selection, and deployment uses Docker Compose or Helm with interfaces for CLI, web UI, or asynchronous jobs.

Benchmark Results

AI-Q achieved the number one ranking on both DeepResearch Bench with a score of 55.95 and DeepResearch Bench II with a score of 54.50. These results were achieved using a hybrid model strategy and documented training process involving approximately 80,000 generated trajectories filtered to 67,000 for training, run for a single epoch over 25 hours on 128 NVIDIA H100 GPUs.

The Hybrid Model Strategy

A key insight from AI-Q's implementation is the hybrid approach: using frontier models for orchestration while delegating bulk research and reasoning to NVIDIA's open Nemotron models. NVIDIA claims this strategy reduces query costs by more than 50 percent compared to using frontier models exclusively. This is not just a cost optimization; it demonstrates that agent systems do not need frontier-class intelligence at every step to produce frontier-class results.

The Economic Argument for Agent Infrastructure

The cost narrative is central to NVIDIA's pitch. As enterprises scale from experimental agent deployments to production workloads, inference costs become a primary concern. Random, unoptimized use of frontier models for every agent action quickly becomes unsustainable.

The Agent Toolkit addresses this through multiple mechanisms. The privacy router in OpenShell can direct routine tasks to cheaper open models while reserving frontier models for complex reasoning. NAT's hyperparameter optimizer reduces waste by finding configurations that maintain quality at lower cost. The hybrid model strategy demonstrated in AI-Q provides a concrete template for cost-efficient agent architectures.

NAT's telemetry capabilities also enable what NVIDIA calls agent "FinOps," the ability to track granular metrics on tool usage efficiency, computational costs, and cross-agent coordination. This transforms LLM costs from opaque overhead into measurable, allocatable expenses that can be budgeted and charged back to business units.

Why This Matters for Enterprise Adoption

The fundamental challenge with enterprise AI agents is not capability but governance. Most organizations can build agents that work in demos. The hard part is deploying agents that work reliably, securely, and economically in production, especially when those agents can execute code, access databases, make API calls, and interact with external systems.

NVIDIA Agent Toolkit tackles this governance gap head-on. OpenShell ensures agents operate within defined boundaries, even when running autonomously for extended periods. NAT provides the visibility needed to understand what agents are doing, how well they are performing, and what they cost. The blueprint architecture provides a proven starting point rather than forcing teams to design agent systems from scratch.

The partner list reinforces this narrative. When Adobe, Salesforce, SAP, and ServiceNow integrate these components, the toolkit's security and observability features become available to millions of enterprise users through tools they already use. This reduces the "last mile" friction that often prevents AI capabilities from reaching production.

NVIDIA Agent Toolkit - Stack

For Developers Getting Started

The entry point is straightforward. Install NAT via pip, define your agent configuration in YAML, and start instrumenting your existing agent code. If you are using LangChain, Google ADK, or CrewAI, NAT integrates without requiring you to change frameworks.

For security, OpenShell can wrap existing coding agents with a single command. Start with restrictive policies and gradually expand permissions as you gain confidence in the agent's behavior. The audit trail provides a safety net for reviewing what the agent attempted and what was blocked.

AI-Q serves as a reference architecture for teams building research or search-oriented agents. Clone the blueprint, customize the YAML configuration for your data sources and model preferences, and deploy via Docker Compose for development or Helm for production.

The entire toolkit is open source, with OpenShell explicitly under the Apache 2.0 license. There are no licensing fees for the software itself, though production deployments will involve costs for inference compute, observability backends, and GPU infrastructure.

The Bigger Picture

NVIDIA's move into agent infrastructure is strategically significant. By providing the runtime, observability, and optimization layer for AI agents, NVIDIA creates demand for its GPUs at every level of the stack, from model training to production inference. But the open-source approach and framework-agnostic design mean that the toolkit's value is genuine, not just a hardware sales vehicle.

For enterprise IT leaders, the message is clear: the tools to deploy AI agents safely and at scale now exist. The question is no longer whether it is possible but whether your organization has the agent strategy and governance framework to take advantage of them.

Frequently Asked Questions

What is NVIDIA Agent Toolkit?+

NVIDIA Agent Toolkit is an open-source platform announced at GTC 2026 for building, running, evaluating, and optimizing autonomous AI agents in enterprise environments. It includes OpenShell for secure runtime, NeMo Agent Toolkit for observability, and AI-Q Blueprint as a reference research agent architecture.

Is NVIDIA Agent Toolkit free to use?+

The software is open source, with OpenShell under the Apache 2.0 license. There are no licensing fees, but production deployments will involve costs for inference compute, observability backends, and GPU infrastructure.

What agent frameworks does NVIDIA Agent Toolkit support?+

NeMo Agent Toolkit is framework-agnostic and explicitly supports LangChain, Google ADK, CrewAI, and custom frameworks. It integrates alongside existing frameworks rather than replacing them.

How does OpenShell protect against unsafe agent actions?+

OpenShell uses deny-by-default policies enforced out-of-process, meaning the security control lives outside the agent and cannot be overridden even if the agent is compromised. It provides granular enforcement across filesystem, network, and process layers with a complete audit trail.

What is the hybrid model strategy in AI-Q?+

AI-Q uses frontier models for orchestration and planning while delegating bulk research and reasoning to NVIDIA's open Nemotron models. This approach reportedly reduces query costs by more than 50 percent compared to using frontier models for everything.

Which companies are integrating NVIDIA Agent Toolkit?+

Adobe, Atlassian, Amdocs, Box, Cadence, Cisco, Cohesity, CrowdStrike, Dassault Systemes, IQVIA, Red Hat, SAP, Salesforce, Siemens, ServiceNow, and Synopsys are among the reported early adopters and integration partners.

What benchmark results has AI-Q achieved?+

AI-Q ranks first on DeepResearch Bench with a score of 55.95 and first on DeepResearch Bench II with a score of 54.50, demonstrating the effectiveness of the hybrid orchestration approach.

How do I get started with NVIDIA Agent Toolkit?+

Install NeMo Agent Toolkit via pip with the command pip install nvidia-nat. Define your agent configuration in YAML, and start instrumenting your existing agent code. OpenShell can wrap existing coding agents with a single command.

Ready to get started?

Maylee

It thinks inside the box.

Resources

Contact

© 2026 Maylee. All rights reserved.