


Leanstral is Mistral's open-source AI for Lean 4 formal proofs. 15x cheaper than Claude, Apache 2.0 license. Full benchmark and setup guide.
Head of Growth & Customer Success
Software testing tells you that your code works for the inputs you tried. Formal verification tells you that your code works for every possible input. That distinction between "probably correct" and "mathematically proven correct" has always been the dividing line between software that is good enough and software where failure is not an option. (Mistral AI)
Aerospace control systems, financial trading algorithms, blockchain smart contracts, medical device firmware: these are domains where "it passed the tests" is not sufficient. A single unhandled edge case can cost millions of dollars, destroy a satellite, or endanger lives. (Lean4 theorem prover)
The problem has always been accessibility. Formal verification required deep expertise in proof assistants like Lean 4 or Coq, weeks of manual proof construction, and specialists who are expensive and scarce. It remained a tool for academic researchers and a handful of high-stakes engineering teams. (Leanstral documentation)
On March 16, 2026, Mistral AI released Leanstral the first open-source AI agent specifically designed for formal verification in Lean 4. It generates proofs at 1/15th the cost of the best alternative (Claude Opus 4.6) while outperforming most models on realistic benchmarks. This is the tool that could make formal verification practical for mainstream software engineering.
Before understanding Leanstral, you need to understand Lean 4. Developed by Leonardo de Moura (formerly Microsoft Research), Lean 4 is both a proof assistant and a functional programming language. You write code, then you write a mathematical proof that the code behaves as specified. The Lean compiler acts as a binary verifier either the proof compiles (code is proven correct) or it does not. No ambiguity.
The tooling ecosystem around Lean4 has been maturing rapidly, with IDE support in VS Code, a growing library of mathematical proofs called Mathlib, and an active community of mathematicians and computer scientists contributing formalized theorems. Leanstral slots into this ecosystem as the AI acceleration layer, handling the tedious parts of proof construction while human experts focus on the high-level proof strategy and verification.
For development teams considering formal verification, the practical workflow with Leanstral looks like this: you write a specification of what your code should do in Lean4's type system, then ask Leanstral to generate a proof that your implementation satisfies that specification. The model handles the mechanical proof steps while flagging cases where it cannot find a valid proof, which often indicates genuine bugs in the implementation. This workflow transforms formal verification from a specialized academic exercise into something closer to an advanced type-checking tool that any senior developer can use.
The cost comparison is particularly striking when you factor in the alternative. Before Leanstral, achieving formal verification for a medium-complexity software module typically required hiring a specialized formal methods consultant at rates of 200 to 500 dollars per hour, with engagements lasting weeks or months. Leanstral's ability to generate proof attempts at API call prices fundamentally disrupts this model, though human experts remain essential for reviewing and validating the generated proofs.
Here is how to call Leanstral through the Mistral API for automated proof generation:
import requests
response = requests.post(
"https://api.mistral.ai/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_MISTRAL_API_KEY"},
json={
"model": "leanstral-2026",
"messages": [{
"role": "user",
"content": "Prove that for all natural numbers n, "
"n + 0 = n in Lean4"
}],
"temperature": 0.1
}
)
proof = response.json()["choices"][0]["message"]["content"]
print(proof)
# theorem add_zero (n : Nat) : n + 0 = n := by
# induction n with
# | zero => rfl
# | succ n ih => simp [Nat.succ_add, ih]https://x.com/MistralDevs/status/2033621477736477077
Lean 4 is not academic obscurity. It is used by:
Google DeepMind: AlphaProof, which earned an IMO silver medal in 2024, uses Lean
Amazon: Cedar policy verification system
Mathlib community: 20,000+ contributions formalizing mathematics, backed by $15 million in 2025 funding
10,000+ members on the Lean Zulip community
Leanstral is built on a Sparse Mixture-of-Experts (MoE) architecture:
Specification | Value |
|---|---|
Total Parameters | ~119 billion |
Active Parameters per Token | ~6.5 billion |
Expert Modules | 128 |
Experts Active per Token | 4 |
Architecture | Sparse MoE |
License | Apache 2.0 |
The MoE design is critical to understanding Leanstral's cost advantage. Each token activates only 4 of 128 expert modules, giving the model the knowledge capacity of a 119B parameter model at the inference cost of a 6.5B one. That 18x efficiency ratio is what enables the dramatic price difference versus generalist models.
This is Leanstral's decisive technical advantage over generalist models. Instead of generating text that looks like Lean code and hoping it compiles, Leanstral interacts directly with the Lean 4 compiler through the Model Context Protocol (MCP).
In practice, the agent can:
Check types in the Lean compiler in real time
Execute proof tactics and observe results
Read and interpret error messages from the compiler
Iteratively refine proofs in a live interactive loop
The model does not guess at proofs. It builds them in dialogue with the verifier, adjusting its approach based on compiler feedback. This is fundamentally more reliable than a generalist model generating proof code from pattern matching.
Mistral introduced FLTEval, a new benchmark designed to evaluate proof engineering under realistic conditions. It is based on the Fermat's Last Theorem (FLT) formalization project at Imperial College London, led by Professor Kevin Buzzard, with 55 contributors and EPSRC funding through 2029.
Unlike MiniF2F (which tests isolated competition-math problems), FLTEval measures the ability to complete proofs in a realistic environment with imports, library dependencies, and multi-file proof structures. This is a much harder test than synthetic benchmarks.
Model | FLTEval Score (pass@N) | Cost per Run | Active Params |
|---|---|---|---|
Claude Opus 4.6 (pass@16) | 39.6 | ~$1,200+ | Undisclosed |
Leanstral pass@16 | 31.9 | ~$288 | 6.5B |
Leanstral pass@8 | 31.0 | ~$144 | 6.5B |
Leanstral pass@2 | 28.2 | ~$36 | 6.5B |
Sonnet 4.6 pass@2 | 25.6 | ~$549 | Undisclosed |
Haiku 4.5 pass@2 | 24.9 | ~$184 | Undisclosed |
Qwen3.5 pass@4 | 25.4 | N/A | 17B active |
The critical comparison: Leanstral at pass@2 ($36) beats Sonnet 4.6 ($549) by 2.6 points at 1/15th the cost. It also beats Haiku 4.5 ($184) by 3.3 points at roughly 1/5th the cost.
Claude Opus 4.6 maintains a clear lead on raw quality: 39.6 vs 31.9 for Leanstral's best configuration. If your absolute priority is maximum proof accuracy and budget is not a constraint, Opus remains the top choice. The Hacker News community flagged this as notable a model specifically trained for formal verification should, in theory, beat a generalist model.
Leanstral's performance gains appear to flatten beyond pass@8. The jump from pass@8 (31.0) to pass@16 (31.9) is only 0.9 points for a doubling in cost. For cost-conscious teams, pass@4 or pass@8 likely represents the optimal price-performance point.
Formal verification has always been too expensive for most teams. Leanstral changes the economics enough to make it viable in new contexts.
DeFi bugs have cost billions in recent years. Formal verification is the gold standard for guaranteeing that a smart contract executes exactly as specified. Traditional formal audits cost tens of thousands of dollars and take weeks. With Leanstral, the cost of generating a formal correctness proof drops to the range of $36–$288, making it feasible to formally verify contracts that would never have justified the expense before.
In regulated industries, formal verification is not a luxury it is a compliance requirement. Leanstral enables teams to:
Specify expected behavior in Lean 4
Automatically generate compliance proofs
Have the Lean compiler verify the proof is valid
The cost savings versus manual proof construction or expensive proprietary tools can be substantial, especially for teams that need to verify many components.
This may be Leanstral's most strategically important use case. AI coding assistants (Copilot, Cursor, Claude Code) generate functional code most of the time. But "most of the time" is not good enough for high-stakes applications.
Leanstral enables what Mistral calls "trustworthy vibe coding": humans specify what they want, AI generates the code, and Leanstral proves it is correct. The Lean 4 compiler serves as the final arbiter either the proof compiles or it does not. No gray area, no "it looks right," no trusting the model's judgment.
The FLT project and Mathlib demonstrate Leanstral's potential to accelerate formalized mathematics. Researchers can delegate routine proof steps to the agent and focus their expertise on creative, novel proof strategies.
Mistral demonstrated Leanstral translating proofs from Rocq (formerly Coq) to Lean 4 while preserving semantics. For academic teams or companies with legacy proof codebases, this migration capability can save months of manual translation work.
The /leanstall command in Mistral Vibe CLI (version 2.5.0, released March 16, 2026) automatically configures the Leanstral agent. This is the fastest path from zero to working proofs.
https://x.com/btibor91/status/2029673694960964001
The labs-leanstral-2603 endpoint is available for free during a limited period. Mistral is collecting real-world feedback to improve future versions. Ideal for evaluation and proof-of-concept work.
Model weights are published under the Apache 2.0 license on Hugging Face (mistralai/Leanstral-120B-A6B-2603). Recommended hardware:
Configuration | Requirements |
|---|---|
Recommended | 4x A100 80GB or 4x H100 GPUs |
Framework | vLLM with Flash Attention |
Quantization | Available for smaller setups |
Note: the Hugging Face page showed a temporary 404 error at launch, which may have been resolved.
Leanstral is designed exclusively for Lean 4. It does not replace general-purpose coding assistants. If you need help with Python, TypeScript, Rust, or SQL, this is not the right tool. It is a specialist, not a generalist.
The 39.6 vs 31.9 gap on FLTEval is significant. For teams where proof accuracy is the only metric that matters, Claude Opus 4.6 remains the better choice despite costing substantially more.
Self-hosting requires 4 high-end GPUs (A100 or H100), representing a significant hardware investment. For teams without this infrastructure, the free API and Mistral Vibe CLI are more practical entry points.
The diminishing returns beyond pass@8 suggest that throwing more compute at Leanstral does not scale linearly. Teams should benchmark their specific use case to find the optimal pass count rather than defaulting to pass@16.
Leanstral addresses a fundamental problem in AI: trust. When an AI system generates code, writes an email, or makes a recommendation, how do you know it is correct?
https://x.com/anishmoonka/status/2032519515817599047
The implications of Leanstral for enterprise software development are substantial. Formal verification has traditionally been reserved for safety-critical systems like aviation software, medical devices, and nuclear power plant controllers. The cost barrier meant that even companies with regulatory requirements for verified software often relied on extensive testing rather than mathematical proof. With Leanstral reducing the cost by over 90%, formal verification becomes economically viable for a much broader range of applications.
For email platforms like Maylee that handle sensitive business communications, formal verification of critical code paths such as message delivery guarantees, encryption implementations, and data integrity checks could provide a level of assurance that traditional testing simply cannot match. A formally verified email routing algorithm can be mathematically proven to never lose a message, whereas even millions of test runs can only demonstrate that no message was lost during testing.
The Lean4 theorem prover that Leanstral targets has been gaining significant traction in the academic and industrial mathematics communities. Its type system and tactic framework make it possible to express and prove complex mathematical properties about software behavior. What Mistral AI has done with Leanstral is essentially train a language model to be fluent in this formal language, dramatically lowering the barrier to entry for developers who want to use formal methods but lack the specialized expertise.
The competitive landscape for AI-assisted formal verification is still nascent. While Leanstral represents the first major commercial offering specifically targeting Lean4 proof generation, several research groups are working on similar capabilities for other proof assistants like Coq, Isabelle, and Agda. The question is whether Mistral AI's first-mover advantage and Leanstral's integration into the broader Mistral API ecosystem will create a winner-takes-most dynamic in this emerging market.
Formal verification provides the strongest possible answer: mathematical proof. While Leanstral applies this specifically to Lean 4 code, the underlying principle building systems that can prove their own correctness is relevant across AI applications.
The same philosophy drives confidence scoring in AI-powered tools. When an email AI like Maylee auto-drafts a reply, it assigns a confidence score to each response. Above a user-defined threshold, the system sends automatically. Below it, the draft waits for human review. It is not formal verification, but it applies the same principle: the AI system quantifies its own certainty and routes decisions accordingly.
As AI systems take on more autonomous responsibilities, mechanisms for verifying correctness whether mathematical proofs or confidence thresholds become essential infrastructure.
You develop smart contracts or DeFi protocols where bugs have direct financial consequences
You work in regulated industries (aerospace, finance, healthcare) with formal compliance requirements
You are generating code with AI tools and need to verify correctness beyond testing
You are a researcher working with Lean 4 or Mathlib and want to accelerate proof construction
You need to migrate proofs from Coq/Rocq to Lean 4
Your codebase is in languages Leanstral does not support (it is Lean 4 only)
Your software's risk profile does not justify formal verification costs
You lack the Lean 4 expertise to write specifications (the model generates proofs, but you still need to define what "correct" means)
You need enterprise-grade support and SLAs (Leanstral is new and community-supported)
Leanstral fills a genuine gap. Before its release, AI-assisted formal verification meant either paying premium prices for Claude Opus or using generalist models not optimized for proofs. Leanstral sits at the intersection of open source (Apache 2.0), proof-specialized training, and aggressive cost efficiency. No other model currently occupies that space.
Looking ahead, the convergence of AI-generated proofs and traditional software testing could reshape quality assurance practices across the industry. Leanstral represents the first step toward a future where critical business logic is not merely tested but mathematically proven correct. For companies building AI-powered email systems, financial tools, or healthcare applications, the ability to provide formal guarantees about software behavior at an accessible price point could become a competitive differentiator and, eventually, a regulatory requirement. The $36 price point that Leanstral offers today may well seem like a bargain within a few years as the demand for verified software accelerates.
Enterprise adoption of formal verification tools is also being driven by the increasing complexity of AI systems themselves. As companies deploy AI agents that make autonomous decisions, the ability to formally verify that these agents will behave within specified bounds becomes not just desirable but essential. Leanstral's combination of affordability and Lean4's expressive type system makes it uniquely positioned to serve this emerging market.
Leanstral is best suited for teams that already use Lean4 or are considering adopting formal methods for safety-critical components. If your organization builds financial trading systems, medical device software, autonomous vehicle controllers, or cryptographic libraries, Leanstral can deliver immediate value by accelerating the proof generation process that your formal methods engineers already perform manually.
The integration of Leanstral into CI/CD pipelines is another promising development path. Teams could configure their build process to automatically run Leanstral verification on critical modules whenever code changes are pushed, catching specification violations before they reach production.
The question is no longer whether AI-assisted formal verification is possible. It is how quickly teams will integrate it into their quality assurance workflows. At $36 per proof attempt, the barrier just dropped by an order of magnitude.
Leanstral is the first open-source AI agent specifically designed for Lean 4 formal verification. Released March 16, 2026, it generates mathematical proofs of code correctness using a 119B parameter Mixture-of-Experts architecture with only 6.5B active parameters per token. Licensed under Apache 2.0.
Leanstral at pass@2 costs approximately $36 per proof attempt versus $549 for Sonnet 4.6 and over $1,200 for Claude Opus 4.6 at pass@16. Leanstral achieves higher scores than Sonnet and Haiku at a fraction of the cost, making formal verification 15x cheaper than the closest alternative.
Claude Opus 4.6 still leads on raw quality with an FLTEval score of 39.6 versus 31.9 for Leanstral pass@16. However, Leanstral achieves competitive results at dramatically lower cost. For cost-sensitive teams, Leanstral offers the best price-performance ratio. For maximum accuracy regardless of cost, Claude Opus remains the top choice.
Leanstral is designed exclusively for Lean 4. It does not support Python, TypeScript, Rust, SQL, or any other language. It is a specialized formal verification tool, not a general-purpose coding assistant. Use it alongside your regular coding tools like Copilot or Claude Code.
Three options: (1) Use the /leanstall command in Mistral Vibe CLI v2.5.0 for zero-setup access. (2) Use the free labs-leanstral-2603 API endpoint during the limited free period. (3) Self-host the open-source weights from Hugging Face (mistralai/Leanstral-120B-A6B-2603), requiring 4x A100 or H100 GPUs.
Formal verification uses mathematical proofs to guarantee that software behaves exactly as specified for all possible inputs. Unlike testing (which checks specific cases), formal verification proves correctness universally. It is essential in high-stakes domains like aerospace, finance, smart contracts, and healthcare where bugs can cost millions or endanger lives.
Yes, this is one of its most strategic use cases. AI coding tools like Copilot and Cursor generate code that works "most of the time." Leanstral can formally prove that AI-generated code meets its specification. The workflow is: specify the expected behavior in Lean 4, let the AI generate code, then use Leanstral to produce a mathematical proof of correctness.
Mistral recommends 4x A100 80GB or 4x H100 GPUs with vLLM and Flash Attention. This represents a significant hardware investment. For teams without this infrastructure, the free API endpoint and Mistral Vibe CLI provide accessible alternatives without any hardware requirements.