Gemma 4: Google's Open Multimodal AI You Can Run on Your Own Hardware

Frequently Asked Questions

What license does Gemma 4 use and can I use it commercially?+

Gemma 4 is released under the Apache 2.0 license, which allows commercial use, modification, and redistribution with minimal restrictions. This is a significant change from previous Gemma generations that used more restrictive community licenses.

How much memory do I need to run Gemma 4 locally?+

Memory requirements vary by model and precision. The E2B model needs as little as 3.2 GB at Q4_0 quantization, E4B needs 5 GB, the 26B MoE needs 15.6 GB, and the 31B dense model needs 17.4 GB. These figures cover weights only and exclude runtime overhead like KV cache.

What modalities does Gemma 4 support?+

All Gemma 4 models support text and image input with text output. The smaller E2B and E4B edge models also support native audio input. Video is processed as a sequence of image frames rather than through native temporal understanding.

What is the difference between the MoE and dense models?+

The 26B A4B MoE model has 25.2 billion total parameters but only activates 3.8 billion per inference pass, offering faster throughput. The 31B dense model activates all 30.7 billion parameters every time, delivering higher quality but requiring more compute per token.

Can Gemma 4 run on a smartphone?+

Yes. The E2B model at Q4_0 quantization requires only 3.2 GB of memory, making it feasible on modern smartphones. Google distributes it through the AI Edge Gallery and Android AICore Developer Preview for mobile deployment.

How does Gemma 4 compare to previous Gemma models?+

Gemma 4 shows dramatic improvements over Gemma 3 27B across all benchmarks. For example, Codeforces ELO jumped from 110 to 2150 for the 31B model, AIME 2026 scores went from 20.8% to 89.2%, and agentic task performance on Tau2 increased from 16.2% to 76.9%.

What context length does Gemma 4 support?+

The E2B and E4B edge models support 128K token contexts. The larger 26B MoE and 31B dense models support 256K token contexts.

Where can I download Gemma 4?+

Gemma 4 weights are available through Hugging Face, Kaggle, Ollama, Google AI Studio, and the Google AI Edge Gallery. Framework support includes vLLM, llama.cpp, MLX, and other popular inference engines.

Gemma 4: Google's Open Multimodal AI You Can Run on Your Own Hardware

Soizic

A New Standard for Open Multimodal AI

The Model Lineup: Four Sizes, Two Architectures

Dense Models

Mixture-of-Experts Model

Memory Requirements and Hardware Planning

Multimodal Capabilities Across the Family

Benchmark Performance: Punching Above Weight Class

What Apache 2.0 Licensing Actually Means

Architecture Details for Developers

Distribution and Ecosystem

What This Means for App Developers

The Competitive Landscape

Looking Ahead

Frequently Asked Questions

Ready to get started?