Gemma 4's small model lineup has two options: E2B (2 billion parameters) and E4B (4 billion parameters). They're both designed to run on constrained hardware, but the gap between them is bigger than the parameter count suggests. Here's how they compare.
What Are E2B and E4B?
Both are lightweight, dense models optimized for on-device inference. No MoE routing, no experts — just compact networks designed to fit in tight memory budgets.
E2B is the smallest model in the Gemma 4 family. At 2 billion parameters, it's built for scenarios where every megabyte of RAM matters — think phones, Raspberry Pi, IoT devices, and embedded systems.
E4B doubles the parameter count to 4 billion. It's still small enough for local use on a laptop or decent phone, but it punches well above its weight on reasoning, coding, and multimodal tasks.
Gemma 4 Small Models:
┌──────────────────────────────────────┐
│ E2B (2B params) │
│ Ultra-compact · Phones · Edge │
│ ~250 MB RAM (CoreML) · 11 tok/s │
├──────────────────────────────────────┤
│ E4B (4B params) │
│ Compact · Laptops · Daily driver │
│ ~1.5 GB RAM (Q4) · 35 tok/s │
└──────────────────────────────────────┘Head-to-Head Comparison
| Metric | E2B (2B) | E4B (4B) |
|---|---|---|
| Parameters | 2B | 4B |
| Model size (FP16) | ~4 GB | ~8 GB |
| Model size (Q4_K_M) | ~1.2 GB | ~2.5 GB |
| RAM (Q4_K_M) | ~1.5 GB | ~3 GB |
| RAM (CoreML, iPhone) | ~250 MB | ~800 MB |
| Context window | 8K | 32K |
| Multimodal | Text only | Text + Image |
The file size and RAM differences are roughly 2x, which makes sense given the parameter count. But the real story is in context length and multimodal support — E4B gets 4x the context and can process images.
Speed Comparison
E2B is faster on the same hardware, but E4B is still plenty fast for interactive use:
| Hardware | E2B (tok/s) | E4B (tok/s) | E2B Speedup |
|---|---|---|---|
| iPhone 15 Pro (CoreML) | ~11 | ~5 | 2.2x |
| iPhone 16 Pro (CoreML) | ~15 | ~7 | 2.1x |
| Raspberry Pi 5 (8GB) | ~8 | ~4 | 2x |
| M3 MacBook Air (Q4) | ~65 | ~35 | 1.9x |
| RTX 3060 12GB (Q4) | ~120 | ~70 | 1.7x |
On an iPhone with CoreML-LLM, E2B runs at about 11 tokens per second while using only 250 MB of RAM and drawing around 2W of power. That's genuinely usable for real-time chat on a phone without killing the battery.
E4B is about half the speed on mobile, but on a laptop or desktop it's still fast enough that you won't notice the difference in practice.
Quality Comparison
This is where E4B pulls ahead significantly:
| Benchmark | E2B (2B) | E4B (4B) | Winner |
|---|---|---|---|
| MMLU | 52.1 | 61.8 | E4B (+9.7) |
| HumanEval | 38.4 | 52.6 | E4B (+14.2) |
| GSM8K | 45.2 | 62.1 | E4B (+16.9) |
| MATH | 18.3 | 28.7 | E4B (+10.4) |
| ARC-Challenge | 48.9 | 57.3 | E4B (+8.4) |
| Average | 40.6 | 52.5 | E4B (+11.9) |
Unlike the 26B vs 31B comparison where the quality gap was 1-2 points, here the gap is massive — nearly 12 points on average. E4B is meaningfully smarter, especially on math and code.
Where You'll Notice the Difference
- Simple Q&A and chat: Both handle basic conversation fine. E2B occasionally produces less coherent long responses.
- Reasoning and math: E4B is significantly better. E2B struggles with multi-step problems.
- Code generation: E4B writes usable code snippets. E2B can autocomplete but struggles with full function implementations.
- Multilingual: E4B handles Chinese, Japanese, Korean, and European languages much better. E2B is mostly English-capable.
- Image understanding: Only E4B supports this. If you need vision, the choice is made for you.
When to Choose E2B
E2B is the right pick when you're operating at the absolute edge of what hardware can support:
- Phones with limited RAM — older iPhones, budget Android devices where 250 MB is all you can spare
- Raspberry Pi and SBCs — runs well on a Pi 5 with 4GB RAM
- IoT and embedded — smart home devices, always-on assistants with minimal power budget
- Offline keyword extraction and classification — when you just need basic NLP, not full reasoning
- CoreML-LLM on iPhone — 11 tok/s at 250 MB RAM and 2W power is remarkable for on-device AI
- Batch processing at scale — when you need to process millions of items and cost per inference matters
If your use case is "respond to simple queries on a device with very little RAM," E2B does the job.
When to Choose E4B
E4B is the better choice for most people who want a small local model:
- Laptops for daily use — fast enough for interactive chat, smart enough for real work
- Better phones — iPhone 14 Pro and newer, flagship Android with 6GB+ RAM
- Coding assistant — actually useful for code completion and generation
- Multimodal tasks — image captioning, visual Q&A, document understanding
- Longer conversations — 32K context vs E2B's 8K means it can handle much longer threads
- Multilingual use — if you work in non-English languages, E4B is dramatically better
- Edge servers — small enough for a mini PC, smart enough to be useful
For a deeper look at running these on phones, see the Mobile Deployment Guide.
Quick Decision Table
| Your Situation | Pick |
|---|---|
| Phone with <1GB free RAM | E2B |
| Raspberry Pi / embedded | E2B |
| Always-on, ultra-low power | E2B |
| Laptop or desktop | E4B |
| Need image understanding | E4B |
| Coding assistance | E4B |
| Multilingual use | E4B |
| Long conversations (>8K tokens) | E4B |
| Simple text classification | E2B |
| General-purpose local AI | E4B |
E2B and E4B vs Larger Models
Where do these small models fit in the full Gemma 4 lineup?
| Model | Params | RAM (Q4) | Speed (M3 Air) | Quality (avg) |
|---|---|---|---|---|
| E2B | 2B | ~1.5 GB | ~65 tok/s | 40.6 |
| E4B | 4B | ~3 GB | ~35 tok/s | 52.5 |
| 12B | 12B | ~7 GB | ~20 tok/s | 67.8 |
| 26B MoE | 26B | ~15 GB | ~12 tok/s | 72.4 |
There's a clear quality staircase. Each step up roughly doubles the RAM and halves the speed. For the full picture, see Which Gemma 4 Model Should You Pick?
Hardware Requirements
For detailed hardware recommendations, check the Hardware Guide. Here's the quick version for small models:
E2B Minimum Hardware
- iPhone: iPhone 12 or newer (CoreML)
- Android: 4GB+ RAM, Snapdragon 8 Gen 1+
- Raspberry Pi: Pi 5 with 4GB RAM
- PC/Mac: Anything from the last 5 years
E4B Minimum Hardware
- iPhone: iPhone 14 Pro or newer (CoreML)
- Android: 6GB+ RAM, Snapdragon 8 Gen 2+
- Raspberry Pi: Pi 5 with 8GB RAM
- PC/Mac: 8GB RAM, any recent CPU/GPU
Next Steps
- Want to run these on your phone? Read the Mobile Deployment Guide for CoreML and Android setup
- Need help picking across the full lineup? See Which Gemma 4 Model Should You Pick?
- Choosing hardware? Check the Hardware Guide for GPU/CPU recommendations
For most people, E4B is the sweet spot — it's small enough to run anywhere with a few GB of RAM, but smart enough to actually be useful for coding, conversation, and multimodal tasks. Save E2B for truly constrained environments where 250 MB of RAM is all you've got.
Stop reading. Start building.
~/gemma4 $ Get hands-on with the models discussed in this guide. No deployment, no friction, 100% free playground.
Launch Playground />


