Gemma 4 + Claude Code Router: Run Claude Code on a Local Model (2026)
Claude Code is Anthropic's terminal coding agent, widely loved for its context handling and code quality. But out of the box it only talks to Anthropic's cloud API — meaning every file you touch is uploaded, and every session bills against your Anthropic account.
If you care about code privacy, need to work offline, or live behind a locked-down corporate network, that's a real blocker. Claude Code Router (CCR) is a community-built open source proxy that sits between Claude Code and its upstream API, translating requests to an OpenAI-compatible backend — such as a local Ollama instance running Gemma 4.
This guide walks through the full setup: installing CCR, wiring Gemma 4 as the upstream, running real coding sessions, and — just as important — the compliance, quality, and support caveats you have to accept before going down this path. We are not cheerleading CCR. It's a gray-area workaround, and we want you to make an informed call.
Why would anyone want Claude Code on a local model?
Before the how, the why. Three legitimate scenarios:
Regulated codebases. Finance, defense, healthcare, and some government teams can't send source code to third-party clouds. Anthropic's API terms don't change that. A local model on a workstation keeps code inside the network boundary.
Flaky or air-gapped networks. Remote work from bad Wi-Fi, long flights, field research, secure labs — cloud APIs fail, local models keep working.
Custom or fine-tuned models. You may have a domain-tuned Gemma 4 (bioinformatics, EDA, internal DSL) and want the Claude Code UX on top of your own weights.
Those are engineering reasons. "Save money on Claude" is a different motivation and not one this article defends.
Legal & compliance note (read before you run anything)
Disclaimer: The following setup rewrites Claude Code's upstream API endpoint to point at a third-party proxy. Depending on how Anthropic's terms of service evolve, this may violate the Claude Code usage agreement and put your Anthropic account at risk. This article is technical documentation, not legal advice. If you are deploying inside a company, run it past your legal and security teams first.
You also need to accept these technical tradeoffs:
- Feature downgrade. Claude Code's extended thinking, prompt caching, and some tool-use formats are tuned for Anthropic models. Swap in Gemma 4 and those features may degrade, misbehave, or silently no-op.
- Output quality. Gemma 4 26B / 31B are strong open-weight models, but on complex multi-file reasoning they still trail Claude 3.5 / Claude 4. Calibrate expectations accordingly.
- Community project, no SLA. CCR is volunteer-maintained. If Anthropic changes its wire format, CCR can break overnight, and there is no support contract to fall back on.
If those tradeoffs are acceptable, keep reading.
What is Claude Code Router, exactly?
CCR is a small local HTTP proxy, typically written in Node.js. Its job is narrow:
- Listen on a local port.
- Accept requests shaped like Anthropic's Messages API.
- Translate those requests into an OpenAI-compatible format (Ollama, LiteLLM, OpenRouter, etc.).
- Translate the streamed response back into the shape Claude Code expects.
The translation layer is the interesting bit. Message roles, tool-call blocks, streaming deltas, and stop reasons all differ between the two API families — CCR normalizes them so Claude Code believes it is still talking to Anthropic.
Prerequisites
- Node.js 18+ (CCR is a Node project)
- Ollama installed and running (ollama.com)
- Gemma 4 26B or 31B pulled via Ollama
- Claude Code (
npm install -g @anthropic-ai/claude-code) - Hardware: 26B needs ~16 GB RAM, 31B needs ~24 GB. See our hardware sizing guide linked at the end.
Step 1 — Install Claude Code Router
Clone the repo and install dependencies:
git clone https://github.com/<org>/claude-code-router.git
cd claude-code-router
npm installThe exact repo URL varies — CCR is a community project and may be forked or renamed. Search GitHub for "claude code router" or "claude code proxy" and pick the actively maintained fork.
Step 2 — Point CCR at Ollama
Create a .env (or config.json, depending on the fork) inside the CCR directory:
UPSTREAM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:26b-a4b
PORT=8082Start the proxy:
npm startExpect log output like:
Claude Code Router started on port 8082
Upstream: ollama (gemma4:26b-a4b)
Ready to accept connectionsSanity-check the HTTP endpoint:
curl http://localhost:8082/v1/modelsYou should get back a JSON list that includes your Gemma 4 model.
Step 3 — Redirect Claude Code to CCR
Claude Code reads ANTHROPIC_BASE_URL to decide where to send requests. Point it at the local proxy:
export ANTHROPIC_BASE_URL="http://localhost:8082"
export ANTHROPIC_API_KEY="local" # must be non-empty, value is ignored
claudeClaude Code will boot normally — but every request now flows through CCR into your local Gemma 4.
To avoid touching env vars every time, put them in ~/.zshrc / ~/.bashrc, or gate it behind an alias so your default claude still uses the official API:
alias claude-local='ANTHROPIC_BASE_URL=http://localhost:8082 ANTHROPIC_API_KEY=local claude'Step 4 — Actually using it
Explain code:
> Explain what src/auth/middleware.ts is doing.Gemma 4 will read the file and summarize. Explanations are decent on isolated files; deep cross-module reasoning is where quality drops.
Generate a new endpoint:
> Create src/api/health.ts with a GET /api/health handler that returns status and version.Straightforward CRUD / HTTP scaffolding works well.
Debug a failure:
> npm test fails with "Cannot read properties of undefined". Look at src/utils/parser.ts.With an explicit stack trace, Gemma 4 usually locates the bug. Subtle logic bugs without a clear error signal are noticeably harder.
Honest comparison: how does it actually feel?
| Dimension | Claude Code (official) | CCR + Gemma 4 26B | CCR + Gemma 4 31B |
|---|---|---|---|
| Code generation quality | Excellent | Decent (≈ GPT-3.5) | Good (≈ GPT-4) |
| Multi-file reasoning | Excellent | Weak | Moderate |
| Tool-use compatibility | Native | Partial | Partial |
| Extended thinking | Supported | Not supported | Not supported |
| Response speed | Fast (cloud) | 20–40 t/s local | 15–30 t/s local |
| Privacy | Code leaves the network | Fully local | Fully local |
| Offline | No | Yes | Yes |
| Monthly cost | Per-token billing | $0 | $0 |
Works for: isolated code generation, code explanation, simple bug fixes, offline / locked-down usage.
Doesn't work for: architecture refactors, long multi-file edits, anything that depends on extended thinking.
How CCR stacks up against other local AI coding setups
| Option | Native local support | Needs a proxy | Git integration | Multi-file edits | Maturity |
|---|---|---|---|---|---|
| Aider + Gemma 4 | Native | No | Auto commits | Strong | High (30K+ stars) |
| Codex CLI + Gemma 4 | Config needed | No | None | Single-file focus | Medium |
| CCR + Claude Code + Gemma 4 | No | Yes (CCR) | None | Strong (via Claude Code) | Low (experimental) |
| Cursor + Ollama | Plugin needed | No | None | Strong | Medium |
Honest take: if your goal is "local model + terminal coding", Aider is the more mature path — it was built for local backends. CCR makes sense only if you are already deeply invested in Claude Code's workflow and have a concrete privacy or offline constraint.
Troubleshooting
"Port already in use" — change PORT in .env, or kill the process holding 8082:
lsof -i :8082
kill -9 <PID>"Authentication failed" from Claude Code — ANTHROPIC_API_KEY must be set to any non-empty value, and ANTHROPIC_BASE_URL must match the CCR port.
Garbled or truncated output — CCR's format translation isn't perfect. Update to the latest CCR (git pull && npm install), try the 31B model (better format adherence), or simplify your prompt.
Slow responses — confirm Ollama is on GPU, not CPU (ollama ps), and consider a GGUF quantization if you are RAM-bound.
FAQ
Will Anthropic ban my account? The risk is non-zero. Rewriting Claude Code's API endpoint may violate Anthropic's ToS. We can't give legal advice — treat it as a personal risk decision, and escalate to your legal team for any commercial use.
Can Gemma 4 fully replace Claude? No. Claude Code is tuned around Anthropic model capabilities (extended thinking, specific tool-use formats, deep context handling). Gemma 4 gets you a functional subset, not feature parity.
Does CCR work on Windows? Yes. Node.js and Ollama are cross-platform, and the install steps are identical.
Can I point CCR at GPT-4 or another hosted model? Yes — CCR supports multiple upstreams, including OpenAI. But that defeats the privacy story and you still pay the upstream's per-token cost.
Why not just use Aider or Codex CLI? Honestly, that's usually the better answer. Aider's repo map and automatic git commits are designed for local-model workflows. Only pick CCR if Claude Code's UX is non-negotiable for you — see our Aider + Gemma 4 guide.
Can I use Gemma 4 E2B or E4B? Technically yes, practically no. The 4B / 8B variants do not hold up under agentic coding workloads. Minimum recommendation is 26B.
Is this production-safe for a team? Not really. Between the ToS exposure and the fact that CCR is a single-maintainer proxy, we would not put it on a shared path for a team of engineers. Use it for personal research or tightly scoped pilots.
Related articles
Stop reading. Start building.
~/gemma4 $ Get hands-on with the models discussed in this guide. No deployment, no friction, 100% free playground.
Launch Playground />


