Gemma 4 + Claude Code Router: Run Claude Code on a Local Model (2026)

Claude Code is Anthropic's terminal coding agent, widely loved for its context handling and code quality. But out of the box it only talks to Anthropic's cloud API — meaning every file you touch is uploaded, and every session bills against your Anthropic account.

If you care about code privacy, need to work offline, or live behind a locked-down corporate network, that's a real blocker. Claude Code Router (CCR) is a community-built open source proxy that sits between Claude Code and its upstream API, translating requests to an OpenAI-compatible backend — such as a local Ollama instance running Gemma 4.

This guide walks through the full setup: installing CCR, wiring Gemma 4 as the upstream, running real coding sessions, and — just as important — the compliance, quality, and support caveats you have to accept before going down this path. We are not cheerleading CCR. It's a gray-area workaround, and we want you to make an informed call.

Why would anyone want Claude Code on a local model?

Before the how, the why. Three legitimate scenarios:

Regulated codebases. Finance, defense, healthcare, and some government teams can't send source code to third-party clouds. Anthropic's API terms don't change that. A local model on a workstation keeps code inside the network boundary.

Flaky or air-gapped networks. Remote work from bad Wi-Fi, long flights, field research, secure labs — cloud APIs fail, local models keep working.

Custom or fine-tuned models. You may have a domain-tuned Gemma 4 (bioinformatics, EDA, internal DSL) and want the Claude Code UX on top of your own weights.

Those are engineering reasons. "Save money on Claude" is a different motivation and not one this article defends.

Legal & compliance note (read before you run anything)

Disclaimer: The following setup rewrites Claude Code's upstream API endpoint to point at a third-party proxy. Depending on how Anthropic's terms of service evolve, this may violate the Claude Code usage agreement and put your Anthropic account at risk. This article is technical documentation, not legal advice. If you are deploying inside a company, run it past your legal and security teams first.

You also need to accept these technical tradeoffs:

Feature downgrade. Claude Code's extended thinking, prompt caching, and some tool-use formats are tuned for Anthropic models. Swap in Gemma 4 and those features may degrade, misbehave, or silently no-op.
Output quality. Gemma 4 26B / 31B are strong open-weight models, but on complex multi-file reasoning they still trail Claude 3.5 / Claude 4. Calibrate expectations accordingly.
Community project, no SLA. CCR is volunteer-maintained. If Anthropic changes its wire format, CCR can break overnight, and there is no support contract to fall back on.

If those tradeoffs are acceptable, keep reading.

What is Claude Code Router, exactly?

CCR is a small local HTTP proxy, typically written in Node.js. Its job is narrow:

Listen on a local port.
Accept requests shaped like Anthropic's Messages API.
Translate those requests into an OpenAI-compatible format (Ollama, LiteLLM, OpenRouter, etc.).
Translate the streamed response back into the shape Claude Code expects.

The translation layer is the interesting bit. Message roles, tool-call blocks, streaming deltas, and stop reasons all differ between the two API families — CCR normalizes them so Claude Code believes it is still talking to Anthropic.

Prerequisites

Node.js 18+ (CCR is a Node project)
Ollama installed and running (ollama.com)
Gemma 4 26B or 31B pulled via Ollama
Claude Code (npm install -g @anthropic-ai/claude-code)
Hardware: 26B needs ~16 GB RAM, 31B needs ~24 GB. See our hardware sizing guide linked at the end.

Step 1 — Install Claude Code Router

Clone the repo and install dependencies:

git clone https://github.com/<org>/claude-code-router.git
cd claude-code-router
npm install

The exact repo URL varies — CCR is a community project and may be forked or renamed. Search GitHub for "claude code router" or "claude code proxy" and pick the actively maintained fork.

Step 2 — Point CCR at Ollama

Create a .env (or config.json, depending on the fork) inside the CCR directory:

UPSTREAM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:26b-a4b
PORT=8082

Start the proxy:

npm start

Expect log output like:

Claude Code Router started on port 8082
Upstream: ollama (gemma4:26b-a4b)
Ready to accept connections

Sanity-check the HTTP endpoint:

curl http://localhost:8082/v1/models

You should get back a JSON list that includes your Gemma 4 model.

Step 3 — Redirect Claude Code to CCR

Claude Code reads ANTHROPIC_BASE_URL to decide where to send requests. Point it at the local proxy:

export ANTHROPIC_BASE_URL="http://localhost:8082"
export ANTHROPIC_API_KEY="local"   # must be non-empty, value is ignored
claude

Claude Code will boot normally — but every request now flows through CCR into your local Gemma 4.

To avoid touching env vars every time, put them in ~/.zshrc / ~/.bashrc, or gate it behind an alias so your default claude still uses the official API:

alias claude-local='ANTHROPIC_BASE_URL=http://localhost:8082 ANTHROPIC_API_KEY=local claude'

Step 4 — Actually using it

Explain code:

> Explain what src/auth/middleware.ts is doing.

Gemma 4 will read the file and summarize. Explanations are decent on isolated files; deep cross-module reasoning is where quality drops.

Generate a new endpoint:

> Create src/api/health.ts with a GET /api/health handler that returns status and version.

Straightforward CRUD / HTTP scaffolding works well.

Debug a failure:

> npm test fails with "Cannot read properties of undefined". Look at src/utils/parser.ts.

With an explicit stack trace, Gemma 4 usually locates the bug. Subtle logic bugs without a clear error signal are noticeably harder.

Honest comparison: how does it actually feel?

Dimension	Claude Code (official)	CCR + Gemma 4 26B	CCR + Gemma 4 31B
Code generation quality	Excellent	Decent (≈ GPT-3.5)	Good (≈ GPT-4)
Multi-file reasoning	Excellent	Weak	Moderate
Tool-use compatibility	Native	Partial	Partial
Extended thinking	Supported	Not supported	Not supported
Response speed	Fast (cloud)	20–40 t/s local	15–30 t/s local
Privacy	Code leaves the network	Fully local	Fully local
Offline	No	Yes	Yes
Monthly cost	Per-token billing	$0	$0

Works for: isolated code generation, code explanation, simple bug fixes, offline / locked-down usage.

Doesn't work for: architecture refactors, long multi-file edits, anything that depends on extended thinking.

How CCR stacks up against other local AI coding setups

Option	Native local support	Needs a proxy	Git integration	Multi-file edits	Maturity
Aider + Gemma 4	Native	No	Auto commits	Strong	High (30K+ stars)
Codex CLI + Gemma 4	Config needed	No	None	Single-file focus	Medium
CCR + Claude Code + Gemma 4	No	Yes (CCR)	None	Strong (via Claude Code)	Low (experimental)
Cursor + Ollama	Plugin needed	No	None	Strong	Medium

Honest take: if your goal is "local model + terminal coding", Aider is the more mature path — it was built for local backends. CCR makes sense only if you are already deeply invested in Claude Code's workflow and have a concrete privacy or offline constraint.

Troubleshooting

"Port already in use" — change PORT in .env, or kill the process holding 8082:

lsof -i :8082
kill -9 <PID>

"Authentication failed" from Claude Code — ANTHROPIC_API_KEY must be set to any non-empty value, and ANTHROPIC_BASE_URL must match the CCR port.

Garbled or truncated output — CCR's format translation isn't perfect. Update to the latest CCR (git pull && npm install), try the 31B model (better format adherence), or simplify your prompt.

Slow responses — confirm Ollama is on GPU, not CPU (ollama ps), and consider a GGUF quantization if you are RAM-bound.

FAQ

Will Anthropic ban my account? The risk is non-zero. Rewriting Claude Code's API endpoint may violate Anthropic's ToS. We can't give legal advice — treat it as a personal risk decision, and escalate to your legal team for any commercial use.

Can Gemma 4 fully replace Claude? No. Claude Code is tuned around Anthropic model capabilities (extended thinking, specific tool-use formats, deep context handling). Gemma 4 gets you a functional subset, not feature parity.

Does CCR work on Windows? Yes. Node.js and Ollama are cross-platform, and the install steps are identical.

Can I point CCR at GPT-4 or another hosted model? Yes — CCR supports multiple upstreams, including OpenAI. But that defeats the privacy story and you still pay the upstream's per-token cost.

Why not just use Aider or Codex CLI? Honestly, that's usually the better answer. Aider's repo map and automatic git commits are designed for local-model workflows. Only pick CCR if Claude Code's UX is non-negotiable for you — see our Aider + Gemma 4 guide.

Can I use Gemma 4 E2B or E4B? Technically yes, practically no. The 4B / 8B variants do not hold up under agentic coding workloads. Minimum recommendation is 26B.

Is this production-safe for a team? Not really. Between the ToS exposure and the fact that CCR is a single-maintainer proxy, we would not put it on a shared path for a team of engineers. Use it for personal research or tightly scoped pilots.

gemma4 — interact

Stop reading. Start building.

~/gemma4 $ Get hands-on with the models discussed in this guide. No deployment, no friction, 100% free playground.

Launch Playground />

Gemma 4 + Claude Code Router: Run Claude Code on a Local Model (2026)

Table of Contents

Gemma 4 + Claude Code Router: Run Claude Code on a Local Model (2026)

Why would anyone want Claude Code on a local model?

Legal & compliance note (read before you run anything)

What is Claude Code Router, exactly?

Prerequisites

Step 1 — Install Claude Code Router

Step 2 — Point CCR at Ollama

Step 3 — Redirect Claude Code to CCR

Step 4 — Actually using it

Honest comparison: how does it actually feel?

How CCR stacks up against other local AI coding setups

Troubleshooting

FAQ

Stop reading. Start building.

Related Guides

50 Best Gemma 4 Prompts for Coding, Writing & Analysis

Best Local AI Models 2026: Gemma 4 vs Llama 4, Qwen 3 and Phi-4

Aider + Gemma 4: The Open-Source AI Pair Programming Stack for 2026