How to Use the Gemma 4 API (Python, cURL & JavaScript)

Apr 7, 2026

So you've played with Gemma 4 in a chat window and now you want to build something with it. Good — that's where it gets fun. There are three main ways to call Gemma 4 via API, and each one makes sense for different situations.

Let's walk through all three with real code you can copy and run.

Option 1: Ollama Local API (Free, Private, No Limits)

If you've already set up Ollama locally, you've got an API server running at localhost:11434 right now. No API key needed, no rate limits, completely free.

Python (requests)

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Explain async/await in Python like I'm 10",
    "stream": False
})

print(response.json()["response"])

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "Explain async/await in Python like I am 10",
  "stream": false
}'

JavaScript (fetch)

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4",
    prompt: "Explain async/await in Python like I'm 10",
    stream: false,
  }),
});

const data = await response.json();
console.log(data.response);

Ollama Chat API (Multi-turn)

For conversations with message history, use the chat endpoint:

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4",
    "messages": [
        {"role": "system", "content": "You are a helpful coding tutor."},
        {"role": "user", "content": "What's the difference between a list and a tuple?"}
    ],
    "stream": False
})

print(response.json()["message"]["content"])

Pros: Zero cost, total privacy, no rate limits, works offline. Cons: Speed depends on your hardware. No GPU = slow.

Option 2: Google AI Studio API (Free Tier Available)

Google offers Gemma 4 through their AI Studio API. You get a generous free tier and it's fast because it runs on Google's TPU infrastructure.

Get Your API Key

  1. Go to aistudio.google.com
  2. Click "Get API Key" in the top right
  3. Create a key (takes 10 seconds)

For a detailed walkthrough, check out our Google AI Studio guide.

Python (google-generativeai SDK)

pip install google-generativeai
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content("Write a Python decorator for retry logic")

print(response.text)

cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-27b-it:generateContent?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Write a Python decorator for retry logic"}]
    }]
  }'

Free Tier Limits

Here's what you get without paying anything:

  • 15 requests per minute (RPM)
  • 1,500 requests per day (RPD)
  • 1 million tokens per minute

That's actually pretty generous for development and small projects. You'll only hit limits if you're building something with real user traffic.

Error Handling

The API returns specific error codes you should handle:

import google.generativeai as genai
from google.api_core import exceptions

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

try:
    response = model.generate_content("Your prompt here")
    print(response.text)
except exceptions.ResourceExhausted:
    print("Rate limit hit. Wait a minute and try again.")
except exceptions.InvalidArgument as e:
    print(f"Bad request: {e}")
except exceptions.NotFound:
    print("Model not found. Check the model name.")
except Exception as e:
    print(f"Unexpected error: {e}")

Option 3: OpenRouter API (OpenAI-Compatible)

OpenRouter is great if you want to swap between models easily. It uses the same format as OpenAI's API, so if you've built anything with GPT, you can switch to Gemma 4 by changing one line.

Get Your API Key

  1. Go to openrouter.ai
  2. Sign up and add credits ($5 minimum)
  3. Generate an API key from the dashboard

Python

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemma-4-27b-it",
        "messages": [
            {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
        ],
    },
)

print(response.json()["choices"][0]["message"]["content"])

cURL

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_OPENROUTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-27b-it",
    "messages": [
      {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
    ]
  }'

Using the OpenAI Python SDK

Since OpenRouter is OpenAI-compatible, you can use the official OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[
        {"role": "user", "content": "Explain monads in plain English"}
    ],
)

print(response.choices[0].message.content)

This is especially nice because you can switch between Gemma 4, Claude, GPT, Llama, and others just by changing the model string. Want to let the model call external tools and APIs? Check out our function calling guide.

Streaming Responses

Nobody wants to wait 30 seconds for a wall of text. Streaming shows tokens as they're generated — way better UX. Here's how to do it with each method.

Ollama Streaming (Python)

import requests
import json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Write a short story about a debugging session at 3am",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk.get("response", ""), end="", flush=True)

Google AI Studio Streaming (Python)

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

response = model.generate_content(
    "Write a short story about a debugging session at 3am",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

OpenRouter Streaming (Python)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

stream = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[{"role": "user", "content": "Write a short story about a debugging session at 3am"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Comparison: Which API Should You Use?

FeatureOllama (Local)Google AI StudioOpenRouter
CostFreeFree tier (15 RPM)Pay per token
SpeedDepends on hardwareFast (Google TPUs)Fast
PrivacyComplete (offline)Data sent to GoogleData sent to provider
Rate LimitsNone15 RPM / 1,500 RPDBased on credits
SetupInstall Ollama + modelGet API keySign up + add credits
OpenAI CompatiblePartialNo (own SDK)Yes
Best ForPrivacy, developmentFree prototypingProduction, multi-model

My recommendation:

  • Building a side project? Start with Google AI Studio's free tier. It's fast and free.
  • Privacy matters? Run Ollama locally. Your data stays on your machine.
  • Production app? OpenRouter gives you the most flexibility and the ability to fall back to other models.
  • Just learning? Ollama. No API keys, no limits, just code.

Common Gotchas

"Connection refused" on Ollama: Make sure the Ollama server is running. On Mac, check if the Ollama icon is in the menu bar. On Linux, run ollama serve first.

"Model not found" on Google AI Studio: Model names change. Check the AI Studio docs for current model IDs.

Slow responses on Ollama: You're probably running on CPU. That's fine — it works, just slower. See our hardware guide for what to expect.

Timeouts: For long-running generations, increase your HTTP client timeout. Gemma 4's 31B model can take a while for complex prompts.

Next Steps

Gemma 4 AI

Gemma 4 AI

Related Guides