How to Use the Gemma 4 API (Python, cURL & JavaScript)

So you've played with Gemma 4 in a chat window and now you want to build something with it. Good — that's where it gets fun. There are three main ways to call Gemma 4 via API, and each one makes sense for different situations.

Let's walk through all three with real code you can copy and run.

Option 1: Ollama Local API (Free, Private, No Limits)

If you've already set up Ollama locally, you've got an API server running at localhost:11434 right now. No API key needed, no rate limits, completely free.

Python (requests)

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Explain async/await in Python like I'm 10",
    "stream": False
})

print(response.json()["response"])

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "Explain async/await in Python like I am 10",
  "stream": false
}'

JavaScript (fetch)

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4",
    prompt: "Explain async/await in Python like I'm 10",
    stream: false,
  }),
});

const data = await response.json();
console.log(data.response);

Ollama Chat API (Multi-turn)

For conversations with message history, use the chat endpoint:

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4",
    "messages": [
        {"role": "system", "content": "You are a helpful coding tutor."},
        {"role": "user", "content": "What's the difference between a list and a tuple?"}
    ],
    "stream": False
})

print(response.json()["message"]["content"])

Pros: Zero cost, total privacy, no rate limits, works offline. Cons: Speed depends on your hardware. No GPU = slow.

Option 2: Google AI Studio API (Free Tier Available)

Google offers Gemma 4 through their AI Studio API. You get a generous free tier and it's fast because it runs on Google's TPU infrastructure.

Get Your API Key

Go to aistudio.google.com
Click "Get API Key" in the top right
Create a key (takes 10 seconds)

For a detailed walkthrough, check out our Google AI Studio guide.

Python (google-generativeai SDK)

pip install google-generativeai

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content("Write a Python decorator for retry logic")

print(response.text)

cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-27b-it:generateContent?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Write a Python decorator for retry logic"}]
    }]
  }'

Free Tier Limits

Here's what you get without paying anything:

15 requests per minute (RPM)
1,500 requests per day (RPD)
1 million tokens per minute

That's actually pretty generous for development and small projects. You'll only hit limits if you're building something with real user traffic.

Error Handling

The API returns specific error codes you should handle:

import google.generativeai as genai
from google.api_core import exceptions

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

try:
    response = model.generate_content("Your prompt here")
    print(response.text)
except exceptions.ResourceExhausted:
    print("Rate limit hit. Wait a minute and try again.")
except exceptions.InvalidArgument as e:
    print(f"Bad request: {e}")
except exceptions.NotFound:
    print("Model not found. Check the model name.")
except Exception as e:
    print(f"Unexpected error: {e}")

Option 3: OpenRouter API (OpenAI-Compatible)

OpenRouter is great if you want to swap between models easily. It uses the same format as OpenAI's API, so if you've built anything with GPT, you can switch to Gemma 4 by changing one line.

Get Your API Key

Go to openrouter.ai
Sign up and add credits ($5 minimum)
Generate an API key from the dashboard

Python

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemma-4-27b-it",
        "messages": [
            {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
        ],
    },
)

print(response.json()["choices"][0]["message"]["content"])

cURL

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_OPENROUTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-27b-it",
    "messages": [
      {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
    ]
  }'

Using the OpenAI Python SDK

Since OpenRouter is OpenAI-compatible, you can use the official OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[
        {"role": "user", "content": "Explain monads in plain English"}
    ],
)

print(response.choices[0].message.content)

This is especially nice because you can switch between Gemma 4, Claude, GPT, Llama, and others just by changing the model string. Want to let the model call external tools and APIs? Check out our function calling guide.

Streaming Responses

Nobody wants to wait 30 seconds for a wall of text. Streaming shows tokens as they're generated — way better UX. Here's how to do it with each method.

Ollama Streaming (Python)

import requests
import json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Write a short story about a debugging session at 3am",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk.get("response", ""), end="", flush=True)

Google AI Studio Streaming (Python)

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

response = model.generate_content(
    "Write a short story about a debugging session at 3am",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

OpenRouter Streaming (Python)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

stream = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[{"role": "user", "content": "Write a short story about a debugging session at 3am"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Comparison: Which API Should You Use?

Feature	Ollama (Local)	Google AI Studio	OpenRouter
Cost	Free	Free tier (15 RPM)	Pay per token
Speed	Depends on hardware	Fast (Google TPUs)	Fast
Privacy	Complete (offline)	Data sent to Google	Data sent to provider
Rate Limits	None	15 RPM / 1,500 RPD	Based on credits
Setup	Install Ollama + model	Get API key	Sign up + add credits
OpenAI Compatible	Partial	No (own SDK)	Yes
Best For	Privacy, development	Free prototyping	Production, multi-model

My recommendation:

Building a side project? Start with Google AI Studio's free tier. It's fast and free.
Privacy matters? Run Ollama locally. Your data stays on your machine.
Production app? OpenRouter gives you the most flexibility and the ability to fall back to other models.
Just learning? Ollama. No API keys, no limits, just code.

Common Gotchas

"Connection refused" on Ollama: Make sure the Ollama server is running. On Mac, check if the Ollama icon is in the menu bar. On Linux, run ollama serve first.

"Model not found" on Google AI Studio: Model names change. Check the AI Studio docs for current model IDs.

Slow responses on Ollama: You're probably running on CPU. That's fine — it works, just slower. See our hardware guide for what to expect.

Timeouts: For long-running generations, increase your HTTP client timeout. Gemma 4's 31B model can take a while for complex prompts.

Next Steps

New to Ollama? Start with our complete Ollama setup guide
Want to send images to the API? Check out Gemma 4 multimodal guide
Need better prompts? Browse our 50 best Gemma 4 prompts
Not sure which model size to pick? Read Gemma 4: Which Model Should You Use?

How to Use the Gemma 4 API (Python, cURL & JavaScript)

Daftar Isi

Option 1: Ollama Local API (Free, Private, No Limits)

Python (requests)

cURL

JavaScript (fetch)

Ollama Chat API (Multi-turn)

Option 2: Google AI Studio API (Free Tier Available)

Get Your API Key

Python (google-generativeai SDK)

cURL

Free Tier Limits

Error Handling

Option 3: OpenRouter API (OpenAI-Compatible)

Get Your API Key

Python

cURL

Using the OpenAI Python SDK

Streaming Responses

Ollama Streaming (Python)

Google AI Studio Streaming (Python)

OpenRouter Streaming (Python)

Comparison: Which API Should You Use?

Common Gotchas

Next Steps

Related Guides

50 Best Gemma 4 Prompts: Coding, Writing, Analysis & Multimodal (2026)

Best Local AI Models You Can Run in 2026: Complete Ranking & Comparison

Gemma 4 vs Llama 4: Which Open AI Model Should You Use in 2026?