Gemma 4 API 调用教程（Python / curl / JS 代码示例）

在聊天界面里玩够了 Gemma 4，想用它做点正经事？那就得上 API 了。目前有三种主流方式可以调用 Gemma 4，各有各的适用场景。

这篇把三种方法都讲透，附上可以直接复制跑的代码。

方式一：Ollama 本地 API（免费、私密、无限制）

如果你已经用 Ollama 跑起了 Gemma 4，那你本地已经有一个 API 服务在 localhost:11434 跑着了。不需要 API Key，不限速，完全免费。

Python（requests）

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "用大白话解释一下 Python 的 async/await",
    "stream": False
})

print(response.json()["response"])

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "用大白话解释一下 Python 的 async/await",
  "stream": false
}'

JavaScript（fetch）

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4",
    prompt: "用大白话解释一下 Python 的 async/await",
    stream: false,
  }),
});

const data = await response.json();
console.log(data.response);

Ollama 对话 API（多轮聊天）

需要多轮对话的场景，用 chat 接口：

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4",
    "messages": [
        {"role": "system", "content": "你是一个编程导师。"},
        {"role": "user", "content": "列表和元组到底有什么区别？"}
    ],
    "stream": False
})

print(response.json()["message"]["content"])

优点： 零成本、完全隐私、无限调用、离线也能用。 缺点： 速度取决于你的硬件。没独显会比较慢。

方式二：Google AI Studio API（有免费额度）

Google 通过 AI Studio 提供 Gemma 4 的云端 API。跑在 Google 的 TPU 上，速度很快，还有不错的免费额度。

获取 API Key

打开 aistudio.google.com
点击右上角「Get API Key」
创建一个 Key（10 秒钟的事）

详细步骤可以看我们的 Google AI Studio 使用指南。

Python（google-generativeai SDK）

pip install google-generativeai

import google.generativeai as genai

genai.configure(api_key="你的_API_KEY")

model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content("写一个 Python 重试装饰器")

print(response.text)

cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-27b-it:generateContent?key=你的_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "写一个 Python 重试装饰器"}]
    }]
  }'

免费额度限制

不花钱能用多少：

每分钟 15 次请求（RPM）
每天 1,500 次请求（RPD）
每分钟 100 万 token

说实话这个额度挺良心的，开发和小项目完全够用。除非你有真实用户流量，否则很难触到上限。

错误处理

API 会返回特定的错误码，建议做好处理：

import google.generativeai as genai
from google.api_core import exceptions

genai.configure(api_key="你的_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

try:
    response = model.generate_content("你的提示词")
    print(response.text)
except exceptions.ResourceExhausted:
    print("触发限速了，等一分钟再试。")
except exceptions.InvalidArgument as e:
    print(f"请求参数有问题：{e}")
except exceptions.NotFound:
    print("模型不存在，检查一下模型名称。")
except Exception as e:
    print(f"未知错误：{e}")

方式三：OpenRouter API（兼容 OpenAI 格式）

OpenRouter 的好处是跟 OpenAI API 格式完全兼容。如果你之前用 GPT 做过开发，换成 Gemma 4 只需要改一行代码。

获取 API Key

打开 openrouter.ai
注册并充值（最低 $5）
在后台生成 API Key

Python

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer 你的_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemma-4-27b-it",
        "messages": [
            {"role": "user", "content": "用 5 个要点对比 React 和 Vue"}
        ],
    },
)

print(response.json()["choices"][0]["message"]["content"])

cURL

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer 你的_OPENROUTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-27b-it",
    "messages": [
      {"role": "user", "content": "用 5 个要点对比 React 和 Vue"}
    ]
  }'

直接用 OpenAI Python SDK

因为兼容 OpenAI 格式，你可以直接用官方 SDK：

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="你的_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[
        {"role": "user", "content": "用人话解释一下 Monad"}
    ],
)

print(response.choices[0].message.content)

最方便的是，只要换 model 字符串，就能在 Gemma 4、Claude、GPT、Llama 之间自由切换。想让模型调用外部工具和 API？看看函数调用教程。

流式输出（Streaming）

没人想干等 30 秒然后一次性蹦出一大段文字。流式输出让 token 边生成边显示，用户体验好太多了。

Ollama 流式输出（Python）

import requests
import json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "写个凌晨三点调 Bug 的小故事",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk.get("response", ""), end="", flush=True)

Google AI Studio 流式输出（Python）

import google.generativeai as genai

genai.configure(api_key="你的_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

response = model.generate_content(
    "写个凌晨三点调 Bug 的小故事",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

OpenRouter 流式输出（Python）

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="你的_OPENROUTER_KEY",
)

stream = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[{"role": "user", "content": "写个凌晨三点调 Bug 的小故事"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

三种方式对比

特性	Ollama（本地）	Google AI Studio	OpenRouter
费用	免费	有免费额度（15 RPM）	按 token 付费
速度	取决于硬件	很快（Google TPU）	很快
隐私	完全隐私（离线）	数据发送到 Google	数据发送到服务商
限速	无	15 RPM / 1,500 RPD	取决于余额
上手难度	装 Ollama + 拉模型	申请 API Key	注册 + 充值
OpenAI 兼容	部分兼容	否（自有 SDK）	完全兼容
适合场景	隐私开发、学习	免费原型开发	生产环境、多模型切换

我的建议：

做个人项目？ 先用 Google AI Studio 的免费额度，快又不花钱。
在意隐私？ 本地跑 Ollama，数据不出机器。
要上生产？ OpenRouter 灵活性最好，还能 fallback 到其他模型。
纯学习？ Ollama 最省事，不用注册不用 Key，直接写代码。

常见坑

Ollama 报「Connection refused」： 确认 Ollama 服务在跑。Mac 看看菜单栏有没有 Ollama 图标，Linux 先执行 ollama serve。

Google AI Studio 报「Model not found」： 模型名称可能更新了。去 AI Studio 文档确认最新的模型 ID。

Ollama 响应很慢： 大概率在用 CPU 推理。不是不能用，就是慢。看看我们的硬件配置指南了解该期待什么速度。

请求超时： 复杂提示词生成时间长，记得把 HTTP 客户端的超时时间调大。31B 模型跑复杂任务可能要一会儿。

下一步

还没装 Ollama？看 Ollama 完整安装指南
想通过 API 发送图片？看 Gemma 4 多模态使用教程
需要更好的提示词？翻翻 50 个最佳 Gemma 4 提示词
不确定该用哪个模型？看 Gemma 4 模型选择指南

gemma4 — interact

Stop reading. Start building.

~/gemma4 $ Get hands-on with the models discussed in this guide. No deployment, no friction, 100% free playground.

Launch Playground />

Gemma 4 API 调用教程（Python / curl / JS 代码示例）

目录

方式一：Ollama 本地 API（免费、私密、无限制）

Python（requests）

cURL

JavaScript（fetch）

Ollama 对话 API（多轮聊天）

方式二：Google AI Studio API（有免费额度）

获取 API Key

Python（google-generativeai SDK）

cURL

免费额度限制

错误处理

方式三：OpenRouter API（兼容 OpenAI 格式）

获取 API Key

Python

cURL

直接用 OpenAI Python SDK

流式输出（Streaming）

Ollama 流式输出（Python）

Google AI Studio 流式输出（Python）

OpenRouter 流式输出（Python）

三种方式对比

常见坑

下一步

Stop reading. Start building.

相关教程

50 个最佳 Gemma 4 提示词：编程、写作、分析与多模态（2026）

2026 年最佳本地 AI 模型完整排名与对比

Aider 接入 Gemma 4：2026 最强开源 AI 结对编程本地搭建指南