如何使用 Gemma 4 API（Python、cURL 與 JavaScript）

你已經在聊天視窗裡玩過 Gemma 4，現在你想用它來建構東西。很好——這才是有趣的地方。有三種主要方式透過 API 呼叫 Gemma 4，每種適合不同的情況。

讓我們用你可以直接複製執行的真實程式碼，一一走過這三種方式。

方式 1：Ollama 本機 API（免費、私密、無限制）

如果你已經在本機設定好 Ollama，你現在就有一個 API 伺服器在 localhost:11434 上執行。不需要 API 金鑰，沒有速率限制，完全免費。

Python (requests)

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Explain async/await in Python like I'm 10",
    "stream": False
})

print(response.json()["response"])

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "Explain async/await in Python like I am 10",
  "stream": false
}'

JavaScript (fetch)

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4",
    prompt: "Explain async/await in Python like I'm 10",
    stream: false,
  }),
});

const data = await response.json();
console.log(data.response);

Ollama Chat API（多輪對話）

要進行帶訊息記錄的對話，使用 chat 端點：

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4",
    "messages": [
        {"role": "system", "content": "You are a helpful coding tutor."},
        {"role": "user", "content": "What's the difference between a list and a tuple?"}
    ],
    "stream": False
})

print(response.json()["message"]["content"])

優點： 零成本，完全隱私，無速率限制，離線可用。 缺點： 速度取決於你的硬體。沒有 GPU 會很慢。

方式 2：Google AI Studio API（有免費方案）

Google 透過 AI Studio API 提供 Gemma 4。有慷慨的免費方案，而且因為在 Google 的 TPU 基礎設施上執行所以很快。

取得 API 金鑰

前往 aistudio.google.com
點擊右上角的「Get API Key」
建立金鑰（只需 10 秒）

如需詳細教學，查看我們的 Google AI Studio 指南。

Python (google-generativeai SDK)

pip install google-generativeai

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content("Write a Python decorator for retry logic")

print(response.text)

cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-27b-it:generateContent?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Write a Python decorator for retry logic"}]
    }]
  }'

免費方案限制

不花一毛錢你就能得到：

每分鐘 15 個請求（RPM）
每天 1,500 個請求（RPD）
每分鐘 100 萬 tokens

對開發和小型專案來說這其實相當慷慨。只有在你建構有真實使用者流量的東西時才會碰到限制。

錯誤處理

API 會回傳特定的錯誤碼，你應該處理它們：

import google.generativeai as genai
from google.api_core import exceptions

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

try:
    response = model.generate_content("Your prompt here")
    print(response.text)
except exceptions.ResourceExhausted:
    print("Rate limit hit. Wait a minute and try again.")
except exceptions.InvalidArgument as e:
    print(f"Bad request: {e}")
except exceptions.NotFound:
    print("Model not found. Check the model name.")
except Exception as e:
    print(f"Unexpected error: {e}")

方式 3：OpenRouter API（OpenAI 相容）

如果你想輕鬆在模型之間切換，OpenRouter 非常棒。它使用與 OpenAI API 相同的格式，所以如果你已經用 GPT 建構過東西，只需改一行就能切換到 Gemma 4。

取得 API 金鑰

前往 openrouter.ai
註冊並加值（最低 $5）
從儀表板生成 API 金鑰

Python

import requests

response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_OPENROUTER_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/gemma-4-27b-it",
        "messages": [
            {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
        ],
    },
)

print(response.json()["choices"][0]["message"]["content"])

cURL

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_OPENROUTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-27b-it",
    "messages": [
      {"role": "user", "content": "Compare React and Vue in 5 bullet points"}
    ]
  }'

使用 OpenAI Python SDK

因為 OpenRouter 相容 OpenAI，你可以使用官方 OpenAI SDK：

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[
        {"role": "user", "content": "Explain monads in plain English"}
    ],
)

print(response.choices[0].message.content)

這特別好用，因為你只需改變 model 字串就能在 Gemma 4、Claude、GPT、Llama 等之間切換。想讓模型呼叫外部工具和 API？查看我們的函式呼叫指南。

串流回應

沒人想等 30 秒才看到一大段文字。串流在 token 生成時就顯示——使用體驗好得多。以下是每種方式的做法。

Ollama 串流（Python）

import requests
import json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma4",
    "prompt": "Write a short story about a debugging session at 3am",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        print(chunk.get("response", ""), end="", flush=True)

Google AI Studio 串流（Python）

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")

response = model.generate_content(
    "Write a short story about a debugging session at 3am",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

OpenRouter 串流（Python）

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

stream = client.chat.completions.create(
    model="google/gemma-4-27b-it",
    messages=[{"role": "user", "content": "Write a short story about a debugging session at 3am"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

比較：該用哪個 API？

特色	Ollama（本機）	Google AI Studio	OpenRouter
費用	免費	免費方案（15 RPM）	按 token 計費
速度	取決於硬體	快（Google TPU）	快
隱私	完整（離線）	資料送到 Google	資料送到供應商
速率限制	無	15 RPM / 1,500 RPD	根據餘額
設定	安裝 Ollama + 模型	取得 API 金鑰	註冊 + 加值
OpenAI 相容	部分	否（自有 SDK）	是
最適合	隱私、開發	免費原型開發	正式環境、多模型

我的建議：

做個人專案？ 從 Google AI Studio 的免費方案開始。快速又免費。
隱私很重要？ 在本機跑 Ollama。你的資料留在你的機器上。
正式環境應用？ OpenRouter 給你最大的彈性和切換其他模型的能力。
只是在學習？ Ollama。不需要 API 金鑰，沒有限制，直接寫程式碼。

常見陷阱

Ollama 出現「Connection refused」： 確認 Ollama 伺服器正在執行。在 Mac 上，檢查選單列是否有 Ollama 圖示。在 Linux 上，先執行 ollama serve。

Google AI Studio 出現「Model not found」： 模型名稱會變動。查看 AI Studio 文件取得當前的模型 ID。

Ollama 回應緩慢： 你可能在 CPU 上執行。這沒問題——能用，只是比較慢。查看我們的硬體指南了解預期效能。

逾時： 對於長時間的生成，增加你的 HTTP 客戶端逾時時間。Gemma 4 的 31B 模型在複雜提示詞上可能需要一些時間。

下一步

新手使用 Ollama？從我們的 Ollama 完整設定指南開始
想透過 API 傳送圖片？查看 Gemma 4 多模態指南
需要更好的提示詞？瀏覽我們的 50 個最佳 Gemma 4 提示詞
不確定該挑哪個模型大小？閱讀 Gemma 4：該用哪個模型？

gemma4 — interact

Stop reading. Start building.

~/gemma4 $ Get hands-on with the models discussed in this guide. No deployment, no friction, 100% free playground.

Launch Playground />

如何使用 Gemma 4 API（Python、cURL 與 JavaScript）

目錄

方式 1：Ollama 本機 API（免費、私密、無限制）

Python (requests)

cURL

JavaScript (fetch)

Ollama Chat API（多輪對話）

方式 2：Google AI Studio API（有免費方案）

取得 API 金鑰

Python (google-generativeai SDK)

cURL

免費方案限制

錯誤處理

方式 3：OpenRouter API（OpenAI 相容）

取得 API 金鑰

Python

cURL

使用 OpenAI Python SDK

串流回應

Ollama 串流（Python）

Google AI Studio 串流（Python）

OpenRouter 串流（Python）

比較：該用哪個 API？

常見陷阱

下一步

Stop reading. Start building.

Related Guides

50 個最佳 Gemma 4 提示詞：程式設計、寫作、分析與多模態（2026）

2026 年最佳本機 AI 模型：完整排名與比較

Aider 接上 Gemma 4：2026 最強開源 AI 結對程式設計本地安裝指南