🌐🇩🇪 Deutsch 🇫🇷 Français 🇫🇷 Français 🇫🇷 Français 🇪🇸 Español 🇺🇸 English

📖 8 min read•1,406 words•Updated Mar 26, 2026

How to Add Streaming Responses with Gemini API Step by Step

The Gemini API has rapidly gained attention among developers for its solid language model capabilities and flexible integration options. One of the most exciting features the Gemini API offers is streaming responses. Instead of waiting for the entire response to arrive, streaming allows you to receive tokens or partial content incrementally, dramatically improving user experience, especially in interactive applications like chatbots or real-time assistants.

In this detailed guide, we will walk you through how to implement gemini api streaming responses step-by-step, complete with practical code examples, a comparison table highlighting streaming versus non-streaming, and tips for optimizing your implementation.

Understanding Streaming Responses in Gemini API

Traditional API calls to language models follow a request-response pattern: you send a prompt, and you wait for the full completion before you can display or process it. Streaming changes this by breaking the response into smaller chunks delivered sequentially. This is analogous to how video streaming delivers parts of a video as you watch rather than downloading the entire video beforehand.

Benefits of streaming responses include:

Lower Latency: Start processing or displaying tokens instantly.
Improved User Experience: Users see output generating in real time.
Better Resource Management: Your app can react dynamically, potentially cancel early, and handle tokens as they arrive.

Step 1: Setting Up Your Gemini API Environment

Before we explore streaming, ensure you have access to the Gemini API with appropriate credentials, and your development environment is set up with necessary libraries for making HTTPS requests and handling streams.

For demonstration, we will use Node.js with the popular axios HTTP client (with support for streaming) and the native http/https modules. However, the concepts apply similarly in Python, Go, or other languages.

Prerequisites

Node.js installed (v14+ recommended)
An API key for Gemini API
Install axios with npm install axios

Step 2: Making a Standard Completion Request (Non-Streaming)

First, let’s consider a simple example where you send a prompt and wait for the full response:

const axios = require('axios');

async function getCompletion() {
 const API_KEY = 'YOUR_GEMINI_API_KEY';
 const url = 'https://api.gemini.com/v1/completions';

 const data = {
 model: 'gemini-1',
 prompt: 'Write a poem about the ocean',
 max_tokens: 100
 };

 const response = await axios.post(url, data, {
 headers: {
 'Authorization': `Bearer ${API_KEY}`,
 'Content-Type': 'application/json'
 }
 });

 console.log('Completion:', response.data.choices[0].text);
}

getCompletion();

This will return the entire completion only after the model finishes generating it. While straightforward, this can cause noticeable delay in applications requiring real-time responsiveness.

Step 3: Enabling Streaming Responses with Gemini API

The Gemini API supports a streaming mode through its Completions endpoint. To enable streaming, you need to set a specific request parameter and handle the HTTP response as a stream rather than waiting for full body.

Key points to enable streaming:

Set stream: true in your request payload.
Use an HTTP request method that supports handling streamed chunks.
Listen for data events on the response stream.

Example: Streaming with Axios and Node.js

const axios = require('axios');

async function streamCompletion() {
 const API_KEY = 'YOUR_GEMINI_API_KEY';
 const url = 'https://api.gemini.com/v1/completions';

 const data = {
 model: 'gemini-1',
 prompt: 'Write a story about a brave knight',
 max_tokens: 150,
 stream: true // Enable streaming responses
 };

 const response = await axios({
 method: 'post',
 url: url,
 data: data,
 headers: {
 'Authorization': `Bearer ${API_KEY}`,
 'Content-Type': 'application/json'
 },
 responseType: 'stream'
 });

 response.data.on('data', (chunk) => {
 // Each chunk is a Buffer object
 const payloads = chunk.toString().split('nn');
 for (const payload of payloads) {
 if (payload.includes('[DONE]')) return; // End of stream
 if (payload.trim() === '') continue;
 try {
 const data = JSON.parse(payload);
 const token = data.choices[0].delta?.content;
 if (token) {
 process.stdout.write(token);
 }
 } catch (err) {
 // Handle parse errors if any
 console.error('Error parsing chunk:', err);
 }
 }
 });

 response.data.on('end', () => {
 console.log('nn[Stream ended]');
 });
}

streamCompletion();

In this example, the tokens arrive as chunks encoded in JSON, and your code parses and immediately outputs each token.

Step 4: Parsing Streaming Data Format

Gemini API’s streaming response format typically follows a Server-Sent Events (SSE) style or chunked JSON payloads where each chunk contains updates about the new tokens generated.

A typical chunk looks like:

{
 "id": "completion-123",
 "object": "text_completion",
 "created": 1688749214,
 "model": "gemini-1",
 "choices": [
 {
 "delta": {
 "content": "Hello"
 },
 "index": 0,
 "finish_reason": null
 }
 ]
}

The delta.content field holds the new piece of text for this chunk. Your code should accumulate or stream this content to your application interface.

Step 5: Handling End of Stream and Errors

When the stream finishes, the server sends a special token or message such as [DONE], indicating no more content will be sent. Your stream handler should listen for this token and close the connection gracefully.

Also, be prepared to handle intermittent network errors or parse exceptions. Implement retry logic or user-friendly error displays if streaming data is interrupted.

Comparison Table: Streaming vs Non-Streaming Responses in Gemini API

Feature	Non-Streaming Response	Streaming Response
Response Delivery	Batch delivery after full generation	Incremental delivery of tokens/chunks as generated
Latency	Higher latency, wait for whole response	Lower latency, partial output available quickly
User Experience	Delayed, static display	Dynamic, real-time output
Complexity of Implementation	Simple to implement	Moderate complexity due to streaming handling
Error Handling	Easier, single response	More thorough, handle stream interruptions
Use Cases	Batch processing, non-real-time tasks	Chatbots, interactive assistants, live data generation

Practical Tips for Implementing Gemini API Streaming Responses

1. Buffer Tokens Appropriately

Depending on your UI or backend needs, you might want to collect tokens and output them in batches (e.g., per word or sentence) instead of raw token-by-token to avoid janky or overwhelming updates.

2. Implement Backpressure Handling

If your front-end or other systems cannot handle rapid token bursts, implement backpressure or throttling mechanisms to regulate flow and avoid overwhelming users or system resources.

3. Use Abort Signals or Cancel Tokens

Streaming allows early termination if a user cancels an operation. Integrate abort signals into your HTTP requests to stop streaming and free resources immediately.

4. Detailed Logging and Monitoring

Streaming is stateful and more complex, so add detailed logs to monitor data flow, errors, and stream completions, aiding debugging and operational insights.

5. Security Considerations

Always secure your API key and do not expose it publicly. For frontend streaming scenarios, proxy streaming through backend to avoid key exposure.

Real-World Example: Creating a Live Chatbot Interface Using Gemini Streaming

Imagine a chat window where user messages are sent to Gemini API and responses appear token by token:

const readline = require('readline');
const axios = require('axios');

const rl = readline.createInterface({
 input: process.stdin,
 output: process.stdout
});

async function chat() {
 const API_KEY = 'YOUR_GEMINI_API_KEY';
 rl.question('You: ', async (prompt) => {
 console.log('Gemini:');
 const url = 'https://api.gemini.com/v1/completions';
 const data = {
 model: 'gemini-1',
 prompt,
 max_tokens: 200,
 stream: true
 };
 
 try {
 const response = await axios({
 method: 'post',
 url: url,
 data: data,
 headers: {
 'Authorization': `Bearer ${API_KEY}`,
 'Content-Type': 'application/json'
 },
 responseType: 'stream'
 });
 
 response.data.on('data', (chunk) => {
 const lines = chunk.toString().split('nn');
 for (const line of lines) {
 if (line.trim() === '') continue;
 if (line.includes('[DONE]')) {
 rl.close();
 return;
 }
 try {
 const parsed = JSON.parse(line);
 const content = parsed.choices[0].delta?.content;
 if (content) {
 process.stdout.write(content);
 }
 } catch (e) {
 // ignore malformed JSON chunks
 }
 }
 });
 
 response.data.on('end', () => {
 console.log('n[End of response]');
 rl.close();
 });

 response.data.on('error', (err) => {
 console.error('Stream error:', err.message);
 rl.close();
 });
 
 } catch (err) {
 console.error('Request failed:', err.message);
 rl.close();
 }
 });
}

chat();

This script lets users type messages and see Gemini’s streaming responses live on the terminal.

Summary

Integrating gemini api streaming responses can drastically improve the interactivity and responsiveness of your AI-powered apps. By enabling streaming, handling incremental data, and managing edge cases like errors and stream termination, you can build interfaces that feel smoother and more dynamic.

Remember the key steps:

Set the stream: true parameter in your request payload
Make a request that supports streaming (handle the response as a stream)
Parse incremental data chunks, extracting tokens from JSON payloads
Update your application UI or backend consumer progressively
Handle stream completion and errors gracefully

With the sample code and best practices shared in this article, you are well-equipped to begin adding streaming functionality to your Gemini API projects. Happy coding!

🕒 Last updated: March 26, 2026 · Originally published: March 19, 2026

🛠️

Written by Jake Chen

Full-stack developer specializing in bot frameworks and APIs. Open-source contributor with 2000+ GitHub stars.

Learn more →

How to Add Streaming Responses with Gemini API (Step by Step)

How to Add Streaming Responses with Gemini API Step by Step

Understanding Streaming Responses in Gemini API

Step 1: Setting Up Your Gemini API Environment

Prerequisites

Step 2: Making a Standard Completion Request (Non-Streaming)

Step 3: Enabling Streaming Responses with Gemini API

Example: Streaming with Axios and Node.js

Step 4: Parsing Streaming Data Format

Step 5: Handling End of Stream and Errors

Comparison Table: Streaming vs Non-Streaming Responses in Gemini API

Practical Tips for Implementing Gemini API Streaming Responses

1. Buffer Tokens Appropriately

2. Implement Backpressure Handling

3. Use Abort Signals or Cancel Tokens

4. Detailed Logging and Monitoring

5. Security Considerations

Real-World Example: Creating a Live Chatbot Interface Using Gemini Streaming

Summary

Related Articles

Related Articles

How to Add Streaming Responses with Gemini API Step by Step

Understanding Streaming Responses in Gemini API

Step 1: Setting Up Your Gemini API Environment

Prerequisites

Step 2: Making a Standard Completion Request (Non-Streaming)

Step 3: Enabling Streaming Responses with Gemini API

Example: Streaming with Axios and Node.js

Step 4: Parsing Streaming Data Format

Step 5: Handling End of Stream and Errors

Comparison Table: Streaming vs Non-Streaming Responses in Gemini API

Practical Tips for Implementing Gemini API Streaming Responses

1. Buffer Tokens Appropriately

2. Implement Backpressure Handling

3. Use Abort Signals or Cancel Tokens

4. Detailed Logging and Monitoring

5. Security Considerations

Real-World Example: Creating a Live Chatbot Interface Using Gemini Streaming

Summary

Related Articles

You May Also Like

📚 You Might Also Like

Related Articles