Streaming Response Handling

Streaming responses allow users to see AI replies in real-time as they are generated, rather than waiting for the complete response. This is ideal for chat applications and interactive UIs.

Python Streaming with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://gateway.iotex.ai/v1"
)

def stream_with_callback(prompt, on_content=None):
    """Stream response handling with callback function support"""
    stream = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )

    full_content = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            full_content += content

            if on_content:
                on_content(content)

    return full_content

# Usage example
def print_content(content):
    print(content, end="", flush=True)

result = stream_with_callback("Write a Python quicksort algorithm", print_content)

JavaScript Streaming with OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "your-api-key",
    baseURL: "https://gateway.iotex.ai/v1"
});

async function streamChat(prompt) {
    const stream = await client.chat.completions.create({
        model: "gemini-2.5-flash",
        messages: [{ role: "user", content: prompt }],
        stream: true,
    });

    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
        }
    }
}

await streamChat("Explain what machine learning is");

For a browser-based streaming example using the Fetch API, see JavaScript/Node.js Examples.

Benefits of Streaming

Faster perceived response time — users see content immediately as it’s generated
Better user experience — reduces waiting time and provides real-time feedback
Interruptible — you can stop generation early if needed
Ideal for chat applications — perfect for conversational interfaces

Implementation Tips

Error handling: Wrap streaming code in try/catch to handle network interruptions
Cancellation: Implement the ability to cancel ongoing streams (e.g., using AbortController in JavaScript)
Buffering: For UI rendering, consider batching small chunks to avoid excessive DOM updates
Memory: For very long responses, process chunks as they arrive instead of accumulating the full string

Overview

API Usage Guide

AI Agents

Code Examples

FAQ

Python Streaming with OpenAI SDK

JavaScript Streaming with OpenAI SDK

Benefits of Streaming

Implementation Tips

Overview

API Usage Guide

AI Agents

Code Examples

FAQ

​Python Streaming with OpenAI SDK

​JavaScript Streaming with OpenAI SDK

​Benefits of Streaming

​Implementation Tips

Python Streaming with OpenAI SDK

JavaScript Streaming with OpenAI SDK

Benefits of Streaming

Implementation Tips