Skip to main content
Streaming responses allow users to see AI replies in real-time as they are generated, rather than waiting for the complete response. This is ideal for chat applications and interactive UIs.

Python Streaming with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://gateway.iotex.ai/v1"
)

def stream_with_callback(prompt, on_content=None):
    """Stream response handling with callback function support"""
    stream = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )

    full_content = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            full_content += content

            if on_content:
                on_content(content)

    return full_content

# Usage example
def print_content(content):
    print(content, end="", flush=True)

result = stream_with_callback("Write a Python quicksort algorithm", print_content)

JavaScript Streaming with OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "your-api-key",
    baseURL: "https://gateway.iotex.ai/v1"
});

async function streamChat(prompt) {
    const stream = await client.chat.completions.create({
        model: "gemini-2.5-flash",
        messages: [{ role: "user", content: prompt }],
        stream: true,
    });

    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
        }
    }
}

await streamChat("Explain what machine learning is");
For a browser-based streaming example using the Fetch API, see JavaScript/Node.js Examples.

Benefits of Streaming

  • Faster perceived response time — users see content immediately as it’s generated
  • Better user experience — reduces waiting time and provides real-time feedback
  • Interruptible — you can stop generation early if needed
  • Ideal for chat applications — perfect for conversational interfaces

Implementation Tips

  1. Error handling: Wrap streaming code in try/catch to handle network interruptions
  2. Cancellation: Implement the ability to cancel ongoing streams (e.g., using AbortController in JavaScript)
  3. Buffering: For UI rendering, consider batching small chunks to avoid excessive DOM updates
  4. Memory: For very long responses, process chunks as they arrive instead of accumulating the full string