Skip to main content
Ganesh Joshi
Back to Blogs

Streaming AI responses in web apps

February 18, 20265 min read
Tutorials
Streaming data and real-time output on screen

AI responses take time to generate. Waiting for a complete response creates a poor user experience—users stare at a spinner for seconds. Streaming shows output token by token as it's generated, making responses feel instant and engaging.

Why stream?

Approach User experience
Wait for complete Spinner for 2-5 seconds, then full response
Stream First word appears in ~200ms, text flows naturally

Streaming reduces perceived latency dramatically. Users start reading immediately.

How streaming works

AI APIs generate tokens sequentially. With streaming:

  1. Client sends request with stream: true
  2. Server starts generating and sends tokens as they're ready
  3. Each token arrives as a chunk in the response
  4. Client appends chunks to display text
Client Request → Server → AI API
                    ↓
              Token 1 → Client (display)
              Token 2 → Client (append)
              Token 3 → Client (append)
              ...
              [DONE] → Client (complete)

Basic streaming with fetch

async function streamCompletion(prompt: string) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  if (!reader) throw new Error('No reader');

  let result = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    result += chunk;

    // Update UI with each chunk
    updateDisplay(result);
  }

  return result;
}

Next.js Route Handler

Create an API route that proxies and streams AI responses:

// app/api/chat/route.ts
import { OpenAI } from 'openai';

const openai = new OpenAI();

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });

  // Create a TransformStream to process the response
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const content = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(content));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Transfer-Encoding': 'chunked',
    },
  });
}

React component for streaming

'use client';

import { useState, useRef } from 'react';

export function Chat() {
  const [input, setInput] = useState('');
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const abortControllerRef = useRef<AbortController | null>(null);

  async function handleSubmit(e: React.FormEvent) {
    e.preventDefault();
    if (!input.trim() || isStreaming) return;

    setResponse('');
    setIsStreaming(true);
    abortControllerRef.current = new AbortController();

    try {
      const res = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt: input }),
        signal: abortControllerRef.current.signal,
      });

      const reader = res.body?.getReader();
      const decoder = new TextDecoder();

      if (!reader) throw new Error('No reader');

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        setResponse((prev) => prev + chunk);
      }
    } catch (error) {
      if (error instanceof Error && error.name === 'AbortError') {
        console.log('Request cancelled');
      } else {
        console.error('Stream error:', error);
      }
    } finally {
      setIsStreaming(false);
    }
  }

  function handleCancel() {
    abortControllerRef.current?.abort();
  }

  return (
    <div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask something..."
        />
        <button type="submit" disabled={isStreaming}>
          {isStreaming ? 'Generating...' : 'Send'}
        </button>
        {isStreaming && (
          <button type="button" onClick={handleCancel}>
            Cancel
          </button>
        )}
      </form>
      <div className="whitespace-pre-wrap">{response}</div>
    </div>
  );
}

Vercel AI SDK

The Vercel AI SDK simplifies streaming with built-in hooks:

npm install ai openai
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(request: Request) {
  const { messages } = await request.json();

  const result = streamText({
    model: openai('gpt-4'),
    messages,
  });

  return result.toDataStreamResponse();
}
// components/Chat.tsx
'use client';

import { useChat } from 'ai/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat();

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong> {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Say something..."
        />
        <button type="submit" disabled={isLoading}>
          Send
        </button>
      </form>
    </div>
  );
}

The SDK handles:

  • Stream parsing
  • Message state management
  • Abort handling
  • Error recovery

Server-Sent Events format

Many AI APIs use SSE format:

data: {"choices":[{"delta":{"content":"Hello"}}]}

data: {"choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Parse SSE in the stream:

async function parseSSE(response: Response) {
  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop() || '';

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;

        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0]?.delta?.content;
          if (content) yield content;
        } catch {
          // Skip invalid JSON
        }
      }
    }
  }
}

Auto-scroll while streaming

Keep the latest content visible:

function StreamingContent({ content }: { content: string }) {
  const containerRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    if (containerRef.current) {
      containerRef.current.scrollTop = containerRef.current.scrollHeight;
    }
  }, [content]);

  return (
    <div ref={containerRef} className="overflow-auto max-h-96">
      {content}
    </div>
  );
}

Rendering markdown while streaming

For markdown content, wait for complete blocks:

import ReactMarkdown from 'react-markdown';

function StreamingMarkdown({ content }: { content: string }) {
  // Simple approach: render as markdown continuously
  return <ReactMarkdown>{content}</ReactMarkdown>;

  // Better: buffer incomplete code blocks
  // to avoid flickering
}

For code blocks, consider buffering until the closing fence:

function bufferCodeBlocks(content: string): string {
  const codeBlockRegex = /```[\s\S]*?```/g;
  const incompleteBlock = /```[^`]*$/;

  if (incompleteBlock.test(content)) {
    // Hide incomplete code block
    return content.replace(incompleteBlock, '');
  }

  return content;
}

Error handling

Handle network errors and API failures:

try {
  const response = await fetch('/api/chat', { ... });

  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.message || 'Request failed');
  }

  // Stream processing...
} catch (error) {
  if (error instanceof Error) {
    if (error.name === 'AbortError') {
      // User cancelled - not an error
      return;
    }
    setError(error.message);
  }
}

Rate limiting and retries

async function streamWithRetry(prompt: string, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      return await streamCompletion(prompt);
    } catch (error) {
      if (i === retries - 1) throw error;
      // Exponential backoff
      await new Promise((r) => setTimeout(r, 1000 * Math.pow(2, i)));
    }
  }
}

Summary

Streaming AI responses improves perceived performance by showing output as it's generated. Use fetch with ReadableStream for basic streaming. The Vercel AI SDK simplifies implementation with useChat and useCompletion hooks. Handle SSE format for most AI APIs. Implement auto-scroll, markdown rendering, and proper error handling for a polished experience.

Frequently Asked Questions

Streaming shows output as it's generated, making responses feel faster. Users see the first words in milliseconds instead of waiting seconds for the complete response.

Most AI APIs (OpenAI, Anthropic, Google) support streaming via Server-Sent Events. Set stream: true in your request and read the response as a ReadableStream.

Create a Route Handler that calls the AI API with streaming enabled. Return the stream directly or transform it. On the client, use fetch with getReader() to read chunks.

The Vercel AI SDK simplifies streaming AI responses in Next.js. It provides hooks like useChat and useCompletion that handle streaming, state management, and UI updates automatically.

Wrap the stream reading in try-catch. Handle connection drops by showing an error message and retry button. Parse error events from the AI API stream format.

Related Posts