Skip to main content
Incremental streaming lets clients receive assistant output chunks before the final full agent.message event. Enable it when you create the Session:
{
  "incremental_streaming_enabled": true
}
The setting is stored on the Session. It is not controlled by a stream request query parameter or header.

Endpoints

The paths do not change:
APIIncremental behavior
GET /api/v1/cloud/sessions/{session_id}/events/streamLive Session SSE stream
GET /api/v1/cloud/sessions/{session_id}/threads/{thread_id}/streamLive thread-scoped SSE stream
GET /api/v1/cloud/sessions/{session_id}/eventsHistorical Session event replay
GET /api/v1/cloud/sessions/{session_id}/threads/{thread_id}/eventsHistorical thread-scoped replay
When incremental_streaming_enabled is false or omitted, these APIs keep the original behavior and hide incremental events.

Event Model

Public events align with qodercli/Anthropic raw stream events. CAW emits raw stream event types such as message_start and content_block_start; CAS exposes them publicly by adding the agent. prefix to type and by adding CAS metadata such as id, session_id, session_thread_id, turn_id, message_id, parent_tool_use_id, and processed_at. Top-level incremental event types:
agent.message_start
agent.content_block_start
agent.content_block_delta
agent.content_block_stop
agent.message_delta
agent.message_stop
Chunk kinds are nested inside agent.content_block_delta.delta.type. They are not top-level event types. In the current implementation, agent.content_block_start.content_block.type can be:
content_block.typeMeaning
thinkingThinking block with initial empty thinking
textText block with initial empty text
tool_useTool-use block with id, name, and initial empty input
delta.typeMeaning
text_deltaText output chunk
thinking_deltaThinking chunk, when emitted by the model/provider
signature_deltaSignature chunk for thinking blocks, when available
input_json_deltaTool input JSON chunk. The product-level tool_input_delta concept uses this wire shape, with partial_json
tool_output_deltaReserved for future tool output streaming. The current implementation does not emit this delta; tool results still return as full agent.tool_result events
Common event shapes:
{
  "type": "agent.message_start",
  "message_id": "asst_...",
  "message": {
    "id": "asst_...",
    "type": "message",
    "role": "assistant",
    "model": "qwen3-coder-plus",
    "content": [],
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 0,
      "output_tokens": 0
    }
  }
}
{
  "type": "agent.content_block_delta",
  "message_id": "asst_...",
  "index": 0,
  "delta": {
    "type": "text_delta",
    "text": "Hello"
  }
}
{
  "type": "agent.message_delta",
  "message_id": "asst_...",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null,
    "container": null
  },
  "usage": {
    "input_tokens": 123,
    "output_tokens": 45
  }
}
The full agent.message is still returned after the incremental sequence and remains the authoritative final result. For the same text block, appending text_delta.text in order should equal the final agent.message.content[index].text; if the provider omits a final suffix from stream chunks, CAW emits the remaining suffix as a final text_delta before agent.content_block_stop.

Quick Verification

Prerequisites:
  • A valid PAT in QODER_PAT
  • An existing Agent ID in AGENT_ID
  • An existing Environment ID in ENVIRONMENT_ID
  • jq installed locally
Set the API base. Use the production base by default, or replace it with the Global Test base when validating pre-release deployments.
export BASE_URL="https://api.qoder.com/api/v1/cloud"
# export BASE_URL="https://test-api.qoder.ai/api/v1/cloud"

export QODER_PAT="pat_..."
export AGENT_ID="agent_..."
export AGENT_VERSION="1"
export ENVIRONMENT_ID="env_..."
Create a Session with incremental streaming enabled:
SESSION_JSON=$(
  jq -n \
    --arg agent_id "$AGENT_ID" \
    --argjson agent_version "$AGENT_VERSION" \
    --arg environment_id "$ENVIRONMENT_ID" \
    '{
      agent: {id: $agent_id, type: "agent", version: $agent_version},
      environment_id: $environment_id,
      title: "incremental streaming verification",
      incremental_streaming_enabled: true
    }' |
  curl -s -X POST "$BASE_URL/sessions" \
    -H "Authorization: Bearer $QODER_PAT" \
    -H "Content-Type: application/json" \
    --data-binary @-
)

export SESSION_ID=$(echo "$SESSION_JSON" | jq -r '.id')
echo "$SESSION_JSON" | jq '{id, incremental_streaming_enabled, status}'
Expected response:
{
  "id": "sess_...",
  "incremental_streaming_enabled": true,
  "status": "idle"
}
Open the Session SSE stream in one terminal:
curl -sN "$BASE_URL/sessions/$SESSION_ID/events/stream" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Accept: text/event-stream" |
while IFS= read -r line; do
  case "$line" in
    event:*) echo "$line" ;;
    data:*) echo "${line#data: }" | jq -cr '{type, message_id, index, block_type: .content_block.type, delta_type: .delta.type, text: .delta.text, thinking: .delta.thinking, partial_json: .delta.partial_json}' ;;
  esac
done
Send a user message from another terminal:
curl -s -X POST "$BASE_URL/sessions/$SESSION_ID/events" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {
        "type": "user.message",
        "content": [
          {"type": "text", "text": "Reply with exactly one short sentence."}
        ]
      }
    ]
  }' | jq
The stream should show the incremental sequence before the final full events:
event: agent.message_start
event: agent.content_block_start
event: agent.content_block_delta
{"type":"agent.content_block_delta","delta_type":"text_delta","text":"..."}
event: agent.content_block_stop
event: agent.message_delta
event: agent.message_stop
event: agent.message
event: session.status_idle
Verify historical replay:
curl -s "$BASE_URL/sessions/$SESSION_ID/events?limit=200" \
  -H "Authorization: Bearer $QODER_PAT" |
jq -r '.data[].type' |
grep -E 'agent\.message_start|agent\.content_block_delta|agent\.message_stop'
Verify thread-scoped replay and stream:
export THREAD_ID=$(
  curl -s "$BASE_URL/sessions/$SESSION_ID/threads?limit=20" \
    -H "Authorization: Bearer $QODER_PAT" |
  jq -r '.data[0].id'
)

curl -s "$BASE_URL/sessions/$SESSION_ID/threads/$THREAD_ID/events?limit=200" \
  -H "Authorization: Bearer $QODER_PAT" |
jq -r '.data[].type' |
grep -E 'agent\.message_start|agent\.content_block_delta|agent\.message_stop'
To watch the thread-scoped live stream, use the same parser loop against the thread stream endpoint:
curl -sN "$BASE_URL/sessions/$SESSION_ID/threads/$THREAD_ID/stream" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Accept: text/event-stream" |
while IFS= read -r line; do
  case "$line" in
    event:*) echo "$line" ;;
    data:*) echo "${line#data: }" | jq -cr '{type, message_id, index, block_type: .content_block.type, delta_type: .delta.type, text: .delta.text, thinking: .delta.thinking, partial_json: .delta.partial_json}' ;;
  esac
done

Disabled Control

Create another Session without incremental_streaming_enabled, or set it to false. The response should include:
{
  "incremental_streaming_enabled": false
}
After sending the same user.message, stream and history should contain full events such as agent.message and session.status_idle, but not the incremental event types listed above.

Parser Checklist

  • Treat SSE event: and JSON data.type as the public event type.
  • Reconstruct text by appending agent.content_block_delta.delta.text for events whose delta.type is text_delta.
  • Reconstruct thinking by appending agent.content_block_delta.delta.thinking for events whose delta.type is thinking_delta; a full compatibility agent.thinking event may still arrive later.
  • For tool input increments, check delta.type == "input_json_delta" and append delta.partial_json; do not expect a top-level tool_input_delta event.
  • tool_output_delta is not emitted yet; tool execution results return as full agent.tool_result events.
  • Track index to keep multiple content blocks separate.
  • Treat processed_at as optional on agent-generated events.
  • Keep listening after session.status_idle if the client supports multiple turns on the same connection.
  • Reconnect with Last-Event-ID after network drops.