> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qoder.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Incremental Streaming

> Enable and verify incremental streaming events for Cloud Agent Sessions.

Incremental streaming lets clients receive assistant output chunks before the final full `agent.message` event. Enable it when you create the Session:

```json theme={null}
{
  "incremental_streaming_enabled": true
}
```

The setting is stored on the Session. It is not controlled by a stream request query parameter or header.

## Endpoints

The paths do not change:

| API                                                                  | Incremental behavior            |
| -------------------------------------------------------------------- | ------------------------------- |
| `GET /api/v1/cloud/sessions/{session_id}/events/stream`              | Live Session SSE stream         |
| `GET /api/v1/cloud/sessions/{session_id}/threads/{thread_id}/stream` | Live thread-scoped SSE stream   |
| `GET /api/v1/cloud/sessions/{session_id}/events`                     | Historical Session event replay |
| `GET /api/v1/cloud/sessions/{session_id}/threads/{thread_id}/events` | Historical thread-scoped replay |

When `incremental_streaming_enabled` is `false` or omitted, these APIs keep the original behavior and hide incremental events.

## Event Model

Public events align with qodercli/Anthropic raw stream events. CAW emits raw stream event types such as `message_start` and `content_block_start`; CAS exposes them publicly by adding the `agent.` prefix to `type` and by adding CAS metadata such as `id`, `session_id`, `session_thread_id`, `turn_id`, `message_id`, `parent_tool_use_id`, and `processed_at`.

Top-level incremental event types:

```text theme={null}
agent.message_start
agent.content_block_start
agent.content_block_delta
agent.content_block_stop
agent.message_delta
agent.message_stop
```

Chunk kinds are nested inside `agent.content_block_delta.delta.type`. They are not top-level event types.

In the current implementation, `agent.content_block_start.content_block.type` can be:

| `content_block.type` | Meaning                                                     |
| -------------------- | ----------------------------------------------------------- |
| `thinking`           | Thinking block with initial empty `thinking`                |
| `text`               | Text block with initial empty `text`                        |
| `tool_use`           | Tool-use block with `id`, `name`, and initial empty `input` |

| `delta.type`        | Meaning                                                                                                                                                      |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `text_delta`        | Text output chunk                                                                                                                                            |
| `thinking_delta`    | Thinking chunk, when emitted by the model/provider                                                                                                           |
| `signature_delta`   | Signature chunk for thinking blocks, when available                                                                                                          |
| `input_json_delta`  | Tool input JSON chunk. The product-level `tool_input_delta` concept uses this wire shape, with `partial_json`                                                |
| `tool_output_delta` | Reserved for future tool output streaming. The current implementation does not emit this delta; tool results still return as full `agent.tool_result` events |

Common event shapes:

```json theme={null}
{
  "type": "agent.message_start",
  "message_id": "asst_...",
  "message": {
    "id": "asst_...",
    "type": "message",
    "role": "assistant",
    "model": "qwen3-coder-plus",
    "content": [],
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 0,
      "output_tokens": 0
    }
  }
}
```

```json theme={null}
{
  "type": "agent.content_block_delta",
  "message_id": "asst_...",
  "index": 0,
  "delta": {
    "type": "text_delta",
    "text": "Hello"
  }
}
```

```json theme={null}
{
  "type": "agent.message_delta",
  "message_id": "asst_...",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null,
    "container": null
  },
  "usage": {
    "input_tokens": 123,
    "output_tokens": 45
  }
}
```

The full `agent.message` is still returned after the incremental sequence and remains the authoritative final result. For the same text block, appending `text_delta.text` in order should equal the final `agent.message.content[index].text`; if the provider omits a final suffix from stream chunks, CAW emits the remaining suffix as a final `text_delta` before `agent.content_block_stop`.

## Quick Verification

Prerequisites:

* A valid PAT in `QODER_PAT`
* An existing Agent ID in `AGENT_ID`
* An existing Environment ID in `ENVIRONMENT_ID`
* `jq` installed locally

Set the API base. Use the production base by default, or replace it with the Global Test base when validating pre-release deployments.

```bash theme={null}
export BASE_URL="https://api.qoder.com/api/v1/cloud"
# export BASE_URL="https://test-api.qoder.ai/api/v1/cloud"

export QODER_PAT="pat_..."
export AGENT_ID="agent_..."
export AGENT_VERSION="1"
export ENVIRONMENT_ID="env_..."
```

Create a Session with incremental streaming enabled:

```bash theme={null}
SESSION_JSON=$(
  jq -n \
    --arg agent_id "$AGENT_ID" \
    --argjson agent_version "$AGENT_VERSION" \
    --arg environment_id "$ENVIRONMENT_ID" \
    '{
      agent: {id: $agent_id, type: "agent", version: $agent_version},
      environment_id: $environment_id,
      title: "incremental streaming verification",
      incremental_streaming_enabled: true
    }' |
  curl -s -X POST "$BASE_URL/sessions" \
    -H "Authorization: Bearer $QODER_PAT" \
    -H "Content-Type: application/json" \
    --data-binary @-
)

export SESSION_ID=$(echo "$SESSION_JSON" | jq -r '.id')
echo "$SESSION_JSON" | jq '{id, incremental_streaming_enabled, status}'
```

Expected response:

```json theme={null}
{
  "id": "sess_...",
  "incremental_streaming_enabled": true,
  "status": "idle"
}
```

Open the Session SSE stream in one terminal:

```bash theme={null}
curl -sN "$BASE_URL/sessions/$SESSION_ID/events/stream" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Accept: text/event-stream" |
while IFS= read -r line; do
  case "$line" in
    event:*) echo "$line" ;;
    data:*) echo "${line#data: }" | jq -cr '{type, message_id, index, block_type: .content_block.type, delta_type: .delta.type, text: .delta.text, thinking: .delta.thinking, partial_json: .delta.partial_json}' ;;
  esac
done
```

Send a user message from another terminal:

```bash theme={null}
curl -s -X POST "$BASE_URL/sessions/$SESSION_ID/events" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {
        "type": "user.message",
        "content": [
          {"type": "text", "text": "Reply with exactly one short sentence."}
        ]
      }
    ]
  }' | jq
```

The stream should show the incremental sequence before the final full events:

```text theme={null}
event: agent.message_start
event: agent.content_block_start
event: agent.content_block_delta
{"type":"agent.content_block_delta","delta_type":"text_delta","text":"..."}
event: agent.content_block_stop
event: agent.message_delta
event: agent.message_stop
event: agent.message
event: session.status_idle
```

Verify historical replay:

```bash theme={null}
curl -s "$BASE_URL/sessions/$SESSION_ID/events?limit=200" \
  -H "Authorization: Bearer $QODER_PAT" |
jq -r '.data[].type' |
grep -E 'agent\.message_start|agent\.content_block_delta|agent\.message_stop'
```

Verify thread-scoped replay and stream:

```bash theme={null}
export THREAD_ID=$(
  curl -s "$BASE_URL/sessions/$SESSION_ID/threads?limit=20" \
    -H "Authorization: Bearer $QODER_PAT" |
  jq -r '.data[0].id'
)

curl -s "$BASE_URL/sessions/$SESSION_ID/threads/$THREAD_ID/events?limit=200" \
  -H "Authorization: Bearer $QODER_PAT" |
jq -r '.data[].type' |
grep -E 'agent\.message_start|agent\.content_block_delta|agent\.message_stop'
```

To watch the thread-scoped live stream, use the same parser loop against the thread stream endpoint:

```bash theme={null}
curl -sN "$BASE_URL/sessions/$SESSION_ID/threads/$THREAD_ID/stream" \
  -H "Authorization: Bearer $QODER_PAT" \
  -H "Accept: text/event-stream" |
while IFS= read -r line; do
  case "$line" in
    event:*) echo "$line" ;;
    data:*) echo "${line#data: }" | jq -cr '{type, message_id, index, block_type: .content_block.type, delta_type: .delta.type, text: .delta.text, thinking: .delta.thinking, partial_json: .delta.partial_json}' ;;
  esac
done
```

## Disabled Control

Create another Session without `incremental_streaming_enabled`, or set it to `false`. The response should include:

```json theme={null}
{
  "incremental_streaming_enabled": false
}
```

After sending the same `user.message`, stream and history should contain full events such as `agent.message` and `session.status_idle`, but not the incremental event types listed above.

## Parser Checklist

* Treat SSE `event:` and JSON `data.type` as the public event type.
* Reconstruct text by appending `agent.content_block_delta.delta.text` for events whose `delta.type` is `text_delta`.
* Reconstruct thinking by appending `agent.content_block_delta.delta.thinking` for events whose `delta.type` is `thinking_delta`; a full compatibility `agent.thinking` event may still arrive later.
* For tool input increments, check `delta.type == "input_json_delta"` and append `delta.partial_json`; do not expect a top-level `tool_input_delta` event.
* `tool_output_delta` is not emitted yet; tool execution results return as full `agent.tool_result` events.
* Track `index` to keep multiple content blocks separate.
* Treat `processed_at` as optional on agent-generated events.
* Keep listening after `session.status_idle` if the client supports multiple turns on the same connection.
* Reconnect with `Last-Event-ID` after network drops.
