VideoToolbox Remote Protocol

Status: Stable v1 Wire Format: Annex B (mandatory) Endianness: Big Endian (Network Byte Order)

1. Overview

The protocol uses a single TCP connection per session. It is stateful, starting with a handshake (HELLO), configuration (CONFIGURE), and then a stream of frames/packets.

Communication Modes

Mode Input Output Description
Encode FRAME PACKET Raw frames in, compressed NALs out.
Decode PACKET FRAME Compressed NALs in, raw frames out.
Transcode PACKET PACKET Compressed NALs in, compressed NALs out.

Sequence Flow (Encode)

sequenceDiagram
    participant C as FFmpeg (Client)
    participant S as vtremoted (Server)
    participant VT as VideoToolbox (Hardware)

    Note over C,S: Handshake
    C->>S: HELLO (token, codec, client_info)
    S-->>C: HELLO_ACK (status, caps)

    Note over C,S: Configuration
    C->>S: CONFIGURE (width, height, fmt)
    S->>VT: VTCompressionSessionCreate
    VT-->>S: Session Ready
    S-->>C: CONFIGURE_ACK (extradata)

    Note over C,S: Streaming (Encode Mode)
    loop Frames
        C->>S: FRAME (raw NV12 planes)
        S->>VT: VTCompressionSessionEncodeFrame
        VT-->>S: Callback (CMSampleBuffer)
        S-->>C: PACKET (Annex B encoded)
    end

    Note over C,S: Teardown
    C->>S: FLUSH
    S-->>C: DONE

2. Transport & Framing

Header Structure

Offset Type Name Value
0 uint32 magic 0x56545231 (“VTR1”)
4 uint16 version 1
6 uint16 type Enum ID (see below)
8 uint32 length Payload size in bytes (excluding header)

3. Message Types

ID Name Direction Payload Description
1 HELLO C → S Initial handshake with auth token.
2 HELLO_ACK S → C Server acceptance/rejection.
3 CONFIGURE C → S Stream parameters.
4 CONFIGURE_ACK S → C Finalized config & codec extradata.
5 FRAME Bidirectional Raw image data (planes).
6 PACKET Bidirectional Encoded bitstream (Annex B).
7 FLUSH C → S Request to drain pipeline.
8 DONE S → C Pipeline drained signal.
9 ERROR Bidirectional Fatal error info.
10 PING Bidirectional Keepalive.
11 PONG Bidirectional Keepalive response.
12 PACKET_ACK S → C Transcode-mode input packet credit.

4. Message Payloads

Handshake

HELLO (Type 1)

HELLO_ACK (Type 2)

Configuration

CONFIGURE (Type 3)

[!NOTE] options is a map of codec settings. Unknown keys are ignored. For mode=transcode, out_codec, out_width, out_height, scale_mode, and optional client feature requests such as packet_ack.v1=1 are passed here.

Streaming

FRAME (Type 5) Raw frame planes.

PACKET (Type 6) Encoded Annex B NAL units.

PACKET_ACK (Type 12) Empty payload. Servers that advertise packet_ack.v1 may send this only after a transcode-mode client also requests packet_ack.v1=1 in CONFIGURE. The ACK is emitted after the server accepts an input PACKET; clients use it for in-flight input credit instead of assuming every input packet produces one output packet.

5. Security & Error Handling

Error Codes

Code Meaning
1 Auth Failure
2 Server Busy
3 Unsupported Config
4 Bad Request
5 Internal Error