1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246
|
## JackTrip network protocol (as implemented)
This document describes JackTrip’s **on-the-wire protocol** as implemented in the current source tree. It is intended for developers debugging or interoperating with JackTrip at the packet level.
### Scope and non-goals
- **In scope**: the real-time **UDP audio stream**, its headers and payload layout, the optional **UDP redundancy** framing, the small **UDP “stop” control packet**, and the **TCP handshake** used by hub/ping-server style deployments (including the authentication variant).
- **Out of scope**: local-only IPC (e.g. `QLocalSocket` “AudioSocket”), OSC control, and any higher-level application semantics outside packet exchange.
### Transports at a glance
- **UDP (audio)**: real-time audio is sent as UDP datagrams containing `PacketHeader` + raw audio payload.
- **UDP (control)**: a small fixed-size “stop” datagram is used to signal shutdown.
- **TCP (hub/ping-server handshake)**: a short-lived TCP connection is used to exchange ephemeral UDP port information (and optionally do TLS + credentials). The client sends 4 bytes representing the port number it is binding to, and the server responds by sending 4 bytes representing its own port number.
---
## UDP audio datagrams
### High-level framing
Each UDP datagram carries one of:
- **Audio datagram**: one or more **full packets** (header + audio payload). When redundancy is disabled, there is exactly one full packet per UDP datagram. When redundancy is enabled, multiple full packets are concatenated into a single UDP datagram to provide forward error correction (FEC) (see “UDP redundancy”).
- **Stop/control datagram**: exactly 63 bytes of `0xFF` (see “UDP stop/control datagram”).
### Packet header types
The header is selected by `DataProtocol::packetHeaderTypeT`:
- **DEFAULT**: `DefaultHeaderStruct` (the standard JackTrip header).
- **JAMLINK**: `JamLinkHeaderStuct` (JamLink compatibility).
- **EMPTY**: no header (payload only).
See `src/PacketHeader.h` and `src/PacketHeader.cpp`.
### Default header (`DEFAULT`)
On-wire layout is the in-memory `DefaultHeaderStruct` copied with `memcpy()` (no explicit endian conversions).
Fields (in order):
| Field | Type | Meaning |
|------:|------|---------|
| `TimeStamp` | `uint64_t` | Timestamp in microseconds since Unix epoch (see `PacketHeader::usecTime()`). |
| `SeqNumber` | `uint16_t` | Sequence number; increments once per audio period and wraps at 16 bits. |
| `BufferSize` | `uint16_t` | Audio period size \(N\) in **samples per channel**. |
| `SamplingRate` | `uint8_t` | Encoded sample-rate enum value (`AudioInterface::samplingRateT`), **not** Hz. |
| `BitResolution` | `uint8_t` | Bits per sample (8/16/24/32). |
| `NumIncomingChannelsFromNet` | `uint8_t` | Channel count expected from the peer “from network” direction (see notes below). |
| `NumOutgoingChannelsToNet` | `uint8_t` | Channel count the sender is placing into the payload (see notes below). |
#### Important interoperability notes
- **Endianness / ABI**: this header is serialized by raw `memcpy()` of a C struct. In practice this assumes:
- both sides are using compatible ABI/layout for the struct, and
- both sides are on the same endianness (typically **little-endian** on modern desktop platforms).
- **Channel fields are asymmetric**: the implementation uses these fields to convey “incoming vs outgoing” channel counts, including a couple of sentinel behaviors:
- `NumIncomingChannelsFromNet` is populated from local *audio interface output* channel count.
- `NumOutgoingChannelsToNet` may be set to `0` when in/out channel counts match, or to `0xFF` when there are zero audio interface input channels.
These behaviors come from `DefaultHeader::fillHeaderCommonFromAudio()` in `src/PacketHeader.cpp`.
### JamLink header (`JAMLINK`)
Please note that JamLink is an obsolete device.
JamLink uses a compact header:
| Field | Type | Meaning |
|------:|------|---------|
| `Common` | `uint16_t` | Bitfield describing mono/stereo, bit depth, sample rate, and samples-per-packet (JamLink “streamType”). |
| `SeqNumber` | `uint16_t` | Sequence number. |
| `TimeStamp` | `uint32_t` | Timestamp. |
The current implementation primarily fills this for JamLink constraints (mono, 48kHz, 64-sample buffers). See `JamLinkHeader::fillHeaderCommonFromAudio()` in `src/PacketHeader.cpp`.
### Empty header (`EMPTY`)
No header; the UDP payload is raw audio data only.
---
## UDP audio payload
### Size
For a single full packet (no redundancy), the UDP payload length is:
$$\text{headerBytes} + (N \times C \times \text{bytesPerSample})$$
Where:
- \(N\) is `BufferSize` (samples per channel)
- \(C\) is the number of channels present in the payload
- `bytesPerSample` is `BitResolution / 8`
### Channel/sample ordering (planar / non-interleaved)
On the wire, the payload is **planar** (non-interleaved) by channel:
- First \(N\) samples for channel 0
- Then \(N\) samples for channel 1
- …
This is explicit in `UdpDataProtocol` which converts between:
- **Internal**: interleaved layout \([n][c]\)
- **Network**: planar layout \([c][n]\)
See `UdpDataProtocol::sendPacketRedundancy()` and `UdpDataProtocol::receivePacketRedundancy()` in `src/UdpDataProtocol.cpp`.
### Sample encoding (bit resolution)
JackTrip processes audio internally as `float` (`sample_t`), but the network payload uses the selected bit resolution via `AudioInterface::fromSampleToBitConversion()` / `fromBitToSampleConversion()`.
Behavior by bit resolution (`AudioInterface::audioBitResolutionT`):
- **8-bit (`BIT8`)**: signed 8-bit integer, scaled from float in \([-1, 1]\).
- **16-bit (`BIT16`)**: signed 16-bit integer, written **little-endian**.
- **24-bit (`BIT24`)**: a **non-standard 3-byte format**: a 16-bit signed integer plus an 8-bit unsigned “remainder” byte.
- **32-bit (`BIT32`)**: raw 32-bit float bytes (`memcpy` of `float`), which implicitly assumes IEEE-754 and matching endianness.
See `src/AudioInterface.cpp`.
---
## UDP redundancy (optional)
JackTrip can send redundant audio packets to reduce audible artifacts from packet loss.
### Framing
With redundancy factor \(R\), each UDP datagram contains **R full packets** concatenated:
- The newest packet is first (`UDP[n]`), followed by older packets (`UDP[n-1]`, …).
- Total UDP payload length becomes `R * full_packet_size`.
The sender implements this by shifting a buffer and prepending the newest full packet each period.
See `UdpDataProtocol::sendPacketRedundancy()` and the explanatory comment block in `src/UdpDataProtocol.cpp`.
### Receiver behavior
Upon receiving a redundant datagram, the receiver:
- Reads the first packet’s `SeqNumber`.
- If it is not the next expected sequence, scans forward through the concatenated packets looking for the expected next one.
- May “revive” and deliver multiple packets from the redundant datagram in order.
- Treats large negative or implausibly large sequence jumps as **out-of-order** and ignores them.
See `UdpDataProtocol::receivePacketRedundancy()` in `src/UdpDataProtocol.cpp`.
---
## UDP stop/control datagram
JackTrip uses a special fixed-size UDP datagram to signal shutdown:
- **Length**: 63 bytes
- **Contents**: every byte is `0xFF`
The receiver checks for this exact pattern and treats it as “Peer Stopped”.
See `UdpDataProtocol::processControlPacket()` and the shutdown path in `UdpDataProtocol::run()` in `src/UdpDataProtocol.cpp`.
---
## Connection setup and “handshake”
JackTrip supports multiple deployment styles. The relevant “protocol” differs depending on mode.
### P2P server mode (UDP-only)
In P2P server mode, there is **no TCP handshake**. Instead:
- The server binds a UDP socket on its configured receive port.
- It waits for the first UDP datagram.
- It uses the datagram’s source address/port as the peer endpoint for subsequent UDP send/receive.
This supports basic NAT traversal by responding to the client’s observed source port.
See `JackTrip::serverStart()` and `JackTrip::receivedDataUDP()` in `src/JackTrip.cpp`.
### Hub / ping-server mode (TCP handshake + UDP audio)
When connecting to a hub/ping-server style endpoint, JackTrip uses a short-lived TCP connection to exchange UDP port information.
#### Unauthenticated handshake (no TLS)
Client → server (TCP):
- `int32` little-endian: the client’s UDP receive/bind port
- `gMaxRemoteNameLength` bytes: optional UTF-8 “remote client name” (null-terminated, padded with zeros)
Server → client (TCP):
- `int32` little-endian: the server-assigned UDP port the client should use as its peer port
The TCP connection is then closed.
Client-side send/receive logic: `JackTrip::receivedConnectionTCP()` and `JackTrip::receivedDataTCP()` in `src/JackTrip.cpp`
Server-side receive/send logic: `UdpHubListener::readClientUdpPort()` and `UdpHubListener::sendUdpPort()` in `src/UdpHubListener.cpp`
#### Authentication / TLS handshake (optional)
This is an extension of the same TCP handshake using values above 65535 as “auth response” codes.
High-level flow:
1. Client connects TCP and sends an `int32` little-endian value of `Auth::OK` to request authentication.
2. Server replies with an `int32` auth response (e.g. `Auth::OK`, `Auth::NOTREQUIRED`, `Auth::REQUIRED`, …).
3. If both sides proceed, TLS is established on the same TCP socket.
4. Client then sends:
- `int32` LE: UDP receive/bind port
- `gMaxRemoteNameLength` bytes: client name
- `int32` LE: username length (excluding null terminator)
- `int32` LE: password length (excluding null terminator)
- `username` bytes + `\0`
- `password` bytes + `\0`
5. Server validates credentials and replies with either:
- `int32` LE UDP port (<= 65535) on success, or
- `int32` LE auth error code (> 65535) on failure
Client-side: `JackTrip::receivedConnectionTCP()`, `JackTrip::connectionSecured()`, and `JackTrip::receivedDataTCP()` in `src/JackTrip.cpp`
Server-side: `UdpHubListener::receivedClientInfo()`, `UdpHubListener::checkAuthAndReadPort()`, and `UdpHubListener::sendUdpPort()` in `src/UdpHubListener.cpp`
---
## QoS marking (best-effort)
On supported platforms, JackTrip attempts to mark UDP packets as “voice” traffic:
- Linux/Unix: sets DSCP to 56 (`IP_TOS` / `IPV6_TCLASS` set to `0xE0`), and sets `SO_PRIORITY` to 6.
- Windows: uses QOS APIs with `QOSTrafficTypeVoice`.
- macOS: uses `SO_NET_SERVICE_TYPE` with `NET_SERVICE_TYPE_VO` (best-effort).
See `src/UdpDataProtocol.cpp`.
---
## References
For additional context on JackTrip's network behavior and interpretation of debug output (`-V` flag):
Chafe, C. (2018). I am Streaming in a Room. *Frontiers in Digital Humanities*, Volume 5. https://doi.org/10.3389/fdigh.2018.00027
|