Partially reliable and/or unordered WebRTC data channels

16 Mar 2023

      Shelikhoo brought up an interesting point at today's meeting. Snowflake
uses WebRTC data channels on the client–proxy link. Data channels are
SCTP in DTLS. They are by default fully reliable and in-order, like TCP,
and that is the way we currently use them. But SCTP and in turn WebRTC
data channels also have "partial reliability" and "unordered" options
that might be a better fit for our uses.

http://meetbot.debian.net/tor-meeting/2023/tor-meeting.2023-03-16-15.57.log....
16:25:29 <shelikhoo> webrtc(SCTP to be specific) can be set to not retransmit packet
16:25:34 <shelikhoo> and deliver packet out of order
16:25:43 <shelikhoo> to application
16:26:14 <dcf1> I did not know about that. Do we use it that way in Snowflake?
16:27:28 <shelikhoo> it is called "unreliable mode" https://pkg.go.dev/github.com/pion/webrtc/v3#DataChannel.MaxRetransmits
16:29:28 <dcf1> Ok. I might have missed an issue, is that mode used in Snowflake? Or could it be?
16:30:27 <shelikhoo> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
16:30:54 <shelikhoo> https://pkg.go.dev/github.com/pion/webrtc/v3#DataChannelInit
16:31:18 <shelikhoo> Maybe no...? but we can enable "unreliable mode"

## Background

The specification of WebRTC data channels, RFC 8831, requires the SCTP
implementation to support limiting transmissions by either time or
number.

https://www.rfc-editor.org/rfc/rfc8831
...
SCTP, as specified in [RFC4960] with the partial reliability extension
(PR-SCTP) defined in [RFC3758] and the additional policies defined in
[RFC7496], provides multiple streams natively with reliable, and the
relevant partially reliable, delivery modes for user messages.
...
The partial reliability extension defined in [RFC3758] MUST be
supported. In addition to the timed reliability PR-SCTP policy defined
in [RFC3758], the limited retransmission policy defined in [RFC7496]
MUST be supported.
Orthogonal to reliability, the SCTP implementation must support ordered
or unordered delivery of messages.

https://www.rfc-editor.org/rfc/rfc8831#name-sctp-protocol-consideration
...
This SCTP stack and its upper layer MUST support the usage of multiple
SCTP streams. A user message can be sent ordered or unordered and with
partial or full reliability.
Removing retransmissions and ordering restrictions makes a data channel
into a datagram delivery service that preserves message boundaries, but
not ordering, and does not guarantee delivery.

https://www.rfc-editor.org/rfc/rfc8831#name-sctp-protocol-consideration
...
Limiting the number of retransmissions to zero, combined with
unordered delivery, provides a UDP-like service where each user
message is sent exactly once and delivered in the order received.
The WebRTC Data Channel Establishment Protocol's DATA_CHANNEL_OPEN
message is where the ordering and retransmission limits are specified.
(Note partial reliability and ordering are two separate concepts, and
that you can only limit retransmissions by time or number, not both.)

https://www.rfc-editor.org/rfc/rfc8832.html#name-data_channel_open-message
...
DATA_CHANNEL_RELIABLE (0x00):
  The data channel provides a reliable in-order bidirectional
  communication.
DATA_CHANNEL_RELIABLE_UNORDERED (0x80):
  The data channel provides a reliable unordered bidirectional
  communication.
DATA_CHANNEL_PARTIAL_RELIABLE_REXMIT (0x01):
  The data channel provides a partially reliable in-order
  bidirectional communication. User messages will not be retransmitted
  more times than specified in the Reliability Parameter.
DATA_CHANNEL_PARTIAL_RELIABLE_REXMIT_UNORDERED (0x81):
  The data channel provides a partially reliable unordered
  bidirectional communication. User messages will not be retransmitted
  more times than specified in the Reliability Parameter.
DATA_CHANNEL_PARTIAL_RELIABLE_TIMED (0x02):
  The data channel provides a partially reliable in-order
  bidirectional communication. User messages might not be transmitted
  or retransmitted after a specified lifetime given in milliseconds in
  the Reliability Parameter. This lifetime starts when providing the
  user message to the protocol stack.
DATA_CHANNEL_PARTIAL_RELIABLE_TIMED_UNORDERED (0x82):
  The data channel provides a partially reliable unordered
  bidirectional communication. User messages might not be transmitted
  or retransmitted after a specified lifetime given in milliseconds in
  the Reliability Parameter. This lifetime starts when providing the
  user message to the protocol stack.
Though in SCTP the unordered flag (U) may vary per DATA chunk, data
channels can only be entirely ordered or entirely unordered.
https://www.rfc-editor.org/rfc/rfc9260#name-ordered-and-unordered-deliv

The JavaScript API exposes these settings in the
RTCPeerConnection.createDataChannel method:

https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/createDat...
...
`ordered` (Optional)
  Indicates whether or not messages sent on the RTCDataChannel are
  required to arrive at their destination in the same order in which
  they were sent (`true`), or if they're allowed to arrive
  out-of-order (`false`). Default: `true`.
`maxPacketLifeTime` (Optional)
  The maximum number of milliseconds that attempts to transfer a
  message may take in unreliable mode. While this value is a 16-bit
  unsigned number, each user agent may clamp it to whatever maximum it
  deems appropriate. Default: `null`.
`maxRetransmits` (Optional)
  The maximum number of times the user agent should attempt to
  retransmit a message which fails the first time in unreliable mode.
  While this value is a 16-bit unsigned number, each user agent may
  clamp it to whatever maximum it deems appropriate. Default: `null`.
Similar options are exposed in Pion PeerConnection.CreateDataChannel and
the DataChannelInit struct:

https://pkg.go.dev/github.com/pion/webrtc/v3#DataChannelInit
...
// Ordered indicates if data is allowed to be delivered out of order.
// The default value of true, guarantees that data will be delivered
// in order.
Ordered *bool
// MaxPacketLifeTime limits the time (in milliseconds) during which the
// channel will transmit or retransmit data if not acknowledged. This
// value may be clamped if it exceeds the maximum value supported.
MaxPacketLifeTime *uint16
// MaxRetransmits limits the number of times a channel will retransmit
// data if not successfully delivered. This value may be clamped if it
// exceeds the maximum value supported.
MaxRetransmits *uint16
## Possibilities for Snowflake

Currently in Snowflake, we treat the data channel as if it were a TCP
stream: we ignore message boundaries and concatenate all messages in
order. What we transmit over this stream, though, is a sequence of
discrete KCP packets (and possible padding), delimited using length
prefixes:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
The boundaries of encapsulated KCP packets do not necessarily correspond
to SCTP/data channel message boundaries.

We may be able to increase efficiency by using data channel messages
directly, one KCP packet per message. Because KCP has its own
retransmission and reordering facilities, we could completely turn off
reliability and ordering at the data channel layer.

One of the reasons we use the packet encapsulation we use now is to
permit shaping the size of network packets. But matching data channel
messages 1:1 with KCP packet does not necessarily negatively affect
traffic shaping capability. We could put a simple length prefix at the
head of each message that indicates the effective payload length, with
the rest being padding. If the traffic schedule calls for a payload
smaller than an immediately available packet, an all-padding message can
be sent.

Such a change would require changes at the proxy. Proxies would need a
second mode to take message boundaries directly from the data channel,
rather than concatenating messages and decoding message boundaries from
the joined-up stream. Proxies could backward-compatibly forward the
messages to the bridge over WebSocket, applying the packet encapsulation
at the proxy rather than the client. Or (requiring a change at the
bridge too), we could use WebRTC on the proxy–bridge link and similarly
send packets with unordered and unreliable data channels.

David Fifield

tags

participants (1)