Summary of the discussion today:
- The JSON formats from https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... look suitable, but the 400 responses should also be somehow encapsulated as errors inside the message body. Internal refactoring of the broker is not necessary at this point, as long as we're reasonably sure the message formats will work for other rendezvous methods. - We talked about where to put message version numbers. cohosh likes the idea of having the version number at the start of the message, outside the JSON container. We talked about an algorithm like this: - Read up to 64 bytes or the first newline. - If the first byte is '{', then this is a legacy-format message. - If there is no newline in the first 64 bytes, then error. - Interpret everything up to the first newline as a version number. - If the version number is not understood, then error. - Read the remainder of the body (up to some sane limit) - Parse the remainder according to the version number. - Should there be a version number on the broker's response messages too? Or can the client assume that the broker returns a response in a format appropriate for whatever the client's registration message was? We can make that assumption now, because we tightly control the broker, but if there were more broker or ones that we don't directly control, it might be harder to guarantee that. I don't remember what we decided on this point. - We brainstormed ideas for splitting the broker into components (#26092). Because this is a matter internal to the broker that does not affect protocol messages, we don't have to decide it now. We discussed placing the broker behind a TLS terminator like Apache, with the broker being a separate localhost web server, as is done with the ProxyPass setup for BridgeDB. Separate TLS termination would help with a future AMP cache rendezvous, which needs to somehow share TCP port 443 with the broker's HTTPS rendezvous. The most expedient way to do this would be to add /amp routes to the broker's existing HTTP handlers. An intermediate way would be to have the broker act like a TLS terminator for /amp routes, and proxy those requests to a separate localhost HTTP server that handles AMP cache rendezvous. cohosh and meskio may proceed with starting to break the broker into components if they have a good vision of how to do it. - Regardless of how the broker is factored, where should message parsing (and in the future, decryption/encryption) happen? Should the rendezvous receivers pass their messages to the broker matching module verbatim, without interpretation? Or should they parse the incoming messages into a uniform in-memory data structure, and pass that to the broker matching module? With encrypted messages, there is no possibility of parsing/interpretation, unless the rendezvous modules are trusted with the broker's long-term decryption key. - A single DNS query doesn't have enough room to contain a client registration. (Needs about 1500 bytes or 500 bytes if compressed, about 140 bytes are available.) We wondered if some domain-aware compression could shrink the message enough: stripping out fields we know are implied or unnecessary, and reinserting them on receipt. On a quick inspection, the a=fingerprint field is 32 bytes of non-compressible data, and a=ice-ufrag and a=ice-pwd are 16 and 24 bytes of high-entropy data. - Currently, the broker may return both status code 503 (Service Unavailable) and 504 (Gateway Timeout): https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/broker... https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/broker... but the client only knows about 503, handling 504 in a default case: https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/client...