[tor-bugs] #9022 [Pluggable transport]: Create an XMPP pluggable transport

Tue Jun 18 16:04:29 UTC 2013

#9022: Create an XMPP pluggable transport
---------------------------------+------------------------------------------
 Reporter:  asn                  |          Owner:  feynman 
     Type:  task                 |         Status:  accepted
 Priority:  normal               |      Milestone:          
Component:  Pluggable transport  |        Version:          
 Keywords:                       |         Parent:          
   Points:                       |   Actualpoints:          
---------------------------------+------------------------------------------

Comment(by feynman):

 Replying to [comment:38 asn]:
 > Hey feynman,
 >
 > thanks for all the new features, and sorry for being less active on this
 lately.
 >
 > BTW, due to the encryption of TLS, I'm not sure how helpful the caching
 is, since all TLS records should look unique on the wire. For the same
 reason, zlib might not find much stuff to compress in your TLS traffic.
 >

 TLS encryption should be completely independent of caching. It is not
 caching the TLS packet, but the data it sends *before* it gets encrypted
 with TLS. The same goes for the zlib compression stuff.

 > Also, could you document your TCP-like functionality in the spec? That
 is, how you calculate sequence identifiers and do ACKs, etc.

 I will document all this functionality ASAP (probably over the next couple
 of days). For now, let me give you a run down of what happens:

 1. There is data to be read from the socket.
    a. Data is read from a socket and added to a buffer, which is
 periodically checked.
    b. When data is found in the buffer or the cache, the buffered data is
 added to the cached data, the length of the buffered data (if greater than
 zero) is appended to a separate list of cache lengths, and the current
 time is appended to a list of timestamps.
    c. All cached data is compressed, base 64 encoded, and put in a "data"
 stanza
    d. All the lengths of each cache is comma separated in a "chunks"
 stanza
    e. Local and remote ips and ports are set in their respective stanzas
    f. A comma separated list of all the accounts that the computer
 controls and are connected to the chat server are set in an "aliases"
 stanza.
    g. The socket's id variable is incremented by one (mod sys.maxsize).
    h. The iq message's id is set to the socket's id variable.
    i. The above stanzas are appended to the iq message in a 'packet'
 stanza
    j. The recipient of the message is selected from a list of potential
 addresses given during the connection phase (not mentioned here).
    k. The sender of the message is selected from a list of accounts
 connected to the chat server.
    l. The message is sent over the chat server

 2. A message containing data is received.
    a. The computer computes "id_diff"="id in the message" - "last id
 received with the same local and remote ip and ports and set of aliases"
    b. If id_diff<=0 and id_diff>=-"peer's sys.maxsize"/2 (the latter
 quantity is established during the connection phase) then the message is
 declared redundant and a confirmation is sent regarding the id containing
 the most recent data (i.e. *not* the id of the message that was just
 received).
    c. If the message is not completely redundant, mod id_diff with "peer's
 sys.maxsize" to get the number of new chunks of data.
    d. Compute the number of bytes of data to ignore from the number of new
 chunks of data computed in (c) and the list of chunk sizes in the "chunks"
 stanza.
    e. Unzip the data, discarding the number of bytes computed in (d).
    f. Set the socket's "last id received" to the id of the current message
 and send a confirmation.
    g. Send the data to the socket.

 3. A confirmation of data is received.
    a. Compute the difference between the id of the message acknowledged
 with the appropriate socket's current id variable, storing the result as
 id_diff
    b. Mod the result of (a) with sys.maxsize
    c. Subtract the result of (b) from the number of caches stored.
    d. If the result of (d) is positive move on to e.
    e. set the new throttle rate (the period over which the socket waits
 before checking its buffer) to a complicated function, "F", of difference
 between the current time stamp and the time stamp recorded "result of (d)
 - 1" records ago. The complicated function "F" rescales the throttle rate
 to never goes above a maximum throttle rate/number of accounts connect to
 the chat server (so each account never sends messages slower than a
 certain rate) and the throttle rate never goes below a minimum throttle
 rate/number of accounts connected to the chat server (so each account
 never sends messages faster than a certain rate).
    f. The rate at which the socket reads data is adjusted based on the new
 throttle rate so that garbage collection need not happen for a certain
 minimum amount of time. This minimum amount of time is computed from the
 new throttle rate, together with a global constant "MAXIMUM_DATA" which
 contains the number of bytes that can be safely sent over the chat server,
 and another global constant "NUM_CACHES" which contains the minimum number
 of times the system should cache data before the cache size reaches
 MAXIMUM_DATA (and garbage collection takes place).
    g. The appropriate number of caches are cleared along with their
 recorded data lengths and time stamps (see 1a).

 I know that I could us a global constant to mod data rather than
 sys.maxsize (which varies from one architecture to another), but getting
 the system to run quickly and efficiently is more important at the moment.
 In the mean time, consider this an outline of the full protocol spec to
 come.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9022#comment:41>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online