[tor-bugs] #29206 [Circumvention/Snowflake]: New design for client -- server protocol for Snowflake

Fri Oct 25 23:28:37 UTC 2019

#29206: New design for client -- server protocol for Snowflake
-----------------------------------------------+---------------------------
 Reporter:  cohosh                             |          Owner:  cohosh
     Type:  task                               |         Status:
                                               |  needs_review
 Priority:  Medium                             |      Milestone:
Component:  Circumvention/Snowflake            |        Version:
 Severity:  Normal                             |     Resolution:
 Keywords:  anti-censorship-roadmap-september  |  Actual Points:
Parent ID:                                     |         Points:  6
 Reviewer:  dcf                                |        Sponsor:
                                               |  Sponsor28-must
-----------------------------------------------+---------------------------

Comment (by dcf):

 Replying to [comment:31 cohosh]:
 > I made several changes to the implementation.
 [https://github.com/cohosh/snowflake/tree/sequencing2 Here] are a series
 of commits on top of the old branch, and
 [https://github.com/cohosh/snowflake/tree/sequencing2_squashed here] is a
 newly squashed version.

 Branch sequencing2 is missing server/flurry.go. sequencing2_squashed has
 it.

 [https://github.com/cohosh/snowflake/blob/3a3bef35199d944464bf2e2bfa3e07d43ae7a2cc/common/proto/proto.go#L356-L361
 Here], it still looks like it should be `err2` inside the error handler,
 not `err`.
 {{{
         n, err2 := s.conn.Write(bytes)
         s.writeLock.Unlock()
         if err2 != nil {
                 log.Printf("Error writing to connection: %s", err.Error())
                 return len(b), err
         }
 }}}

 > > > Perhaps 10s is too long a timeout?
 > >
 > > I don't understand this. Do you mean too ''short'' a timeout? As a
 retransmission timer, 10 s doesn't seem short, but as a timer that
 terminates the whole end-to-end connection, it does seem short. Since in
 this design, there's no retransmission except when kicked off by a
 `NewSnowflake` transition, it might be worth increasing the timeout.
 >
 > You're right that 10s is short for a network connection timeout. This
 brings us to what remains to be a tricky engineering challenge here. The
 goal of the sequencing layer is to allow the client to recover from a
 faulty snowflake. However, it takes 30s for the connection to go stale in
 the client's `checkForStaleness` function. So it takes 30s for a client to
 request a new snowflake and start se nding data through it. In all of my
 tests, the SOCKS connection timed out well before the client connected to
 a new snowflake. Since a new SOCKS connection means a new snowflake
 session and new OR connection anyway, this means the client never actually
 recovers and the browser reports a connection error.
 >
 > So my thought was that if we lowered it to 10s, we have a chance to
 recover before the SOCKS connection goes stale. However in practice this
 is still a bit too long and I'm still seeing SOCKS timeouts.
 >
 > So, I'm wondering if it's ok if we make this even shorter. All it means
 is that the client will abandon the proxy connection for a better one with
 less latency and that the threshold for that can be low (maybe 2-5
 seconds).

 I guess I really don't understand how this works. How is the SOCKS
 connection timing out? As far as I know, tor does terminate connections to
 client PTs because of idleness. Is there a specific log message I can look
 for to see when this happens? snowflake-client should be hiding those
 temporary losses of connectivity.

 Is it the timeout when tor sends a SOCKS request and is waiting for the
 "Grant" or "Reject" response? With the Snowflake protocol in place, tor
 should not be requesting new SOCKS sessions anyway -- because snowflake-
 client won't be closing its existing SOCKS connection every time it loses
 proxy connectivity. Anyway, you can always send a fake "Grant" response
 without waiting for a proxy to be available; that's what meek does and the
 PT protocol kind of forces that behavior on non-connection-oriented
 protocols like this.

 In any case, 5 seconds strikes me as way way too short for a liveness
 timer. We'll be throwing away good proxies all the time. The fact that tor
 is requesting new SOCKS connections indicates that we're signaling some
 failure condition upward that we shouldn't be, I think; I would check that
 out first.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29206#comment:34>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online