[tor-bugs] #29206 [Circumvention/Snowflake]: New design for client -- server protocol for Snowflake

Thu Oct 17 07:34:45 UTC 2019

#29206: New design for client -- server protocol for Snowflake
-----------------------------------------------+---------------------------
 Reporter:  cohosh                             |          Owner:  cohosh
     Type:  task                               |         Status:
                                               |  needs_review
 Priority:  Medium                             |      Milestone:
Component:  Circumvention/Snowflake            |        Version:
 Severity:  Normal                             |     Resolution:
 Keywords:  anti-censorship-roadmap-september  |  Actual Points:
Parent ID:                                     |         Points:  6
 Reviewer:  dcf                                |        Sponsor:
                                               |  Sponsor28-must
-----------------------------------------------+---------------------------

Comment (by dcf):

 I think this design is looking pretty good. You and I converged on similar
 decisions here and in
 https://github.com/net4people/bbs/issues/14#issuecomment-542898991, such
 as maintaining a dynamic map of session ID to `net.Conn` on the server,
 and treating a session ID as a `net.Addr`.

 I see that a map of the type
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/server/server.go#L44-L48
 Flurry] maintains the set of mappings from client session IDs to
 (Snowflake proxy, OR conn). The map definitely needs protection against
 concurrent modification—I wonder if that's the cause of the occasional
 server failures you saw. If you haven't done it yet, try building with `go
 build -race` and see whether it reports anything. Take a look at the
 `connMap` type in reconnecting-kcp/server/conn_map.go in
 https://github.com/net4people/bbs/files/3736441/turbo-tunnel-reconnection-
 demo.zip. It's meant to be a synchronized version of the same data
 structure.

 Could `Flurry.conn` be a plain `net.Conn`, and `Flurry.or` also? Or do
 they need the specialization to `*proto.Snowflake.Conn` and
 `*net.TCPConn`?

 Could the directory common/snowflake-proto be called common/proto instead,
 so that the directory name matches the package name?

 > I see some motivation for another feature that allows us to set some
 kind of FIN or RST flag to notify the client that the OR port has been
 closed and the server that the client has closed the connection.

 Yes, but in the Tor model, it essentially never happens that a
 relay/bridge closes an ORPort connection, right? It's usually the client
 that decides when to disconnect.

 > Perhaps 10s is too long a timeout?

 I don't understand this. Do you mean too ''short'' a timeout? As a
 retransmission timer, 10 s doesn't seem short, but as a timer that
 terminates the whole end-to-end connection, it does seem short. Since in
 this design, there's no retransmission except when kicked off by a
 `NewSnowflake` transition, it might be worth increasing the timeout.

 I'm not sure about
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L341-L353 this design] of setting a timer on
 every `Write`, to wait for the ack to come back. Does the `timers` slice
 grow without bound? In the name of simplicity, could there be one timer,
 that gets reset whenever any amount of data is acked?

 Speaking of timeouts, I think it's worth reducing the timeout in tests, to
 make them run faster.

 I don't understand the design where methods such as
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L168 NewSnowflake] and
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L242 readLoop] take an optional
 `*snowflakeHeader`. Can you explain that?

 What's the reason for the
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L76-L78 nil check in snowflakeHeader.Marshal]?
 It's not a typical pattern for a method like this. Is there a place where
 the error is anticipated, or would it always indicate a bug of some sort?

 It looks like
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L333-L339 this block] is meant to return `0,
 err2`, not `len(b), nil`? Also `err2.Error()` in the log statement, not
 `err.Error()`.
 {{{
         if err2 != nil {
                 log.Printf("Error writing to connection: %s", err.Error())
                 return len(b), nil
         }
 }}}

 `SnowflakeConn` needs a test for duplicate and overlapping sequence
 numbers. That's the primary expected use case: the client sends 100 bytes
 with seq 1234, the server receives, the proxy dies before sending the
 server's ack, the client reconnects to a different proxy and retransmits
 the same 100 bytes with seq 1234, the server receives duplicate sequence
 numbers and should ignore the second packet. Here's a stab at writing a
 test:
 {{{
                 Convey("Overlapping sequence numbers", func(ctx C) {
                         sessionID := []byte("session_")
                         go func() {
                                 hb, _ := (&snowflakeHeader{
                                         seq:       0,
                                         ack:       0,
                                         length:    5,
                                         sessionID: sessionID,
                                 }).Marshal()
                                 // Write first 5 bytes.
                                 client.Write(hb)
                                 client.Write([]byte("HELLO"))
                                 // Exact duplicate packet.
                                 client.Write(hb)
                                 client.Write([]byte("HELLO"))

                                 hb, _ = (&snowflakeHeader{
                                         seq:       3,
                                         ack:       0,
                                         length:    5,
                                         sessionID: sessionID,
                                 }).Marshal()
                                 // Overlapping sequence number -- should
 be
                                 // rejected (not overwrite what was
 already
                                 // received) and bytes past the expected
 seq
                                 // ignored.
                                 client.Write(hb)
                                 client.Write([]byte("XXXXX"))

                                 // Now the expected sequence number.
                                 hb, _ = (&snowflakeHeader{
                                         seq:       5,
                                         ack:       0,
                                         length:    5,
                                         sessionID: sessionID,
                                 }).Marshal()
                                 client.Write(hb)
                                 client.Write([]byte("WORLD"))

                                 client.Close()
                         }()

                         received, err := ioutil.ReadAll(s)
                         ctx.So(err, ShouldEqual, nil)
                         ctx.So(received, ShouldResemble,
 []byte("HELLOWORLD"))
                 })
 }}}

 I think the
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto_test.go#L95 "Test reading multiple chunks" test] is
 more specific than necessary, in that it tests that `Write` boundaries are
 preserved on `Read`, when that behavior isn't required by `net.Conn` nor
 is it something we need. The first `s.Read` call would be within its
 rights to return 2, or 1, or even 0 bytes instead of 3, according to the
 contract of `io.Reader`. This would be better expressed, in my opinion, as
 some `io.ReadFull` calls on a 3-byte buffer, to ensure that all 3 bytes
 are read if availabl. Then the second `s.Read` should also return 3 bytes,
 not 2, notwithstanding that it spans what was originally a break between
 two `Write` calls.

 Should it be an EOF error to receive future sequence numbers, as is tested
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto_test.go#L207 here]? If there's a hole in the
 sequence stream, it seems like the future packet should just be dropped;
 it shouldn't cause an error. On the other hand, maybe we don't care about
 this case, because for us `SnowflakeConn` is always layered on top of a
 reliable transport, so while we could receive a duplicate ''past''
 sequence number, we could never receive a ''future'' sequence number. But
 if we're going to specify a behavior, I think it shouldn't be `(0,
 io.EOF)`, rather `(0, nil)` or block until at least 1 byte is available.

 The "webRTC header"
 [https://github.com/cohosh/snowflake/blob/1707fcc125180b73509688a510bfbe8fa697a99d/common
 /snowflake-proto/proto.go#L89 here] looks like a typo for
 "snowflakeHeader".
 {{{
 // Parses a webRTC header from bytes received on the
 // webRTC connection
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29206#comment:28>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online