[tor-bugs] #25688 [Obfuscation/Snowflake]: proxy-go is still deadlocking occasionally

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Apr 5 14:43:09 UTC 2019


#25688: proxy-go is still deadlocking occasionally
--------------------------------------------+------------------------------
 Reporter:  dcf                             |          Owner:  cohosh
     Type:  defect                          |         Status:  needs_review
 Priority:  Low                             |      Milestone:
Component:  Obfuscation/Snowflake           |        Version:
 Severity:  Normal                          |     Resolution:
 Keywords:  network-team-roadmap-2019-Q1Q2  |  Actual Points:
Parent ID:                                  |         Points:  3
 Reviewer:                                  |        Sponsor:
--------------------------------------------+------------------------------

Comment (by cohosh):

 Replying to [comment:23 dcf]:
 > I was going to quibble about `makePeerConnectionFromOffer` becoming
 blocking, which through `runSession`, will block the main polling loop in
 `main` (if I understand right). But I think this is an architectural
 problem unrelated to the deadlock fix, so let's ignore it. (It seems like
 `makePeerConnectionFromOffer` and `sendAnswer` should run in their own
 goroutine, responsible for a single PeerConnection.)
 >
 I agree that these should ideally be their own goroutine, though this is
 unrelated to this bug. Note also that the blocking of
 makePeerConnectionFromOffer is accompanied by a 3 second timeout
 [https://github.com/keroserene/go-
 webrtc/blob/a1272c08ab1d5ca154c6794ddc5f73d2e576fe1b/peerconnection.cc#L350
 here] in the blocking library call. It's not great design to rely on this
 but perhaps this is better fixed in a separate refactoring ticket.
 > Replying to [comment:22 cohosh]:
 > > Replying to [comment:21 dcf]:
 > > I am ok with this as well, but we should probably be tearing down the
 peer connections properly after a timeout anyway (though maybe go handles
 this on its own eventually?)
 >
 > I'm looking at the code and I don't quite get how it's supposed to work.
 The error handlers
 ([https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L302 here]
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L313 here]
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L318 here]) call `pc.Destroy()`, and `retToken()`
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L355 in the caller]. The timeout checker
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L334 here] calls both `pc.Destroy()` and also
 `retToken()`, which makes sense because it doesn't have a caller to call
 `retToken()`. So that looks good.
 >
 > When a PeerConnection ends "naturally", I suppose what happens is that
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L334 dc.OnClose()] gets called, which calls
 `pc.DeleteDataChannel(dc)`, but not `pc.Destroy()` nor `retToken()`. I can
 understand why `pc.DeleteDataChannel(dc)` gets called here and not in the
 other cases--because in the other cases we don't have a DataChannel yet--
 but then I'm wondering why it doesn't call the other two functions. Are we
 losing a token in this case too?
 >
 > I was thinking, what we need is an OnError handler so we can get called
 back when a DataChannel fails to establish. I found
 [https://github.com/keroserene/go-
 webrtc/blob/0c5ebb10a5dd7990a4962b78de27c2a8c735dac0/datachannel.go#L47-L50
 this comment]:
 > {{{
 > OnError - is not implemented because the underlying Send
 > always returns true as specified for SCTP, there is no reasonable
 > exposure of other specific errors from the native code, and OnClose
 > already covers the bases.
 > }}}
 > I was curious about what happens in browser WebRTC so I hacked
 [https://developer.mozilla.org/en-
 US/docs/Web/API/WebRTC_API/Simple_RTCDataChannel_sample this demo] and
 [https://github.com/mdn/samples-
 server/tree/49c605fbda926a2dce9955b362233eef673e6090/s/webrtc-simple-
 datachannel code] to comment out [https://github.com/mdn/samples-
 server/blob/49c605fbda926a2dce9955b362233eef673e6090/s/webrtc-simple-
 datachannel/main.js#L60-L62 the onicecandidate callback] in the remote.
 What happens is you get a browser-produced line in the console:
 > {{{
 > ⚠️ ICE failed, add a STUN server and see about:webrtc for more details
 > }}}
 > but none of the error callbacks get called, so the application is never
 aware of the failure. (There is a [https://developer.mozilla.org/en-
 US/docs/Web/API/RTCPeerConnection/onicecandidateerror onicecandidateerror]
 callback but apparently nothing implements it.) So it looks like a browser
 is not doing anything fundamentally different, and checking after a
 timeout seems like a reasonable way to do it.
 Yeah I found the same unimplemented error callback. I'm still not sure
 that an onIceCandidateError is what we want, since even if ICE succeeds at
 the proxy side, it still doesn't guarantee that the client will use the
 SDP offer they receive through the signaling channel. What we'd need is
 for the library itself to also have a timeout between creating the SDP
 answer and firing the OnDataChannel callback (which is essentially what
 we're doing in this code).
 >
 > So I think your patch looks good, if you can
 > 1. answer whether the `OnClose` path needs to call `retToken()`
 > 2. move the `time.Minute` value into a commented constant

 To answer (1), the call to `pc.Destroy()` and `retToken()` is handled by
 the goroutine running `dataChannelHandler`
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L225 here]. This function calls both of these when
 it returns, which it should automatically do when `OnClose` is called
 because the deletion of the data channel in `OnClose` will cause
 `CopyLoopTimeout` to return.

 It might actually be better design for us to move these to `OnClose`, but
 right now this works as well.

 (2) has now been implemented in
 [https://github.com/cohosh/snowflake/commit/62fddab153019ac7e5d7efd1d327b20aede921c3
 this commit]

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25688#comment:24>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list