[tor-bugs] #25688 [Obfuscation/Snowflake]: proxy-go is still deadlocking occasionally

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Apr 5 02:04:04 UTC 2019


#25688: proxy-go is still deadlocking occasionally
--------------------------------------------+------------------------------
 Reporter:  dcf                             |          Owner:  cohosh
     Type:  defect                          |         Status:  needs_review
 Priority:  Low                             |      Milestone:
Component:  Obfuscation/Snowflake           |        Version:
 Severity:  Normal                          |     Resolution:
 Keywords:  network-team-roadmap-2019-Q1Q2  |  Actual Points:
Parent ID:                                  |         Points:  3
 Reviewer:                                  |        Sponsor:
--------------------------------------------+------------------------------

Comment (by dcf):

 I was going to quibble about `makePeerConnectionFromOffer` becoming
 blocking, which through `runSession`, will block the main polling loop in
 `main` (if I understand right). But I think this is an architectural
 problem unrelated to the deadlock fix, so let's ignore it. (It seems like
 `makePeerConnectionFromOffer` and `sendAnswer` should run in their own
 goroutine, responsible for a single PeerConnection.)

 Replying to [comment:22 cohosh]:
 > Replying to [comment:21 dcf]:
 > I am ok with this as well, but we should probably be tearing down the
 peer connections properly after a timeout anyway (though maybe go handles
 this on its own eventually?)

 I'm looking at the code and I don't quite get how it's supposed to work.
 The error handlers
 ([https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L302 here]
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L313 here]
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L318 here]) call `pc.Destroy()`, and `retToken()`
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L355 in the caller]. The timeout checker
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L334 here] calls both `pc.Destroy()` and also
 `retToken()`, which makes sense because it doesn't have a caller to call
 `retToken()`. So that looks good.

 When a PeerConnection ends "naturally", I suppose what happens is that
 [https://github.com/cohosh/snowflake/blob/08f5205461573bf8a6e8961540ac620865a3b45c
 /proxy-go/snowflake.go#L334 dc.OnClose()] gets called, which calls
 `pc.DeleteDataChannel(dc)`, but not `pc.Destroy()` nor `retToken()`. I can
 understand why `pc.DeleteDataChannel(dc)` gets called here and not in the
 other cases--because in the other cases we don't have a DataChannel yet--
 but then I'm wondering why it doesn't call the other two functions. Are we
 losing a token in this case too?

 I was thinking, what we need is an OnError handler so we can get called
 back when a DataChannel fails to establish. I found
 [https://github.com/keroserene/go-
 webrtc/blob/0c5ebb10a5dd7990a4962b78de27c2a8c735dac0/datachannel.go#L47-L50
 this comment]:
 {{{
 OnError - is not implemented because the underlying Send
 always returns true as specified for SCTP, there is no reasonable
 exposure of other specific errors from the native code, and OnClose
 already covers the bases.
 }}}
 I was curious about what happens in browser WebRTC so I hacked
 [https://developer.mozilla.org/en-
 US/docs/Web/API/WebRTC_API/Simple_RTCDataChannel_sample this demo] and
 [https://github.com/mdn/samples-
 server/tree/49c605fbda926a2dce9955b362233eef673e6090/s/webrtc-simple-
 datachannel code] to comment out [https://github.com/mdn/samples-
 server/blob/49c605fbda926a2dce9955b362233eef673e6090/s/webrtc-simple-
 datachannel/main.js#L60-L62 the onicecandidate callback] in the remote.
 What happens is you get a browser-produced line in the console:
 {{{
 ⚠️ ICE failed, add a STUN server and see about:webrtc for more details
 }}}
 but none of the error callbacks get called, so the application is never
 aware of the failure. (There is a [https://developer.mozilla.org/en-
 US/docs/Web/API/RTCPeerConnection/onicecandidateerror onicecandidateerror]
 callback but apparently nothing implements it.) So it looks like a browser
 is not doing anything fundamentally different, and checking after a
 timeout seems like a reasonable way to do it.

 So I think your patch looks good, if you can
 1. answer whether the `OnClose` path needs to call `retToken()`
 2. move the `time.Minute` value into a commented constant

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25688#comment:23>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list