[tor-bugs] #33211 [Circumvention/Snowflake]: proxy-go sometimes gets into a 100+% CPU state

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Feb 22 22:56:36 UTC 2020


#33211: proxy-go sometimes gets into a 100+% CPU state
-------------------------------------+------------------------------
 Reporter:  dcf                      |          Owner:  cohosh
     Type:  defect                   |         Status:  needs_review
 Priority:  Medium                   |      Milestone:
Component:  Circumvention/Snowflake  |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:                           |  Actual Points:
Parent ID:                           |         Points:
 Reviewer:                           |        Sponsor:
-------------------------------------+------------------------------

Comment (by dcf):

 Good work with the CCM discovery. It's different from what I filed the
 ticket about, but it will help everything regardless.

 I reproduced the symptoms I was thinking of and got a profile. Files are
 at https://people.torproject.org/~dcf/bug33211/. I had to let proxy-go run
 for an hour (details below) until it happened. Then I let it run for 30
 minutes at high CPU so that whatever was going wrong would dominate the
 profile. The [https://people.torproject.org/~dcf/bug33211/proxy-
 go.profile.2020-02-22T20:32:32Z.png graph] tells the story.
 `pion/dtls/internal/crypto/ccm.(*ccm).cbcRound` and
 `crypto/aes.encryptBlockAsm` are major contributors, but I believe they
 are a separate issue that is fixed by your comment:16. The culprit for
 what I was describing of is
 `pion/sctp.(*payloadQueue).markAllToRetrasmit`, called by
 `sctp.(*Association).onRetransmissionTimeout`. It looks to me like a timer
 somewhere is becoming negative and becoming a busy-loop (similar to the
 bug in quic-go from #33401).

 During the time of high CPU usage, proxy-go wasn't producing an abnormal
 amount of log output (probably about 1 line per second on average), nor
 was it sending an abnormal amount of packets.

 {{{
 go tool pprof proxy-go.380b1331.profiled 'proxy-
 go.profile.2020-02-22T20:32:32Z'
 File: proxy-go.380b1331.profiled
 Type: cpu
 Time: Feb 22, 2020 at 1:32pm (MST)
 Duration: 1.59hrs, Total samples = 1.08hrs (67.68%)
 Entering interactive mode (type "help" for commands, "o" for options)
 (pprof) top5
 Showing nodes accounting for 23.95mins, 37.03% of 64.68mins total
 Dropped 999 nodes (cum <= 0.32mins)
 Showing top 5 nodes out of 139
       flat  flat%   sum%        cum   cum%
   8.07mins 12.48% 12.48%   8.07mins 12.48%  runtime.futex
   4.65mins  7.20% 19.67%  11.15mins 17.23%
 github.com/pion/sctp.(*payloadQueue).markAllToRetrasmit
   4.35mins  6.72% 26.39%   6.35mins  9.82%  runtime.mapiternext
   3.98mins  6.15% 32.55%   3.98mins  6.15%  crypto/aes.encryptBlockAsm
   2.90mins  4.48% 37.03%   5.03mins  7.78%  runtime.scanobject
 (pprof) evince
 }}}

 == Procedure

 Here is what I did to collect the profile, starting at commit
 380b133155ad725126bc418d0e66b3c550b4c555 patched with attachment:proxy-go-
 profile.patch.2. I started a local broker, proxy, and bridge, then set up
 a loop to have a client connect every 2.5 minutes. I was also
 intermittently downloading through the SOCKS port 10000, but I don't know
 if that matters.

 {{{
 broker/broker --disable-tls --addr 127.0.0.1:8000
 proxy-go/proxy-go --broker http://127.0.0.1:8000/ --relay
 ws://127.0.0.1:8080/
 tor -f torrc.server
 while true; do timeout 150 tor -f torrc.client SOCKSPort 10000; done
 }}}

 torrc.server:
 {{{
 DataDirectory datadir-server
 SocksPort 0
 ORPort 9001
 ExtORPort auto
 BridgeRelay 1
 AssumeReachable 1
 PublishServerDescriptor 0
 ServerTransportListenAddr snowflake 0.0.0.0:8080
 ServerTransportPlugin snowflake exec server/server --disable-tls --log
 snowflake-server.log
 }}}

 torrc.client:
 {{{
 UseBridges 1
 DataDirectory datadir-client
 ClientTransportPlugin snowflake exec client/client \
 -url http://127.0.0.1:8000/ \
 -ice stun:stun.l.google.com:19302 \
 -log snowflake-client.log \
 -max 2
 Bridge snowflake 0.0.3.0:1
 }}}

 * I started everything at 2020-02-22 20:32:55.
 * proxy-go jumped to 200% CPU at about 2020-02-22 21:36:40, roughly one
 hour later. (It happened suddenly, not a gradual increase. The fans start
 spinning.)
 * After 5 minutes it was up to 250% CPU.
 * After 8 minutes it was up to 280% CPU.
 * I killed the tor client loop after 15 minutes at 2020-02-22 21:51:55.
 The high CPU use of proxy-go continued, being in the range of about
 260–295%.
 * I killed the tor server after 30 minutes at 2020-02-22 22:05:46. proxy-
 go's CPU use immediately became negligible.
 * I killed proxy-go at 2020-02-22 22:08:02 and saved the profile.

 The last log output of proxy-go was:
 {{{
 2020/02/22 22:04:23 Write 536 bytes --> WebRTC
 2020/02/22 22:04:26 Write 536 bytes --> WebRTC
 2020/02/22 22:04:28 Write 536 bytes --> WebRTC
 2020/02/22 22:04:29 Write 536 bytes --> WebRTC
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
 close 1006 (abnormal closure): unexpected EOF
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 OnClose channel
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 datachannelHandler ends
 2020/02/22 22:05:46 datachannelHandler ends
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33211#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list