[tor-bugs] #33211 [Circumvention/Snowflake]: proxy-go sometimes gets into a 100+% CPU state

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Feb 26 04:00:37 UTC 2020


#33211: proxy-go sometimes gets into a 100+% CPU state
-------------------------------------+-----------------------------------
 Reporter:  dcf                      |          Owner:  cohosh
     Type:  defect                   |         Status:  needs_information
 Priority:  Medium                   |      Milestone:
Component:  Circumvention/Snowflake  |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:                           |  Actual Points:
Parent ID:                           |         Points:
 Reviewer:                           |        Sponsor:
-------------------------------------+-----------------------------------

Comment (by dcf):

 Replying to [comment:20 cohosh]:
 > I haven't been able to reproduce this yet, but I'm going to try running
 it outside of a Docker container to see if that helps. I'm using
 attachment:0001-Add-logging-for-pion-library-calls.patch to log debug
 messages from the pion library.

 I got it to happen again, with the logging patch. Something went wrong
 with the profiling and the profile output was 0 bytes. I had built the
 executable with `-race`, but I didn't see any race-related messages in the
 log.
  * [https://people.torproject.org/~dcf/bug33211/proxy-
 go.2e9e8071.profiled+logging+race proxy-go.2e9e8071.profiled+logging+race]
 executable
  * [https://people.torproject.org/~dcf/bug33211/proxy-
 go.2e9e8071.profiled+logging+race.log.xz proxy-
 go.2e9e8071.profiled+logging+race.log.xz] log file, 6 MB, decompresses to
 158 MB because of massive logging

 I started everything according to the procedure in comment:17 at about
 2020-02-26 00:50. It jumped to 250% CPU about one hour later at 2020-02-26
 01:58:07. (Note that the pion log lines use local time (UTC–7) instead of
 UTC, so the timestamp to look for is 18:58:07.) It's obvious in the log
 file where it happens, starting at line 48424:

 {{{
 sctp DEBUG: 18:58:07.240692 association.go:648: [0xc000094ea0] sending
 SACK: SACK cumTsnAck=4213437655 arwnd=1048576 dupTsn=[]
 sctp DEBUG: 18:58:07.295662 stream.go:118: [0:0xc000094ea0]
 reassemblyQueue readable=true
 sctp DEBUG: 18:58:07.295750 stream.go:120: [0:0xc000094ea0]
 readNotifier.signal()
 sctp DEBUG: 18:58:07.295798 stream.go:122: [0:0xc000094ea0]
 readNotifier.signal() done
 sctp DEBUG: 18:58:07.414562 stream.go:118: [0:0xc000094ea0]
 reassemblyQueue readable=true
 sctp DEBUG: 18:58:07.414629 stream.go:120: [0:0xc000094ea0]
 readNotifier.signal()
 sctp DEBUG: 18:58:07.414671 stream.go:122: [0:0xc000094ea0]
 readNotifier.signal() done
 sctp DEBUG: 18:58:07.414811 association.go:648: [0xc000094ea0] sending
 SACK: SACK cumTsnAck=4213437657 arwnd=1048040 dupTsn=[]
 sctp DEBUG: 18:58:07.561561 stream.go:118: [0:0xc000094ea0]
 reassemblyQueue readable=true
 sctp DEBUG: 18:58:07.561630 stream.go:120: [0:0xc000094ea0]
 readNotifier.signal()
 sctp DEBUG: 18:58:07.561671 stream.go:122: [0:0xc000094ea0]
 readNotifier.signal() done
 sctp DEBUG: 18:58:07.636247 stream.go:118: [0:0xc000094ea0]
 reassemblyQueue readable=true
 sctp DEBUG: 18:58:07.636309 stream.go:120: [0:0xc000094ea0]
 readNotifier.signal()
 sctp DEBUG: 18:58:07.636358 stream.go:122: [0:0xc000094ea0]
 readNotifier.signal() done
 sctp DEBUG: 18:58:07.636511 association.go:648: [0xc000094ea0] sending
 SACK: SACK cumTsnAck=4213437659 arwnd=1048576 dupTsn=[]
 sctp DEBUG: 18:58:07.744138 stream.go:118: [0:0xc000094ea0]
 reassemblyQueue readable=true
 sctp DEBUG: 18:58:07.744215 stream.go:120: [0:0xc000094ea0]
 readNotifier.signal()
 sctp DEBUG: 18:58:07.744249 stream.go:122: [0:0xc000094ea0]
 readNotifier.signal() done
 sctp DEBUG: 18:58:07.762724 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=63 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763026 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=64 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763203 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=65 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763346 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=66 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763518 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=67 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763722 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=68 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.763895 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=69 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764038 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=70 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764202 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=71 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764426 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=72 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764601 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=73 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764741 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=74 cwnd=1228 ssthresh=4912
 sctp DEBUG: 18:58:07.764950 association.go:2066: [0xc0000941a0] T3-rtx
 timed out: nRtos=75 cwnd=1228 ssthresh=4912
 }}}

 It goes on with those `T3-rtx timed out` for 3.5 minutes until line
 1398659, where I stopped everything. About 6500 log lines per second.

 This was snowflake commit 2e9e8071787398d9713395d94d93b5427ac8d452,
 pion/sctp commit 32f160c5ee1fe015c168a0fc8272633b73c5e4bf.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33211#comment:24>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list