[tor-bugs] #32108 [Core Tor/Tor]: tor can overrun its accountingmax if it enters soft hibernation first

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 16 09:52:32 UTC 2019


#32108: tor can overrun its accountingmax if it enters soft hibernation first
------------------------------+--------------------------------
     Reporter:  arma          |      Owner:  (none)
         Type:  defect        |     Status:  new
     Priority:  Medium        |  Milestone:
    Component:  Core Tor/Tor  |    Version:  Tor: 0.4.0.1-alpha
     Severity:  Normal        |   Keywords:  network-health
Actual Points:                |  Parent ID:
       Points:                |   Reviewer:
      Sponsor:                |
------------------------------+--------------------------------
 I'll put the punchline first: second_elapsed_callback(), which is where we
 check if it's time to go dormant for hibernation, is no longer called when
 we've entered soft hibernation.

 I assume this is because of the new "periodic event flag" feature, where
 we try to avoid calling callbacks when we're in a state that won't need
 them. See the "online and active" note here:
 {{{
   /* This is a legacy catch-all callback that runs once per second if
    * we are online and active. */
   CALLBACK(second_elapsed, NET_PARTICIPANT,
            FL(NEED_NET)|FL(RUN_ON_DISABLE)),
 }}}

 The impact is limited, since we stop accepting new connections and new
 circuits when we enter soft hibernation, but it can still be bad: existing
 connections and circuits can last for a long time and use a lot of
 bandwidth.

 A secondary impact is that accounting_run_housekeeping() never gets
 called, which means that the state file never gets updated after we've
 entered soft hibernation, which means these bandwidth overspends are never
 recorded to disk.

 I think the bug went in during commit 4bf79fa4f which is part of Tor
 0.4.0.1-alpha.

 The PERIODIC_EVENT_FLAG_NEED_NET flag (what FL(NEED_NET) expands into)
 checks net_is_disabled(), but there is another function right after
 net_is_disabled in netstatus.c called net_is_completely_disabled(), and
 the only difference is that it checks we_are_fully_hibernating() vs
 we_are_hibernating().

 I confirmed the overall bug happens in practice: there's a relay operator
 in #tor who hit soft hibernation, and then saw his tor proceed to use more
 bytes than expected. I had him do a 'gdb attach' to his tor and break on
 'second_elapsed_callback' and the function never got called.

 It seems like the immediate fix, and best backport plan, would be to
 resume calling second_elapsed_callback even when net_is_disabled(). The
 longer term plan can be to audit our calls to net_is_disabled() and
 net_is_completely_disabled() and we_are_hibernating(), with an eye towards
 "should we be doing this behavior while soft hibernating", and see what
 other bugs we find.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32108>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list