[tor-bugs] #24716 [Core Tor/DirAuth]: Try cranking up cbttestfreq consensus param, to see if it helps the current overload

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Dec 22 16:51:22 UTC 2017


#24716: Try cranking up cbttestfreq consensus param, to see if it helps the current
overload
----------------------------------+--------------------
     Reporter:  arma              |      Owner:  (none)
         Type:  task              |     Status:  new
     Priority:  Medium            |  Milestone:
    Component:  Core Tor/DirAuth  |    Version:
     Severity:  Normal            |   Keywords:
Actual Points:                    |  Parent ID:
       Points:                    |   Reviewer:
      Sponsor:                    |
----------------------------------+--------------------
 In Tor 0.3.1.1-alpha, commit d5a151a, we switched:
 {{{
 -#define CBT_DEFAULT_TEST_FREQUENCY 60
 +#define CBT_DEFAULT_TEST_FREQUENCY 10
 }}}

 And on May 20 2017 the dir auths set the cbttestfreq consensus param to 10
 as well.

 Right now the network is overloaded with create cells, from the millions
 of new clients that showed up in the past weeks.

 Hypothesis 1: most of these clients are in learning mode much of the time,
 so 5 million clients * 10 seconds = 500k new create requests per second
 launched at the network, which contributes to the overload.

 Hypothesis 2: some of these clients have learned quite low timeouts,
 causing them to generate many circuits which they then almost immediately
 cancel, but not enough of their circuits fail that they back away from
 their learned value.

 Hypothesis 3: the clients are stuck in a sad loop where they learn a low
 cbt value, generate circuits for a while that mostly time out, eventually
 they give up on their cbt value, then they generate a circuit every 10s
 until they re-learn a low cbt value, and they cycle.

 The experiment here (set cbttestfreq to 600 seconds temporarily) should
 help us test these hypotheses. For 1, we will immediately reduce the load
 of new circuits. For 2, this will help more slowly, because we'll have to
 wait for each client to hit a situation where 90%+ of its circuit attempts
 are being timed out, but in theory clients will slowly shift from having a
 too-aggressive cbt, back into learning mode. And for 3, we'll push most
 clients to the "learning, but very slowly" phase of their sad loop.

 We can use the notice-level heartbeat messages in relay logs, to discover
 whether the total number of create cells goes down dramatically. If it
 does, win, we confirmed one or more of these hypotheses, and we can make a
 plan from there. If it doesn't, also win, we know we need to look
 elsewhere.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24716>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list