[tor-bugs] #9321 [Tor]: Load balance right when we have higher guard rotation periods

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Aug 4 09:32:39 UTC 2014


#9321: Load balance right when we have higher guard rotation periods
-------------------------+-------------------------------------------------
     Reporter:  arma     |      Owner:
         Type:  project  |     Status:  new
     Priority:  normal   |  Milestone:  Tor: 0.2.6.x-final
    Component:  Tor      |    Version:
   Resolution:           |   Keywords:  needs-proposal, tor-auth, tor-
Actual Points:           |  client, 026-triaged-1
       Points:           |  Parent ID:  #11480
-------------------------+-------------------------------------------------

Comment (by asn):

 Replying to [comment:16 NickHopper]:
 > Replying to [comment:13 asn]:
 >
 > > Few questions that will need to be answered:
 > >
 > > - Will the script be called periodically and the authorities will have
 > > to parse the output file every once in a while? Or will the script
 > > be ran once, and then it's the job of the authorities to internally
 > > update their state with new information?
 > >
 > > I'm currently aiming for the former behavior, to minimize the amount
 > > of code that needs to be written for little-t-tor. OTOH, this means
 > > that authorities will need to keep 9 months worth of consensuses in
 > > their filesystem. As we move closer to completion of this task we
 > > will see if the former behavior is indeed better.
 >
 > FWIW, I agree this is probably the right design, though parsing 9 months
 worth of consensuses with stem is no mean feat.  An alternative would be
 to have the script keep a summary file that is updated as new consensuses
 are fetched; it might store, e.g. the number of consensuses a relay
 appeared in for each day, and then could get batch updated.
 >

 Some thoughts on how the script should be ran by Tor.

 *Ideally*, the script should be ran every hour: everytime the Tor
 authorities are making a vote. This means that the script should
 be *quick*: Parsing 9 months worth of consensuses with stem takes
 about 1.5 hours on a decent box, which makes it impossible to run
 every hour.

 To work around this, we might try to use some sort of "summary file"
 that our script can keep updated with compact guard info about the
 past months, so that it doesn't need to parse all those consensuses
 every time.

 Unfortunately, summary files are not trivial in our case:

 a)
    We are considering 9 *rolling* months of consensuses and we run the
    script for every new consensus, this means that in every run we
    need to subtract the data of the oldest consensus (the one from 9
    months ago) since it has expired.

    Hence the "summary file" can't be a simple summary of the past 9
    months, because then we wouldn't know the exact values of the
    oldest consensus (that we need to subtract from our summary since
    newer observations took their place)

 b)
    Also, for every consensus (from the past 9 months), we need to keep
    track of _all_ guards; not only the ones that were referenced in
    recent consensuses. The reason for that, is that a guard might be
    taking a break for 6 months and then suddenly reappear in the
    consensus. If a guard's identity key has made an appearance in the
    past 9 months, we should have its data.

 With that in mind, here are some ideas on summary files:

 - We keep a summary file for every consensus.

   The idea here is that our summary files will be much faster to parse
   than full fledged consensuses. Ideally, the parsing time would be
   reduced from 1.5 hours to a few seconds/minutes, but I'm not sure if
   that's realistic.

   To get an idea of the data size, for a 9 months period, we are
   talking about 6480 summary files (consensuses) and about 2500 guards
   per summary file.

   This seems like the most easy to implement and understand scheme,
   but I'm not sure if it will be efficient enough.

 - We try to be smarter with summary files.

   For example, we could keep an 8-months summary file, and for the
   oldest month (the one 9 months ago) we actually parse the
   consensuses one-by-one, which allows us to know which consensus
   to ignore in every run.

   This idea seems like much harder engineering work: For example, in
   the beginning of each month, we will need to use a directory with
   consensuses for the new oldest month (since we no longer use a
   summary file for it).

   This plan seems doable but hefty engineering work, but it might give
   us the results we want fast: an 8-months summary file should be easy
   to parse, and 1 month of consensus should take about 10 minutes.

   We can imagine optimizations where we keep weekly summary files for
   the oldest month, etc.

 I also wonder how much of a speed up (or is that only space
 efficiency) we could gain by using some membership data structures
 like bloom filters.

 Also, the above ideas assume that we need to run the script (get
 updated about guardiness data) for every new vote, which is the best
 behavior. If we relax this requirement, and e.g. only run the script
 every week, we could potentially just suck it and parse the
 consensuses manually even if it takes 2 hours.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9321#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list