[tor-dev] Guardiness: Yet another external dirauth script

Mon Sep 22 10:49:18 UTC 2014

George Kadianakis <desnacked at riseup.net> writes:

> ==Guardiness: Yet another external dirauth script==
>
> ====Introduction====
>
> One well-known problem with Tor relays, is that Guards will suffer a
> big loss of traffic as soon as they get the Guard flag. This happens
> because clients pick guards every 2-3 months, so young guards will not
> get picked by old clients and mainly attract new clients. This is
> documented in 'phase three' of Roger's blog post:
> https://blog.torproject.org/blog/lifecycle-of-a-new-relay
>
> The problem gets even worse if we extend the guard lifetime to 8-9 months.
>
> The plan to solve this problem is to make client load balancing a bit
> smarter by priotizing guards that suffer this traffic loss as middle
> relays.
>
> The reason I'm sending this email is because this feature is by far
> the trickiest part of prop236 (guard node security) and I wanted to
> inform all dirauths of our plan and ask for feedback on the deployment
> procedure.
>
> <snip>
>
> ====How the guardiness script works====
>
> The guardiness script, is supposed to parse 2-3 months worth of
> consensuses (but should also be to do the same for 9 months worth of
> consensuses) and calculate the guard fraction of each guard, save it
> to a file, and have the dirauth read it to update its routerstatuses.
>
> One problem I encountered from early on, is that stem takes about
> 30mins to parse 3 months of consesuses (~2000 consensuses). Since this
> script should ideally be run every hour before each authority votes,
> such long parsing time is unacceptable.
>
> I mentioned this problem at 
> https://trac.torproject.org/projects/tor/ticket/9321#comment:19
> and stated a few possible solutions.
>
> I received some feedback from Nick, and the solution I decided to take
> in the end is to have another script that is called first and
> summarizes consensuses to summary files. Summary files are then saved
> to disk, and parsed by the guardiness script to produce an output file
> that is read by dirauths.
>
> Summary files are designed to be quick to parse (even with Python) and
> contain all the necessary information for guardiness. For example,
> parsing 2000 summary files in my laptop takes about 10 seconds.
>
> FWIW, the guardiness scripts are ready for review and can be found here:
> https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardiness
>

FWIW, a weasel suggested to me a potentially better solution than the
iffy summary files.

He suggested parsing consensuses and putting them in an sqlite3
database. Each time we have a new consensus, parse it and import it in
the database. Then query the database when creating the guardiness
output file.

The good thing with this approach is that you only need to parse
consensuses once, instead of every hour. Also, we don't need to do
file management for summary files etc. I like this approach and I will
be looking into it the following days.

I think that the weasel suggested that the database should have an
entry for each guard, and for each guard it should note down when that
guard was observed in a consensus (the precise date is required so
that we can discard expired observations).