[tor-dev] Guardiness: Yet another external dirauth script

Tue Sep 16 14:15:59 UTC 2014

==Guardiness: Yet another external dirauth script==

====Introduction====

One well-known problem with Tor relays, is that Guards will suffer a
big loss of traffic as soon as they get the Guard flag. This happens
because clients pick guards every 2-3 months, so young guards will not
get picked by old clients and mainly attract new clients. This is
documented in 'phase three' of Roger's blog post:
https://blog.torproject.org/blog/lifecycle-of-a-new-relay

The problem gets even worse if we extend the guard lifetime to 8-9 months.

The plan to solve this problem is to make client load balancing a bit
smarter by priotizing guards that suffer this traffic loss as middle
relays.

The reason I'm sending this email is because this feature is by far
the trickiest part of prop236 (guard node security) and I wanted to
inform all dirauths of our plan and ask for feedback on the deployment
procedure.

====How guardiness works====

Authorities calculate for each relay how many consensuses it has been
a guard for the past 2-3 months, and then they note that fraction down
in the consensus.

Then clients parse the consensus and if they see a guard that has been
a guard for 55% of the past consensuses, they will consider that relay
as 55% guard and 45% non-guard (that's 100% - 55%).

You can find more information at:
https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-guard-node.txt#l101

The idea was that the guardiness script will be an external script
that is run by Tor in a similar fashion to the bandwidth auth
scripts. We chose that because we could write the script in a
high-level language and because it could be modular and we could
change the algorithm in the future if we wanted. Unfortunately, it
seems that external scripts in dirauths are a PITA to maintain as can
be seen by the lack of bwauth operators.

====How the guardiness script works====

The guardiness script, is supposed to parse 2-3 months worth of
consensuses (but should also be to do the same for 9 months worth of
consensuses) and calculate the guard fraction of each guard, save it
to a file, and have the dirauth read it to update its routerstatuses.

One problem I encountered from early on, is that stem takes about
30mins to parse 3 months of consesuses (~2000 consensuses). Since this
script should ideally be run every hour before each authority votes,
such long parsing time is unacceptable.

I mentioned this problem at 
https://trac.torproject.org/projects/tor/ticket/9321#comment:19
and stated a few possible solutions.

I received some feedback from Nick, and the solution I decided to take
in the end is to have another script that is called first and
summarizes consensuses to summary files. Summary files are then saved
to disk, and parsed by the guardiness script to produce an output file
that is read by dirauths.

Summary files are designed to be quick to parse (even with Python) and
contain all the necessary information for guardiness. For example,
parsing 2000 summary files in my laptop takes about 10 seconds.

FWIW, the guardiness scripts are ready for review and can be found here:
https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardiness

====How the guardiness script will be deployed====

The idea is that dirauths will add another script to their crontab
that is called every hour (before or after the bwauth scripts).

The script first calls the summarizer script, which goes to the
consensus/ directory and summarizes all consensuses it finds and puts
them in the summary/ directory. The summarizer script then deletes all
the consensuses that got summarized.

Then the script calls the the guardiness script, which goes to the
summary/ directory, parses all summary files it finds, and outputs a
guardiness output file that gets parsed by the dirauth prior to voting.

That should be all. Easy, eh? :)

Now I will start a FAQ section where I state my doubts and fears.

====FAQ====

- Q: Where do dirauths find all those old consensuses?

There are various ways for dirauths to populate their consensus/
directory. They could fetch consensuses from metrics, or they could
add a cron job that copies cached-consensus to a directory every hour.

However, I think the cleanest solution is to use Daniel Martí's
upcoming consensus diff changes. Daniel will add a torrc option that
allows Tor to save consensuses to a directory. My idea was to get
dirauths to use Daniel's code to populate their consensus/ directory
for two or three months. And then, after two or three months enable
the guardiness scripts.

To make sure that this is indeed the best approach, I need to learn
from Nick when he plans to merge Daniel's code to Tor.

- Q: How does guardiness look like in the consensus?

Here is how a guard with guardiness (GuardFraction) 10% looks like in
the consensus:

 r test006r HyS1DRHzEojbQVPZ1B3zAHc/HY0 9St4yWfV4huz5V86mt24HL3Yi2I 2014-09-06 13:44:28 127.0.0.1 5006 7006
 s Exit Fast Guard HSDir Running Stable V2Dir Valid
 v Tor 0.2.6.0-alpha-dev
 w Bandwidth=111 Unmeasured=1 GuardFraction=10

- Q: What are you afraid of?

I'm mainly afraid of misconfiguration problems. This guardiness system
is a bit complex and I'm not expecting dirauths to learn how to use it
and debug it, so it should work easily and well...

Here are some specific issues: 

-- File management

For example, I'm afraid of the file management mess that summary files
cause. We need to make sure that we don't leave old
consensuses/summary files rot in the filesystem. Or that we don't
summarize the same consensuses over and over again. To do that, I
added some optional cleanup switches to both scripts:

Specifically, the summarizer script can delete consensus files that
already got summarized and can also delete consensus files older than
3 months (or N months). Similarly, the guardiness.py script can delete
summary files older than 3 months (or N months).

The idea is that every time the cron job triggers, both scripts will
auto-delete the oldest summary/consensus file, keeping in disk only
the useful files.

-- Incomplete consensus data set

I'm afraid that a directory authority might not have a properly
populated consensus directory and hence advertise wrong guard
fractions. For example, maybe it only has 10 consensuses in its
consensus directory instead of 1900 consensuses. Since the authorities
only state the guardiness percentage in the consensus, it's not
possible to learn how many consensuses were in their dataset. Maybe we
need to add a "guardiness-consensus-parsed" in their votes, to easier
debug such issues?

Also, 3 months worth of consensuses is 2160 consensuses. Because
dirauths sometimes misbehave, it's certain that not all 2160
consensuses will have been issued and that's normal. But how do we
understand if dirauths have a sufficiently good consensus data set?
Is 2000 out of 2160 consensuses an OK data set? What about 1000 out of
2160 consensuses?

Furthermore, we need to make sure that dirauths don't consider old
consensuses in their GuardFraction calculations. To achieve this, both
scripts have a mandatory switch that allows operators to specify the
maximum consensus age that is acceptable. So for example, if you call
the summarizer script with 3 months of consensus age, it will not
parse consensuses older than 3 months. Furthermore, there is a CLI
switch that allows the scripts to delete expired consensuses.

- Q: Why do you slow stem instead of parsing consensuses with Python on your own?

This is another part where I might have taken the wrong design
decision, but I decided to not get into the consensus parsing business
and just rely on stem.

This is also because I was hoping to use stem to verify consensus
signatures. However, now that we might use Daniel's patch to populate
our consensus database, maybe we don't need to treat consensuses as
untrusted anymore.

If you think that I should try to parse the consensuses on my own,
please tell me and I will give it a try. Maybe it will be
fast. Definitely not as fast as summary files, but maybe we can parse
3 months worth of consesuses in 15 to 40 seconds.

- Q: Why do you mess with multiple summary files and you don't just have *one* summary file?

Because of the rolling nature of guardiness (we always want to
consider the past 3 months), every hour we need to _discard_ the
oldest observations (the consensus from 3 months ago) and start
considering the newest consensus.

Because we need to discard that oldest consensus, it's hard to keep
information about each consensus in a single summary file. And that's
why I chose to have a summary file for each consensus. Maybe it's the
wrong decision though...

- Q: What's up with the name "guardiness"?

It's quite terrible I know but it was the name that we used from quite
early on about this project.

I think before finalizing this task I'm going to rename everything to
'GuardFraction' since it's more self-explanatory. I'm also considering
names like "GuardTrafficLoadBalancer" etc.