[tor-dev] Implications of switching to a single guard node: some conclusions

13 Mar 2014

      tl;dr: analysis seems to indicate that switching to one guard node
might not be catastrophic to the performance of Tor. To improve
performance some increased guard bandwidth thresholds are proposed
that seem to help without completely destroying the anonymity of the
network. Enjoy the therapeutic qualities of the graphs and please read
the whole post.

We start this post by assuming that we _should_ switch to one guard
for the security/anonymity arguments that were detailed in Tariq's
paper and Roger's blog post.

=== Performance implications of switching to 1 guard ===

The question now becomes, if we indeed switch to 1 guard, how does
that influence the performance of the Tor network? To answer this
question we look at the following graph which shows the expected
bandwidth for a client circuit:

https://people.torproject.org/~asn/guards2/perf_cdf_guard_bw_desc.png 
(see green and orange lines)

(I calculate the bandwidth using the descriptor bandwidth values [0]
and in the case of 3 guards we measure the expected bandwidth as the
average of the bandwidths of the three guard. [1])

For example, looking at the graph, we see that when three guards are
used, 1/5th of the clients will have performance below 5MB/s, whereas
with one guard 1/5th of the clients will have performance below 3MB/s.
Assuming that our assumptions are logical, this is almost half of the
bandwidth for the unlucky 1/5th single guard clients that happened to
pick a weak guard: not good.

At a later stage of our CDF, we see that in the three guards case,
half of the clients will have performance below 8MB/s whereas in the
one guard case they will have performance below 7MB/s. This is not
terribly bad, and the reason for this is that powerful guards have
more chance to be selected, so single-guard clients will tend to pick
those.

Finally, a crossover happens for the lucky 2/5ths of the single guard
clients, where they actually experience better performance than the
three guards clients since they picked a powerful guard and they only
use that. This is interesting but in real life the results might not
be so peachy, because the powerful guards will get more overloaded.

=== Client performance implications of bumping up the guard bandwidth threshold ===

So, now that we analyzed the performance implications of using a
single guard, let's see if we can improve the performance. One obvious
way of doing so is by increasing the bandwidth threshold for the Guard
flag. The threshold is currently at 250KB/s (according to dir-spec),
but let's see what happens from a performance perspective if we bump
it up to 2MB/s. Looking at the same graph as before, now pay attention
to the blue line.

We can see that for the unlucky 1/5th of the single guard clients who
had a bandwidth of 3MB/s, their bandwidth now becomes 4MB/s, which
seems like a decent improvement. Furthermore, the crossover happens
earlier now, which means that _supposedly_ half of the clients are
going to have better performance (modulo guard overload) compared to
the three guard case! 

I also made graphs for a bandwidth threshold of 1MB/s (since 2MB/s
sounded too crazy), you can find them here [2]:
https://people.torproject.org/~asn/guards2/perf_cdf_guard_bw_desc_1000.png
https://people.torproject.org/~asn/guards2/perf_cdf_guard_bw_consensus_1000....

=== Network performance implications of bumping up the guard bandwidth threshold ===

Now that we analyzed the performance difference for individual
clients, let's see what will happen to the total bandwidth of the Tor
network if we bump up the guard bandwidth threshold. This might help
us understand how much we will overload the Tor network with this
change.

Here is a graph that shows the fraction of the total guard bandwidth
we discard when we impose various bandwidth thresholds [3]:
https://people.torproject.org/~asn/guards2/perf_bw_fraction.png {1}

The graph above is not very meaningful on its own, but it combos well
with the following metrics graph:
https://metrics.torproject.org/network.html#bandwidth-flags {2}
(see yellow and orange lines)
...
From {2}, we see that the Tor network has 6000MiB/s advertised guard
bandwidth (orange line), but supposedly is only using the 3500MiB/s
(yellow line). This means, that supposedly we are only using 3/5ths of
our guard capacity: we have 2500MiB/s spare.
Looking back at {1}, we see that if we increase the guard bandwidth
threshold to 2MB/s we will discard 1/10th of our total guard
bandwidth. This is not a terrible problem if we have 2/5ths of spare
guard capacity...

.oO(this sounds too good to be true, doesn't it?)

=== Security implications of bumping up the guard bandwidth threshold
=== 

Unfortunately, we can't just simply go about and discard most of our
guard nodes. Discarding nodes has definite implications to the
anonymity of the Tor network. Let's try to understand them.

Here is a graph that shows the number of guard nodes and how that
changes over different bandwidth thresholds:
https://people.torproject.org/~asn/guards2/diversity_guards_n.png

For example, we see that increasing the bandwidth threshold to 2MB/s
will cut our guard nodes to half: from 2000 to 1000. This is not
really good. Even a smaller threshold of 1MB/s will cut them down to
1400 or so.

But before we pull a Filliol, let's try to understand how much
discarding 1000 guard nodes influences the diversity of our guard
selection. Here is a graph that shows what's the probability of
picking any of the guard nodes we discarded for different bandwidth
thresholds:
https://people.torproject.org/~asn/guards2/diversity_discarded_prob.png

So for example, we see that those 1000 nodes that we discarded in the
2MB/s case, only had 0.07 probability of being selected. That's around
1/15 chance of picking one of those 1000 guard nodes, so even though
there were many of them they were not providing much diversity to the
guard selection process. Of course, there are many possible attacks
and threat models involving guards, so this analysis might be valid to
some and irrelevant to others.

The fact that those guards had only 1/15 chance of being selected also
gives us hope that we will not overload the network by discarding
them, since only a "small" portion of clients were choosing them
anyway. These clients will now spread to the rest of the other 1000
nodes which are much better at handling them (famous last words).

=== Fingerprinting implications of switching to 1 guard === 

See https://bugs.torproject.org/10969 for the background of this.

Here is a graph with the expected number of clients for the biggest
and smallest guard over different bandwidth thresholds:
https://people.torproject.org/~asn/guards2/fingerprinting_expected_clients.p...
The graph considers 500k clients choosing guards simultaneously.

Switching to 1 guard will make guard set fingerprinting harder if you
are a lucky client that picked a popular guard, since now you are
blending in with thousands of other clients who are using that guard.

If you were unlucky to chose a small guard, your anonymity set is
still shit. For example, without considering bandwidth cutoffs, the
smallest guard has an expected number of clients less than 1, which
means that it will uniquely represent you. Even with a bandwidth
cutoff of 2MB/s, the expected number of clients is 10 which is not
much better. Heck, even with a cutoff of 9MB/s, there will only be 100
clients in average for the smallest guard; that's a pretty small
number if we consider Tor clients all over the globe.

=== Conclusions ===

It seems that the performance implications of switching to 1 guard are
not terrible. The performance of some clients will indeed get worse,
but we might be able to help that by increasing the bandwidth
threshold for being a Guard node.

A guard bandwidth threshold of 2MB/s (or 1MB/s if that sounds too
crazy) seems like it would considerably improve client performance
without screwing terribly with the security or the total performance
of the network.

The fingerprinting problem will be improved in some cases, but still
remains unsolved for many of the users (TODO: calculate the
percentage). A proper solution might involve guard node buckets as
explained in :
https://trac.torproject.org/projects/tor/ticket/9273#comment:4

Also, through the analysis it seems that people who pick slow guards
are unlucky (even though they will share those guards with less
people). Should we do anything about people who are going to choose
new guards till they hit the good ones? Or torrc lines on the Internet
that statically pick the best guard nodes?

=== Closing notes and disclaimers ===

I would say that our analysis has shown that switching to one guard is
probably viable but we should be aware of the drawbacks and be
prepared for possible surprises.

Furthermore, I would like to disclose that one month ago I didn't even
know how guard node selection happens and now I'm partly responsible
for choosing whether we switch to one guard node or not. Also, even
though this project is a serious research project, I felt that I had
to rush it and do it in 3 weeks. This was not ideal, because I don't
feel I understand all the variables in the equation. So please read
the whole document and make sure that I have not fucked up majorly. I
would like to avoid being the man who destroyed the Tor network ;)

Also, it's my first time producing graphs with Python, so I wouldn't
be surprised if there are errors. Hopefully most of the graphs that I
produced seem to agree with the graphs that Nick Hopper or Tariq have
produced, which gives me some slight confidence.

The code I used can be found in https://gitorious.org/guards/guards [4]
You can find all the graphs here:
https://people.torproject.org/~asn/guards2/

Don't worry be happy.

[0]: Important note: even though I calculate the plotted bandwidth
     using descriptor bandwidth values, I still calculate the guard
     probabilities using the consensus bandwidth values. This seemed
     to me to be the correct way; if it's not I can easily change it.

     Also see
     https://people.torproject.org/~asn/guards2/perf_cdf_guard_bw_consensus.png
     for the same graph but using the bandwidth values from the
     consensus (measured by the bandwidth authorities) everywhere.

[1]: This graph is taking the pretty bold assumption that "higher
     guard bandwidth' == "better client performance" which is probably
     not entirely true because of the bandwidth-based load balancing
     during path selection. However, we need an assumption to work
     with and this one might not be too bad.

     It also takes the assumption, that the mean of the bandwidth of
     three guards represents the actual performance of a client, which
     is not entirely true. A correct solution in this case should take
     the circuit-build-times (CBT) logic of tor into account.

[2]: Because of technical difficulties I could not put everything in
     one graph! Graphs are hard!

[3]: Nick Hopper made a similar graph earlier in this thread:
     https://www-users.cs.umn.edu/~hopper/guards/guard_thresholds_bandwidth.png

[4]: It's rushed research quality code, which means that I'm probably
     the only person who can use it atm. If you feel experimental, you
     can try generating some graphs, for example:
     $ python guard_probs.py consensus descriptors

[tor-dev] Implications of switching to a single guard node: some conclusions

George Kadianakis