[tor-dev] Guard nodes and network down events

Fri Aug 15 20:29:40 UTC 2014

On Wed, Aug 13, 2014 at 03:47:43PM +0300, George Kadianakis wrote:
> Hello friends :)
> 
> This is a post to discuss how Tor should treat its entry guards when
> its network goes down. This is part of ticket #12595 [0] which aims to
> design better interfaces and data structures for entry guards.
> 

<hilight>
> This thread investigates what should happen when the network goes down
> and Tor's connection to a guard fails. How should Tor recognize that
> and connect back to that guard when the network goes up again?
>
<snip />
> 
> The fundamental issue here is that Tor does not have a primitive that
> detects whether the network is up or down, since any such primitive
> stands out to a network attacker [3]. This means that when Tor fails
> to connect to an entry guard, Tor can never be sure whether the guard
> was actually down or whether the network is down, and that complicates
> its algorithm significantly.
> 
</hilight>

These are both important problems, but I think we'll get ourselves into
trouble if we try to solve them both in the same discussion, one color
per bikeshed is enough :)

Both David and Tom have good ideas, but I think this will take the
discussion offtopic from "how do we choose which guard we should use
when we realize we have a connection to the internet". I think further
discussion should go into a "How should Tor detect that it has an
internet connection?" thread.

> In #12595, I laid down a few different ways that Tor could improve its
> current "network down" entry guard algorithm [4]. After thinking some
> more, I've been leaning towards algorithm (a), which is:
> 
>   Everytime we manage to connect to a guard, if it's not the top
>   guard in our list, mark all previous guards as retriable and try
>   again from the top.
> 
> For me, this seems like the most complete way of ensuring that the
> guards in the top of the list are always going to get a fair hearing
> even if the network was down when they were first probed.
> 
> However, that algorithm is not without its problems. Here are some
> notes:
> 
> - This algorithm suffers from an infinite loop.
> 
>   The naive version suffers from an obvious infinite loop if the first
>   guards in our list are actually down. To protect against that, we
>   will probably need to rewrite the algorithm to:
> 
>     Everytime we manage to connect to a guard, if it's not the top
>     guard in our list, mark all previous guards as retriable and try
>     again from the top: this time pick whichever guard we can connect
>     to even if it's not the top one.
> 
> - This algorithm is not actually robust.
> 
>   If we wanted to design a truly robust algorithm, it would have to be
>   robust even against adversarial "network down" events. That is, the
>   algorithm would need to work well even if you imagine the network as
>   an "on"/"off" switch that the adversary can toggle at will.
> 
>   Here is how that algorithm can fail against such an adversary:
> 
>   1) Tor starts up. Adversary switches network off.
>   2) Tor starts going through guards and fails to connect to them.
>   3) Adversary switches network on.
>   4) Tor detects "network up" event, marks all guards as up and goes from the top.
>   5) Adversary switches network off.
>   6) Tor starts going through guards and fails to connect to them.
>   7) Adversary switches network on, and Tor establishes a circuit to
>      the guard that it was currently walking over.
> 
>   If the adversary is a LAN adversary, she learns the order of the
>   guards in Tor's list, so she can basically choose whichever guard
>   she wants.
> 
>   FWIW, whether such an adversary is realistic is up for debate.
> 
> - This algorithm is not very elegant.
> 
> - This algorithm might make guard fingerprinting worse.
> 
>   Imagine that the first 2 guards in your list are actually
>   down. Everytime Tor detects a "network up" event, it will attempt to
>   connect to those 2 dead guards before connecting to the third guard
>   which is up.

This is true, but in this case Tor will only try connecting to the
first two guards in this case for a few hours, until they're dropped
from the consensus. But, in terms of fingerprinting, "hours" is too
long, which I think is part of your point.

Can we do something simple like:
   1) Tor starts up. Adversary switches network off.
   2) Tor starts going through guards and fails to connect to them.
   3) Adversary switches network on.
   4) Tor detects "network up" event, marks all guards as up and goes from the top.
   5) Adversary switches network off.
   6) Tor starts going through guards and fails to connect to them.
   7) Adversary switches network on, and Tor establishes a circuit to
      the guard that it was currently walking over.
   8) Tor walks the list of guards from the top, again.
   9) Tor establishes connection with first available Guard
     9a) If guard in 9) is different and earlier in the list than guard
         in 7), drop 7) and use 9).
     9b) or, if guard in 9) is the same or later in the list than 7),
         then use 7)

This has some disadvantages, but at least we will have a baseline
(using guard in (7) when we rewalk the list) and a partial guarantee that
if we can't connect to the guard then it is probably down and not that
the network is faulty.