[tor-dev] Getting people started on guard selection algorithms [prop259]

Wed Feb 3 14:45:49 UTC 2016

> On Tue, Feb 02, 2016 at 02:29:22PM -0500, Ola Bini wrote:
>> Hi,
>> 
>> We have now started looking at the proposal and the existing code. Our
>> current plan is to first of all code more of the simulations referred
>> to in stuff-to-test.txt. We also noticed that the hashring
>> implementation for choosing the [DYSTOPIC/UTOPIC]_GUARDLIST hasn't
>> been implemented in the simulation code - in fact, in the code it
>> seems the [DYSTOPIC/UTOPIC]_GUARDS is used as the guardlist. We are
>> planning on implementing the two different varieties of the guardlist
>> selection algorithm as well.
>> 

Sounds good. 

Another thing to note is that during our last meeting, we decided to not have
the utopic/dystopic guardlists be disjoint (mainly for load balancing reasons):
        https://lists.torproject.org/pipermail/tor-dev/2016-January/010265.html
Check the meeting logs if you care for more info.

Unfortunately, I think prop259 has not been updated to specify this new behavior.

This might be too researchy for what you are trying to do, but finding a nice
behavior here would be very helpful.

Here are is an example idea of doing the 80/443 fascist firewall detection
heuristic without two disjoint guard pools:

   Alice initializes a single guard list with all the guard nodes. Then she
   does steps 1 to 4 from §2 of prop259. Then in step 5, if Alice has tried
   more than GUARDLIST_FAILOVER_THRESHOLD guards from her guard list, she goes
   into "dystopic firewall" mode. During this mode, Alice only picks 80/443
   nodes as guards (maybe from a separate dystopic guardlist). If those don't
   work either (she tries GUARDLIST_FAILOVER_THRESHOLD of them), then Alice
   "should make a note to herself that the network has potentially gone down"
   as suggested by step 5.b.

So in the above idea, there are two guardlists. Guardlist GUARDS has all the
guards in the network, and then there is a DYSTOPIC_GUARDS guard list (which is
a subset of GUARDS) which is only used during "dystopic firewall" mode. I think
this has better load balancing and anonymity properties. But there might be
even better behaviors. Feel free to come up with your own and test them! 

>> After that, we are going to start looking at where it fits in the main
>> Tor code base.
>> 

Sounds good. You might enjoy choose_random_entry_impl() as a starting point.

Feel free to ask us any questions you have about the Tor code base. Either here
or on IRC.

>> The only thing I'm a bit unclear about from the specification is the
>> idea of primary guards, and what the procedure is when no primary
>> guards are possible - 259.§2.3 talks about "all available and fitting
>> entry guards" - is this from the list of primary guards or the
>> guardlist?
>> 

IIUC it's from the whole guardlist, but this should only happen if the primary
guards are unreachable.

The idea with primary guards is that in an ideal world, a Tor client would
always only connect to the top guard of its guardlist. To expose itself
minimally to the network. Unfortunately, the network is fiddly so this is not
possible because the top guard will eventually go down. The concept of primary
guards tries to compensate for that, by going to extra lengths to ensure that
at least you always connect to one of your N=3 top guards in your guard
list. It does this by periodically checking the reachability of those top N=3
guards, and marking them online if they are (see step 2 of §2). So, even if you
or your guards have reachability issues and you drift on your 12th guard or
something, you will eventually come back to one of your primary guards when
they are found online again.

Also, for the above heuristic to work, nodes that are not listed in the latest
consensus should not be considered primary guards.

>> 259.§2.4 says "adds a new entry guard" - is that adding it to the list
>> of primary guards or something else?
>> 

Good question. I'm not sure what the proposal means there. Maybe isis can
clarify this further?

Here is an attempt to help. Hope I don't confuse you further.

The currently implemented Tor guard algorithm keeps a list USED_GUARDS of the
guards it has already connected to (it's also saved on disk and is the guard
list you see in your Tor state file). Everytime Tor tries a new guard node, it
adds it to USED_GUARDS. The top N guards of USED_GUARDS are the primary
guards. If you have exhausted all of the guards in USED_GUARDS and you still
can't connect, then you need to add a new node to USED_GUARDS and attempt to
connect to it. I imagine that the "list of all available and fitting entry
guards" referenced in step 3 is something like USED_GUARDS.

(It's actually not called USED_GUARDS in the code. I just named it like this
for this email.)