[tor-dev] [RFC] On new guard algorithms and data structures

Thu Aug 20 11:28:26 UTC 2015

Hello there,

recently we've been busy specifying various important improvements to entry
guard security. For instance see proposals 250, 241 and ticket #16861.

Unfortunately, the current guard codebase is dusty and full of problems (see
#12466, #12450). We believe that refactoring and cleaning up the entry guard
code is essential before we proceed to more advanced security improvements.

We've been working on new algorithms and data structures for guard nodes as part
of ticket #12595.

In this mail I include some pseudocode for this new algorithm with the hope that
it will act as a draft for implementing these changes. You can find the pseucode
here:

   https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug12595

A short description of the algorithm is included on top, and then various
methods and functions are prototyped underneath to make the logic more concrete.

Apart from the comments and XXXs on the code, here are some more thoughts on
this work:

- This new design focuses on protecting against path bias attacks, by slightly
  damaging our reachability.

  Specifically, the old design is better at recovering in filtered networks,
  because it will keep on adding new nodes till one succeeds. In this new
  design, we will not try more than 80 relays per time. So if none of them
  passes the filtered network, bad luck no Tor.

  While this failure mode should not happen much, it's bad news for users behind
  FascistFirewalls which are actually quite frequent. A quick fix here would be
  to always add an 80/443 guard on our list, however as it stands only 30% of
  the guards are 80/443 guards, so this has bad anonymity consequences.

- To improve our algorithm and make it more robust we need to understand further
  what kind of path bias attacks are relevant here. The adversary here is a
  network adversary (like a gateway) that can block our connections to certain
  guards. What nasty attacks can this adversary do?

  If we can't find bad attacks here, then maybe we should stop worrying about
  those path bias attacks so much.

  For example a threat here with the old guard logic, is that if we used this
  evil gateway just for 10 minutes (in an airport), the adversary could launch a
  path bias attack and force us to connect to her guard node. Then even after we
  left that airport, we would still stick to the evil guard node which is bad.

  Also, an adversary that manages to own our guard using path bias attacks, then
  has further possibilites for biasing the rest of the circuit. What can this
  adversary do?

- Notice that the pseudocode contains no logic about bridges. I'm not sure how
  bridges should be handled here.

- I tried to keep the dirguard logic very simple, hoping that we can eventually
  forget about dirguards entirely when #12538 is done.

  The main dirguard feature is that we assume that populate_live_entry_guards()
  and add_an_entry_guard() will return dirguards when the circuit is a directory
  circuit.

  Maybe we should consider introducing the "primary dirguard" concept as well.
  And maybe also add some logic where Tor will move on to the next dirguard if
  it failed to receive a document from the current dirguard.

- I used the ATTEMPTED_THRESHOLD concept of prop241, but did not use the
  NET_THRESHOLD and CONNECTED_THRESHOLD ideas.

  I removed NET_THRESHOLD because I increased the value of ATTEMPTED_THRESHOLD
  to the point that it can also be used as a network down indicator.

  Also, I was not sure what CONNECTED_THRESHOLD was useful for, and there were
  certain engineering issues with it (Like, if that threshold is hit, we need a
  logic that will *only retry the successfully connected guards*, and not all
  guards).

- There is no log message warning the user of path bias attacks or bad network
  or anything.  That's because there is no way to figure out what's the problem,
  and issuing an alarming log message here would confuse and panic the user.

  If we want to inform the user anyhow, maybe if the user is *actively* trying
  to visit a destination, and we've been cycling through our guard list for
  ages, maybe we should then issue a log message telling the user that something
  is wrong with the network.

- In general, I tried to keep the number of heuristics and kludges to the
  minimum to keep the logic simple. Unfortunately, it seems that without a
  "network down" indicator (#16120) there is no way to avoid edge cases and
  false positives here.

  We should try to fix all problems here that can occur frequently or have
  security consequences, but there will always be scenarios where Tor will end
  up thinking there is no network while it's actually on a filternet. For this
  reason, we should give plenty of testing to this feature before we ship it to
  real users!

- Finally, all the constants & parameters in the pseudocode are subject to
  change. I tried to motivate some of them, but others are just arbitrary.

Feedback is very welcome and please let me know of any issues with security or
reachability that you find! Or of how the pseudocode should be altered to make
it more useful for implementors.

Cheers!