Re: [tor-dev] [RFC] On new guard algorithms and data structures

9 Sep 2015

      Mike Perry <mikeperry@torproject.org> writes:
...
George Kadianakis:
...
Hello there,
recently we've been busy specifying various important improvements to entry
guard security. For instance see proposals 250, 241 and ticket #16861.
Unfortunately, the current guard codebase is dusty and full of problems (see
#12466, #12450). We believe that refactoring and cleaning up the entry guard
code is essential before we proceed to more advanced security improvements.
We've been working on new algorithms and data structures for guard nodes as part
of ticket #12595.
In this mail I include some pseudocode for this new algorithm with the hope that
it will act as a draft for implementing these changes. You can find the pseucode
here:
https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug...
Have you considered how these data structures might change for Prop 247?
I like Prop 247, and think it is a significant step in the right
direction for the HS-specific problem (though I'd like to think a bit
more about how it would compare to a more general virtual circuit
mechanism that could also protect clients even for instances of
application-specific Guard discovery).
More generally, I'm worried in that this is yet another case where
upstream deliverables that were written over a year ago are going to
cause us to arrive at a sub-optimal solution that we're forced to deploy
only to rip out later. Hurray deliverables!
In terms of specific suggestions: I find that code conceptually makes
more sense when it at least pretends to be object oriented. You might
want to consider a single guardset object that holds all of the Guard
configuration state, including the main guard list, vanguards, path bias
info, etc. This object would then be passed in as the first argument to
all guard-related functions, especially if we're talking about
overhauling it anyway.
I agree about looking at this from an OOP prespective.

I think the main class that is introduced here is the guardset (or guardlist).
The second class is the guard itself, which is basically a node_t in the code.

A guardlist contains many guards, and knows how to to manipulate them correctly. 

The way I was thinking that this would work along with prop247, is that each
layer of guards will be a separate guardset. So when you pick your first-layer
guard, you pick it from the first-layer guardset and when you pick your
second-layer guards, you pick them from the second-layer guardset.

Now, if we want to do guard buckets for the third layer in a way that each
second-layer guard has a dedicated guardset of third-layer guards, we will need
to split the third-layer guards to N guardsets, and assign each of these
guardset to one of the N second-layer guards.

This way, when you make a circuit through second-layer guard k, you will only
use the third-layer guardset that corresponds to k.

There are probably more subtetlies to this design, but this is how I imagine
these two proposals to interface in basic terms.
...
All guard-related functions would then be prefixed with guardset_ as
well, to make it clear that you could find them in guardset.c. (Also
note I deliberately used guardset_ instead of guardlist_ due to Prop
247).
...
A short description of the algorithm is included on top, and then various
methods and functions are prototyped underneath to make the logic more concrete.
Apart from the comments and XXXs on the code, here are some more thoughts on
this work:
- This new design focuses on protecting against path bias attacks, by slightly
  damaging our reachability.
Specifically, the old design is better at recovering in filtered networks,
  because it will keep on adding new nodes till one succeeds. In this new
  design, we will not try more than 80 relays per time. So if none of them
  passes the filtered network, bad luck no Tor.
While this failure mode should not happen much, it's bad news for users behind
  FascistFirewalls which are actually quite frequent. A quick fix here would be
  to always add an 80/443 guard on our list, however as it stands only 30% of
  the guards are 80/443 guards, so this has bad anonymity consequences.
I had mixed feelings about not trying to handle this better in the past,
but given that such users can use the default Tor Browser bridges (and
likely effectively have to, since bootstrapping takes tens of minutes
anyway in this case) it may not be worth micro-optimizing for?
As always it's a tradeoff between security (against path bias attacks) and
connectivity (or how quickly if ever you manage to connect to Tor).

At this point, it might make sense to default to security, but make it easy for
filtered people to get connectivity. So, like, if we fail to bootstrap for a
while, maybe it makes sense for tor-launcher to present a "connect me anyway!"
option to the user (which will use bridges or turn on FascistFirewall).
...
...
- To improve our algorithm and make it more robust we need to understand further
  what kind of path bias attacks are relevant here. The adversary here is a
  network adversary (like a gateway) that can block our connections to certain
  guards. What nasty attacks can this adversary do?
I see this attack vector as adding two capabilities:
1. The adversary can induce you to connect to only its chosen guards to
find you later on other networks through fingerprinting (as others have
mentioned).
2. The adversary gets to perform attacks that would otherwise be
impossible without the Guard node's identity key (which is required in
order to unwrap TLS). There are lots of these attacks: XOR-based tagging
of the cipherstream to force you to use specific exits, other instances
of per-circuit failure, HS circuit fingerprinting, circuit-level traffic
analysis, etc.
...
- In general, I tried to keep the number of heuristics and kludges to the
  minimum to keep the logic simple. Unfortunately, it seems that without a
  "network down" indicator (#16120) there is no way to avoid edge cases and
  false positives here.
If you can't find a more reliable OS-specific indicator, you might look
at circuit_build_times_network_is_live() and where it is called. I think
the channel code also sets indicators in similar places.