[tor-dev] [RFC] On new guard algorithms and data structures

Thu Sep 3 06:22:58 UTC 2015

George Kadianakis:
> Hello there,
> 
> recently we've been busy specifying various important improvements to entry
> guard security. For instance see proposals 250, 241 and ticket #16861.
> 
> Unfortunately, the current guard codebase is dusty and full of problems (see
> #12466, #12450). We believe that refactoring and cleaning up the entry guard
> code is essential before we proceed to more advanced security improvements.
> 
> We've been working on new algorithms and data structures for guard nodes as part
> of ticket #12595.
> 
> In this mail I include some pseudocode for this new algorithm with the hope that
> it will act as a draft for implementing these changes. You can find the pseucode
> here:
> 
>    https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug12595

Have you considered how these data structures might change for Prop 247?

I like Prop 247, and think it is a significant step in the right
direction for the HS-specific problem (though I'd like to think a bit
more about how it would compare to a more general virtual circuit
mechanism that could also protect clients even for instances of
application-specific Guard discovery).

More generally, I'm worried in that this is yet another case where
upstream deliverables that were written over a year ago are going to
cause us to arrive at a sub-optimal solution that we're forced to deploy
only to rip out later. Hurray deliverables!

In terms of specific suggestions: I find that code conceptually makes
more sense when it at least pretends to be object oriented. You might
want to consider a single guardset object that holds all of the Guard
configuration state, including the main guard list, vanguards, path bias
info, etc. This object would then be passed in as the first argument to
all guard-related functions, especially if we're talking about
overhauling it anyway.

All guard-related functions would then be prefixed with guardset_ as
well, to make it clear that you could find them in guardset.c. (Also
note I deliberately used guardset_ instead of guardlist_ due to Prop
247).

> A short description of the algorithm is included on top, and then various
> methods and functions are prototyped underneath to make the logic more concrete.
> 
> Apart from the comments and XXXs on the code, here are some more thoughts on
> this work:
> 
> - This new design focuses on protecting against path bias attacks, by slightly
>   damaging our reachability.
> 
>   Specifically, the old design is better at recovering in filtered networks,
>   because it will keep on adding new nodes till one succeeds. In this new
>   design, we will not try more than 80 relays per time. So if none of them
>   passes the filtered network, bad luck no Tor.
> 
>   While this failure mode should not happen much, it's bad news for users behind
>   FascistFirewalls which are actually quite frequent. A quick fix here would be
>   to always add an 80/443 guard on our list, however as it stands only 30% of
>   the guards are 80/443 guards, so this has bad anonymity consequences.

I had mixed feelings about not trying to handle this better in the past,
but given that such users can use the default Tor Browser bridges (and
likely effectively have to, since bootstrapping takes tens of minutes
anyway in this case) it may not be worth micro-optimizing for?

> - To improve our algorithm and make it more robust we need to understand further
>   what kind of path bias attacks are relevant here. The adversary here is a
>   network adversary (like a gateway) that can block our connections to certain
>   guards. What nasty attacks can this adversary do?

I see this attack vector as adding two capabilities:

1. The adversary can induce you to connect to only its chosen guards to
find you later on other networks through fingerprinting (as others have
mentioned).

2. The adversary gets to perform attacks that would otherwise be
impossible without the Guard node's identity key (which is required in
order to unwrap TLS). There are lots of these attacks: XOR-based tagging
of the cipherstream to force you to use specific exits, other instances
of per-circuit failure, HS circuit fingerprinting, circuit-level traffic
analysis, etc.

> - In general, I tried to keep the number of heuristics and kludges to the
>   minimum to keep the logic simple. Unfortunately, it seems that without a
>   "network down" indicator (#16120) there is no way to avoid edge cases and
>   false positives here.

If you can't find a more reliable OS-specific indicator, you might look
at circuit_build_times_network_is_live() and where it is called. I think
the channel code also sets indicators in similar places.

-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20150902/64d527d5/attachment.sig>