[tor-dev] [RFC] On new guard algorithms and data structures

s7r s7r at sky-ip.org
Thu Aug 20 12:27:18 UTC 2015

Hash: SHA256


On 8/20/2015 2:28 PM, George Kadianakis wrote:
> Hello there,
> recently we've been busy specifying various important improvements 
> to entry guard security. For instance see proposals 250, 241 and 
> ticket #16861.
> Unfortunately, the current guard codebase is dusty and full of 
> problems (see #12466, #12450). We believe that refactoring and 
> cleaning up the entry guard code is essential before we proceed to 
> more advanced security improvements.
> We've been working on new algorithms and data structures for guard 
> nodes as part of ticket #12595.
> In this mail I include some pseudocode for this new algorithm with 
> the hope that it will act as a draft for implementing these 
> changes. You can find the pseucode here:
> https://gitweb.torproject.org/user/asn/tor.git/tree/src/or/guardlist.c?h=bug12595
A short description of the algorithm is included on top, and then
> various methods and functions are prototyped underneath to make
> the logic more concrete.
> Apart from the comments and XXXs on the code, here are some more 
> thoughts on this work:
> - This new design focuses on protecting against path bias attacks, 
> by slightly damaging our reachability.
> Specifically, the old design is better at recovering in filtered 
> networks, because it will keep on adding new nodes till one 
> succeeds. In this new design, we will not try more than 80 relays 
> per time. So if none of them passes the filtered network, bad luck 
> no Tor.

This number looks good to me. Could you make it dynamic, so in the
future we don't have to change this code? Being optimistic here about
Tor's scale in the future. E.g. calculate:
GUARDS_ATTEMPTED_THRESHOLD == 'total no of Guards in a consensus' * 0.05
and change update it in our 'State' every time we receive a valid new
consensus document which changes it. Should be slight updates here,
like maybe 78, maybe 82, etc. If the result of the above calculation
is not an even number, approximate with deduction (e.g. if result =
81,6, set the limit to 81).

> While this failure mode should not happen much, it's bad news for 
> users behind FascistFirewalls which are actually quite frequent. A 
> quick fix here would be to always add an 80/443 guard on our list, 
> however as it stands only 30% of the guards are 80/443 guards, so 
> this has bad anonymity consequences.

Bad idea for anonymity and also not a very good idea regarding to load
balancing (80/443 Guards might get hammered more). We do have a torrc
option for this, in case the should enable it so Tor will only look
for 80/443 Guards, or use bridges.

> - To improve our algorithm and make it more robust we need to 
> understand further what kind of path bias attacks are relevant 
> here. The adversary here is a network adversary (like a gateway) 
> that can block our connections to certain guards. What nasty 
> attacks can this adversary do?
> If we can't find bad attacks here, then maybe we should stop 
> worrying about those path bias attacks so much.
> For example a threat here with the old guard logic, is that if we 
> used this evil gateway just for 10 minutes (in an airport), the 
> adversary could launch a path bias attack and force us to connect 
> to her guard node. Then even after we left that airport, we would 
> still stick to the evil guard node which is bad.

That is why we have some primary guards which we retry for some time,
and not remove them from the list if we cannot connect to them one or
two times. Our network could be down or the Guard's network could be
down, etc.

> Also, an adversary that manages to own our guard using path bias 
> attacks, then has further possibilites for biasing the rest of the 
> circuit. What can this adversary do?

Would it make sense for Tor to change Guard if it fails more than n
circuits at a given time? If the attacker owns our guard and wants to
path bias attack the rest of the circuit, since the client is the one
who selects the path, it will cause a lot of circuit failures on
client side - we should use this as a metric to detect this
possibility and defend against it.

> - Notice that the pseudocode contains no logic about bridges. I'm 
> not sure how bridges should be handled here.

Prop#188 is very important for bridges, not sure what algorithm we
could use here, since bridges are designed to be little bit hard to
get in unlimited quantities and manually fetched and added to Tor.

> - I tried to keep the dirguard logic very simple, hoping that we 
> can eventually forget about dirguards entirely when #12538 is 
> done.

Indeed, this is not so important particularly because a DirGuard is
way less dangerous than an Entry Guard. Just select 3 main DirGuards
and add more to the list until we get a valid consensus document
(which we verify ourselves anyway). After that, retry the 3 main
DirGuards for some more and eventually replace them with the DirGuards
we were able to connect to. I suggest retrying a DirGuard 5 times,
once very 20 minutes, until we replace it from the primary DirGuard table.

We can remove this code when #12538 is done.

> The main dirguard feature is that we assume that 
> populate_live_entry_guards() and add_an_entry_guard() will return 
> dirguards when the circuit is a directory circuit.
> Maybe we should consider introducing the "primary dirguard"
> concept as well. And maybe also add some logic where Tor will move
> on to the next dirguard if it failed to receive a document from
> the current dirguard.
> - I used the ATTEMPTED_THRESHOLD concept of prop241, but did not 
> I removed NET_THRESHOLD because I increased the value of 
> ATTEMPTED_THRESHOLD to the point that it can also be used as a 
> network down indicator.


> Also, I was not sure what CONNECTED_THRESHOLD was useful for, and 
> there were certain engineering issues with it (Like, if that 
> threshold is hit, we need a logic that will *only retry the 
> successfully connected guards*, and not all guards).
> - There is no log message warning the user of path bias attacks or 
> bad network or anything.  That's because there is no way to figure 
> out what's the problem, and issuing an alarming log message here 
> would confuse and panic the user.


> If we want to inform the user anyhow, maybe if the user is 
> *actively* trying to visit a destination, and we've been cycling 
> through our guard list for ages, maybe we should then issue a log 
> message telling the user that something is wrong with the network.

If we are under an attack which tries to force us into using a certain
Guard, we need to exit after we try everything above and log a message
that there's something wrong with the network, Tor cannot establish

> - In general, I tried to keep the number of heuristics and kludges 
> to the minimum to keep the logic simple. Unfortunately, it seems 
> that without a "network down" indicator (#16120) there is no way
> to avoid edge cases and false positives here.

It's hard to tell the difference between network down (for real) and
gateway has a consensus document and drops packets sent to all (or
almost all) Guards. Nothing to do but follow up our protocol,
eliminate all the options and exit with a log message. If restarted,
start again but consider the same selected primary guards and other
state data and follow the algorithm again, maybe the network is fixed.
Exit again if not.

> We should try to fix all problems here that can occur frequently
> or have security consequences, but there will always be scenarios 
> where Tor will end up thinking there is no network while it's 
> actually on a filternet. For this reason, we should give plenty of 
> testing to this feature before we ship it to real users!

As I said above, trying to detect a network down will make it a lot
complicated for us and with little benefits since this can be
trivially gamed. We are not an operating system, we don't care if
network is down for real (for all destinations) - if network is down
for Tor (cannot establish connections to any Guard or most of Guards
[path bias attack]) for us it means network is down for Tor == network
is down for real period.

What is the difference from Tor's perspective if there is no link on
the internet interface or there is a link which only forwards packets
to xxx.xxx.xxx and yyy.yyy.yyy.yyy ?

Consensus document (and relays in the network) is public info. This is
just the limitation here, but not the end of the world.

> - Finally, all the constants & parameters in the pseudocode are 
> subject to change. I tried to motivate some of them, but others
> are just arbitrary.
> Feedback is very welcome and please let me know of any issues with 
> security or reachability that you find! Or of how the pseudocode 
> should be altered to make it more useful for implementors.
> Cheers!
Version: GnuPG v2.0.22 (MingW32)


More information about the tor-dev mailing list