Proposal 151: Path Selection Improvements

Mike Perry mikeperry at
Fri Jul 11 23:19:47 UTC 2008

Thus spake Nick Mathewson (nickm at

> On Sun, Jul 06, 2008 at 04:38:56PM -0700, Mike Perry wrote:
>  [...]
> > I've updated this proposal in svn to include plans for migrating away
> > from guards with high failure rates that Fallon and I discussed last
> > night.
> > 
> > For convenience:
> >
> > 
> If we're taking a dynamic approach... who is to do the measurements?
> Clients, or authorities?  Having authorities do measurements seems
> more efficient, and more resiliant to attack where the adversary acts
> differently towards different clients in order to change their view of
> the network parameters, or make them use different guards, or such.

The variables in the proposal refer to client parameters and client
storage.  We chose dynamic approach by the clients, because the
timeout will vary depending upon the types connections of the clients.
If we set the timeout high enough for really fast authorities (for
example, my node has ping times under 10ms to most places in the US),
it may get set too low for modem users in iran, and they will churn
creating tons more circuits than they should. If we have the clients
set it, we can be sure that we are excluding exactly n% of the paths
of the network for that particular client.

> On CircuitBuildTimeout, a question: for general-purpose circuits,
> perhaps the behavior of CircuitBuildTimeout is just plain wrong.
> Perhaps the sensible behavior is, instead of discarding the circuit,
> launching a new circuit.  This way, if the circuit is just being slow,
> it can still get used instead of discarded.

Get used when though? Once it is built, it will still likely be
unbearably slow for normal client traffic.

> WRT this paragraph:
> > In addition, we have noticed that some entry guards are much more
> > failure prone than others. In particular, the circuit failure rates for
> > the fastest entry guards was approximately 20-25%, where as slower
> > guards exhibit failure rates as high as 45-50%.
> I'm very curious about the causes and symptoms of these failures.
> What makes them happen?  How do they look to the client?  This seems
> excessively high, and in addition to responding to these rates (as 151
> proposes) we might also do well to look into reducing them.

By far the most common failure is unspecified timeout, especially for
the slower guards. I suspect this is due to stream balancing issues
with people abusing the Tor network for connection heavy protocols
such as bittorrent, which saturate their guards when they happen to
pick a slow one. The next common failure is DESTROYED with reason
OR_CONN_CLOSED. This also seems correlated to load.

We can produce more detailed results as soon as we work out a couple
bugs with the graphing script.

> > In [1], it was
> > demonstrated that failing guard nodes can deliberately bias path
> > selection to improve their success at capturing traffic. For both these
> > reasons, failing guards should be avoided. 
> >
> > We propose increasing the number of entry guards to five, and gathering
> > circuit failure statistics on each entry guard. Any guards that exceed
> > the average failure rate of all guards by 10% after we have
> > gathered ncircuits_to_observe circuits will be replaced.
> If clients make their own measurements on this, there's actually a
> neat class of attack that we'd be enabling.  Instead of a malicious
> guard failing in response to the client building a circuit through a
> non-compromised path, malicious second hops can fail in response to
> circuits built through targetted non-compromised guards.  If they
> manage to raise failure rates for non-malicious guards high enough,
> those guards will stop getting used.

Hrmm. That's true. As soon as the adversary obtains 10% +
.10*natural_failure_rate% of the middle-node network bandwidth, they
would be able to do this. 

The problem is that malicious guard nodes can provide good service to
centralized scanners, so as far as trying to prevent any sort of
failure-based circuit biasing, centralized scanning won't work either.
With a side channel to detect a colluding adversary at the exit node,
failure attacks can get quite damaging, esp in combination with lying
about bandwidth in combination with failing circuits for which they
don't detect their peer on the other end.

Centralized measurement of failures would detect the overloaded
condition though, but it would also not give us the finer-grained
balancing control that would still allow us to use those guards as

Do you think there might be any way to salvage this part of the
proposal, or should it just be cut out?

I do think in general we should increase the number of guards, so that
the user has a better chance of selecting some non-overloaded guard
nodes as their guards...

> Here's a more interesting attack: Suppose that we have a couple of bad
> guards and they're targetting us in particular.  Let's say that they
> have the same failure rate as average (based on CPU limits or
> connectivity or limited bandwidth or whatever), but that they can
> divert their resources towards particular circuits.  These malicious
> guards should devote extra resources towards some users, and fail all
> the time for the users they aren't targetting.  If they do this, they
> may be able to get the target users to abandon other guards in favor
> of them.

Why does this proposal enable this attack? I was under the impression
this was always possible.

> [As an aside: reputation-like systems on anonymity networks are the
> best example I know of the principle in security that when you amend a
> design, the attacker gets to make up attacks based on _your_ design,
> and isn't restricted to the attacks that worked on the last one.]
> yrs,
> -- 
> Nick

Mike Perry
Mad Computer Scientist evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <>

More information about the tor-dev mailing list