Master's Thesis: Performance-Improved Onion Routing

Sat Sep 29 16:43:02 UTC 2007

On Fri, Sep 28, 2007 at 05:56:52PM +0200, Johannes Renner wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Nick Mathewson wrote:
> >> proposed techniques is presented. Included evaluations cover the
> >> performance that is reached on average, but also the degree of
> >> anonymity that is achieved when employing different methods of
> >> path selection.
> >
> > What are you using as your anonymity measure?
> 
> [Answering here since it seems to be interesting]
> 
> To measure a "degree of anonymity", a certain method of path
> selection can provide, I do the following:
> 
> I generate a (preferably big) amount of paths, without actually
> creating any circuit. This list of paths can then be evaluated
> regarding the occurrences of specific nodes on the single
> positions of the paths. So, as a first metric I simply count the
> total number of different nodes that is chosen on the single
> positions of the paths. The higher the single numbers for
> entry/middle/exit positions, the more anonymity there is.

This seems pretty weird; it implies that an adversary can increase
people's anonymity by signing up more nodes.  (Since the more nodes on
the network, the more possible nodes will appear in entry/middle/exit
positions.)

> I know, this metric does not consider that some methods choose
> some nodes with a higher probability (means more often) than
> others. Therefore, I work towards the conditional probability
> that a specific entry node is chosen, given the IP of an exit
> node and vice-versa (according to an advice from Mike Perry).

This is a bit more reasonable, though it probably makes more sense to
look at conditional entropy of entry node given exit node and/or
target IP... or perhaps additional information gained about entry
nodes given knowledge of the exit node or destination?  (If entry
nodes are chosen dependent on the client's current location, we want
to look at additional info gained about client location too.)

 [I see below that you *do* go for entropy.  Cool.]

Also, if we're worried about the attacker being the exit, we need to
look at additional info gained given knowledge of the exit _and_
middle nodes:  if you're the exit node, you not only know where you
are, but you also know the previous node in the circuit.

In any case, that's a decent measure of the attack where the adversary
observes some nodes in the circuit and tries to infer the rest.  But
there are other attacks, and so there there are other measures that
count too!  These include:

  - Given a fraction of nodes compromised by an adversary, what is the
    probability that a given path will have first and last node
    compromised?

  - Given a fraction of nodes compromised by an adversary, and
    assuming guard nodes are in use, what is the probability that
    adversary can be first and last node for a sample of
    paths?  (This is important if what you're worried about is
    statistical sampling.)

> This means, instead of counting the occurrences of single nodes,
> I count the total number of different combinations of entry and
> exit nodes (EE-combinations). Intuitively, the more different
> EE-combinations there are in a sample of n paths, the better the
> achieved anonymity.

This doesn't take into account knowledge of the middle node that the
attacker gets by being the exit.

> Further, I calculate the entropy regarding the probabilities of
> the different EE-combinations, and from this, a bounded degree
> of anonymity 'd'. This is done using the maximum entropy that is
> calculated from a sample of size n, where no two paths have the
> same EE-combination (which means d=1, iff all of the generated
> paths show different EE-combinations). This way it is easy to
> compare different methods of path selection regarding the
> achieved anonymity by generating samples of the same size. From
> these samples, 'd' is calculated and the results can be compared.
> For my thesis I generated samples of 100.000 paths (only), so it
> will be interesting to increase n to maybe 1.000.000 and repeat
> the experiment.

 [THis looks interesting; I think I should read the relevant bits of
 the thesis for details and think about it more.]

> Note, that I only had limited time for writing my thesis, and
> the anonymity evaluation was not the most important aspect in
> it (even if it is maybe the most interesting). So, all of this
> surely offers room for improvements, please send me your ideas!

My main idea is this: too many papers decide to characterize some
measure as "anonymity" when that measure only reflects the difficulty
of one possible attack among many.  This is indeed a valuable
contribution to research, since resisting more attacks is quite
worthwhile... but it's possible to resist one attack at the expense of
becoming weaker to another.  Thus, it's important to look at many
different measures and consider the effects of design choices on all
the significant attacks that effect your system.

For all I know, you do this in your thesis; I should check it out in
more detail.

anyways, thanks again for your work!  I have the copy of your thesis
that you sent, and I look forward to reading it.

yrs,
-- 
Nick Mathewson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070929/84aa35cc/attachment.pgp>