Roger Dingledine arma at mit.edu
Wed Aug 14 23:09:52 UTC 2002

I'm still trying to decide if you're agreeing with me or disagreeing. :)

On Mon, Aug 12, 2002 at 09:37:46AM +0100, Andrei Serjantov wrote:
> > Scenario 1 also covers "Alice has an account on slashdot.org from which
> > she writes comments", "Alice uses Freedom to check her nym's mail at
> > the popserver", and "Alice mails a bunch of reply blocks to a Mixminion
> > nymserver so it can deliver her mail".
>
> In my opinion, this is too hard. Pseudonymity requires an entirely
> different infrastructure than anonymity (end to end padding). Sorry.

Really? But this is going to be a common use of the system. Or at least,
it will be unless we explicitly tell people they won't get anonymity if
they do it. (And even then...)

> Scenario 2 -- I agree with. Here is my view on the way the attacker can
> analyse his observations:
>
> He watches Bob and computes anonymity sets every time there is a request
> coming to her. Suppose they are as follows:
> {1,2,3,4,5};{1,6,7,8,9};{1,2,6,10},{1,3,10}
>
> Now he comuptes all the possible combinations of what could have happened.
> There are many, eg:
>
> 1->Bob;
> 1->Bob;
> 1->Bob;
> 1->Bob; (call this 1^4)
>
> Or, 1^2;2;3, or 1^2;10^2. Etc.
>
> Now we need to build a probability distribution over these traces.  If you
> assume that P(X^2) = P(X;Y) where X \= Y, then basically all the possible
> traces will have identical probabilities. Now you need to convince me that
> P(X^2) = k*P(X;Y) (X \= Y \and k = *large* constant).

Translating this back into English, you want me to convince you that the
the chance of Alice going to Bob twice is much higher than the chance
of Alice going to Bob and Arnold going to Bob. (Is that accurate?)

Is this the right way to look at it? I've already argued above that in
some scenarios, Alice *will* make multiple connections to Bob.

Whether there are a bunch of other people talking to Bob, and thus perhaps
it's much more common to find two unrelated connections to Bob than two
connections from the same Alice ... I'm not sure that matters. I guess my
point is that we can look at the chances that Arnold (an ordinary guy) is
in the set of suspects, based on some assumed distribution of recipients
(I think randomly choosing recipients is a fine first model), and if we
find an Alice that's in the set more often than she should be, she is more
suspicious. The more often she's in the set, the more suspicious she is.

Does that make sense? (How) does that tie in with what you were saying?

> Note that Scenario 1 is basically knowing that the trace we are looking
> for is X^4 and deducing that 1^4 is the only possible trace.

It's a bit more complex than that -- we may not see all of the connections
from Alice, so a 1^3 may be her if there's no 1^4.

> The magnitude of k (or actually the k_i where
>
> P(X^i) = k_i*P(X_0...X_i-1) where X_i's are distinct),
>
> and the size of the anonymity set determine the outcoming probability
> distribution and therefore the anonymity.
>
> We have also assumed that P(X) = P(Y) i.e. at the every request everyone
> in the anonymity set is equally likely to communicate to Bob. This can, of
> course, be easily generalised.
>
> So, tell me what k_i's are and I will tell you how big the anonymity set
> needs to be to reduce anonymity down to 1/2 in r rounds!

Does this mean you agree that this is a strong attack? :) Can you tie
the k_i values into something real so we can have a better intuition