Brainstorming about Tor, Germany, and data retention
arma at mit.edu
Wed Oct 8 13:12:01 UTC 2008
We need to build a plan for Tor, data retention, and Germany. The data
retention law in Germany comes into effect Jan 1 2009. Coincidentally,
this is just after the 25C3 congress, and many people there will be
asking me about our plan. I hope to do a talk there on Tor's plan wrt
German data retention. Maybe we should also consider a panel with an
actual informed German person, like Frank or Julius or Andreas?
Now, part of the challenge here is that there's so much misinformation
(and lack of information) floating around. Some questions we need
1) Will ISPs be required to log connection timing information of their
users? What exactly will the information be -- destination IP address,
port, timestamp of beginning of connection, timestamp of end? Or more
than that? Or less?
2) Are there ISPs that plan to not log? If so, how many?
3) Do we expect the authorities will only use the logs in a pinpoint
way, or will they also trawl? That is, will they go to an ISP and say
"tell me what this user did during this five minute period", or can they
ask every ISP at once "tell me all your info about every connection to
amazon.com on Saturday"? As a special case of this, is asking for info
from 500 Tor relays more like a pinpoint request or more like trawling?
4) Is it just each ISP that directly interacts with customers that has
to log, or is it all the ISPs, all the way up the chain? That is, are
there tier-1 ISPs that will end up with massive logs, and will there be
a lot of points you can go if you want to see what a user has been up to?
5) Will Tor relays be required by law to log "stuff" too? If so, is it
the same stuff as in question #1?
6) Are there Tor relays that plan to not log? E.g. CCC or Foebud or GPF?
Is fighting a law by not following it even a normal way of fighting a
law in Germany? :)
First, let's recall that Tor is most vulnerable at the endpoints: the
entry point and the exit point. If an attacker gets data about the middle
of the circuit, but doesn't get both the entry point and the exit point,
then they don't really win much.
There are two categories of users we can consider: people inside
Germany and people outside Germany. And as Karsten keeps reminding me,
the third category is relay operators in Germany, who really want some
The first defense that comes to mind is to never set the Guard flag on
German Tor relays. That means that people outside Germany will never
reveal their location to a German Tor relay, since their guards will
always be outside. Folks inside Germany will still have a problem, since
*their* ISPs will probably still log connection data.
Note that German users contacting German websites are always going to
have a problem; we can't do anything about it if the ISP of the user
and the ISP of the website both log, and later compare notes.
That said, though, if the logs they keep aren't very precise, then
comparing them won't actually be very useful. In particular, if they just
have TLS connection start and stop timestamps from the entry node side,
then we could adapt Tor so it starts the TLS connection well before
it needs it (this is already done to a large extent, since we build
circuits preemptively), and so it stops the TLS connection well after
it is done with it (we are mostly set up for this already too, since we
delay a while before closing unused connections -- we just need to make
that a variable delay rather than a constant delay).
Of course, if every ISP were to keep logs of the timing of every _packet_
on every connection, we'd be back in bad shape. But I think such a burden
is impractical (to say nothing of illegal, even in the current Germany).
Another catch here is the lesson we learned from the students at U
Colorado with their PETS paper: even if you only logged one side of a
conversation, it's still really risky to have that log, because you have
no idea who *else* happened to log somewhere else on the Tor network
at the same time. Both parties could be thinking to themselves "hey,
I've only got half the conversation, this can't hurt anybody", but if
they both publish then suddenly users are linked.
If we conclude then that logging even one side of a conversation is
bad (because you never know who else might have data that matches up),
then we should be really uncomfortable with exit relays in Germany too.
After all, their ISPs will have half the conversation logged already. And
while there's no trivial way to turn a log from an exit relay into
knowing where the clients are, it's still one of the steps down that path.
Worse, logs at the exit relay side won't be padded by the above "start
the TLS connection early and end it late" strategies, since they'll be
seeing the bare exit connections.
If we truly believed that the databases these ISPs build will be kept
secure against all attackers, and we truly believed that the databases
would never be used for trawling (see question #3 at the top), then it
might not be so bad. But that's a lot to ask.
On the other hand, it would be a real shame to withhold both the Guard
flag and the Exit flag from all Tor relays in Germany. There really isn't
much left, especially if we plan to experiment with 2-hop paths one day.
Now, what about the relay operators, and *their* duty to log? First,
note that from the perspective of an exit relay, the game is already up:
if the ISP logs connections, it has pretty much everything useful the
exit relay could be logging anyway.
(Exception: that isn't the case if the circuit consists entirely of
German Tor relays. No matter what strategy we conclude here, it seems
clear that we should disallow circuits like that.)
I don't mean to say "therefore it is fine for Tor exit relays to log"
-- I think it is a dark day when anonymity infrastructure operators
start tracking their users -- but we should recognize that the damage is
already done here by the ISP, regardless of what the exit relay chooses.
On the other hand, if we do the "make TLS start and stop times less
dependent on stream connection times" trick, then what the entry guard
knows and what the entry guard's ISP knows *are* in fact different. This
argues for never letting German Tor relays be entry guards, so we don't
put them in this position.
Speaking of which, there's another lesson we can learn from the distant
past. Once upon a time, in my first congress talk about Tor back in
21c3, the Wikipedia people stood up and asked how they were supposed to
deal with anonymous users. My answer at the time was basically "there
are effectively anonymous users on the Internet already, sorry, you'll
just have to deal." Their eventual answer was to build a big list of
anything ever associated with Tor, and block edits from it. If we had
worked with them from the start, we could have saved a lot of grief by
giving them precise lists of current exit IP addresses, etc. The lesson
here is that we need a better answer for both German Tor relay operators
and for German law enforcement than "sorry, you'll just have to deal",
since otherwise they *will* come up with answers that we don't like.
That said, my first reaction is still "Tor relays must not log, even in
Germany. If you're planning to log, please shut down your relay instead."
Is there some approach we can take that doesn't result in 1/3 of the Tor
network disappearing in January?
For example, if Tor users always avoid German Tor relays for circuit
positions where they know more than their ISP knows (i.e. entry guard),
can German relay operators then argue that they don't know any more than
their ISP does, so if you want logs just go hassle the ISP? If we explain
the design clearly enough, that puts operators in a better position than
"sorry, sir, I *could* log that for you, but I have chosen not to."
Ok, so let's break this down into cases.
1) User outside of Germany, entry outside of Germany, exit and destination
outside of Germany. User is in pretty good shape, other than relatively
minor "partitioning" attacks coming from people with access to German
logs being able to rule out a fraction of the Tor network.
2) User outside of Germany, entry outside Germany, exit or destination
inside Germany. User is also in pretty good shape, in that just by
having exit or destination logs, you don't know where the user could be.
We're vulnerable to somebody who happened to collect logs of entry
traffic, and have to hope they never combine them with the German logs,
but that's a plausible thing to hope.
3) User inside Germany, entry outside Germany, exit and destination
outside Germany. Country-wide logs could enumerate German Tor users
(if they don't use bridges), but there wouldn't be anything to line up
to on the exit side.
4) User inside Germany, entry outside Germany, exit or destination
inside Germany. If our TLS start/stop trick is good enough, and there
are many Tor users in Germany, and the ISP logs can't be fetched in a
trawling manner ("show me all German users who were connected to the
Tor network during this time period"), then it's also not easy to line
up user to exit.
As an aside, it seems that German Tor users may benefit from using
(non-German) bridge relays as their first hop, as it complicates the
"all German users who were connected to the Tor network" step.
Also, because we can make the "take away the Guard flag" change at the
directory authorities, clients get the new protection without needing
to upgrade. This will make it easier to argue that all Tor users are
choosing paths in the new way.
One thing I missed in the analysis is Internet connections that traverse
Germany, for example the connection from an Austrian Tor user to a Danish
Tor entry guard. I don't know how common these paths are, and I don't know
whether such connections are proposed to be logged under the proposed law.
I have also made generous assumptions on the part of the law enforcement,
as to how rational they will be in evaluating the law. I know we can be
paranoid and assume the worst, but in that case we might as well excise
all German Tor relays from the network and be done with it.
What else did I miss?
More information about the tor-dev