[tor-relays] could Tor devs provide an update on DOS attacks?

Tue Jan 16 08:27:21 UTC 2018

Hi everybody,

Thanks for your patience. Here is quick update -- hopefully we'll have
another update in the upcoming days too.

On Sat, Dec 30, 2017 at 06:25:28PM -0500, Roger Dingledine wrote:
> (0) Thanks everybody for your work keeping the network going in the
> meantime! I see that the total number of relays has dropped off a
> tiny bit:
> https://metrics.torproject.org/networksize.html
> but the overall capacity and load on the network has stayed about
> the same:
> https://metrics.torproject.org/bandwidth.html
> So I wouldn't say the sky is falling at this point.

This part is still true! :)

> (1) I don't currently have any reason to think this is an intentional
> denial-of-service attack.

Actually, I now think there is an intentional component to it. But
it's not as straightforward as we might have thought.

I think the pain started because somebody is trying to overload a set
of onion services with rendezvous requests. But the real pain for the
network as a whole comes when those onion services try to keep up with
responding to the rendezvous requests.

Counterintuitively, by generating so many response circuits on the
network, they're actually loading down the network enough that many of
their response attempts will fail.

For one concrete example, when a v2 (that is, non-nextgen) onion service
is building its response rendezvous circuit, the last hop in that circuit
(the one to the rendezvous point) uses the old "TAP" circuit handshake,
which takes a lot more cpu and is given much lower priority by that
relay. So if people are flooding the relay with a bunch of circuit create
requests, it will take an extra long time to get around to processing the
TAP cell, which is part of why their rendezvous circuits are failing. That
explanation also matches how people here observed a spike in TAP cells
on their relays.

> (2b) If anybody has great contacts at Hetzner or OVH and can help us get
> a message to whoever is running these clients, that would be grand. ("Hi,
> did you know that you're hurting the Tor network? The Tor people would
> love to talk to you to help you do whatever it is you're trying to do,
> in a less harmful way.")

We talked to some OVH abuse people who are Tor fans, who requested that we
file a formal abuse ticket asking for contact. I did, and they passed it
on to "the customer", but then the OVH Tor clients mysteriously vanished
a few days later, with as far as I can tell no attempts at contact.
https://metrics.torproject.org/userstats-relay-country.html?start=2017-10-15&end=2018-01-13&country=fr

The Hetzner clients still remain so far:
https://metrics.torproject.org/userstats-relay-country.html?start=2017-10-15&end=2018-01-13&country=de
and we've actually heard from some of them, who are onion service
operators trying to keep up with the load.

But the number of people we have heard from only explains a tiny fraction
of the "million plus" new users in Germany, so there are still some good
mysteries left.

But again, it seems that (some of) these connections from OVH and Hetzner
aren't really the origin of the problem. So defenses that focus only on
stopping these "attacks" are leaving out a big piece of the puzzle.

> (3) I took some steps on Dec 22 to reduce the load that these clients
> (well, all clients) are putting on the network in terms of circuit
> creates. It seems like maybe it helped a bit, or maybe it didn't, but
> I'm the only one who has posted any stats for comparison. You can read
> more here:
> https://trac.torproject.org/24716

Alas, I think these consensus param changes didn't make a huge
difference. We still have the main change in place, but I plan to try
backing it out sometime soon, to see if we see any difference.

The other directions we're working on fall into four categories:

A) Bugfixes and design changes to help onion services not overload the
network when they're trying to respond to so many requests. That is,
ways to make them more efficient at responding to the most actual users
with the fewest wasted circuits.

B) Ways to block or throttle jerks who are trying to overload the onion
services. I actually think I have good way to do it for this particular
attack, but I'd like to work harder to be a few steps ahead in the arms
race first -- that is, move from "bump out these jerks" to "make it
harder to use the Tor internal protocols for amplification attacks".

C) Mitigations that relays can use to be more fair with their available
resources. This one is actually quite tough from a design perspective,
because if one relay is really fast, meaning it could handle all of the
create cells it receives, maybe it should nonetheless opt to fail some
of them, for the good of later circuits in those circuits.

D) Talking to the humans involved to try to get them to stop and/or make
things less bad, in the mean time.

Hope that helps explain. More soon as we learn more and/or as we merge
in defenses and/or as we get permission to share things from the people
who have told us things.

Thanks,
--Roger