[tor-relays] >23% Tor exit relay capacity found to be malicious - call for support for proposal to limit large scale attacks

Wed Jul 8 10:35:42 UTC 2020

On Tue, Jul 07, 2020 at 01:01:12AM +0200, nusenu wrote:
> > https://gitlab.torproject.org/tpo/metrics/relay-search/-/issues/40001
> 
> thanks, I'll reply here since I (and probably others) can not reply there.

Fwiw, anybody who wants a gitlab account should just ask for one. Don't
be shy. :)

The instructions for asking are here:
https://gitlab.torproject.org/users/sign_in

> > (A) Limiting each "unverified" relay family to 0.5% doesn't by itself
> > limit the total fraction of the network that's unverified. I see a lot of
> > merit in another option, where the total (global, network-wide) influence
> > from relays we don't "know" is limited to some fraction, like 50% or 25%.
> 
> I like it (it is even stricter than what I proposed), you are basically saying
> the "known" pool should always control a fixed (or minimal?) portion - lets say 75% - 
> of the entire network no matter what capacity the "unknown" pool has

Right.

> but it doesn't address the key question: 
> How do you specifically define "known" and how do you verify entities before you move them to the "known" pool?

Well, the first answer is that these are two separate mechanisms, which
we can consider almost independently:

* One is dividing the network into known and unknown relays, where we
reserve some minimum fraction of attention for the known relays. Here
the next steps are to figure out how to do load balancing properly with
this new parameter (mainly a math problem), and to sort out the logistics
for how to label the known relays so directory authorities can assign
weights properly (mainly coding / operator ux).

* Two is the process we use for deciding if a relay counts as known. My
suggested first version here is that we put together a small team of Tor
core contributors to pool their knowledge about which relay operators
we've met in person or otherwise have a known social relationship with.

One nice property of "do we know you" over "do you respond to mail at
a physical address" is that the thing you're proving matters into the
future too. We meet people at relay operator meetups at CCC and Fosdem
and Tor dev meetings, and many of them are connected to their own local
hacker scenes or other local communities. Or said another way, burning
your "was I able to answer a letter at this fake address" effort is a
different tradeoff than burning your "was I able to convince a bunch of
people in my local and/or international communities that I mean well?"

I am thinking back to various informal meetings over the years at
C-base, Hacking At Random, Defcon, etc. The "social connectivity" bond
is definitely not perfect, but I think it is the best tool available to
us, and it provides some better robustness properties compared to more
faceless "proof of effort" approaches.

That said, on the surface it sure seems to limit the diversity we can
get in the network: people we haven't met in Russia or Mongolia or
wherever can still (eventually, postal service issues aside) answer a
postal letter, whereas it is harder for them to attend a CCC meetup. But
I think the answer there is that we do have a pretty good social fabric
around the world, e.g. with connections to OTF fellows, the communities
that OONI has been building, etc, so for many places around the world,
we can ask people we know there for input.

And it is valuable for other reasons to build and strengthen these
community connections -- so the incentives align.

Here the next step is to figure out the workflow for annotating relays. I
had originally imagined some sort of web-based UI where it leads me
through constructing and maintaining a list of fingerprints that I have
annotated as 'known' and a list annotated as 'unknown', and it shows
me how my lists have been doing over time, and presents me with new
not-yet-annotated relays.

But maybe a set of scripts, that I run locally, is almost as good and
much simpler to put together. Especially since, at least at first,
we are talking about a system that has on the order of ten users.

One of the central functions in those scripts would be to sort the
annotated relays by network impact (some function of consensus weight,
bandwidth carried, time in network, etc), so it's easy to identify the
not-yet-annotated ones that will mean the biggest shifts. Maybe this
ordered list is something we can teach onionoo to output, and then all the
local scripts need to do is go through each relay in the onionoo list,
look them up in the local annotations list to see if they're already
annotated, and present the user with the unannotated ones.

To avoid centralizing too far, I could imagine some process that gathers
the current annotations from the several people who are maintaining them,
and aggregates them somehow. The simplest version of aggregation is
"any relay that anybody in the group knows counts as known", but we
could imagine more complex algorithms too.

And lastly, above I said we can consider the two mechanisms "almost
independently" -- the big overlap point is that we need to better
understand what fraction of the network we are considering "known",
and make sure to not screw up the load balancing / performance of the
network too much.

> > (2) We need to get rid of http and other unauthenticated internet protocols:
> 
> This is something browser vendors will tackle for us I hope, but it
> will not be anytime soon.

Well, we could potentially tackle it sooner than the mainstream browser
vendors. See
https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/19850#note_2685098
where maybe (I'm not sure, but maybe) https-everywhere has a lot of the
development work already done.

--Roger