[tor-relays] Guidelines and processes for our bad relay work

8 Apr 2020

      Hello!

There has been some confusion among relay operators about how we deal
with bad relays and who is actually making decisions and how the overall
process is working. Even though we don't have a document yet to point to
for answering all those questions (more on that below) we thought it
could be useful to give a status update to the relay community and
outline possible next steps.

One of the tasks in the network health area is to make sure bad
relays are found and excluded from the network. This should happen
according to transparent criteria which help relay operators to
understand both expectations and processes. Ideally, a document
containing those criteria would give operators some insight at how we
arrived at those as well.

Unfortunately and as I said above, we are not at a point yet where we
have written up that document. However, that does not mean that removing
relays from the network is arbitrary currently. Rather, we have some
rules of thumb and some unwritten guidelines which still seem to be
worth sharing at this point to help relay operators better understand
what is going on in the bad relay detection world.

A bad relay is one that either doesn't work properly or tampers with
users' connections. This can be either through maliciousness or
misconfiguration. We are relying on some scanners that check for common
issues to find those relays and on volunteers that spot things beyond
what our scanners target.

To give you some examples of issues we are concerned about:

a) Tampering with exit traffic
b) Running HSDirs that harvest and probe .onion addresses
c) Issues with resolving DNS queries on exit relays
d) Flooding the network with relays to deanonymize users
e) Running outdated Tor versions
...

Now, how do we detect maliciousness vs. misconfiguration and what do we
do about both?

There is behavior that we think is clearly malicious like tampering with
exit traffic or trying to harvest and probe .onion addresses. In those
cases we outright reject relays. In the past we thought relays that
tampered with exit traffic could still be useful as non-exit relays and
they got the BadExit flag. But it turned out that a bunch of those had
other, more subtle, misbehavior and thus we decided to be on the safe
side and just reject those malicious relays nowadays.

For behavior that could either be malicious or the result of a
misconfiguration (like missing MyFamily settings) things get messier.

Means for contacting relay operators (e.g. a meaningful ContactInfo
entry) are very important in cases where misconfiguration can play a
role. We usually contact operators in that case (if possible) to figure
out what is going on and help them getting their configurations right.
That means there is no outright force removal of relays that e.g. did
not have their MyFamily configuration set up properly (we know it can be
tricky). That approach is successful in a lot of cases and helps us
build a relationship to operators which is worthwhile as well. However,
in cases where we don't get a reaction or are getting confident that the
intentions of the operator are malicious we'll reject the relay(s) to
protect our users.

All those activities mentioned above are coordinated on the bad-relays
list, which is private and used by members of the team to discuss cases
and keep each other in the loop.

As to next steps: yes, we need to sit down finishing that document with
all the criteria we are concerned with giving some rationale for each of
them. Alas, there is no timeframe for getting this work done. But once
we are there we'll consult tor-internal and the tor-relays list for
input and make changes as needed.

I hope this helps to clear some things up. I am happy to answer
questions/reply to concerns on and off-list should there be some.

Georg

[tor-relays] Guidelines and processes for our bad relay work

Georg Koppen