[network-health] Relays that can't extend

31 Mar 2021

      Hi folks,

I've been probing which relays can't extend to my relay:
https://metrics.torproject.org/rs.html#details/7B35DB92BA72BA0BBFD51B35B11A4...

I'm simply making two-hop paths from my client through every relay to
my destination relay.

Here's a snapshot of some relays that fail most/all of the extend
attempts.

The first data set is Feb 27 through Mar 1, and I've attached it as
feb27-mar1.txt. The second data set is Mar 30, and it's attached as
mar30.txt.

The way to read each line of the ratio is: number of failed extend
attempts / (number of failed + number of succeeded).

(There are some edge cases where I failed to reach the relay as the first
hop, for example because it has gone down, or it's in the new consensus
but my Tor client doesn't have the descriptor for it yet. I've omitted
those edge cases from the ratio calculation, since they're not failures
and they're not successes so they just muddy the analysis.)

Then I combined them: cat feb27-mar1.txt mar30.txt |sort > combined.txt
to make it easy to see which relays had problems in both sets.

I've started mailing some of the relay operators to ask them to
investigate. (Many of them alas have no workable contactinfo.) The most
likely explanation is that they're out of file descriptors (in which case
their Tor logs should be complaining constantly). A backup explanation
might be that their relay is censored from reaching my relay, perhaps
by destination port or address.

Eventually we'll want to use these results to compare against the
self-reported "overload" metrics from Proposal 328 ("Make Relays Report
When They Are Overloaded". I'm not sure what the step after that is,
but maybe it will be reducing the consensus weights for relays that
aren't performing well.

--Roger

[network-health] Relays that can't extend

Roger Dingledine