Hi folks,
I've been probing which relays can't extend to my relay: https://metrics.torproject.org/rs.html#details/7B35DB92BA72BA0BBFD51B35B11A4...
I'm simply making two-hop paths from my client through every relay to my destination relay.
Here's a snapshot of some relays that fail most/all of the extend attempts.
The first data set is Feb 27 through Mar 1, and I've attached it as feb27-mar1.txt. The second data set is Mar 30, and it's attached as mar30.txt.
The way to read each line of the ratio is: number of failed extend attempts / (number of failed + number of succeeded).
(There are some edge cases where I failed to reach the relay as the first hop, for example because it has gone down, or it's in the new consensus but my Tor client doesn't have the descriptor for it yet. I've omitted those edge cases from the ratio calculation, since they're not failures and they're not successes so they just muddy the analysis.)
Then I combined them: cat feb27-mar1.txt mar30.txt |sort > combined.txt to make it easy to see which relays had problems in both sets.
I've started mailing some of the relay operators to ask them to investigate. (Many of them alas have no workable contactinfo.) The most likely explanation is that they're out of file descriptors (in which case their Tor logs should be complaining constantly). A backup explanation might be that their relay is censored from reaching my relay, perhaps by destination port or address.
Eventually we'll want to use these results to compare against the self-reported "overload" metrics from Proposal 328 ("Make Relays Report When They Are Overloaded". I'm not sure what the step after that is, but maybe it will be reducing the consensus weights for relays that aren't performing well.
--Roger