Re: [tor-dev] connectivity failure for top 100 relays

13 Mar 2018

      On Tue, Mar 13, 2018 at 02:55:12AM +0000, dawuud wrote:
...
Out of 9900 possible two hop tor circuits among the top 100 tor relays
only 935 circuit builds have succeeded. This is way worse than the last
time I sent a report 6 months ago during the Montreal tor dev meeting.
The next step here would be to try to debug your results, to understand
if it's actually an issue with the Tor network (in which case, what
exactly is the issue), or if it's a bug in your scripts.

Teor asked some good questions.

Other questions I'd want to investigate:

(A) Are the failures consistent, or intermittent? That is, does a
failed link always fail, or only sometimes?

(B) Are you really sure that it failed? I would guess that 'failed'
is different from 'timeout' because it got an explicit destroy back?
If so, don't destroy cells have 'reason' components? Which reasons are
happening most commonly?

(C) We should find a link that is failing between two relays that we
both control, and look at each one more closely to see if there are any
hints. For example, is there anything in the logs? If we turn up the
logging, do we get any hints then?

(D) ...which leads to: we should run this same tool on the test network
that teor and dgoulet et al run, and look for failures there. Assuming we
find some, since there are no users on the test network, we can investigate
much more thoroughly.

(E) I wonder if there's a correlation between the failed links and
whether a TLS connection is already established on that link. That is,
when there is no connection already, there are many more steps that
need to be taken to extend the circuit, and those steps could lead to
increased failure rates, either due to the extra time that is needed,
or because part of tor's link handshake (NETINFO, etc) is going wrong.

And a last point: this tool, and these investigations, are exactly in
scope for the "network health" topic that the network team has been
discussing as one of the key open areas that need more attention.

--Roger

Re: [tor-dev] connectivity failure for top 100 relays

Roger Dingledine