> Other questions I'd want to investigate:
> (A) Are the failures consistent, or intermittent? That is, does a
> failed link always fail, or only sometimes?

Yes this is what our new testing methodology should support.
My current scanner is not sufficient. We want to improve it.

> (B) Are you really sure that it failed? I would guess that 'failed'
> is different from 'timeout' because it got an explicit destroy back?
> If so, don't destroy cells have 'reason' components? Which reasons are
> happening most commonly?

Yes I am sure it failed. It would be cool if txtorcon can expose the
'reason' but I think that it cannot. I suppose it will show up in the
tor log file if I set it to debug logging.

> (C) We should find a link that is failing between two relays that we
> both control, and look at each one more closely to see if there are any
> hints. For example, is there anything in the logs? If we turn up the
> logging, do we get any hints then?

Sounds good. I would certainly be willing to collaborate with Teor or anyone
else who might like to help with this.

> (D) ...which leads to: we should run this same tool on the test network
> that teor and dgoulet et al run, and look for failures there. Assuming we
> find some, since there are no users on the test network, we can investigate
> much more thoroughly.

Sounds good. Let me know if there is anything I can do to help with this.

> (E) I wonder if there's a correlation between the failed links and
> whether a TLS connection is already established on that link. That is,
> when there is no connection already, there are many more steps that
> need to be taken to extend the circuit, and those steps could lead to
> increased failure rates, either due to the extra time that is needed,
> or because part of tor's link handshake (NETINFO, etc) is going wrong.

Ah yes this is another good question for which I currently do not have an answer.
