On Wed, Mar 31, 2021 at 11:14:11AM +0200, Dennis Jackson wrote:
Just to check I understand correctly - The attempt is considered a failure if (and only if) you can connect to the test relay correctly, but can't extend from the test relay to your own relay?
Right.
As many of these relays have weights, presumably they can successfully extend to at least some relays in order to have their bandwidth measured. I wonder how the probabilities would look if you tested with a some other (highly weighted) relays in the 2nd-hop position?
Right. I intentionally picked my own tiny relay for the second hop, first so that I know it's working (to remove that extra variable), but second because there probably won't already be a TLS connection open (because I wanted to test both the TCP/TLS connectivity and also the circuit extend part).
You're right that a good follow-up test would be to compare these results to one with a hugely popular second hop relay, because then there's more chance of an existing long-term conn in place (though that's complicated by flags -- e.g. exit relays probably won't connect to guard relays, but guard relays might be used in the middle hop to connect to exit relays, and circuit extends don't care about which way the existing orconn got originally created).
Would you be comfortable sharing the unfiltered dataset? It would be interesting to approximate the probability a client circuit is impacted by this kind of failure.
Yeah, I'll publish all these things once I figure out the right place for them (currently they're inside my bermuda repo, which is one of the bad-relays repos).
But in the mean time, in case you are excited for some more scripting, here are the full output results for the past day, plus the little perl script I use for turning the results into that ratio format you saw earlier. This is more data than the mail from 8 hours ago because the scans are still going.
I retest successes after 12 hours, and failures on the first hop after 2 hours, and failures on the second hop after 1 hour, so that's why you'll see more attempts to fingerprints that are more flaky. I should be able to reconstruct approximate timestamps for each test if we find a use for them, and I have the full set of circ controller events too.
--Roger