Re: [tor-dev] connectivity failure for top 100 relays

27 Apr 2018

      Greetings,

(
Meejah and I made txtorcon report the reason for circuit
build failures here: https://github.com/meejah/txtorcon/pull/299
My scanner now uses this txtorcon feature:
https://github.com/david415/tor_partition_scanner
)

I used a collector consensus file: 2018-04-27-19-00-00-consensus

wget https://collector.torproject.org/recent/relay-descriptors/consensuses/2018-0...

and extracted the top 100 relays with the highest consensus weights
with stable AND fast flags.

./helpers/query_fingerprints_from_consensus_file.py 2018-04-27-19-00-00-consensus > top100.relays

and then performed the scan, building 9900 2-hop tor circuits:

detect_partitions.py --tor-control unix:/var/run/tor/control --log-dir ./ --status-log ./status_log \
   --relay-list top100.relays --secret secretTorEmpireOfRelays --partitions 1 --this-partition 0 \
   --build-duration .25 --circuit-timeout 60 --log-chunk-size 1000 --max-concurrency 100

This resulted in only 307 circuit build failures:

echo "select reason from scan_log where status = 'failure'
...
;" | sqlite3 scan1.db | wc -l
307
And out of these failures, 301 of them the circuit build failure REASON was reported by little-t tor as TIMEOUT:

echo "select reason from scan_log where status = 'failure';" | sqlite3 scan1.db | grep -i timeout | wc -l
301

Here's the non-timeout REASONs for these circuit build failures:

echo "select reason from scan_log where status = 'failure';" | sqlite3 scan1.db | grep -vi timeout

DESTROYED, FINISHED
DESTROYED, FINISHED
DESTROYED, CHANNEL_CLOSED
DESTROYED, CHANNEL_CLOSED
DESTROYED, CHANNEL_CLOSED
DESTROYED, CHANNEL_CLOSED

I'm curious to try this scan at different times of day to see if results vary.

Cheers,

David

On Tue, Mar 13, 2018 at 11:48:30PM +0000, dawuud wrote:
...
I did another scan, this time with 3 seconds between each circuit
build and set the max connections to 50 with similar results as
yesterday:
9354 failure
2 timeout
544 success
most of the circuit build failures happened in under a second:
echo "select (end_time - start_time) / 1000 as duration from scan_log where duration < 1 AND status = 'failure';" | sqlite3 scan1.db | wc -l
9344
...
txtorcon does expose both the 'reason' and the 'remote_reason' flags
returned by the failure messages. In fact, it returns all flags that Tor
sent during stream or circuit failures.
The **kwargs in stream_closed, circuit_closed or circuit_failed
notifications should all include "REASON" and many times will also
include "REMOTE_REASON" (e.g. if the "other" relay closed the
connection). For convenience, txtorcon also includes lower-cased
versions of all the flags.
ah ok! I will take a look at this. I'd like to do another scan
while collecting this additional information.
...
Would it be better, then, to pick one first hop and scan (sequentially)
every second-hop using that first hop? (And maybe have say 5 or 10 such
things going on at once?)
Maybe it's ok to make 7,000+ tor circuits sequentially from the same relay
if it's done very slowly?

...
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev