[tor-bugs] #32473 [Metrics/Exit Scanner]: Evaluate the results from the exitmap based scanner compared to the current exit lists system

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Nov 13 14:18:00 UTC 2019


#32473: Evaluate the results from the exitmap based scanner compared to the current
exit lists system
----------------------------------+------------------------------
 Reporter:  irl                   |          Owner:  metrics-team
     Type:  task                  |         Status:  new
 Priority:  Medium                |      Milestone:
Component:  Metrics/Exit Scanner  |        Version:
 Severity:  Normal                |     Resolution:
 Keywords:                        |  Actual Points:
Parent ID:                        |         Points:
 Reviewer:                        |        Sponsor:
----------------------------------+------------------------------

Comment (by karsten):

 Today I looked at an early log file produced by the new exitmap based
 scanner. Some observations:

  - The log file is the result of a single iteration over all exits in the
 consensus. This is different from the operation of the typical operation
 of an exit scanner which periodically checks which exits it hasn't scanned
 in a while and then scans them. This means that for this analysis we
 cannot compare scanners regarding how soon they scan new relays or relays
 that have uploaded a new descriptor. We should do that in the next step as
 more data is available from the new scanner.

  - I compared these single-run scan results with exit lists to see how
 much they agree. I found:
    - 191 relays that were contained in exit lists but not scanned by the
 new scanner. However, I considered exit lists from the 24 hours before the
 new scanner run time. I did not check which of these relays were still
 running at the time when the new scanner started. I could imagine that
 this explains most of these 191 cases. Plus a few scan errors. Needs more
 analysis, possibly in the next step.
    - 764 relays that had the same descriptor address as scanned exit
 address as found out by both scanners. This makes sense.
    - 33 relays with a different descriptor address than scanned exit
 address, but in all these cases the scanners agreed.

 These early results are promising! I suggest that I take another look as
 soon as the new scanner produces similar output as the current exit
 scanner, and once it's running continuously. I have some metrics in mind
 for that comparison, including: number of scans per hour, overlap of
 scanned relays (similar to the analysis above), or time between new
 descriptor and first exit scan. But this is easier as soon as the data
 exists. I'd like to wait for that now.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32473#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list