[metrics-team] Please fact-check my guide to determining which website visitors use Tor

Karsten Loesing karsten at torproject.org
Wed Mar 23 17:52:38 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David,

On 23/03/16 16:52, David Fifield wrote:
> This is some information Sadia Afroz and I are planning to send to 
> Twitter, after our talk on Tor blocking today. Our goal is to show
> how to measure how many users are on Tor, correctly. Would you
> please check it for errors and suggest changes?

This all looks plausible!

What you might want to add is that people can get a quick first idea
of IP addresses used by relays to register in the Tor network, IP
addresses used for exiting to the internet, and exit policies by
looking at Onionoo data:

https://onionoo.torproject.org/details?limit=500

https://onionoo.torproject.org/protocol.html

In particular the "or_addresses", "exit_addresses", and "exit_policy"
fields seem relevant here.

Of course, people can as well process the raw data directly.  But
maybe Onionoo data is sufficient for a first prototype.

All the best,
Karsten


> 
> 
> == Measuring Tor users ==
> 
> You can mine your past logs to see what fraction of sessions used
> Tor. The data source you want to use for this is: 
> https://collector.torproject.org/#type-tordnsel 
> https://collector.torproject.org/archive/exit-lists/ It contains
> records of this form: ExitNode
> 63BA28370F543D175173E414D5450590D73E22DC Published 2010-12-28
> 07:35:55 LastStatus 2010-12-28 08:10:11 ExitAddress 91.102.152.236
> 2010-12-28 07:10:30 ExitAddress 91.102.152.227 2010-12-28 10:35:30 
> The "ExitAddress" lines are determined by actually building
> circuits through the exit; i.e., they won't be fooled by exits that
> exit traffic on a different IP address than they accept Tor
> connections on.
> 
> To be especially rigorous, you would want to also consider each
> exit node's exit policy, to check whether it allows exiting to
> Twitter on ports you care about. Those exit nodes that do not,
> should not be considered "exit nodes" from Twitter's point of view.
> For that, you probably want network status documents, and join on
> the fingerprint field. But I would guess that effect is very small:
> it would only matter if someone had an exit that did not allow
> access to Twitter, but they themselves access Twitter (not through
> Tor) on the same IP address. 
> https://collector.torproject.org/#type-network-status-consensus-3
> 
> This is the same process that powers the
> https://check.torproject.org/ online test that checks if you are
> using Tor, and the https://exonerator.torproject.org/ service that
> checks if an IP address was an exit in the past. For real-time
> checks, you'll want to have a process that continually refreshes
> the exit list from 
> https://collector.torproject.org/recent/exit-lists/ (they are
> published hourly). There is documentation and source code for 
> running the Check and Exonerator services: 
> https://gitweb.torproject.org/check.git/tree/ 
> https://gitweb.torproject.org/exonerator.git/tree/ Here is sample
> Python code that parses various Collector documents and outputs a
> list of IP addresses: 
> https://gitweb.torproject.org/check.git/tree/scripts/exitips.py The
> output of the above code is available here (same format as the 
> tordnsel documents): current:
> https://check.torproject.org/exit-addresses 
> _______________________________________________ metrics-team
> mailing list metrics-team at lists.torproject.org 
> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
> 

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJW8tfmAAoJEC3ESO/4X7XBzg0IAJE1bxYe7wTdAAo0ISbEIf9L
xtXUO4olnc7rIqfydZCEbuwlvEyKTyinZzcoGNwI/5GQqDzehkGyN8sueJ+nzLO2
8uhp7UfhR8m/t18x476ylifTj4AFIauRDlyaDGBQIdlDrCtrECHKgagN6PblPYcJ
mGqF4A7GRlj9+DypAf32O3mO8iN9QP7K9ryvTJKpN4PBfGR9S1vRBfXBHT1IZ+d3
9naL5+jLFRHRDXK8beX1t+4+YFE59YKiAcO5AT8+89hUFoVs2dyqcbx2EiW2ZLpl
NrYMzoyrqg7BGDXt3jGpuNMuCT3pQyRAtGifdiZe1MVuPUTPsL5H9RuYzZt81XY=
=XItX
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list