[metrics-team] Please fact-check my guide to determining which website visitors use Tor

David Fifield david at bamsoftware.com
Wed Mar 23 15:52:32 UTC 2016


This is some information Sadia Afroz and I are planning to send to
Twitter, after our talk on Tor blocking today. Our goal is to show how
to measure how many users are on Tor, correctly. Would you please check
it for errors and suggest changes?


== Measuring Tor users ==

You can mine your past logs to see what fraction of sessions used Tor.
The data source you want to use for this is:
	https://collector.torproject.org/#type-tordnsel
	https://collector.torproject.org/archive/exit-lists/
It contains records of this form:
	ExitNode 63BA28370F543D175173E414D5450590D73E22DC
	Published 2010-12-28 07:35:55
	LastStatus 2010-12-28 08:10:11
	ExitAddress 91.102.152.236 2010-12-28 07:10:30
	ExitAddress 91.102.152.227 2010-12-28 10:35:30
The "ExitAddress" lines are determined by actually building circuits
through the exit; i.e., they won't be fooled by exits that exit traffic
on a different IP address than they accept Tor connections on.

To be especially rigorous, you would want to also consider each exit
node's exit policy, to check whether it allows exiting to Twitter on
ports you care about. Those exit nodes that do not, should not be
considered "exit nodes" from Twitter's point of view. For that, you
probably want network status documents, and join on the fingerprint
field. But I would guess that effect is very small: it would only matter
if someone had an exit that did not allow access to Twitter, but they
themselves access Twitter (not through Tor) on the same IP address.
	https://collector.torproject.org/#type-network-status-consensus-3

This is the same process that powers the https://check.torproject.org/
online test that checks if you are using Tor, and the
https://exonerator.torproject.org/ service that checks if an IP address
was an exit in the past. For real-time checks, you'll want to have a
process that continually refreshes the exit list from
	https://collector.torproject.org/recent/exit-lists/
(they are published hourly). There is documentation and source code for
running the Check and Exonerator services:
	https://gitweb.torproject.org/check.git/tree/
	https://gitweb.torproject.org/exonerator.git/tree/
Here is sample Python code that parses various Collector documents and
outputs a list of IP addresses:
	https://gitweb.torproject.org/check.git/tree/scripts/exitips.py
The output of the above code is available here (same format as the
tordnsel documents):
	current: https://check.torproject.org/exit-addresses


More information about the metrics-team mailing list