-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello Tor relay operators,
We could use your help in a pilot project to improve Tor security. As you may be aware, the anonymity of a connection over Tor is vulnerable to an adversary who can observe it in enough places along its route. For example, traffic that crosses the same country as it enters and leaves the Tor network can potentially be deanonymized by an authority in that country who can monitor all network communication. Researchers have been working to figure out how Tor traffic gets routed over the Internet [0-3], but determining routes with high confidence has been difficult.
That's where you come in. To figure out where traffic travels from your relay, we'd like you to run a bunch of "traceroutes" - network measurements that show the paths traffic takes. This is a one-time experiment for now, but, depending on what we find out, regularly making such measurements may become a part of Tor itself. We have already gotten some results thanks to Linus Nordberg of DFRI and Moritz Bartl of torservers.net, and now it's time to ask all relay operators to help. We would like to start this right away.
We have written some shell scripts to automate most of the process. The easiest way for you to get them is with git, using the following commands:
git clone https://bitbucket.org/anupam_das/traceroute-from-tor-relays git checkout f253f768d14e3368e4fe4de9895acd2715a19412
You can also just download the files directly by visiting [4]. Detailed instructions for setting up and running the experiment are in the README.
Basically the experiment does traceroutes to three groups: all "routable IP prefixes", all Tor relays, and then all /24 subnets. These kinds of measurements are not uncommon, and they will not be done at a high rate. By default the scripts will periodically move the results to our server [5] via SSH, although you can keep the results around and/or not send them automatically if you wish (see the README). The traceroute data recorded is not sensitive or private at all. We plan to make the code and data public, following Tor's practice of open cooperation with the research community [6].
The measurements will work best if you have the "scamper" tool from the Cooperative Association for Internet Data Analysis (CAIDA) installed (see the README for installation instructions). This is a standard and open-source tool that handles the many modern complexities of Internet routing measurement. If you are not able to run scamper, the script will also work with the more-common but less-accurate and slower "traceroute" utility. We do not currently have support for Windows relays. The output will take up around 500KB (110MB if you disable automatic removal after upload) disk space if you use scamper; on the other hand if you use "traceroute" utility each output will be around 4MB (1GB with automatic removal after upload disabled). * *Depending on whether you run scamper or traceroute the total time required varies but results for traceroutes to "routable IP prefixes" and all Tor relays should finish within one week (possibly earlier). We would like to request relay operators to upload those results once finished.* *
This experiment is in collaboration with several researchers, but the leads are Anupam Das, a Ph.D. student at the University of Illinois at Urbana-Champaign, and his advisor Nikita Borisov. Based on a review of the scripts of commit f253f768d14e3368e4fe4de9895acd2715a19412, we believe that they operate as described above. Please do read through them yourself, and let us know if you have any questions or concerns. And also feel free to contact any of us for help or with suggestions.
Because of you, Tor is the "king" of anonymous communication. With your help, we will keep improving to face the new challenges to privacy and freedom online.
Thank you, Karsten Loesing karsten@torproject.org Anupam Das das17@illinois.edu Nikita Borisov nikita@illinois.edu
[0] "Protecting anonymity in the presence of autonomous system and internet exchange level adversaries" by Joshua Juen. Master's Thesis, UIUC. 2012. https://www.ideals.illinois.edu/handle/2142/34363 [1] "Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries" by Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson. ACM CCS 2013. http://freehaven.net/anonbib/cache/ccs2013-usersrouted.pdf [2] "AS-awareness in Tor path selection" by Matthew Edman and Paul F. Syverson. ACM CCS 2009. http://freehaven.net/anonbib/cache/DBLP:conf/ccs/EdmanS09.pdf [3] "Sampled Traffic Analysis by Internet-Exchange-Level Adversaries" by Steven J. Murdoch and Piotr Zieliński. PETS 2007. http://freehaven.net/anonbib/cache/murdoch-pet2007.pdf [4] https://bitbucket.org/anupam_das/traceroute-from-tor-relays/downloads [5] ttat-control.iti.illinois.edu [6] https://metrics.torproject.org/
On Wed, 23 Oct 2013, Karsten Loesing wrote:
As you may be aware, the anonymity of a connection over Tor is vulnerable to an adversary who can observe it in enough places along its route.
Are you trying to work backward to physical-layer chokepoints, like how the inter-contentinal network topology maps to the landing sites on http://www.submarinecablemap.com/?
Basically the experiment does traceroutes to three groups: all "routable IP prefixes", all Tor relays, and then all /24 subnets.
Based on my read of your input data, these will run traceroutes to 491762, 9058, and 13597431 IPs respectively, sending at least 100 million packets? This is a bigger ask than I think you made clear.
I question the utility of scanning all /24s. By definition, all /24s in a BGP prefix take the same path to its origin AS; the only variation will be within that. If you are looking for chokepoints, you've already found it with the origin AS.
This also does all scans sequentially, which will have a couple of negative side-effects. You are much more likely to trigger ICMP response rate-limiting on intermediate routers and more likely to trigger IDS alarms than if you'd randomized your target selection. Running your target list through one of the following would mitigate this:
awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -k1 -n | cut -f2-
perl -MList::Util -e 'print List::Util::shuffle <>'
sort -R
These kinds of measurements are not uncommon, and they will not be done at a high rate.
They are uncommon from a Tor exit node, which already receives enough complaints where it is really helpful to be able to truthfully claim to my ISP and others that none of the traffic was generated by me, there's not much I can do to stop it, and I have no logs about anything.
If you are not able to run scamper, the script will also work with the more-common but less-accurate and slower "traceroute" utility.
Which by default will try to keep 128 traceroute processes running all the time. This is potentially problematic for relays with limited RAM or CPU available. I'd recommend making this more clear.
I may run this from a machine on the same network as my Tor node, but definitely not on the Tor node itself.
-- Aaron
On Wed, Oct 23, 2013 at 1:26 PM, Aaron Hopkins lists@die.net wrote:
On Wed, 23 Oct 2013, Karsten Loesing wrote:
Basically the experiment does traceroutes to three groups: all "routable IP prefixes", all Tor relays, and then all /24 subnets.
Based on my read of your input data, these will run traceroutes to 491762, 9058, and 13597431 IPs respectively, sending at least 100 million packets? This is a bigger ask than I think you made clear.
I question the utility of scanning all /24s. By definition, all /24s in a BGP prefix take the same path to its origin AS; the only variation will be within that. If you are looking for chokepoints, you've already found it with the origin AS.
Initially I didn't see the sense in this either, perhaps it's in the referenced docs. If the internal paths of a large aggregated AS were of interest it could be used for that. Though you might use the BGP table to reduce the /24 query set to just those matching the AS you reside in. Then again, there can be higher precedence side peerings that would not be found with that.
Also, I'd merge in current BGP AS and GeoIP data for each hop of the trace. That could probably be done upon receipt of a submission if they happen daily. However on the client may be better since in order to test new routes over time they'll need to update BGP tables anyway.
And for the Tor node... capture the IP, IP whois, DNS ptr, DNS ptr whois, BGP AS and prefix, and GeoIP.
I can see this being useful for siting nodes in the future.
They are uncommon from a Tor exit node, which already receives enough
It's opt-in so I see no issue here.
Which by default will try to keep 128 traceroute processes running all the time. This is potentially problematic for relays with limited RAM or CPU available. I'd recommend making this more clear.
...and more tuneable via config file.
If looking for more network input and similar work, make a post to NANOG etc. There have been lots of traceroute projects.
So we have received some questions about running our traceroute measurements. Let me answer some of the questions:
------------------------------------------------------ From: Aaron Hopkins [lists@die.net]
Are you trying to work backward to physical-layer chokepoints, like how the inter-contentinal network topology maps to the landing sites on http://www.submarinecablemap.com/?
We are not looking for "chokepoints" exactly. We are interested in knowing as much as possible about where the traffic goes and who can see it.
I question the utility of scanning all /24s. By definition, all /24s in a BGP prefix take the same path to its origin AS; the only variation will be within that. If you are looking for chokepoints, you've already found it with the origin AS.
Our BGP prefixes are derived from RouteViews BGP tables. There is no guarantee that these are the prefixes used by all Internet routers, much less that routing is consistent over all IPs in the prefix. In addition, these prefixes change over time. Doing all /24 prefixes would enable us to study finer-grained routing behavior.
This also does all scans sequentially, which will have a couple of negative side-effects. You are much more likely to trigger ICMP response rate-limiting on intermediate routers and more likely to trigger IDS alarms than if you'd randomized your target selection. Running your target list through one of the following would mitigate this:
awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -k1 -n | cut -f2-
perl -MList::Util -e 'print List::Util::shuffle <>'
sort -R
We have randomized the prefix list. And will soon randomize the /24 list too.
Which by default will try to keep 128 traceroute processes running all the time. This is potentially problematic for relays with limited RAM or CPU available. I'd recommend making this more clear.
We tested running 128 parallel traceroute. We found the CPU (<5%) and RAM (<0.2% for 8GB RAM) requirements really small . We can also choose how many traceroutes you want to run using the following command-
PARALLEL=64 ./traceroutes.sh & if you want to run 64 traceroutes in parallel
If you are using scamper (which we encourage you to do) you can tell the script how many packets-per-second you want using the following command-
PPS=800 ./traceroutes.sh & where 1<=PPS<=1000
I may run this from a machine on the same network as my Tor node, but definitely not on the Tor node itself.
Running from a machine on the same network is absolutely just as good as long as they definitely send the same traffic through the same Internet gateways.
------------------------------------------------------------------- From: Geoff Down geoffdown at fastmail.net
Your README should probably explicitly say that you need to run sudo chmod 04555 /path/to/scamper after installing scamper or it won't work.
Yes we have mentioned that on the README now.
-------------------------------------------------------------------- From: Jesse Victors jvictors at jessevictors.com
How much bandwidth will this be taking up, and roughly how much will be uploaded/downloaded? I've cloned the repo, but I'm nervous about running this if it's going to be a significant bandwidth hog for a whole week. As Aaron said in Issue 39, it looks like it's going to be a lot of IPs and a large amount of packets. Also, ISPs may not take kindly to all these scans. What's the word on that? Has anyone run this tool, and what's their opinion? I'd be happy to help, but I'd like to know the full details of the various resources this tool will be consuming.
We have provided some resource requirements at the end of the README file. I'll just summarize all of them here-
1. Upload: The script generate less than 500MB of total data. By default the files are erased once they are uploaded, but you can choose not to erase any data.
2. Bandwidth: If scamper used then with all the default setting it would consume at most 0.5 megabit of bandwidth. However, you can choose to reduce the consumption by changing
the PPS (packet-per-second) parameter (1<=PPS<=1000) using the following command
PPS=800 ./traceroutes.sh &
With reasonable bandwidth line (>100Mbps) this form of traceroutes shouldn't be much of a load to any ISP.
3. RAM and CPU usage: We have tested out the script on multiple machines. We found the following resource usage
RAM <0.2% (for 8GB RAM) and CPU < 5%.
Hope this gives you some idea about the resource requirements.
Thanks
Anupam Das
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Das, Anupam:
So we have received some questions about running our traceroute measurements. Let me answer some of the questions:
Here are two more:
1. Is the traffic to go *through* tor, or just clearnet off the machine running the relay?
2. Is this only applicable to exit relays? That's not clear at all, whether results from non-exit relay boxes (or bridges) are useful or wanted.
Best, - -Gordon M.
On 27 Oct 2013 21:23, "Gordon Morehouse" gordon@morehouse.me wrote:
- Is the traffic to go *through* tor, or just clearnet off the
machine running the relay?
Traffic is sent directly, not through Tor.
- Is this only applicable to exit relays? That's not clear at all,
whether results from non-exit relay boxes (or bridges) are useful or wanted.
That's a great question. A relay that can act as a guard is also immediately useful to us. Bridges aren't directly addressed by our research; I'm sure we could find some use for measurements from bridges in the future, but right now, guards and exits are what we need.
Thanks! - Nikita
Hi Gordon, Thanks for the questions. We have put up a small description of the project and FAQs (including your posted questions) at the following link-
http://web.engr.illinois.edu/~das17/tor-traceroute_v1.html
Hope you find the page helpful.
Thanks
Anupam Das
________________________________________ From: tor-relays [tor-relays-bounces@lists.torproject.org] on behalf of Gordon Morehouse [gordon@morehouse.me] Sent: Sunday, October 27, 2013 4:22 PM To: tor-relays@lists.torproject.org Subject: Re: [tor-relays] Traceroute measurement from Tor relays
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Das, Anupam:
So we have received some questions about running our traceroute measurements. Let me answer some of the questions:
Here are two more:
1. Is the traffic to go *through* tor, or just clearnet off the machine running the relay?
2. Is this only applicable to exit relays? That's not clear at all, whether results from non-exit relay boxes (or bridges) are useful or wanted.
Best, - -Gordon M.
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Das, Anupam:
Hi Gordon, Thanks for the questions. We have put up a small description of the project and FAQs (including your posted questions) at the following link-
http://web.engr.illinois.edu/~das17/tor-traceroute_v1.html
Hope you find the page helpful.
Yep, I read it, but I still had the two questions - particularly whether non-exit relays should run the script or not.
Anyway, I'll run it on any of my relays with the Guard flag. Thanks, and thanks for doing this research! :)
Best, - -Gordon M.
Das, Anupam:
So we have received some questions about running our traceroute measurements. Let me answer some of the questions:
Here are two more:
- Is the traffic to go *through* tor, or just clearnet off the
machine running the relay?
- Is this only applicable to exit relays? That's not clear at
all, whether results from non-exit relay boxes (or bridges) are useful or wanted.
Best, -Gordon M.
On 10/23/2013 10:09 AM, Karsten Loesing wrote:
Hello Tor relay operators,
We could use your help in a pilot project to improve Tor security. As you may be aware, the anonymity of a connection over Tor is vulnerable to an adversary who can observe it in enough places along its route. For example, traffic that crosses the same country as it enters and leaves the Tor network can potentially be deanonymized by an authority in that country who can monitor all network communication. Researchers have been working to figure out how Tor traffic gets routed over the Internet [0-3], but determining routes with high confidence has been difficult.
That's where you come in. To figure out where traffic travels from your relay, we'd like you to run a bunch of "traceroutes" - network measurements that show the paths traffic takes. This is a one-time experiment for now, but, depending on what we find out, regularly making such measurements may become a part of Tor itself. We have already gotten some results thanks to Linus Nordberg of DFRI and Moritz Bartl of torservers.net, and now it's time to ask all relay operators to help. We would like to start this right away.
We have written some shell scripts to automate most of the process. The easiest way for you to get them is with git, using the following commands:
git clone https://bitbucket.org/anupam_das/traceroute-from-tor-relays git checkout f253f768d14e3368e4fe4de9895acd2715a19412
You can also just download the files directly by visiting [4]. Detailed instructions for setting up and running the experiment are in the README.
Basically the experiment does traceroutes to three groups: all "routable IP prefixes", all Tor relays, and then all /24 subnets. These kinds of measurements are not uncommon, and they will not be done at a high rate. By default the scripts will periodically move the results to our server [5] via SSH, although you can keep the results around and/or not send them automatically if you wish (see the README). The traceroute data recorded is not sensitive or private at all. We plan to make the code and data public, following Tor's practice of open cooperation with the research community [6].
The measurements will work best if you have the "scamper" tool from the Cooperative Association for Internet Data Analysis (CAIDA) installed (see the README for installation instructions). This is a standard and open-source tool that handles the many modern complexities of Internet routing measurement. If you are not able to run scamper, the script will also work with the more-common but less-accurate and slower "traceroute" utility. We do not currently have support for Windows relays. The output will take up around 500KB (110MB if you disable automatic removal after upload) disk space if you use scamper; on the other hand if you use "traceroute" utility each output will be around 4MB (1GB with automatic removal after upload disabled). * *Depending on whether you run scamper or traceroute the total time required varies but results for traceroutes to "routable IP prefixes" and all Tor relays should finish within one week (possibly earlier). We would like to request relay operators to upload those results once finished.* *
This experiment is in collaboration with several researchers, but the leads are Anupam Das, a Ph.D. student at the University of Illinois at Urbana-Champaign, and his advisor Nikita Borisov. Based on a review of the scripts of commit f253f768d14e3368e4fe4de9895acd2715a19412, we believe that they operate as described above. Please do read through them yourself, and let us know if you have any questions or concerns. And also feel free to contact any of us for help or with suggestions.
Because of you, Tor is the "king" of anonymous communication. With your help, we will keep improving to face the new challenges to privacy and freedom online.
Thank you, Karsten Loesing karsten@torproject.org Anupam Das das17@illinois.edu Nikita Borisov nikita@illinois.edu
[0] "Protecting anonymity in the presence of autonomous system and internet exchange level adversaries" by Joshua Juen. Master's Thesis, UIUC. 2012. https://www.ideals.illinois.edu/handle/2142/34363 [1] "Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries" by Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson. ACM CCS 2013. http://freehaven.net/anonbib/cache/ccs2013-usersrouted.pdf [2] "AS-awareness in Tor path selection" by Matthew Edman and Paul F. Syverson. ACM CCS 2009. http://freehaven.net/anonbib/cache/DBLP:conf/ccs/EdmanS09.pdf [3] "Sampled Traffic Analysis by Internet-Exchange-Level Adversaries" by Steven J. Murdoch and Piotr Zieliński. PETS 2007. http://freehaven.net/anonbib/cache/murdoch-pet2007.pdf [4] https://bitbucket.org/anupam_das/traceroute-from-tor-relays/downloads [5] ttat-control.iti.illinois.edu [6] https://metrics.torproject.org/
Is this Big Brother phishing for better ways to compromise the Tor network?
tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
On Wed, Oct 23, 2013, at 04:09 PM, Karsten Loesing wrote:
The measurements will work best if you have the "scamper" tool from the Cooperative Association for Internet Data Analysis (CAIDA) installed (see the README for installation instructions).
Your README should probably explicitly say that you need to run sudo chmod 04555 /path/to/scamper after installing scamper or it won't work. GD
tor-relays@lists.torproject.org