So we have received some questions about running our traceroute measurements. Let me answer some of the questions:
------------------------------------------------------ From: Aaron Hopkins [lists@die.net]
Are you trying to work backward to physical-layer chokepoints, like how the inter-contentinal network topology maps to the landing sites on http://www.submarinecablemap.com/?
We are not looking for "chokepoints" exactly. We are interested in knowing as much as possible about where the traffic goes and who can see it.
I question the utility of scanning all /24s. By definition, all /24s in a BGP prefix take the same path to its origin AS; the only variation will be within that. If you are looking for chokepoints, you've already found it with the origin AS.
Our BGP prefixes are derived from RouteViews BGP tables. There is no guarantee that these are the prefixes used by all Internet routers, much less that routing is consistent over all IPs in the prefix. In addition, these prefixes change over time. Doing all /24 prefixes would enable us to study finer-grained routing behavior.
This also does all scans sequentially, which will have a couple of negative side-effects. You are much more likely to trigger ICMP response rate-limiting on intermediate routers and more likely to trigger IDS alarms than if you'd randomized your target selection. Running your target list through one of the following would mitigate this:
awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -k1 -n | cut -f2-
perl -MList::Util -e 'print List::Util::shuffle <>'
sort -R
We have randomized the prefix list. And will soon randomize the /24 list too.
Which by default will try to keep 128 traceroute processes running all the time. This is potentially problematic for relays with limited RAM or CPU available. I'd recommend making this more clear.
We tested running 128 parallel traceroute. We found the CPU (<5%) and RAM (<0.2% for 8GB RAM) requirements really small . We can also choose how many traceroutes you want to run using the following command-
PARALLEL=64 ./traceroutes.sh & if you want to run 64 traceroutes in parallel
If you are using scamper (which we encourage you to do) you can tell the script how many packets-per-second you want using the following command-
PPS=800 ./traceroutes.sh & where 1<=PPS<=1000
I may run this from a machine on the same network as my Tor node, but definitely not on the Tor node itself.
Running from a machine on the same network is absolutely just as good as long as they definitely send the same traffic through the same Internet gateways.
------------------------------------------------------------------- From: Geoff Down geoffdown at fastmail.net
Your README should probably explicitly say that you need to run sudo chmod 04555 /path/to/scamper after installing scamper or it won't work.
Yes we have mentioned that on the README now.
-------------------------------------------------------------------- From: Jesse Victors jvictors at jessevictors.com
How much bandwidth will this be taking up, and roughly how much will be uploaded/downloaded? I've cloned the repo, but I'm nervous about running this if it's going to be a significant bandwidth hog for a whole week. As Aaron said in Issue 39, it looks like it's going to be a lot of IPs and a large amount of packets. Also, ISPs may not take kindly to all these scans. What's the word on that? Has anyone run this tool, and what's their opinion? I'd be happy to help, but I'd like to know the full details of the various resources this tool will be consuming.
We have provided some resource requirements at the end of the README file. I'll just summarize all of them here-
1. Upload: The script generate less than 500MB of total data. By default the files are erased once they are uploaded, but you can choose not to erase any data.
2. Bandwidth: If scamper used then with all the default setting it would consume at most 0.5 megabit of bandwidth. However, you can choose to reduce the consumption by changing
the PPS (packet-per-second) parameter (1<=PPS<=1000) using the following command
PPS=800 ./traceroutes.sh &
With reasonable bandwidth line (>100Mbps) this form of traceroutes shouldn't be much of a load to any ISP.
3. RAM and CPU usage: We have tested out the script on multiple machines. We found the following resource usage
RAM <0.2% (for 8GB RAM) and CPU < 5%.
Hope this gives you some idea about the resource requirements.
Thanks
Anupam Das