Hi!
Attached is a pretty simple Python script to compute some statistics about relays. Here's the doc:
Usage: tor-relays-stats.py <output> [args ...]
Where <output> is one of: - countries [FLAGS] relative percentage of the consensus in each countries - as-sets [FLAGS] [COUNTRIES] relative percentage of the consensus in each AS sets - top [COUNT] [FLAGS] [COUNTRIES] top relays according their place in the whole consensus
Examples:
- To get the top five exit nodes in France: tor-relays-stats.py top 5 Exit fr - To get weights of each AS of all relays in Germany: tor-relays-stats.py as-sets Running de
Maybe it has flaws. Maybe it should land in some Git repository. I just felt it might be of interest to other folks. Feel free to comment and hack.
On 12.07.2012 11:24, delber wrote:
Hi!
Attached is a pretty simple Python script to compute some statistics about relays. Here's the doc:
Thanks for this. Mind posting some output "examples" in a regular fashion to some website?
On Thu, Jul 12, 2012 at 10:28:03PM +0200, Moritz Bartl wrote:
Thanks for this. Mind posting some output "examples" in a regular fashion to some website?
It would probably make more sense to get such stats from Onionoo and displayed in Atlas. The script is actually partly based from the PoC found at https://trac.torproject.org/projects/tor/ticket/6329.
On 12.07.2012 11:24, delber wrote:
Attached is a pretty simple Python script to compute some statistics about relays. [...]
I figured the top AS would be interesting to some on the list who don't want to run the script themselves.
$ ./tor-relays-stats.py as-sets Exit | tac | head -n 25 21.0682% rrbone UG (haftungsbeschraenkt) 8.9403% NFOrce Entertainment BV 7.4181% Hosting Services Inc 6.2603% Applied Operations, LLC 5.2239% ViaEuropa Sweden 5.1970% Voxility SRL 4.8918% OVH Systems 4.8338% Host Europe GmbH 4.1482% OC3 Networks & Web Solutions, LLC 3.7870% Teknikbyran i Sverige AB 3.3924% Default Route, Inc. 2.1942% Ecatel Network 1.9386% Hetzner Online AG RZ 1.6148% University of Waterloo 1.5782% GutCon GmbH 1.3157% Bahnhof Internet AB 1.0233% Boston University 1.0210% XMission, L.C. 1.0168% 23Media GmbH 0.8925% Foreningen for digitala fri- och rattigheter 0.7443% Ownit Broadband AB 0.4449% Nessus Internet Dienstleistungs GmbH 0.4149% STRATO STRATO AG 0.4109% Colo4, LLC 0.4028% Saitis Network, N.Desir
$ ./tor-relays-stats.py as-sets | tac | head -n 25 10.1986% rrbone UG (haftungsbeschraenkt) 6.9121% OVH Systems 5.0553% Hetzner Online AG RZ 4.5189% NFOrce Entertainment BV 3.6003% Hosting Services Inc 3.4540% Team Cymru Inc. 3.0407% Host Europe GmbH 3.0307% Applied Operations, LLC 2.8976% Voxility SRL 2.5592% NORDUnet 2.5290% ViaEuropa Sweden 2.2029% Bahnhof Internet AB 2.1017% OC3 Networks & Web Solutions, LLC 1.9433% SILVER SERVER GmbH 1.9390% Teknikbyran i Sverige AB 1.8996% intergenia AG 1.6422% Default Route, Inc. 1.5037% WestHost, Inc. 1.3081% Ecatel Network 1.0363% Cogent/PSI 0.9654% LeaseWeb B.V. 0.9330% FDCservers.net 0.8687% SURFnet, The Netherlands 0.8661% XS4ALL Internet BV 0.8552% MDNX MDNX
On 7/12/12 11:24 AM, delber wrote:
Hi!
Attached is a pretty simple Python script to compute some statistics about relays. Here's the doc:
[...]
Maybe it has flaws. Maybe it should land in some Git repository. I just felt it might be of interest to other folks. Feel free to comment and hack.
Nice work!
Yes, we should put this script in a Git repository. How about we put it in metrics-tasks.git for now? The script is related to #6329, so we could create a new directory task-6329/ and put the script and a README in there. I can do that, or you can clone metrics-tasks.git, make a commit, and tell me from where to pull. Once we're happy with the kind of output, we can add the code to Atlas and put it in its Git repo.
I'd like to add a link to the script to Onionoo's project page as an example for useful applications using its data. Do you mind if I do that?
Here are two suggestions for tweaking the script a bit:
- Would it make sense to add a COUNT parameter to all outputs with a default of, say, 10? Also, should results be sorted in descending order? I guess most people are interested in countries/ASes/relays that are picked by clients most often. Maybe the "top" option should then be renamed to "relays" when all options have a COUNT parameter.
- Instead of downloading the full /details file, you could use this link: https://onionoo.torproject.org/details?type=relay&running=true. You're only interested in running relays anyway, and this cuts down the download from 5.8M to 2.8M. Maybe add the curl command to the usage output, too, for the lazy people who don't open source files.
Again, looks really good. Thanks for hacking on this! :)
Best, Karsten
On Fri, Jul 13, 2012 at 12:10:42PM +0200, Karsten Loesing wrote:
Yes, we should put this script in a Git repository. How about we put it in metrics-tasks.git for now? The script is related to #6329, so we could create a new directory task-6329/ and put the script and a README in there. I can do that, or you can clone metrics-tasks.git, make a commit, and tell me from where to pull.
Please pull from git://repo.or.cz/tor-metrics-tasks/delber.git.
I'd like to add a link to the script to Onionoo's project page as an example for useful applications using its data. Do you mind if I do that?
Not at all! :)
Here are two suggestions for tweaking the script a bit:
- Would it make sense to add a COUNT parameter to all outputs with a
default of, say, 10? Also, should results be sorted in descending order? I guess most people are interested in countries/ASes/relays that are picked by clients most often. Maybe the "top" option should then be renamed to "relays" when all options have a COUNT parameter.
- Instead of downloading the full /details file, you could use this
link: https://onionoo.torproject.org/details?type=relay&running=true. You're only interested in running relays anyway, and this cuts down the download from 5.8M to 2.8M. Maybe add the curl command to the usage output, too, for the lazy people who don't open source files.
Sound suggestions. All implemented.
Hi delber,
On 7/14/12 3:22 PM, delber wrote:
On Fri, Jul 13, 2012 at 12:10:42PM +0200, Karsten Loesing wrote:
Yes, we should put this script in a Git repository. How about we put it in metrics-tasks.git for now? The script is related to #6329, so we could create a new directory task-6329/ and put the script and a README in there. I can do that, or you can clone metrics-tasks.git, make a commit, and tell me from where to pull.
Please pull from git://repo.or.cz/tor-metrics-tasks/delber.git.
Merged. Thanks!
I'd like to add a link to the script to Onionoo's project page as an example for useful applications using its data. Do you mind if I do that?
Not at all! :)
Great, will do!
Here are two suggestions for tweaking the script a bit:
- Would it make sense to add a COUNT parameter to all outputs with a
default of, say, 10? Also, should results be sorted in descending order? I guess most people are interested in countries/ASes/relays that are picked by clients most often. Maybe the "top" option should then be renamed to "relays" when all options have a COUNT parameter.
- Instead of downloading the full /details file, you could use this
link: https://onionoo.torproject.org/details?type=relay&running=true. You're only interested in running relays anyway, and this cuts down the download from 5.8M to 2.8M. Maybe add the curl command to the usage output, too, for the lazy people who don't open source files.
Sound suggestions. All implemented.
Thanks for implementing those changes!
I played around with the script and made a few more changes here:
https://gitweb.torproject.org/karsten/metrics-tasks.git/commitdiff/dd6329a
I didn't merge these changes into the official metrics-tasks.git yet, because I'd like to hear from you first if you like them. Also, lacking proper Python skills, some of the changes seem like a hack to me. If you like the changes, maybe you can turn the new code into real Python code, and I'll merge the result?
Thanks! Karsten
On Sun, Jul 15, 2012 at 10:26:14AM +0200, Karsten Loesing wrote:
I played around with the script and made a few more changes here:
https://gitweb.torproject.org/karsten/metrics-tasks.git/commitdiff/dd6329a
I didn't merge these changes into the official metrics-tasks.git yet, because I'd like to hear from you first if you like them. Also, lacking proper Python skills, some of the changes seem like a hack to me. If you like the changes, maybe you can turn the new code into real Python code, and I'll merge the result?
If it works, just merge them! From my quick look they felt fine and meaningful. If that script gets extended some more, a little bit of refactoring is probably in order, but at the moment, it feels good enough. :)
May I ask if Onionoo will get a similar treatment to #6266 that what happened with #6334? The country-based statistics are probably off due to 'a1' being present in the country list at the moment.
On 7/15/12 7:37 PM, delber wrote:
On Sun, Jul 15, 2012 at 10:26:14AM +0200, Karsten Loesing wrote:
I played around with the script and made a few more changes here:
https://gitweb.torproject.org/karsten/metrics-tasks.git/commitdiff/dd6329a
I didn't merge these changes into the official metrics-tasks.git yet, because I'd like to hear from you first if you like them. Also, lacking proper Python skills, some of the changes seem like a hack to me. If you like the changes, maybe you can turn the new code into real Python code, and I'll merge the result?
If it works, just merge them! From my quick look they felt fine and meaningful. If that script gets extended some more, a little bit of refactoring is probably in order, but at the moment, it feels good enough. :)
Great, merged. :)
May I ask if Onionoo will get a similar treatment to #6266 that what happened with #6334? The country-based statistics are probably off due to 'a1' being present in the country list at the moment.
Done by reverting to the February database (Maxmind doesn't have archives of their binary-formatted database, and February was the last copy that I had locally). Hardly any relays in "a1" land anymore.
Best, Karsten
May I ask if Onionoo will get a similar treatment to #6266 that what happened with #6334? The country-based statistics are probably off due to 'a1' being present in the country list at the moment.
Done by reverting to the February database (Maxmind doesn't have archives of their binary-formatted database, and February was the last copy that I had locally). Hardly any relays in "a1" land anymore.
Great. That did change the outputs quite a bit.
While preparing another batch of stats on french relays [1], I made a few other fixes to the script that you can pull from the same place as before.
[1] https://lists.riseup.net/www/arc/tor-relays-fr/2012-07/msg00021.html
Hi delber,
On 7/17/12 11:10 AM, Karsten Loesing wrote:
On 7/16/12 5:54 PM, delber wrote:
While preparing another batch of stats on french relays [1], I made a few other fixes to the script that you can pull from the same place as before.
Thanks, merged! :)
I made a couple more changes to the script. We could clean up the code a bit and improve the documentation, but I consider the script pretty much feature-complete now. Yay! :)
Next step is to find a good product name. "tor-relays-stats" doesn't say very much and is hard to remember. How about Compass? (Think: supplemental tool to Atlas to navigate the Tor network.)
Here's the script, in case others want to try it out:
https://gitweb.torproject.org/metrics-tasks.git/blob_plain/HEAD:/task-6329/t...
And here's the usage output:
Usage: tor-relays-stats.py [options]
Options: -h, --help show this help message and exit -d, --download download details.json from Onionoo service
Filtering options: -a AS, --as=AS select only relays from autonomous system number AS -c CC, --country=CC select only relays from country with code CC -e, --exits-only select only relays suitable for exit position -g, --guards-only select only relays suitable for guard position
Grouping options: -A, --by-as group relays by AS -C, --by-country group relays by country
Display options: -t NUM, --top=NUM display only the top results (default: 10)
Of course, feedback is much appreciated!
Thanks, Karsten
On Fri, Jul 20, 2012 at 04:31:48PM +0200, Karsten Loesing wrote:
I made a couple more changes to the script. We could clean up the code a bit and improve the documentation, but I consider the script pretty much feature-complete now. Yay! :)
I have used it a bit. This is simply brilliant. :)
Next step is to find a good product name. "tor-relays-stats" doesn't say very much and is hard to remember. How about Compass? (Think: supplemental tool to Atlas to navigate the Tor network.)
Compass is fine by me. Although, you have to admit that it does not say more than `tor-relay-stats`.
On 26.07.2012 16:39, delber wrote:
Compass is fine by me. Although, you have to admit that it does not say more than `tor-relay-stats`.
I opt for tor-relay-stats. I don't believe a helper script needs a made up name that doesn't say anything :)
On 7/26/12 4:50 PM, Moritz Bartl wrote:
On 26.07.2012 16:39, delber wrote:
Compass is fine by me. Although, you have to admit that it does not say more than `tor-relay-stats`.
I agree that Compass does not say more than tor-relay-stats. But it's a name, and once you know about it, it's much easier to refer to the thing by using the name. tor-relay-stats is not as useful as a name, because it's long and trying to be descriptive. But tor-relay-stats can be pretty much anything. metrics.tpo is full of Tor relay stats. ;)
I opt for tor-relay-stats. I don't believe a helper script needs a made up name that doesn't say anything :)
Maybe, yes. Not feeling strongly, sticking with tor-relay-stats. :)
Best, Karsten
tor-relays@lists.torproject.org