[metrics-team] requesting traffic data per day per country

Karsten Loesing karsten at torproject.org
Fri Aug 4 20:00:10 UTC 2017

Hi Alexei,

I'm moving this thread back to the mailing list.

Quick response inline.

On 2017-08-04 18:04, Alexei Abrahams wrote:
> Thanks very much for the clarification, Karsten. I would be quite happy to whip up a Python script that downloads archival data from your website, parses it for the information I am interested in, and outputs a csv.

You should look at the Development page if you want to start working
with data from Tor Metrics:


But be warned that it's really a lot of data. This can easily become
time consuming.

> I'm only trying to confirm up front that the data I'm looking for are actually archived. I'm grateful also to Philipp Winters, a Tor researcher at Princeton, for responding to this thread and confirming that you guys have indeed archived *per-relay-per-day* the number of IP addresses *per country* that accessed that relay. It seems those data are stored in the relay-data/extra-info files such as
> https://collector.torproject.org/recent/relay-descriptors/extra-infos/2017-08-04-14-05-01-extra-infos

That's indeed where all statistics gathered by relays are stored. You'll
find a few more details about that data on the CollecTor page:


> (It seems the IP counts per country are listed after the string "entry-ips")

No, we're not using that for our user statistic. We're using directory
requests per country and converting that to user numbers. Here are some
questions and answers about that:


> Please correct me if I'm wrong, but It looks to me like the entire archive of these files is stored by month as tarballs at this link: https://collector.torproject.org/archive/relay-descriptors/extra-infos/ 


> Philipp's guess was that while you do record the number of IPs per country served by each relay per day, you do *not* archive the quantity of traffic processed at this level of specificity. So for example, I can see that 8 IP addresses from Australia and 20 IP addresses from Ukraine were served by relay x on such and such a day, but I can't tell how much traffic the Australians ran through this relay versus the Ukrainians. Is that correct that Tor does not collect such data?

Connecting users and amount of traffic are not directly related in our data.

I also think I misinterpreted your earlier question and thought you'd be
interested in traffic by relays located in a given country. Like, how
much traffic was provided by relays in the U.S. on a given day?

But traffic of users per country is different and not something you
could extract from our data.

I'm a bit in a rush right now. Maybe read through the links I gave you
and generally over what else you can find on Tor Metrics. If you have
further questions, please ask them on the mailing list, and somebody
will answer you, possibly early next week.

> Many thanks,
> Alexei

You're welcome!

All the best,

> ________________________________________
> From: Karsten Loesing [karsten.loesing at gmail.com] on behalf of Karsten Loesing [karsten at torproject.org]
> Sent: Friday, August 04, 2017 10:45 AM
> To: Alexei Abrahams
> Cc: Metrics Team discussion list
> Subject: Re: [metrics-team] requesting traffic data per day per country
> On 2017-08-03 16:12, Alexei Abrahams wrote:
>> Dear Tor Metrics Team,
> Hi Alexei,
>> I am a political economy researcher at Princeton University and I just
>> started looking at your Tor metrics data. I am wondering if you are
>> willing to post data on the amount of traffic handled by relays/bridges
>> per country per day. At this
>> link https://metrics.torproject.org/networksize.html I can download a
>> csv file recording the number of active relays/bridges per country per
>> day.
> There are indeed such numbers, but as the specification page says:
> "Statistics on relays by country code are only available until January
> 31, 2013."
> https://metrics.torproject.org/stats.html#servers
> There are plans to resume computing these numbers, but they're not top
> priority at the moment.
>> But I cannot tell how much traffic volume is being handled by those
>> relays/bridges per country per day. The traffic data csv
>> at https://metrics.torproject.org/bandwidth.html reports traffic per day
>> **but not per country**. Without these data I cannot draw network bubble
>> maps like you have drawn
>> (https://metrics.torproject.org/bubbles.html#country).
> Right, but that graph is based on the Onionoo service and only displays
> the very latest network status without any history. I assume you'll want
> history.
> https://metrics.torproject.org/onionoo.html
>> I believe these
>> data would be helpful to me in my research. I am happy to provide
>> further details if desired.
> That would indeed be an interesting statistic. We don't have those
> numbers, though. It's all in the raw Tor descriptors, but we'd have to
> go through the archive and import a fair amount of data into a database
> before exporting those numbers. That could easily take a few days to get
> the code right and a few weeks to import all the data. It should be
> easier, but unfortunately it isn't.
> I'm afraid the best thing I can offer now is that you create a ticket,
> and whenever we next work on related parts of Tor Metrics we'll consider
> implementing this statistic.
>> Many thanks in advance for your assistance. I am a big fan of the Tor
>> project! :)
> Glad to hear! Hope that's still the case after this response. ;)
>> Regards,
>> Alexei Abrahams
> All the best,
> Karsten

