[metrics-team] Data Files Country Codes
teor at riseup.net
Fri Jun 14 03:39:39 UTC 2019
> On 14 Jun 2019, at 06:14, Karsten Loesing <karsten at torproject.org> wrote:
> Signed PGP part
> On 2019-06-13 19:29, Daniel Herschel wrote:
> Hi Daniel,
>> I was looking to do some data analysis and data visualization using your
>> publicly available datasets (these
>> ones: https://metrics.torproject.org/stats.html), and I had a question
>> regarding the country column present in a number of the datasets.
>> The columns documentation says that the country codes are based on GeoIP
>> addresses. Using a list a GeoIP address (found
>> here: https://dev.maxmind.com/geoip/legacy/codes/iso3166/), I was able
>> to convert most of these codes to their corresponding country name for
>> ease in reading on visualizations.
>> However, I did find some countries that did not have a mapping. Do you
>> know what these countries would be/what the codes correspond to? The
>> image below shows the codes in question. (dd, xk, an, cs, du are the
>> specific codes I am looking at. NaN means the entry was empty and ?? is
>> your code for unknown.)
> It looks like these codes come from single relays reporting statistics
> and using another GeoIP database than the one shipped with the tor software.
> I think it's safe to just consider all of these users as coming from an
> unknown country (??).
Tor uses ?? internally when it doesn't know the country.
Perhaps some of these ?? codes are leaking out into the statistics files:
I don't know whether tor intends to put them in its statistics, or whether
metrics replaces ?? with something else.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: Message signed with OpenPGP
More information about the metrics-team