[metrics-team] Data Files Country Codes
karsten at torproject.org
Fri Jun 14 06:54:51 UTC 2019
On 2019-06-14 05:39, teor wrote:
>> On 14 Jun 2019, at 06:14, Karsten Loesing <karsten at torproject.org> wrote:
>> Signed PGP part
>> On 2019-06-13 19:29, Daniel Herschel wrote:
>> Hi Daniel,
>>> I was looking to do some data analysis and data visualization using your
>>> publicly available datasets (these
>>> ones: https://metrics.torproject.org/stats.html), and I had a question
>>> regarding the country column present in a number of the datasets.
>>> The columns documentation says that the country codes are based on GeoIP
>>> addresses. Using a list a GeoIP address (found
>>> here: https://dev.maxmind.com/geoip/legacy/codes/iso3166/), I was able
>>> to convert most of these codes to their corresponding country name for
>>> ease in reading on visualizations.
>>> However, I did find some countries that did not have a mapping. Do you
>>> know what these countries would be/what the codes correspond to? The
>>> image below shows the codes in question. (dd, xk, an, cs, du are the
>>> specific codes I am looking at. NaN means the entry was empty and ?? is
>>> your code for unknown.)
>> It looks like these codes come from single relays reporting statistics
>> and using another GeoIP database than the one shipped with the tor software.
>> I think it's safe to just consider all of these users as coming from an
>> unknown country (??).
> Tor uses ?? internally when it doesn't know the country.
> Perhaps some of these ?? codes are leaking out into the statistics files:
> I don't know whether tor intends to put them in its statistics, or whether
> metrics replaces ?? with something else.
These ?? codes do make it into the statistics reported by tor relays,
and metrics treats ?? as any other country code.
All the best,
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 528 bytes
Desc: OpenPGP digital signature
More information about the metrics-team