[metrics-team] Data Files Country Codes

teor teor at riseup.net
Fri Jun 14 03:39:39 UTC 2019


> On 14 Jun 2019, at 06:14, Karsten Loesing <karsten at torproject.org> wrote:
> Signed PGP part
> On 2019-06-13 19:29, Daniel Herschel wrote:
>> Hello,
> Hi Daniel,
>> I was looking to do some data analysis and data visualization using your
>> publicly available datasets (these
>> ones: https://metrics.torproject.org/stats.html), and I had a question
>> regarding the country column present in a number of the datasets.
>> The columns documentation says that the country codes are based on GeoIP
>> addresses.  Using a list a GeoIP address (found
>> here: https://dev.maxmind.com/geoip/legacy/codes/iso3166/), I was able
>> to convert most of these codes to their corresponding country name for
>> ease in reading on visualizations.
>> However, I did find some countries that did not have a mapping.  Do you
>> know what these countries would be/what the codes correspond to?  The
>> image below shows the codes in question.  (dd, xk, an, cs, du are the
>> specific codes I am looking at.  NaN means the entry was empty and ?? is
>> your code for unknown.)
> It looks like these codes come from single relays reporting statistics
> and using another GeoIP database than the one shipped with the tor software.
> I think it's safe to just consider all of these users as coming from an
> unknown country (??).

Tor uses ?? internally when it doesn't know the country.

Perhaps some of these ?? codes are leaking out into the statistics files:
I don't know whether tor intends to put them in its statistics, or whether
metrics replaces ?? with something else.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20190614/a24e095e/attachment.sig>

More information about the metrics-team mailing list