GeoIP database comparison

Karsten Loesing karsten.loesing at gmx.net
Thu Apr 29 08:58:27 UTC 2010


Hi everybody,

as you may or may not know, Tor uses a GeoIP database to resolve client
IP addresses to country codes for statistics. That's how we get the data
to plot usage-by-country graphs such as these:

  http://metrics.torproject.org/recurring-users-graphs.html

However, our GeoIP database was last updated on June 6, 2009, because a
subsequent update from our GeoIP database provider (ip-to-country) was
broken and declared most of the US IP addresses as unassigned. After
waiting for a few weeks for the database provider to fix this, we gave
up and kept on using the June 6, 2009 database.

Another issue with our current GeoIP database is that it didn't resolve
a single observed IP address to Tunisia, for example. But we know we
have users in Tunisia, so this cannot be true.

Time to look for alternatives. The alternatives I investigated are:

- Update to the most recent ip-to-country database from April 26, 2010,
available at
http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip

- Switch to the Host IP database from April 26, 2010, available at
http://www.hostip.info/faq.html

- Switch to Maxmind's free GeoLite Country database from April 1, 2010,
available at http://www.maxmind.com/app/geolitecountry

In addition to these databases, I looked at two more sources, mostly for
comparison, not for shipping them with Tor:

- Maxmind's commercial GeoIP Country database, last updated on April 20,
2010

- Jake's blockfinder tool http://github.com/ioerror/blockfinder

In particular, I looked at IP address ranges assigned to Tunisia,
because our current database fails entirely here and because Tunisia has
a nice small number of IP ranges in all the databases that makes it easy
to handle for this comparison. I attached the analysis to this mail
(32K), so that it's in the mail archives.

The PDF has IP address ranges in its rows and country codes resolved by
the various databases in its columns. Larger IP address ranges are bold
and bigger. Here are some observations:

- Our current database (ip-to-country 6/3/09) resolves almost zero IP
addresses to Tunisia, just one /29, one /30, and one /28. No wonder
we're seeing no users from Tunisia. Let's give up on this database.

- The most recent ip-to-country database from 4/26/10 has the very same
IP address ranges for Tunisia. No need to upgrade, IMO.

- Host IP disagrees most with the other databases. Host IP has lots of
/24 ranges that it thinks are Tunisia, but which no other database
agrees with. Host IP fails to identify the two largest ranges
41.224.0.0/13 and 196.203.0.0/16 as Tunisia, as opposed to the two
Maxminds and blockfinder. I should also say that /24 is the smallest
unit that Host IP knows, which is quite imprecise. I think Host IP is
out, too.

- blockfinder knows about the largest ranges only, which are all at
least /24 in size. It agrees with both Maxmind databases in all these
ranges.

- Both Maxmind databases have quite a few smaller IP address ranges that
none or few of the other databases know. One of them is a /26, the
others are /28 or smaller.

- Commercial and free Maxmind have almost the same ranges for Tunisia.
One example for a difference is 196.203.0.0/16 which is split into 5
ranges in the commercial database covering 65470 addresses (compared to
65536 in the free database) which is an overlap of 99.899%.



More information about the tor-dev mailing list