[or-cvs] r12599: more progress on the geoip proposal (tor/trunk/doc/spec/proposals)

arma at seul.org arma at seul.org
Thu Nov 29 06:14:41 UTC 2007


Author: arma
Date: 2007-11-29 01:14:41 -0500 (Thu, 29 Nov 2007)
New Revision: 12599

Modified:
   tor/trunk/doc/spec/proposals/126-geoip-reporting.txt
Log:
more progress on the geoip proposal


Modified: tor/trunk/doc/spec/proposals/126-geoip-reporting.txt
===================================================================
--- tor/trunk/doc/spec/proposals/126-geoip-reporting.txt	2007-11-28 20:21:28 UTC (rev 12598)
+++ tor/trunk/doc/spec/proposals/126-geoip-reporting.txt	2007-11-29 06:14:41 UTC (rev 12599)
@@ -1,10 +1,10 @@
 Filename: 126-geoip-fetching.txt
-Title: Fetching GeoIP databases for clients, relays, and bridges
+Title: Getting GeoIP data and publishing usage summaries
 Version: $Revision: 11988 $
 Last-Modified: $Date: 2007-10-16 12:59:42 -0400 (Tue, 16 Oct 2007) $
 Author: Roger Dingledine
 Created: 2007-11-24
-Status: Open
+Status: Researching
 
 1. Background and motivation
 
@@ -17,7 +17,7 @@
   is the only reason we haven't deployed "directory guards" (think of
   them like entry guards but for directory information; in practice,
   it would seem that Tor clients should simply use their entry guards
-  as their directory guards).
+  as their directory guards; see also proposal 125).
 
   With the move toward bridges, we will no longer be able to track Tor
   clients that use bridges, since they use their bridges as directory
@@ -25,40 +25,137 @@
   use from certain countries (and are thus likely blocked), so we can
   avoid giving them out to other users in those countries.
 
-  Right now we support GeoIP lookups through Vidalia: Vidalia draws relays
+  Right now we already do GeoIP lookups in Vidalia: Vidalia draws relays
   and circuits on its 'network map', and it performs anonymized GeoIP
   lookups to its central servers to know where to put the dots. Vidalia
   caches answers it gets -- to reduce delay, to reduce overhead on
   the network, and to reduce anonymity issues where users reveal their
-  behavior through which IP addresses they ask about.
+  knowledge about the network through which IP addresses they ask about.
 
   But with the advent of bridges, Tor clients are asking about IP
   addresses that aren't in the main directory. In particular, bridge
-  users tell the central Vidalia servers about each bridge as they
+  users inform the central Vidalia servers about each bridge as they
   discover it and their Vidalia tries to map it.
 
   Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's
   own IP address, so it can provide a more useful map.
 
-  Also, Vidalia's central servers leave users open to partitioning
+  Finally, Vidalia's central servers leave users open to partitioning
   attacks, even if they can't target specific users. Further, as we
   start using GeoIP results for more operational or security-relevant
   goals, such as avoiding or including particular countries in circuits,
   it becomes more important that users can't be singled out in terms of
   their IP-to-country mapping beliefs.
 
-  This proposal describes a way for Tor relays, bridges, and clients to
-  download a local copy of a GeoIP database, so they can do local private
-  queries. Thus we can avoid sending detailed queries to central servers.
+2. The available GeoIP databases
 
-2. Publishing and caching the GeoIP database
+  There are at least two classes of GeoIP database out there: "IP to
+  country", which tells us the country code for the IP address but
+  no more details, and "IP to city", which tells us the country code,
+  the name of the city, and some basic latitude/longitude guesses.
 
-  We assume that we use a free GeoIP db, like ip2country. We will need
-  to standardize on its format; see Section 5.
+  A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
+  bytes. A typical line is:
+    "205500992","208605279","US","USA","UNITED STATES"
+  http://ip-to-country.webhosting.info/node/view/5
 
+  Similarly, the maxmind GeoLite Country database is also about 500KB
+  compressed.
+  http://www.maxmind.com/app/geolitecountry
+
+  The maxmind GeoLite City database gives more finegrained detail like
+  as geo coordinates and city name. Vidalia currently makes use of this
+  information. On the other hand it's 16MB compressed. A typical line is:
+    206.124.149.146,Bellevue,WA,US,47.6051,-122.1134
+  http://www.maxmind.com/app/geolitecity
+
+  There are other databases out there, like
+  http://www.hostip.info/faq.html
+  http://www.webconfs.com/ip-to-city.php
+  that want more attention, but for now let's assume that all the db's
+  are around this size.
+
+3. What we'd like to solve
+
+  Goal #1a: Tor relays collect IP-to-country user stats and publish
+  sanitized versions.
+  Goal #1b: Tor bridges collect IP-to-country user stats and publish
+  sanitized versions.
+
+  Goal #2a: Vidalia learns IP-to-city stats for Tor relays, for better
+  mapping.
+  Goal #2b: Vidalia learns IP-to-country stats for Tor relays, so the user
+  can pick countries for her paths.
+
+  Goal #3: Vidalia doesn't do external lookups on bridge relay addresses.
+
+  Goal #4: Vidalia resolves the Tor client's IP-to-country or IP-to-city
+  for better mapping.
+
+  Goal #5: Reduce partitioning opportunities where Vidalia central
+  servers can give different (distinguishing) responses.
+
+4. Solution overview
+
+  Our goal is to allow Tor relays, bridges, and clients to learn enough
+  GeoIP information so they can do local private queries.
+
+4.1. The IP-to-country db
+
+  Directory authorities should publish a "geoip" file that contains
+  IP-to-country mappings. Directory caches will mirror it, and Tor clients
+  and relays (including bridge relays) will fetch it. Thus we can solve
+  goals 1a and 1b (publish sanitized usage info). Controllers could also
+  use this to solve goal 2b (choosing path by country attributes). It
+  also solves goal 4 (learning the Tor client's country), though for
+  huge countries like the US we'd still need to decide where the "middle"
+  should be when we're mapping that address.
+
+  The IP-to-country details are described further in Sections 5 and
+  6 below.
+
+4.2. The IP-to-city db
+
+  In an ideal world, the IP-to-city db would be small enough that we
+  could distribute it in the above manner too. But for now, it is too
+  large. Here's where the design choice forks.
+
+  Option A: Vidalia should continue doing its anonymized IP-to-city
+  queries. Thus we can achieve goals 2a and 2b. We would solve goal
+  3 by only doing lookups on descriptors that are purpose "general"
+  (or, alternately, by only doing lookups on descriptors that are in
+  the networkstatus consensus). We would leave goal 5 unsolved.
+
+  Option B: Each directory authority should keep an IP-to-city db,
+  lookup the value for each router it lists, and include that line in
+  the router's network-status entry. The network-status consensus would
+  then use the line that appears in the majority of votes. This approach
+  also solves goals 2a and 2b, goal 3 (Vidalia doesn't do any lookups
+  at all now), and goal 5 (reduced partitioning risks).
+
+  Option B has the advantage that Vidalia can simplify its operation,
+  and the advantage that this consensus IP-to-city data is available to
+  other controllers besides just Vidalia. But it has the disadvantage
+  that the networkstatus consensus becomes larger, even though most of
+  the GeoIP information won't change from one consensus to the next. Is
+  there another reasonable location for it that can provide similar
+  consensus security properties?
+
+4.3. Recommendation
+
+  My overall recommendation is that we should implement 4.1 soon
+  (e.g. early in 0.2.1.x), and we can go with 4.2 option A for now,
+  with the hope that later we discover a better way to distribute the
+  IP-to-city info and can switch to 4.2 option B.
+
+  Below we discuss more how to go about achieving 4.1.
+
+5. Publishing and caching the GeoIP (IP-to-country) database
+
   Each v3 directory authority should put a copy of the "geoip" file in
-  its datadirectory. Then its votes should include a hash of this file,
-  and the resulting consensus directory should specify the consensus hash.
+  its datadirectory. Then its network-status votes should include a hash
+  of this file (Recommended-geoip-hash: %s), and the resulting consensus
+  directory should specify the consensus hash.
 
   There should be a new URL for fetching this geoip db (by "current.z"
   for testing purposes, and by hash.z for typical downloads). Authorities
@@ -70,55 +167,42 @@
   same URLs.
 
   We assume that the file would change at most a few times a month. Should
-  Tor ship with a bootstrap geoip file?
+  Tor ship with a bootstrap geoip file? An out-of-date geoip file may
+  open you up to partitioning attacks, but for the most part it won't
+  be that different.
 
-3. Clients use it for Vidalia
-
-  Tor fetches the geoip file as above, and puts it in Tor's DataDirectory.
-  Then we could have a status event that tells controllers that a new
-  geoip file has arrived.
-
-  Then Vidalia would either read the file directly, or we would add
-  a control protocol interface for querying. Since Tor probably needs
-  to parse the file itself (see Section 4 below), offering the control
-  interface is probably cleanest.
-
   There should be a config option to disable updating the geoip file,
   in case users want to use their own file (e.g. they have a proprietary
   GeoIP file they prefer to use). In that case we leave it up to the
   user to update his geoip file out-of-band.
 
-4. Bridges use it for usage summaries
+  [XXX Should consider forward/backward compatibility, e.g. if we want
+  to move to a new geoip file format. -RD]
 
-  Once bridges have a GeoIP database locally, they can start to publish
-  sanitized summaries of client usage -- how many users they see and from
-  what countries. This might also be a more useful way for ordinary Tor
-  relays to convey the level of usage they see.
+6. Controllers use the IP-to-country db for mapping and for path building
 
-  But how to safely summarize this information without opening too many
-  anonymity leaks seems hard, so I'm going to leave it for a different
-  proposal.
+  Vidalia can use the IP-to-country mappings for placing on its map:
+  - The location of the client
+  - The location of the bridges, or other relays not in the
+    networkstatus, on the map.
+  - Any relays that it doesn't yet have an IP-to-city answer for.
 
-5. Which db to use?
+  Controllers can also it to set EntryNodes, ExitNodes, etc in a
+  per-country way. To support this feature, we need to export the
+  IP-to-country data via the Tor controller protocol.
 
-  A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
-  bytes. This isn't so bad. But we can easily cut it down further; some
-  sample lines are:
-    "205500992","208605279","US","USA","UNITED STATES"
-    "208605280","208605311","CA","CAN","CANADA"
-    "208605312","210784255","US","USA","UNITED STATES"
-  My guess is the compression will solve most of the redundancy, so we
-  can stick with the default format.
-  http://ip-to-country.webhosting.info/node/view/5
+  Is it sufficient just to add a new GETINFO command:
+    GETINFO ip-to-country/128.31.0.34
+    250+ip-to-country/128.31.0.34="US","USA","UNITED STATES"
 
-  The maxmind GeoLite Country database is also about 500KB compressed.
-  http://www.maxmind.com/app/geolitecountry
+7. Relays and bridges use the IP-to-country db for usage summaries
 
-  The maxmind GeoLite City database gives more finegrained detail, such
-  as geo coordinates and city name. Vidalia currently makes use of this
-  information. On the other hand it's 16MB compressed, which would seem
-  to be out of our reach.
-  http://www.maxmind.com/app/geolitecity
+  Once bridges have a GeoIP database locally, they can start to publish
+  sanitized summaries of client usage -- how many users they see and from
+  what countries. This might also be a more useful way for ordinary Tor
+  relays to convey the level of usage they see, which would allow us to
+  switch to using directory guards for all users by default.
 
-  What other options are there?
+  But how to safely summarize this information without opening too many
+  anonymity leaks seems hard...
 



More information about the tor-commits mailing list