[tor-bugs] #5247 [Onionoo]: Include reverse DNS lookup results in details

Mon Feb 27 15:12:26 UTC 2012

#5247: Include reverse DNS lookup results in details
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  karsten
     Type:  enhancement  |         Status:  new    
 Priority:  normal       |      Milestone:         
Component:  Onionoo      |        Version:         
 Keywords:               |         Parent:         
   Points:               |   Actualpoints:         
-------------------------+--------------------------------------------------
 We should run reverse DNS lookups and include their results in details
 documents. What's the best way to run these lookups in Java? Also, do we
 have to run them every hour for every relay?

 I wrote a simple Java application that looks up host names using the
 following code line:

 {{{
 InetAddress.getByName(address).getHostName()
 }}}

 The application also measures how long each lookup took.  I ran it for the
 first 1000 relays in the consensus published on 2012-02-18 at 03:00:00.
 Here are some simple statistics:

 {{{
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 114.0   688.8  1032.0  1906.0  1628.0 81120.0
 }}}

 So, looking up all 2759 relays in the consensus would have taken about 1.5
 hours.  There's no way for sequentially looking up reverse DNS entries for
 all relays in a consensus every hour.  We'll need to make some
 optimizations before even starting.  Questions are:

  - Is there a faster way to look up reverse DNS entries than the one used
 in this simple Java application?

  - Can we group multiple lookups and make a single request for them?

  - How often do we need to refresh a reverse DNS lookup result?  In theory
 we could cache results for an arbitrary time, but would they still be
 accurate after 3, 6, 12, 24 hours?

  - How many requests can we make in parallel using Java threads?  The Java
 side is easy and probably doesn't eat too much CPU time, but would we
 trigger some mechanism at our ISP when we make 100 requests at a time?

 Here are some comments after talking to George and Damian:

  - An average lookup time of 1.9 seconds per request isn't that unlikely.

  - Using a thread pool with 5 lookup threads should be a fine start.

  - Caching results for 12 hours should work fine.  It's much more likely
 that a relay IP address changes than that the host name changes.  We could
 also keep some simple statistics how often host names actually change when
 looking them up; if the fraction is higher than we'd like it to be, we can
 still reduce the caching period to 6 hours or less.  We should document in
 protocol.html how often host names are looked up.

  - Performing multiple lookups per request would be cool, but is probably
 not supported by Java libraries.

  - I re-ran the analysis above, but this time with the `host` tool instead
 of Java.  Results are much lower, so there must be something going on in
 Java which slows down the lookup.  More research needed.

 {{{
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.0320  0.1800  0.3780  0.4252  0.5420 12.0300
 }}}

 (This was issue 7 in my GitHub repository.)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5247>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online