[tor-bugs] #4439 [Metrics Utilities]: Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue Nov 8 09:35:49 UTC 2011


#4439: Develop a Java/Python API that wraps relay descriptor sources and provides
unified access to them
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:  karsten
     Type:  task               |         Status:  new    
 Priority:  normal             |      Milestone:         
Component:  Metrics Utilities  |        Version:         
 Keywords:                     |         Parent:         
   Points:                     |   Actualpoints:         
-------------------------------+--------------------------------------------
 Quite a few metrics tools are processing archived and current relay
 descriptors to provide aggregate statistics, make descriptor archives
 searchable, or monitor the Tor network.  These tools have a non-trivial
 amount of code in common that imports relay descriptors from various
 sources.  Copying code is bad.  Let's write an API that all these metrics
 tools can use and that facilitates developing new tools.

 Note that this API is different from existing Tor controller APIs which
 connect to a Tor's control port and provide descriptors that the Tor
 process knows about.  The new API won't connect to a Tor control port
 (even though it would be possible, but it's not required), but it may read
 the cached descriptors from a Tor's data directory, along with importing
 relay descriptors from other sources.  Of course, the two APIs can be
 combined, but there's also a reason for the API described here to exist
 separately.  None of the metrics tools requires to control a Tor process.

 There are two major sources for relay descriptors:

  - Local directories: We can read relay descriptors from the cached-*
 files of a local Tor data directory or from the output directory of the
 directory-archive script or metrics-db.  Some of these local directories
 can grow quite large, so that we'll need an efficient way to exclude
 descriptors that we already know.  Also, some files contained in these
 directories may contain multiple relay descriptors while others don't.
 We'll want to support an arbitrary number of local directories in the new
 API.

  - Directory authorities/mirrors: We can download relay descriptors from
 the directory authorities or directory mirrors via Tor's directory
 protocol.  We should restrict downloads to the minimum and only download
 missing descriptors.  We should also download compressed descriptors if
 possible.  In some cases we're interested whether a directory authority
 serves a descriptor (e.g., consensus-health script).  In most cases we
 want to set a timeout for downloading descriptors.

 We should design the new API in a way that it's stateless with respect to
 different executions and that it doesn't have its own configuration.  A
 tool that uses the API should first initialize the API by creating relay
 descriptor data sources and then requesting descriptors to process.

 The following tools may use the new API once it's ready: metrics-db, the
 part of metrics-web that aggregates statistics, the ExoneraTor database,
 the relay search database, the consensus-health script, the descriptor-
 health script, and the basic monitoring infrastructure.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4439>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list