[tor-bugs] #4439 [Metrics Utilities]: Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Mon Dec 5 18:22:15 UTC 2011

#4439: Develop a Java/Python API that wraps relay descriptor sources and provides
unified access to them
 Reporter:  karsten            |          Owner:  karsten
     Type:  task               |         Status:  new    
 Priority:  normal             |      Milestone:         
Component:  Metrics Utilities  |        Version:         
 Keywords:                     |         Parent:         
   Points:                     |   Actualpoints:         

Comment(by karsten):

 Replying to [comment:9 atagar]:
 > The main value of this library is being able to pull consensus
 information via several methods without a tor instance, yes? This
 description seems to focus more on the use cases you're planning rather
 than what the library does. Maybe alternatively phrase this as "that
 directly fetches consensus information from a variety of sources like
 cached descriptors and directory authorities/mirrors."

 Good idea, added your sentence before my first sentence.

 > How does a bridge descriptor differ from relays? I don't think that I've
 ever dealt with them.

 See https://metrics.torproject.org/formats.html#bridgedesc .

 > Ahhh interesting, I hadn't thought of eagerly loaded consensus
 snapshots. As the last sentence mentions this works well for batch jobs,
 being simpler for callers and more faithful to how consensuses are
 published. However, it also comes with the drawbacks of a lengthy
 initialization and high memory usage. Can we follow an iteration or
 callback pattern instead so we can process the descriptors as they come in
 (and free the memory)?

 I agree that potentially lengthy initialization and high memory usage may
 be problematic.  In fact, I started with a callback pattern, but discarded
 that because it's more difficult to use for some applications.  On second
 thought that doesn't apply to all applications.  For example, the
 consensus-health checker needs to have all consensuses and votes available
 before it can do anything; it would essentially have to implement a
 descriptor store itself.  But the metrics-web database importer could
 easily implement a listener and start importing once the first descriptor

 How about we implement both the descriptor store and a callback pattern?

 How would the iteration pattern look like?  Do you have an example?

 > I had been planning on an api where we lazy load and cache descriptors
 requested by our caller. That would have been better for some use cases,
 but certainly not when we want to process the majority of the consensus.
 It might also not be realistic with how we can fetch consensuses
 information when detached from the control socket (I assume when dealing
 with authorities and cached consensuses fetching is more of an all-or-
 nothing operation).

 I'm not entirely sure what you mean here.  Fetching a consensus is an all-
 or-nothing operation when downloading via the directory protocol.  Also,
 requests for multiple server descriptors or extra-info descriptors should
 be combined in a single request to reduce the download overhead.

Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4439#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

More information about the tor-bugs mailing list