[tor-bugs] #4439 [Metrics Utilities]: Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them

Tue Dec 6 08:05:33 UTC 2011

#4439: Develop a Java/Python API that wraps relay descriptor sources and provides
unified access to them
-------------------------------+--------------------------------------------
 Reporter:  karsten            |          Owner:  karsten
     Type:  task               |         Status:  new    
 Priority:  normal             |      Milestone:         
Component:  Metrics Utilities  |        Version:         
 Keywords:                     |         Parent:         
   Points:                     |   Actualpoints:         
-------------------------------+--------------------------------------------

Comment(by karsten):

 Replying to [comment:11 atagar]:
 > I favor the iterator for that reason - callers that want everything
 buffered can read everything into a list (simple to do with both python
 and java).

 Right.  Makes sense.

 > The callback is bad because you're having the handler block reads,

 Oh, right, haven't thought of that.

 > and stores are bad for the reasons mentioned earlier. If we went with an
 iterator then it would be the best of both worlds: unblocked reads,
 limited memory usage if the handler is faster than reads, and can be
 converted into a store too. The only advantage to a callback is that it
 would guarantee constant memory usage (if your handlers slow then you
 could consume as much memory as your buffer size which would probably be
 unbouned). On second thought that would be likely to come up when reading
 local cached descriptors... lets do both.

 We could even suspend adding new descriptors to the queue if the handler
 is slow.  That would work both for downloads and for reading from disk.

 And we could implement descriptor parsing on demand, that is, when a
 handler runs the first getter of a descriptor they received from the
 queue.  That would save quite some memory, too.

 But!  These are ideas to optimize something that's not even there.

 I'd like to start with a single pattern.  We can always make it more
 complex later on.

 > Iterator would just be a simple producer/consumer. The producer thread
 adds descriptors to a buffer as they're read and the consumer pops
 elements off and provides them to the caller (blocking if there's no
 input). Iirc this would be handled in both python and java by a
 synchronized queue (I forget the class...
 java.util.concurrent.BlockingQueue?).

 Cool.  I think I like that pattern most.  (Let me update the API and
 example applications, and hopefully I'll still like it afterwards.)

 > Requesting descriptors via the control socket can be for individual
 relays. I was thinking there may be some counterpart for 'give me
 descriptor for fingerprint X' via directory mirrors and authorities but on
 second thought tor wouldn't use that so it would be odd if that capability
 existed. Oh well...

 Well, you can ask for the descriptor for fingerprint X.  But the better
 approach is to ask by descriptor ID, not by fingerprint.  And it's better
 to ask for more than one descriptor at a time, because it causes less
 overhead for the directory.  When you're bored, look at dir-spec.txt and
 search for "http" to see what fancy things the directory protocol allows
 you to do.

 Anyway, let's focus on the iterator idea first.

 I'll ask Sebastian to create two personal "DescripTor" repositories for
 us.  That way you can make changes to the code or documentation and tell
 me to pull them, rather than having to describe your suggested changes
 here.  And once we agree on a project name, we can create an official
 repository.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4439#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online