[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Sun Jun 12 20:25:16 UTC 2016

#18910: distributing descriptors accross CollecTor instances
-------------------------------+-----------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_information
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+-----------------------------------

Comment (by karsten):

 Replying to [comment:3 iwakeh]:
 > I agree, the protocol option is too much implementation effort.  The
 protocol design could be made simple by copying the existing protocol, but
 implementation of this protocol and having a service up and running all
 the time answering requests is a lot work and not really necessary.

 Agreed.

 > Regarding you're suggestion for the download-option from 'recent' I'm
 wondering if this could be designed a little more fine grained, in order
 to save a bit bandwidth, processing time, and memory?
 > Usually there are only a few descriptors missing and it is easy to
 determine which document to download. For votes and consensus the download
 url can be constructed directly and for the referenced descriptors it is
 possible to infer (using a directory listing from the remote collector
 instance, e.g. <other instance>/recent/relay-descriptors/extra-infos/)
 which doc respective url should provide the missing information.
 > Would that be a feasible approach?

 My sense is that we shouldn't worry about bandwidth, processing time, and
 memory yet but instead go for the solution that takes the least
 engineering effort and is hence potentially more robust.

 But I also don't fully understand your suggestion above.  Sure, votes and
 consensuses and in general all files containing just a single descriptor
 could be skipped just from looking at file name, file size, or file last
 modified time.  But how would we handle files containing dozens or even
 hundreds of descriptors?  It seems that those files would be different in
 almost all cases, except when two instances download the exact same
 descriptors in a given hour, which won't happen if one instance reads
 cached-* descriptors or another instance fetches a missing descriptor from
 a third instance.

 Overall, I think I'd rather want us to keep things simple here for now and
 think about optimizing later.  What do you think?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online