[tor-bugs] #19934 [Metrics/CollecTor]: CollecTor should use new metrics-lib json classes

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Sep 8 07:55:22 UTC 2016


#19934: CollecTor should use new metrics-lib json classes
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  Medium             |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Replying to [comment:4 iwakeh]:
 > Some thoughts:
 >
 > 1. The implementation of #18910 requires CollecTor to have a Java
 representation of index.json to choose the documents to download from the
 partner-synch-Collector instance(s), especially with the pick-and-choose
 requirements from comment:11 in #18910.

 So, this is about using metrics-lib `*Node` classes for obtaining
 descriptors, not for providing `index.json*` files, right?

 I don't recall the exact requirements we discussed for #18910, and I think
 we discussed quite a few variations there.  But what we can already do is
 specify an array of directories to synchronize.  The local CollecTor
 instance would then decide locally from looking at synchronized files
 which ones to keep and copy over and which ones to ignore.

 What we ''could'' do is pass a list of excluded paths to
 `DescriptorCollector`, which would contain paths of consensuses and votes,
 possibly even with last modified times, that we already have and that we
 don't want to synchronize.  However, this feels a bit like premature
 optimization.  There's no big harm in downloading the entire `recent/`
 folder from a remote CollecTor instance and decide locally what to do with
 the data.  We're moving around larger chunks of bytes than that.

 > 2. Shouldn't a CollecTor instance have more fine grained control when
 creating index.json? More than just specifying a directory. Currently it
 includes already `recent` and `archive`.

 Well, we could accept an array of directories instead of just one, if that
 helps.  Or a base directory and an array of contained subdirectories to
 include.  Whatever is most intuitive and does the job.

 > 3. The package was designated as '''alpha''' to prevent too early
 reliance on the new API to have more flexibility when implementing #18910.
 It is well tested in metrics-lib.

 Oh, I'm not worried that it might not be tested well enough.  I'm worried
 about making the API bigger and having to maintain these parts in the
 future.  This whole `index.json*` stuff is something that library users
 ideally shouldn't have to worry about.  That's why I'm still trying hard
 to hide it away as best as I can.  If this turns out to be impossible or
 impracticable, so be it.  But I'm not there yet. :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19934#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list