[tor-bugs] #19791 [Metrics/metrics-lib]: Use CollecTor's index.json for download; adapt current download to use new date format

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Aug 24 16:12:02 UTC 2016


#19791: Use CollecTor's index.json for download; adapt current download to use new
date format
---------------------------------+-----------------------------------
 Reporter:  karsten              |          Owner:  iwakeh
     Type:  defect               |         Status:  merge_ready
 Priority:  High                 |      Milestone:  metrics-lib 1.4.0
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+-----------------------------------
Changes (by karsten):

 * status:  needs_review => merge_ready


Comment:

 Replying to [comment:11 iwakeh]:
 > Replying to [comment:10 karsten]:
 > [...]
 >
 > > Regarding the new subpackage `org.torproject.descriptor.index`, I'm
 yet unclear whether that's supposed to be "visible" to API users or not,
 that is, if they're supposed to import and/or use those classes directly.
 If it is, that's going to grow the overall interface quite a bit,
 requiring us to keep the public parts stable in subsequent releases.  For
 example, I'm unclear whether CollecTor is supposed to instantiate its own
 `*Node` classes when writing its `index.json*` files.  Ideally, metrics-
 lib would hide away those classes by providing some kind of
 `DescriptorIndexGenerator` interface that takes a local directory as
 parameter and writes one or more `index.json[*]` files, but I'm not sure
 what your plan is there.  If the plan is to hide that package from users,
 how about we rename it to `org.torproject.descriptor.impl.index` to make
 that clearer?  We could even create similar implementation subpackages in
 the future to clean up the `impl` package a bit, but that's not something
 that users would have to care about.
 >
 > I'd like to add the package as '''alpha'''. Warning that it can change
 still in unexpected ways.  Thus, it can be used in CollecTor and from the
 findings there we can decide to remove it from the public interface, or
 change it w/o being backward compatible.
 > I provide a commit with the package-info and more javadoc below.
 >
 > The *Node classes are of use on their own for clients that want to know
 about a CollecTor's instance files.  They are there anyway and maintained,
 so others can just use them as well.

 Okay, I'm fine postponing this decision, and adding that alpha notice
 should be clear enough.

 However, I'm yet unconvinced by the reasons you're giving.  Ideally,
 clients shouldn't have to deal with the details of an index.json file.  We
 should give them interfaces that are powerful enough to do all the things
 they'd want to do.  Also, my past experience is that maintaining code can
 be very different depending on whether it's exposed in an API or not.  I'd
 rather want to provide fewer details in interfaces than what's already
 there.  But, no need to find a conclusion for this today, we can still
 think about this for 1.5.0.

 > > Can we avoid changing the parameter usage of
 `DescriptorIndexCollector#collectDescriptors()` by overloading that method
 in the interface?  How about we add a new method `collectDescriptor(String
 collecTorIndexUrl, String collecTorIndexPath, String[] remoteDirectories,
 ...)` and use an implementation-specific default for `collecTorIndexPath`
 in the current method, such as `"/index.html"` for
 `DescriptorCollectorImpl` and `"/index.json.xz"` for
 `DescriptorIndexCollector`?  If we retain backward compatibility here, I'd
 say we could switch to `DescriptorIndexCollector` in 1.5.0 when this class
 has seen some more testing.  Otherwise I'd suggest waiting for 2.0.0.
 >
 > I wouldn't want to change the interface for that.  Both are URL strings
 and whoever uses the non-default implementation must know what their
 doing, i.e. at least read javadoc of the class they explicitly request to
 be used.  At the point DescriptorIndexCollector becomes default we might
 not want to keep maintaining the old implementation and simply remove it.
 Then the additional method wouldn't make sense any more.

 My point is that we could do the switch to the new implementation in a
 version 1.5.0 if it remains backward-compatible, whereas we'd have to call
 that version 2.0.0 otherwise.  So, we can merge this code as is for 1.4.0,
 but we should aim for a backward-compatible version if we want to make it
 the new default in 1.5.0.

 Here's an even better suggestion: we accept both, base URLs (like
 `"https://collector.torproject.org"`) and full URLs to index files (like
 `"https://collector.torproject.org/index.html"` or
 `"https://collector.torproject.org/index.json.xz"`), in the current
 `collectDescriptors()` method, and if no file is specified, we'll use an
 implementation-specific default (`"/index.html"` in
 `DescriptorCollectorImpl` and `"/index.json.xz"` in
 `DescriptorIndexCollector`).  That doesn't break current applications that
 just provide a base URL, and it enables future applications to select
 their favorite compression type.  Likewise, no need to make a decision
 today, but let's consider this for 1.5.0.

 > [...]
 >
 > > Is `FileNode#lastModifiedMillis()` called often?  If so, it may be
 better to use `ParseHelper#getDateFormat()` to obtain a thread-safe
 `DateFormat` instance for the given format and store it in a map.  That
 avoids creating single-use instances of that class over and over.  We'd
 have to make that method public for this, but that shouldn't be an issue.
 However, if it's not called very often, we can skip this.
 > >
 >
 > I'm aware of `ParseHelper#getDateFormat()`, but I didn't want to
 introduce dependencies to the `impl` package.
 > Actually,  `ParseHelper` should be split into a utility class that is
 part of the interface and the implementation specific part (should be part
 of the redesign ticket #19640).  Once, this class exists the
 `getDateFormat` method should be used; until then it wont break I think.

 Ah, fair enough.  Happy to discuss this more on that other ticket.

 > [...]
 >
 > > Can we pick a smaller `test.json` and compressed equivalents in
 `src/test/resources/`?  For example, we could take the `index.json` from
 the main CollecTor instance and throw out all files that haven't been
 modified in the past, say, four weeks.  Or did you include a large file on
 purpose to check something that cannot be tested with a smaller file?
 > >
 >
 > I wanted to use the 'real' file as it is provided by the current main
 CollecTor without any edits, because I only have these small toy test
 files otherwise. There can be a timeout to the test method and we'd notice
 if that takes too long?

 Okay to leave these files in.  No need to add a timeout to the test, as
 we'll notice slow tests pretty easily, and then we can always take them
 out.  I was also more concerned about the file size than the test
 execution time, but I admit that file size doesn't really matter here.
 Never mind.

 > > I'd say once we have resolved these questions, it's good to go into
 1.4.0, even today.  Thanks for working on this!
 >
 > Fine, thanks for all the things you spotted!
 >
 > Please review the [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/?h=task-19791 new commits] based on your branch above.

 All changes look good.  I'll squash your latest commits and mine into
 32f628f, run some tests, and release 1.4.0.  Let's just not forget the
 issues I'm raising above for 1.5.0.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19791#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list