[tor-bugs] #19791 [Metrics/metrics-lib]: Use CollecTor's index.json for download; adapt current download to use new date format

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Aug 24 13:44:34 UTC 2016


#19791: Use CollecTor's index.json for download; adapt current download to use new
date format
---------------------------------+-----------------------------------
 Reporter:  karsten              |          Owner:  iwakeh
     Type:  defect               |         Status:  needs_review
 Priority:  High                 |      Milestone:  metrics-lib 1.4.0
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+-----------------------------------
Changes (by iwakeh):

 * status:  needs_revision => needs_review


Comment:

 Replying to [comment:10 karsten]:
 > Alright, let's go through the last four commits in that branch:

 Thanks for looking at all that!

 >
 > 91fdb9b is already contained in master as 1dab40d.

 correct.

 >
 > a74563a looks good and is merged to my branch (see below) as 7a0e109.
 >
 > 1276c14 looks good, too, and is merged to my branch as e9c0731.  Thanks
 for caring about tests!

 Without tests a codebase becomes unmaintainable very soon ...

 >
 > 32f628f is preliminary merged to my branch as 82eb1cf, but I have a
 couple of questions and change requests, as follows:
 >
 > Commit bf5ba78 in [https://gitweb.torproject.org/user/karsten/metrics-
 lib.git/log/?h=task-19791 my task-19791 branch] contains a few trivial
 fixes.  Please take a look and let me know if you're okay with those.

 Does look fine.

 >
 > Regarding the new subpackage `org.torproject.descriptor.index`, I'm yet
 unclear whether that's supposed to be "visible" to API users or not, that
 is, if they're supposed to import and/or use those classes directly.  If
 it is, that's going to grow the overall interface quite a bit, requiring
 us to keep the public parts stable in subsequent releases.  For example,
 I'm unclear whether CollecTor is supposed to instantiate its own `*Node`
 classes when writing its `index.json*` files.  Ideally, metrics-lib would
 hide away those classes by providing some kind of
 `DescriptorIndexGenerator` interface that takes a local directory as
 parameter and writes one or more `index.json[*]` files, but I'm not sure
 what your plan is there.  If the plan is to hide that package from users,
 how about we rename it to `org.torproject.descriptor.impl.index` to make
 that clearer?  We could even create similar implementation subpackages in
 the future to clean up the `impl` package a bit, but that's not something
 that users would have to care about.

 I'd like to add the package as '''alpha'''. Warning that it can change
 still in unexpected ways.  Thus, it can be used in CollecTor and from the
 findings there we can decide to remove it from the public interface, or
 change it w/o being backward compatible.
 I provide a commit with the package-info and more javadoc below.

 The *Node classes are of use on their own for clients that want to know
 about a CollecTor's instance files.  They are there anyway and maintained,
 so others can just use them as well.

 >
 > Can we avoid changing the parameter usage of
 `DescriptorIndexCollector#collectDescriptors()` by overloading that method
 in the interface?  How about we add a new method `collectDescriptor(String
 collecTorIndexUrl, String collecTorIndexPath, String[] remoteDirectories,
 ...)` and use an implementation-specific default for `collecTorIndexPath`
 in the current method, such as `"/index.html"` for
 `DescriptorCollectorImpl` and `"/index.json.xz"` for
 `DescriptorIndexCollector`?  If we retain backward compatibility here, I'd
 say we could switch to `DescriptorIndexCollector` in 1.5.0 when this class
 has seen some more testing.  Otherwise I'd suggest waiting for 2.0.0.

 I wouldn't want to change the interface for that.  Both are URL strings
 and whoever uses the non-default implementation must know what their
 doing, i.e. at least read javadoc of the class they explicitly request to
 be used.  At the point DescriptorIndexCollector becomes default we might
 not want to keep maintaining the old implementation and simply remove it.
 Then the additional method wouldn't make sense any more.

 >
 > The condition `if (!filepath.exists() && filepath.mkdirs())` in
 `DescriptorIndexCollector#fetchRemoteFiles()` looks wrong.  Shouldn't it
 be `!filepath.mkdirs()` there?  I didn't test this, but the API says that
 `mkdirs()` returns "true if and only if the directory was created, along
 with all necessary parent directories; false otherwise".

 Thanks for spotting this!
 I'll added a test and the fix in a new commit.

 >
 > Is `FileNode#lastModifiedMillis()` called often?  If so, it may be
 better to use `ParseHelper#getDateFormat()` to obtain a thread-safe
 `DateFormat` instance for the given format and store it in a map.  That
 avoids creating single-use instances of that class over and over.  We'd
 have to make that method public for this, but that shouldn't be an issue.
 However, if it's not called very often, we can skip this.
 >

 I'm aware of `ParseHelper#getDateFormat()`, but I didn't want to introduce
 dependencies to the `impl` package.
 Actually,  `ParseHelper` should be split into a utility class that is part
 of the interface and the implementation specific part (should be part of
 the redesign ticket #19640).  Once, this class exists the `getDateFormat`
 method should be used; until then it wont break I think.

 > In `IndexNode#retrieveFiles()` I'm not yet clear what situations lead to
 returning the half-done map in the middle of the loop.  If one invalid
 remote directory leads to that case, I think we should either warn and
 skip that remote directory or return an empty map.  Otherwise, the order
 of remote directories would affect the result, and that might lead to bugs
 in user applications that are quite hard to find.

 Good catch! Indeed there should be a `continue`. Adapted the tests to
 catch this and corrected the method an also another bug.

 >
 > Can we pick a smaller `test.json` and compressed equivalents in
 `src/test/resources/`?  For example, we could take the `index.json` from
 the main CollecTor instance and throw out all files that haven't been
 modified in the past, say, four weeks.  Or did you include a large file on
 purpose to check something that cannot be tested with a smaller file?
 >

 I wanted to use the 'real' file as it is provided by the current main
 CollecTor without any edits, because I only have these small toy test
 files otherwise. There can be a timeout to the test method and we'd notice
 if that takes too long?

 > I'd say once we have resolved these questions, it's good to go into
 1.4.0, even today.  Thanks for working on this!

 Fine, thanks for all the things you spotted!

 Please review the [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/?h=task-19791 new commits] based on your branch above.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19791#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list