[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 12 13:42:45 UTC 2016


#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Alright, here's a review of f4026fc:

  - All classes in the new `persist` package contains unused imports or
 unused attributes.  Can you remove those in a fixup commit?

  - Can we somehow use `printf` patterns for putting together paths to make
 that bunch of `DescriptorPersistence` classes easier to read?  It should
 be possible to express all paths using placeholders for dates/times,
 strings, and chars.  And if we care about path separators being platform
 independent, we could split the resulting path at `/` characters and put
 it back together using `Paths.get()`.  Untested example for a bridge
 extra-info descriptor:

 {{{
 String.format("bridge-descriptors/%tY/%<tm/extra-infos/%c/%c/%s",
     desc.getPublishedMillis(),
     desc.getExtraInfoDigest().charAt(0),
     desc.getExtraInfoDigest().charAt(1),
     desc.getExtraInfoDigest());
 }}}

  - Building upon the previous idea, maybe we can avoid having
 `DescriptorPersistence` classes at all if all we need is a type annotation
 and two `printf` patterns.  We could define an order of `printf` arguments
 for all patterns, like: published, received, source, digest, first char of
 digest, second char of digest.  That would really save a few lines of
 code, wouldn't it?  Maybe something for later?

  - Half of the paths generated by `DescriptorPersistence` subclasses do
 not match paths as they are currently implemented.  I only read the code
 and did not run it, but I believe the following actual paths would not
 have been generated by these classes:

 {{{
 ./out/bridge-descriptors/2016/10/extra-
 infos/0/0/0000000001764ef8b8b5bc9ed70b9e99225112ffd04
 ./out/bridge-descriptors/2016/10/server-
 descriptors/1/1/112fd90a05866092a50a4ab5b1e07ee1749259fe
 ./out/bridge-
 descriptors/2016/10/statuses/06/20161006-123817-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1
 ./out/relay-descriptors/microdesc/2016/10/consensus-
 microdesc/06/2016-10-06-12-00-00-consensus-microdesc
 ./out/relay-
 descriptors/microdesc/2016/10/micro/6/6/66d7428cefc72f63b26fbee812797cc6c4ff1c34729b79631bf6c1717d46e82e
 ./out/relay-descriptors/server-
 descriptor/2016/10/5/3/53c11f7232b3d7ef6501112eb1cdc2fb998b0197
 ./out/torperf/2016/10/06/moria-51200-2016-10-06.tpf
 ./recent/bridge-
 descriptors/statuses/20161012-073817-1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1
 ./recent/relay-descriptors/microdescs/consensus-
 microdesc/2016-10-12-03-00-00-consensus-microdesc
 ./recent/relay-descriptors/server-descriptors/2016-10-12-07-05-00-server-
 descriptors
 ./recent/relay-descriptors/votes/2016-10-09-12-00-00-vote-
 23D15D965BC35114467363C165C4F724B64B4F66-732E4ED709E1E9F84AD19B686C18202DB1524410
 }}}

  - For the sake of completeness, I believe that the following paths would
 have been generated by these classes:

 {{{
 ./out/exit-lists/2016/10/12/2016-10-12-03-02-00
 ./out/relay-descriptors/consensus/2016/10/06/2016-10-06-12-00-00-consensus
 ./out/relay-descriptors/extra-
 info/2016/10/1/1/1135684f37075fa58f525216444512c4f64e2e9c
 ./out/relay-descriptors/vote/2016/10/02/2016-10-02-13-00-00-vote-
 E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58-2A409B6A22D0266CBA2175E837B39C435BACA312
 ./recent/bridge-descriptors/extra-infos/2016-10-12-02-09-00-extra-infos
 ./recent/bridge-descriptors/server-descriptors/2016-10-12-02-09-00-server-
 descriptors
 ./recent/exit-lists/2016-10-12-03-02-00
 ./recent/relay-descriptors/consensuses/2016-10-11-12-00-00-consensus
 ./recent/relay-descriptors/extra-infos/2016-10-12-05-05-00-extra-infos
 ./recent/relay-descriptors/microdescs/micro/2016-10-12-09-05-00-micro
 ./recent/torperf/moria-51200-2016-10-12.tpf
 }}}

  - In `ConsensusPersistence`, the check for `null ==
 desc.getConsensusFlavor()` seems too fragile with respect to flavors we'll
 add in the future.  We wouldn't notice except for microdesc consensuses
 being overwritten.  We should check the actual consensus flavor if it's
 not `null`.

  - In `DescriptorPersistence`, I wonder if we should not accept
 `StandardOpenOption` or even `StandardOpenOption ...` to distinguish
 replace vs. append but rather just a boolean.  It seems like an
 implementation level detail we're handing out to the caller.  Are we sure
 we're implementing all options and in particular combination of options
 correctly?  Those checks where we're only looking at how many such option
 values we're being given look really fragile.  If we only want to support
 replace vs. append, we should only provide those two options in the
 interface.

  - Same class, method `storeAll()` should be more explicit in the method
 name or at least documentation that `recent` will only be written if
 writing `archive` is successful.

  - Same class, methods `storeRecent()` and `storeOut()` might have wrong
 argument orders in their `Paths.get()` calls, but I'm not sure.

  - `ExtraInfoPersistence` does not distinguish enough between relay and
 bridge extra-info descriptors.  The latter use a different `@type`
 annotation which needs to be reflected in the constructor.  It might be
 easier to use two distinct `DescriptorPersistence` classes for
 `RelayExtraInfoDescriptor` and `BridgeExtraInfoDescriptor`.

  - Same class, `@type bridge-extra-info` is not `1.0` but `1.3` right now.
 And we'll almost certainly forget to update the version number there in
 case we move to `1.4`.  Can we avoid having to remember that?  This
 probably applies to all `@type` annotations, though the bridge descriptors
 numbers are most likely to change over time.

  - In `PersistenceUtils`, method `storeToFileSystem()`, that check for
 `append.length == 0` could be changed to `append.length <= i` to make the
 method a tiny bit more robust against wrong usage.

  - Same class, are the `dateDependentPath*()` methods used anywhere?  If
 not and if there are no plans to use them, can you remove them?

  - Same class, `dateDependentPathYm()` (assuming that it'll be used) does
 not indicate that it also includes the day in the path, and it appends the
 `append` part without dash, unlike `dateDependentPathYmd()`.

  - `ServerDescriptorPersistence`: see comments for `ExtraInfoPersistence`.

  - `StatusPersistence`: why not call it `BridgeNetworkStatusPersistence`?
 But regardless of that, the version must be `1.1` and adapted whenever it
 updates.  See above.

  - The following didn't become entirely clear from reading the code, so
 let me ask: are files written to temporary files like `.somefile.tmp` and
 later renamed to `somefile`?  We should make sure we're doing that for
 files in `recent/` that we append descriptors, because otherwise those
 half-written files will be indexed and made available, which we should
 avoid.  Basically, if a file may change after being written, let's write
 it to a temp file and only rename it to its final file name when we're
 sure it won't change anymore.

  - Similar to the previous comments, I found a few formattings that should
 have been caught by checkstyle.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list