[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 19 12:18:15 UTC 2016


#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Alright!  Please find [https://gitweb.torproject.org/karsten/metrics-
 db.git/log/?h=task-18910-3 branch task-18910-3 in my repository].  I
 created fixup/squash commits for most things I found.  Please take a close
 look and challenge anything you don't like to be squashed into your
 commit.  In particular, please take a look why the unit tests don't pass
 anymore. ;)

 More comments below:

  - I believe that index.json would include the `.tmp` files written during
 sync, because we're only skipping files starting with `.`.  And that's not
 even a new issue, because the `.tmp` files written during normal updating
 would also be included.  The issue will only become more prevalent.  I'd
 say we adapt the indexer, which can happen in a separate commit.  Or we
 could change temporary files to start with `.`, but that seems a bit more
 error-prone.  Suggestion: '''fix'''

  - As mentioned before somewhere, we're appending server/extra-info
 descriptors to different files in the synchronization run than in the
 earlier update run.  The reason is that `SyncManager` creates its own
 `collectionDate` in the constructor.  However, we cannot just pass a
 `Date` instance to both `SyncManager` and `ArchiveWriter`, because then
 `SyncManager` would append to a file that we're already serving, in which
 case the file size would change once we're done synchronizing, and we
 should try to avoid that.  Let's fix this as soon as the current updaters
 are modules just like the synchronization module, and let's keep in mind
 not to close and rename temporary files to their final names between
 modules.  Suggestion: '''postpone'''

  - As said earlier, let's release metrics-lib and use that new release as
 basis for this patch.  Then we won't have to import and instantiate
 `org.torproject.descriptor.index.DescriptorIndexCollector` anymore, and we
 can change defaults for the `*SyncOrigins` properties in
 `collector.properties` back to `https://collector.torproject.org`.
 Suggestion: '''change'''

  - Regarding comment 25 above, I'm still a fan of using `printf` as
 opposed to creating `SimpleDateFormat` instances and concatenating
 strings.  I also believe it would be more efficient.  But I'm happy to
 postpone that change.  Suggestion: '''postpone'''

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:56>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list