[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Wed Oct 26 12:17:02 UTC 2016

#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by iwakeh):

 1. Valid points about file-io.  My local runs identified the network as
 the bottleneck rather than file-io when doing an initial sync to an empty
 CollecTor instance.  Subsequent runs were way shorter.  The mirror as a
 running Collector instance needed only 20 min on the first sync-run and
 now way less (3 to 1 min).
  But anyway, it's true that some of the copying could and should be
 reduced.

 2. Ideally index.json should be a true picture of 'recent', but actually,
 it'll always only be a snapshot, even if it's updated with each change,
 b/c then the syncing instance cannot update index.json continuously.  So,
 CollecTor's sync should accommodate the possible differences, which it
 does currently, I think.

 How to proceed?
 Do you think this is a halt to the release?
 I think it can be released as is, because the current set-up increases
 descriptor availability a lot and is tested.
 I'm wary of tuning it now without a release delay.  And, regarding both
 writing and parsing there are duplicate and trip-licate implementations in
 the code-base, which should be streamlined and can be tuned in that
 process.

 I'd suggest to release and have new tickets (which will be part of the
 other tickets for planning the streamlining, modularization, and other
 improvements):
 1. streamline writing all over the code-base with an emphasis on reducing
 file-io for CollecTor;
 2. Make index.json as close to the current state as necessary and
 feasible, which includes pondering about how accurate it should be with
 the given use-cases. Maybe have a clean-up module before index-run.

 Is that an ok plan?

--
Ticket URL: <https://troodi.torproject.org/projects/tor/ticket/18910#comment:89>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online