[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Wed Oct 26 17:32:45 UTC 2016

#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Replying to [comment:91 iwakeh]:
 > Replying to [comment:90 karsten]:
 > > Well, the sync runs don't take just 3 or 20 minutes here, but many
 hours.  I could imagine that it's the backup daemon trying to capture all
 file system changes for the next backup, or something.  I could imagine
 that similar things can happen on servers.
 >
 > Well, please  investigate, if the backup is the reason.  A backup
 shouldn't hamper the productive system.

 This was just a guess.  I didn't further investigate after finding out
 what the code was doing.  And even if backup daemons were not the reason
 for the slowness here, we shouldn't ship this code, now that we know about
 its inefficiency.

 > Just to make sure we're talking about the same use case:
 > * A fresh installation without previous data is not what sync is for.
 Here the archived data of the last three days should be provided and one
 or two regular download runs. After that, sync can be turned on.
 > * A running instance like a mirror can be enhanced with the sync.

 Why would we discourage turning on sync right from the start?  In this
 case, it was not the first run that was slow, but later sync runs were
 even slower.

 > > I don't understand your reasoning about `index.json` not being a true
 picture of `recent/`.  We're skipping `*.tmp` files when creating that
 file, and we always append to `.tmp` and only rename to the destination
 file when we're sure the file won't change anymore.  Where does that get
 inaccurate?
 >
 > Now I see we're talking about different accuracies:
 >
 > You're referring to the single file assembled in one download (or
 hopefully soon sync-run).  Thus, the index.json of a syncing instance
 could become inaccurate in this case.

 Yes, that's what I was referring to.

 > I'm referring to the accuracy lost by regular operation after creation
 or download of index.json.  For example ticket #20430, the syncing
 instance retrieves index.json, shortly after that the main instance has a
 clean-up run, and thus files listed in index.json don't exist anymore.

 True, that's unrelated to what I meant above.

 > > By the way, we'll need to merge #20380 before putting out the release.
 And I'd want to start a test run over night before releasing, so the
 release cannot happen today anyway. :(
 >
 > That is vital information.
 > When postponing the release there is time for changing the code and
 testing, that's what I said before.
 > I'll take a look.

 If you have some code for me today, I'll run it over night, and maybe we
 can release tomorrow! :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:92>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online