[tor-bugs] #20228 [Metrics/CollecTor]: Append all votes with same valid-after time to a single file in `recent/`

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Sep 29 08:49:46 UTC 2016


#20228: Append all votes with same valid-after time to a single file in `recent/`
-------------------------------+---------------------
 Reporter:  karsten            |          Owner:
     Type:  enhancement        |         Status:  new
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------

Comment (by karsten):

 Replying to [comment:3 iwakeh]:
 > * Regarding grouping by download vs. published time which came up in
 #20234, too.
 >  Let's have the discussion for all descriptors here, if this is ok?
 >  1. Grouping by published time brings more data consistency between
 CollecTor instances, as their download times for the same descriptors
 surely differ often.

 Agreed, I guess we can assume that files in the `recent/` directories
 might differ between CollecTor instances.  But is that important, as long
 as the set of contained descriptors with publication time in the past,
 say, 60 hours is 99.9% the same?  I mean, it's still possible and very
 likely that files by publication hour would contain descriptors in
 different orders.  Do we care?

 >  2. Grouping by download time means keeping track of a data item, i.e.
 download time, that so far is not part of the Tor protocol.  Why introduce
 it for descriptors that provide a published time?  Which is the download
 time after syncing descriptors: the initial download by the supplying
 CollecTor or the sync-download-time by the receiving one?

 Right now, a CollecTor instance records the timestamp when starting to
 download and uses that as file name for the descriptors file where it
 appends all descriptors it learns about in that run.  That would include
 descriptors found via initial download or via synchronization from other
 instances.  And 72 hours later, when the file gets deleted, the download
 time will not be relevant anymore.

 >  3. Regarding #20234:comment:5: Clients might not be interested in past
 or future (according published time) descriptors and just download the
 file they consider current, if it changed since their last visit.

 Right, this is an important argument for storing descriptors by published
 hour, so that clients can retrieve them easily.  However, the presumption
 there is that the client knows the publication time of a descriptor before
 downloading something, and that's not always the case.  It might be that
 the client would have to download several files and search for the
 descriptor it's looking for.

 And the most important argument against storing descriptors by published
 hour is that clients that just want the new descriptors will have to
 download about 8 files per hour (due to #20234) rather than 1, where 6 or
 7 of these files contain mostly the same descriptors as before.

 > * Regarding the notice:  I think the two week time frame is fine.

 Sounds good.  Let's first conclude on something here and then tell the
 world.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20228#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list