[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Sep 15 12:49:48 UTC 2016


#18910: distributing descriptors accross CollecTor instances
-------------------------------+-----------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_information
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+-----------------------------------

Comment (by karsten):

 Hmm, the suggested config options would imply that there's only one new
 sync manager module that syncs all descriptors from the various sources
 and that runs, say, once per hour?  I wonder how to schedule that in a way
 that it does not interfere with the other modules.  So far, modules were
 pretty much independent, but this new module would create a dependency
 between modules.

 Alternative suggestion: we add four (sets of) configurations, one for each
 module, that internally re-use the same code for syncing descriptors and
 for importing them.  For example, `SyncRelayDescriptors`,
 `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`.  We
 could then provide a remote path where to find descriptor files (like
 `/recent/relay-descriptors/`) and could implictly only consider descriptor
 types that the respective module understands (like
 `RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
 `BridgeServerDescriptor`).

 (If we're worried that there are too many config options already, I'm more
 than happy to make a list of options that can go away!  But this shouldn't
 mean we should hold back useful new options.)

 Here's a potential policy we could apply to decided whether to keep a
 local or remote descriptor: while syncing, if we find out that a remotely
 obtained descriptor would be stored under a file name that already exists
 locally, we always discard that; and while processing descriptors locally,
 if we find that we already have a file locally with different content,
 which we likely received while syncing, we always overwrite that.  This
 means that we're only adding data but never replacing data.

 Regarding deleting synced descriptors, we should never do that, but we
 should rather let `DescriptorCollector` clean up the local directory when
 it finds that a local file does not exist anymore remotely.

 Here's something else to watch out for while writing this code: whenever
 we learn descriptors from syncing, we'll have to include them in our
 `/recent/` directory, too.  This wasn't entirely clear to me from the
 description above, so if this was already the plan, never mind.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list