[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Sep 16 17:51:33 UTC 2016


#18910: distributing descriptors accross CollecTor instances
-------------------------------+-----------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_information
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+-----------------------------------

Comment (by iwakeh):

 Thanks for the remarks and suggestions!
 I'm replying inline below and also add a wiki page
 [wiki:doc/CollecTor/DescriptorDistribution CollecTor Sync] that contains
 the current status of the discussion.  Please, take a look there to see
 the entire picture.

 Replying to [comment:14 karsten]:
 > Hmm, the suggested config options would imply that there's only one new
 sync manager module that syncs all descriptors from the various sources
 and that runs, say, once per hour?  I wonder how to schedule that in a way
 that it does not interfere with the other modules.  So far, modules were
 pretty much independent, but this new module would create a dependency
 between modules.

 You're right, they should stay independent.  I intended that, too, but I
 had a different (more complicated) architecture in mind.

 >
 > Alternative suggestion: we add four (sets of) configurations, one for
 each module, that internally re-use the same code for syncing descriptors
 and for importing them.  For example, `SyncRelayDescriptors`,
 `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`.

 Good idea! So we run the sync-function after or instead of the module run
 (see wiki page for more).

 > We could then provide a remote path where to find descriptor files (like
 `/recent/relay-descriptors/`) and could implictly only consider descriptor
 types that the respective module understands (like
 `RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
 `BridgeServerDescriptor`).

 Actually, the directory structure of a CollecTor's 'recent' is given, i.e.
 the different mirrors won't or shouldn't use a different directory
 sructure than the main instance.  So, it suffices to activate the module
 and set the sync or sync-only option.  The path structure for the actual
 download is determined. The straightforward paths for torperf and
 exitlists and the more complex structure for bridge- and relay-
 descriptors.

 >
 > Here's a potential policy we could apply to decided whether to keep a
 local or remote descriptor: while syncing, if we find out that a remotely
 obtained descriptor would be stored under a file name that already exists
 locally, we always discard that;...

 So, //while syncing// means while retrieving descriptors from a different
 instance and writing them to the local `SyncFolder` structure.  And,
 during this process descriptors already available in the sync-folder are
 not replaced.

 > ... and while processing descriptors locally, if we find that we already
 have a file locally with different content, which we likely received while
 syncing, we always overwrite that.  This means that we're only adding data
 but never replacing data.

 This refers to the process of comparing the descriptors fetched from
 remote instances with descriptors already in the 'recent' folder of the
 syncing instance?  Such local descriptors could have been obtained by
 direct download or a different syncing operation. Did I miss something
 here?

 >
 > Regarding deleting synced descriptors, we should never do that, but we
 should rather let `DescriptorCollector` clean up the local directory when
 it finds that a local file does not exist anymore remotely.

 True, if this refers to descriptors in the SyncFolder.

 >
 > Here's something else to watch out for while writing this code: whenever
 we learn descriptors from syncing, we'll have to include them in our
 `/recent/` directory, too.  This wasn't entirely clear to me from the
 description above, so if this was already the plan, never mind.

 That was intended, but should be clearly stated; will be added to the wiki
 page.

 Hope I don't see things too complicated.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list