[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Oct 12 15:58:16 UTC 2016


#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+---------------------------------

Comment (by karsten):

 Next in line is 4efd190:

  - In `SyncManager`, I think we'll have to obtain the `collectionDate`
 from the corresponding `CollecTorMain` subclass somehow, so that we're
 appending descriptors to the same files as previously downloaded or
 locally read descriptors.  Otherwise we'd have two files in the `recent/`
 directory per descriptor type, making the index larger than necessary.
 Let's avoid that if we can.

  - As stated in an earlier comment, there are unused variables and imports
 in this commit that should be removed.

  - Still in `SyncManager`, method `mergeWithLocalStorage()`, it seems that
 we're using the exact same path for parse history files for different
 descriptors.  Example: we're reading descriptor type A with history file
 X, and when we're done we're reading descriptor type B with the same
 history file X.  The effect is that we're not skipping any B descriptors
 and overwriting everything in X that has to do with A.  Basically, we're
 reading the entire directory of synced descriptors in every execution.
 What we need is a distinct history file per descriptor type.  Or we could
 use a single descriptor reader regardless of descriptor type with a single
 history file.  By the way, this could have caused your OOM troubles.

  - Your log statement "Done reading {} of type {}." is misleading.  The
 `readDescriptors()` method returns immediately, but that does not at all
 mean that there are any descriptors in the iterator at that point.  A
 better message would be "Started reading ...".

  - In `SyncPersistence`, method `storeDescs()`, the documentation says
 "OutputDirectory" where it should say "OutputPath".

  - Regarding the three `//` comments in `storeDescs()`: it's indeed
 unfortunate that we need to infer the authority fingerprint of a bridge
 network status from the file name.  We should consider adding a new line
 for that to sanitized bridge network statuses.  That's entirely unrelated
 to the changes here but probably worth a ticket.

  - The second comment there is probably moot with microdescriptor syncing
 being postponed.

  - Regarding the third comment about exit lists, we're currently using the
 initial download time for storing in `out/` and `recent/`.  But I think
 that's something to ensure in `ExitlistPersistence`, not here, so the code
 here can stay unchanged.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:33>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list