[metrics-bugs] #23421 [Metrics/CollecTor]: Use persistence functionality throughout all modules

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Nov 3 09:02:19 UTC 2017


#23421: Use persistence functionality throughout all modules
-------------------------------+-----------------------------------
 Reporter:  iwakeh             |          Owner:  metrics-team
     Type:  enhancement        |         Status:  needs_information
 Priority:  High               |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  metrics-2017       |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+-----------------------------------

Comment (by karsten):

 Replying to [comment:1 iwakeh]:
 > Here's an overview (cf. #21759 comment:12 following):

 Some quick thoughts:

 >    * bridge-desc (all types): after sanitation the descriptor is
 written; if one descriptor cannot be sanitized, it is skipped

 Sounds reasonable. The decision whether to skip a descriptor or not should
 remain outside of the persistence module. We could easily remove
 synchronization functionality, because a CollecTor instance either
 sanitizes bridge descriptors or copies them over from another instance,
 but not both. But I don't mind keeping it as long as it doesn't get in the
 way by making designs more complex than they have to be.

 >    * relay-desc (all types): descriptors written one by one skipping
 problematic ones

 I wonder if we should take out the part where we're skipping problematic
 descriptors, so that we handle descriptors coming from directory
 authorities and from other CollecTor instances the same. We only need
 basic things like publication time, descriptor digest, etc. to determine
 file names. But maybe it shouldn't be on us to decide about rejecting a
 descriptor. Needs discussion.

 >    * exitlists: always stored as a single file.

 Yup. Nothing special here, I think.

 >    * onionperf: currently implemented using an implicit transaction,
 i.e., all descriptors in one downloaded descriptor file are only stored,
 if all were valid. This is different from the sync-approach where
 invalid/unparseable descriptors are ignored, but valid ones stored no
 matter if they came in one file.

 The implicit transaction thing was a mistake that we should fix. But
 similar to relaydescs, we should think about accepting all measurements
 containing just enough data to put them into the right directories. We can
 still warn about validation errors, just like we do when downloading
 relaydescs, but we'd store them anyway.

 Regarding webstats: Maybe we can take a similar approach like we have for
 bridgedescs where the sanitizer decides what files or lines go through and
 the persistence layer just stores what it gets as long as it has the
 necessary metadata. We don't really need synchronization here, for the
 same reason as we don't need it for bridgedescs.

 Is this the information you were looking for?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23421#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list