[tor-bugs] #20234 [Metrics/CollecTor]: Define CollecTor's file-structure protocol 1.0

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Sep 28 09:30:37 UTC 2016


#20234: Define CollecTor's file-structure protocol 1.0
-------------------------------+--------------------------------
 Reporter:  karsten            |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_revision
 Priority:  High               |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+--------------------------------
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 Thanks for starting this!  Here are some answers and some feedback:

  - It makes sense to specify the web-visible directories in this protocol,
 but what's the reason for also specifying the web-invisible `out/`
 directory there?  If the audience is developers who rely on the directory
 structure provided via HTTP, I'd say it's fine and even better to leave
 out that last directory.  And if the audience is operators and
 contributors, then we might have to include even more directories,
 including the `stats/` directory and others.  For comparison, the Onionoo
 protocol specification doesn't say anything about the `status/` directory
 which would be important for operators and contributors but which Onionoo
 client developers don't need to worry about.

  - "Shouldn't 'exit-list' be changed to 'exit-lists'?" -- Yes, we can do
 that.  In fact, I had this on my local TODO list for years and only
 recently dropped it, because meh, but if you also found this confusing,
 then it gets above the meh threshold again.  Let's do it.

  - "Shouldn't there be different markers for different torperf sources?"
 -- Maybe, but I'd rather not want to touch anything with the label Torperf
 on it unless it breaks apart or explodes.  Let's wait for the switch to
 OnionPerf and do something reasonable there.

  - "The 'compression-type' is one element of "xz", "gz", or "zip".  XXXX
 Is this true?" -- No, the only compression type that is currently in use
 is "xz".  We did use "bz2" until a few years ago, but we recompressed all
 tarballs, because "xz" compresses much better.  Of course, there's no
 guarantee that we'll stick with "xz" forever, so it might be fine to
 mention all possible compression types there.

  - Section 2.4 says that server descriptors are sorted into tarballs by
 download date.  That's not true, we're using published dates just like
 we're sorting extra-info descriptors into tarballs.

  - In Section 4.1.1, you ask: "Shouldn't the seconds be dropped?" -- No,
 because it's just coincidence that seconds are always zero.  That's
 because the new scheduler is super precise compared to the cron-based
 scheduling which put a 01 or 02 there at times.

  - Also in Section 4.1.1, "Why not group extra-info according to published
 time?" -- I don't understand that question.  Can you rephrase?

  - In Section 4.2.1, "What is the reason _not_ to group according to
 published time?" -- This question is very related to my recent thoughts on
 appending multiple votes to a single file:
 https://trac.torproject.org/projects/tor/ticket/20228#comment:2.
 Basically, if we were to store server descriptors and extra-info
 descriptors in hourly files, I'd expect that we update a couple of those
 files during a single update run.  (In fact, see the command and output
 below.)  And a client who wants to stay up to date would have to download
 all files that have changed.  Therefore it's much easier to append
 everything we learn in a single execution to a single file.

 {{{
 wget -O - https://collector.torproject.org/recent/relay-descriptors
 /server-descriptors/2016-09-28-09-05-00-server-descriptors | grep
 "^published " | cut -c1-23 | sort | uniq -c
    1 published 2016-09-28 04   # <- this comes quite late
    7 published 2016-09-28 07   # <- these, too
  786 published 2016-09-28 08   # <- one would only expect those
   16 published 2016-09-28 09   # <- and maybe a few of those
    3 published 2016-09-28 10   # <- hello, future
    1 published 2016-09-28 11   # <- and future
    1 published 2016-09-28 16   # <- and future
    1 published 2016-09-28 18   # <- hello, wrong clock
 }}}

  - I didn't look at Section 5 yet, because it's yet unclear whether that
 section belongs in the protocol.

 Again, thanks for writing this document!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20234#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list