[tor-bugs] #17321 [CollecTor]: Index to better support downloaders

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Oct 12 08:20:00 UTC 2015


#17321: Index to better support downloaders
-----------------------------+-----------------
     Reporter:  atagar       |      Owner:
         Type:  enhancement  |     Status:  new
     Priority:  major        |  Milestone:
    Component:  CollecTor    |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |    Sponsor:
-----------------------------+-----------------

Comment (by karsten):

 Thanks for starting this discussion.  I spent some time thinking about
 this in the past days, too, and while I don't have a complete plan in mind
 yet, I'd want to share some ideas:

 You placed your example index.json somewhere in the middle of the
 directory tree and listed all directly contained directories and files.
 That's how Apache's index.html works.  But that also means that tools like
 Stem would need to navigate through the directory tree and read multiple
 of these index.json files.  And it means that CollecTor would have to
 rewrite all these index.json files after an update.  While this could
 work, it's somewhat complex.

 How about we write a single index.json, say
 https://collector.torproject.org/index.json, that contains all directories
 and files in the directory tree?  This would make processing a lot easier.

 The obvious downside is that this file could grow quite big.  I'm listing
 all directories and the number of contained files here:

 {{{
    0 https://collector.torproject.org/
    0 https://collector.torproject.org/archive/
   89 https://collector.torproject.org/archive/bridge-descriptors/
   52 https://collector.torproject.org/archive/bridge-pool-assignments/
   68 https://collector.torproject.org/archive/exit-lists/
    1 https://collector.torproject.org/archive/relay-descriptors/
   96 https://collector.torproject.org/archive/relay-
 descriptors/consensuses/
   98 https://collector.torproject.org/archive/relay-descriptors/extra-
 infos/
   21 https://collector.torproject.org/archive/relay-
 descriptors/microdescs/
  117 https://collector.torproject.org/archive/relay-descriptors/server-
 descriptors/
   76 https://collector.torproject.org/archive/relay-descriptors/statuses/
   40 https://collector.torproject.org/archive/relay-descriptors/tor/
   96 https://collector.torproject.org/archive/relay-descriptors/votes/
   75 https://collector.torproject.org/archive/torperf/
    0 https://collector.torproject.org/recent/
    0 https://collector.torproject.org/recent/bridge-descriptors/
   72 https://collector.torproject.org/recent/bridge-descriptors/extra-
 infos/
   72 https://collector.torproject.org/recent/bridge-descriptors/server-
 descriptors/
   72 https://collector.torproject.org/recent/bridge-descriptors/statuses/
   72 https://collector.torproject.org/recent/exit-lists/
    0 https://collector.torproject.org/recent/relay-descriptors/
   72 https://collector.torproject.org/recent/relay-
 descriptors/consensuses/
   72 https://collector.torproject.org/recent/relay-descriptors/extra-
 infos/
    0 https://collector.torproject.org/recent/relay-descriptors/microdescs/
   72 https://collector.torproject.org/recent/relay-descriptors/microdescs
 /consensus-microdesc/
   72 https://collector.torproject.org/recent/relay-descriptors/microdescs/
   72 https://collector.torproject.org/recent/relay-descriptors/server-
 descriptors/
  576 https://collector.torproject.org/recent/relay-descriptors/votes/
   37 https://collector.torproject.org/recent/torperf/
 2090 (total)
 }}}

 If we assume that each directory or file requires 200 characters/bytes in
 the index.json, that's an uncompressed file size of 413 KiB.  We can
 probably save a bit here by removing whitespace, not repeating the
 https://collector.torproject.org/ part over and over, etc.  What do you
 think, is that still reasonable?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17321#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list