[tor-bugs] #17321 [CollecTor]: Index to better support downloaders

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Oct 13 21:29:47 UTC 2015


#17321: Index to better support downloaders
-------------------------+---------------------
 Reporter:  atagar       |          Owner:
     Type:  enhancement  |         Status:  new
 Priority:  High         |      Milestone:
Component:  CollecTor    |        Version:
 Severity:  Blocker      |     Resolution:
 Keywords:               |  Actual Points:
Parent ID:               |         Points:
  Sponsor:               |
-------------------------+---------------------
Changes (by karsten):

 * severity:   => Blocker


Comment:

 Replying to [comment:6 atagar]:
 > > What do you think, should we move forward with this?
 >
 > Yikes, I'm surprised you put this together so quickly. This is great,
 lets do it!

 Great!

 > > "path": "/srv/collector.torproject.org/htdocs/",
 >
 > The filesystem path is being used for the first entry. This threw me off
 for a sec. In the example above you have
 "https://collector.torproject.org/" which made a bit more sense for the
 root.

 Sure, it's supposed to be that.  I just didn't edit the output file.

 > > "last_modified": "2014-07-07 09:15"
 >
 > You mentioned above that you wanted to switch to ISO timestamps. That,
 or these timestamp (which tor uses) are both fine to me as long as it's
 UTC.

 Oh, and ISO would be "2014-07-07T09:15Z"?  Well, I'd say let's use the
 format that tor uses then.

 > Personally I'm still not clear on its use though. I care about 'what
 time period are the descriptors in this resource for'. I don't have a use
 case at the moment for last modified timestamps, though if you have need
 for them then feel free to include them.
 >
 > If we're certain they're needed include them. If unsure I'd suggest
 leaving them out for now to cut down on the size. We can always add them
 later.

 The main use case is to only fetch files that have changed since the last
 time we fetched.  While most files cannot change, some can.  For example,
 Torperf files will be appended to multiple times over the day and monthly
 tarballs are updated every three days.

 So, I think I'll need this field in metrics-lib.

 > > "path": "exit-lists/",
 >
 > Very minor but personally I'd opt to not include the trailing slash. It
 doesn't add anything. But that's definitely in bikeshed territory. :P

 Bikeshedding is still fine at this stage.  Removed the trailing slash.

 But let me bikeshed back: also removed the trailing "/" from the
 "https://collector.torproject.org/" that you suggested. ;)

 > One general suggestion: lets put the whole thing in a 'contents' entry
 so we can include additional metadata. For instance, 'when was this index
 created' could be very useful to help detect if the script keeping it up
 to date broke...
 >
 > {{{
 > {
 >   "index_created": "2014-06-05 10:52",
 >   "contents":
 >     ... all the stuff...
 >   }
 > }
 > }}}
 >
 > More fields might come into play in the future.

 Good idea.  I tweaked it a tiny bit by turning the top-level object into a
 special type of directory object with "path", "directories", and
 (optional) "files", but which may also contain additional fields like
 "index_created".  I think that's easier to process for applications.

 Here's an example:

 {{{
 {
   "index_created": "2015-10-13 21:00",
   "path": "https://collector.torproject.org",
   "directories": [
     {
       "path": "archive",
       "directories": [
         {
           "path": "bridge-descriptors",
           "files": [
             {
               "path": "bridge-descriptors-2008-05.tar.xz",
               "size": 624156,
               "last_modified": "2012-05-30 19:41"
             },
             {
               "path": "bridge-descriptors-2008-06.tar.xz",
               "size": 1010648,
               "last_modified": "2012-05-30 19:41"
             },
             {
               "path": "bridge-descriptors-2008-07.tar.xz",
               "size": 1173032,
               "last_modified": "2012-05-30 19:41"
             },
 }}}

 What do you think?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17321#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list