[metrics-bugs] #25523 [Metrics/Library]: Add support for webstats tarballs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Mar 16 16:04:27 UTC 2018


#25523: Add support for webstats tarballs
---------------------------------+----------------------
     Reporter:  karsten          |      Owner:  iwakeh
         Type:  defect           |     Status:  assigned
     Priority:  Medium           |  Milestone:
    Component:  Metrics/Library  |    Version:
     Severity:  Normal           |   Keywords:
Actual Points:                   |  Parent ID:
       Points:                   |   Reviewer:
      Sponsor:                   |
---------------------------------+----------------------
 I started creating tarballs containing `.xz`-compressed webstats files.
 When I attempt to feed them into `DescriptorReader`, it fails with an
 exception like the following:

 {{{
 Cannot parse descriptor file ’in/webstats-2016-01.tar’.
 ��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
         at
 org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
         at
 org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
         at java.lang.Thread.run(Thread.java:745)}
 }}}

 The tarballs I created contain files as follows:

 {{{
 $ tar tf webstats-2016-01.tar
 [...]
 webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
 webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
 }}}

 When I extract tarball files before reading them with `DescriptorReader`,
 this works just fine.

 I ''think'' that the issue is that
 `DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
 `descriptorFile` rather than `fileName` to obtain the file name. The
 effect is that it learns the ''tarball'' file name, rather than the file
 name of the contained log file:

 {{{
 -    if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
 +    if (fileName.contains(LogDescriptorImpl.MARKER)
 }}}

 The above is untested and probably insufficient. It's just supposed to
 start the bug hunting. Priority is medium, because we can just extract
 tarballs for now. But it's a bug, and it may confuse users as soon as we
 provide these tarballs and no working code to process them.

 This is also related to #22695.

 Assigning to iwakeh who said they'd like to grab it.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25523>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list