[tor-bugs] #11788 [Metrics Data Processor]: Consider providing descriptor tarballs as .tar.xz rather than .tar.bz2

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 7 18:03:50 UTC 2014


#11788: Consider providing descriptor tarballs as .tar.xz rather than .tar.bz2
----------------------------------------+-----------------
     Reporter:  karsten                 |      Owner:
         Type:  enhancement             |     Status:  new
     Priority:  normal                  |  Milestone:
    Component:  Metrics Data Processor  |    Version:
   Resolution:                          |   Keywords:
Actual Points:                          |  Parent ID:
       Points:                          |
----------------------------------------+-----------------

Comment (by karsten):

 Replying to [comment:1 wfn]:
 > A purely procedural/logistical thing: I wonder how many services/tools
 use the Metrics archives, and whether it makes sense to convert all
 existing/previous .tar.bz2 archives to .tar.xz. Of course as Karsten says,
 if the latter is not done, "the downside is that new tools will have to
 support both .tar.bz2 and .tar.xz if we don't recompress existing
 archives."

 Right, I think at some point we'll want to provide archives using a single
 compression method.

 > In any case, ''quietly'' changing to .tar.xz is maybe not the way to go,
 in the sense that this should at the very least be announced. How many
 existing tools/software may rely on these Metrics archives?

 Agreed about not making this change quietly.  Here's what we could do:

  1. Start compressing new tarballs with `xz` ''in addition to'' `bzip2`
 and recompress existing tarballs using `xz` but without deleting the
 `bzip2` ones.  Change links on https://metrics.torproject.org/data.html to
 the `.tar.xz` tarballs.  Tell people on tor-dev@ about the change, but say
 that `.tar.bz2` tarballs will be available for another two months.
  2. Two months later, stop creating `.tar.bz2` tarballs and delete
 existing `.tar.bz2` tarballs.

 > > Anything else?
 >
 > Memory usage?[1][2] Though as Nick said, `xz` memory usage seems to be
 constant / invariant to target size (depends on compression level only), I
 guess because the compression level chooses the dictionary size; and that
 is what uses the memory.
 >
 > [1]: http://pokecraft.first-
 world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
 > [2]: http://linux.die.net/man/1/xz

 Right.  The first link you mention there says we'll need up to 673MB for
 compressing and up to 64MB for decompressing a tarball using `xz`.  Sounds
 reasonable.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11788#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list