[tor-bugs] #32515 [Webpages]: Keep Good Summaries From Reliable Sources. Example: Tor's CODEC Strategy, zlib, lzma, zstd

Fri Nov 15 20:55:51 UTC 2019

#32515: Keep Good Summaries From Reliable Sources. Example: Tor's CODEC Strategy,
zlib, lzma, zstd
------------------------------+--------------------------
 Reporter:  werd              |          Owner:  (none)
     Type:  enhancement       |         Status:  new
 Priority:  Medium            |      Component:  Webpages
  Version:                    |       Severity:  Normal
 Keywords:  zlib, lzma, zstd  |  Actual Points:
Parent ID:                    |         Points:
 Reviewer:                    |        Sponsor:
------------------------------+--------------------------
 Ironically, about a week after I spent a bunch of time reading up on zlib,
 lzma, and zstd, on 2019-09-13 someone (Steve Snyder) on the tor-dev list
 asked:
  Given the multiple compression types supported (none, lzma, zlib, zstd),
 what is the order of preference for runtime use?

  Put another way, which compression method(s) should be supported to get
 optimal runtime performance from a Tor node?

 To which Nick Mathewson replied:
  For big objects like consensuses or consensus diffs that are sent over
 and over, relays prefer to use whichever compression method has the
 highest compression -- that's lzma2, then zstd, then zlib, then none.
 Lzma2 (aka xz) is more expensive to calculate, but the relays only need to
 calculate it once per compressed object, and then they can send it over
 and over.

  For smaller objects that are compressed in a stream (descriptors and
 microdescriptors), relays will not use xz, since it would be to expensive
 to recompute it for every stream. They'll prefer zstd, then zlib, then
 none.

  So if you want to save bandwidth above all, you should enable all
 compression algorithms.

  If you want to save CPU above all, you should enable all compression
 algorithms except xz.

  If you want to save bandwidth and CPU, I _think_  enabling all the
 compression algorithms will result in Tor making good choices (as
 described above).  But I'd appreciate benchmarks if anybody has tried it
 both ways to find out.

 This awesome summary, which I'd not gleaned in all my reading, should be
 preserved some place in a hierarchical structure of a builder/developer
 FAQ.

 As you know, by default Tor reports via standard output and logging, the
 compression types compiled into Tor. I wanted to know if it was worth the
 effort to add more than what the current default is.

 There didn't seem a place to put it in the existing structure. Both
 https://2019.www.torproject.org/docs/tor-doc-unix.html.en and
 https://2019.www.torproject.org/docs/tor-doc-osx.html.en are places where
 it makes sense, but best I can tell this raises a few issues: duplicating
 info is not great, those pages are official, and from other pages on
 TorProject.org I've read, those pages are output from some other software.
 And looking at the trac index page
 https://trac.torproject.org/projects/tor/wiki/TitleIndex is overwhelming,
 I'm not sure this awesome tidbit should be yet another page on that list.

 Please excuse me if I placed this under the wrong Type, and also I wasn't
 sure what Subcomponent should be chosen.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32515>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online