[tor-dev] patch to improve consensus download decompress performance

Tim Wilson-Brown - teor teor2345 at gmail.com
Fri Aug 28 01:52:22 UTC 2015

> On 28 Aug 2015, at 00:50, starlight.2015q3 at binnacle.cx wrote:
> <tor->

Thanks for this patch, I have logged it into the Tor Trac system as #16914

The full details are on Trac, but I’ll summarise my analysis of the patch here:
This patch modifies the estimate for every gzip decompression performed by tor,
not just the decompression of the consensus documents. But it's the largest files
that matter for decompression speed and memory usage analysis.

So using `gzip -9 -v * 2>&1 | cut -f1 -d%` on some recent tor directory
documents (it’s not exactly tor’s compression, the headers are different):
cached-microdesc-consensus-20150826-1400-UTC:      56.3
cached-microdesc-consensus-20150827-2300-UTC:      56.1
cached-microdescs-20150826-1100-UTC:       51.9
cached-microdescs-20150827-2300-UTC:       52.1

The micro descriptor consensuses are 1.2 MBytes, and the combined size of all
cached microdescriptors is 2.9MBytes.

The microdescriptor consensus and microdescriptors are downloaded by
most Tor clients (and, therefore, by most Tor instances). The microdescriptors
are downloaded individually, so their ratios may be slightly lower. (The
"full" consensus and descriptors aren't used by most clients, so they can
be ignored for the purposes of this analysis.)

I suggest we go with the patch, and increase the expected ratio to 75%. We
currently double the size of the buffer when we need to reallocate anyway, so we
are already using that much RAM for every decompression, we're just allocating
50%, then reallocating 75%. (Except if the individual microdescriptor ratios
fall under 50%, in those cases, we're only using 50%.)

Alternately, we could increase the expected ratio to 70%, saving 1
reallocation and 5% RAM in most cases. This would be a win for both
performance and RAM usage. The 5% difference between 70% and 75% is 61 KByte
for the 1.2 MByte micro descriptor consensus, which is the largest individual file.
This optimisation costs us nothing, except that if we cut it too close, and the
compression ratio improves, we get a memory doubling during decompression,
which we’re trying to avoid. This would most likely happen in the full consensus
decompression (currently 66% compression ratio on 1.4 MBytes).

To proceed with the patch, we need to know / decide:

What is the range of compression ratios on recent microdescriptors and
microdescriptor consensuses? Do they vary much?
(Does someone have an archive somewhere?)

Do we want to go with the 70% option to save both RAM and performance?

More information about the tor-dev mailing list