On 28 Aug 2015, at 00:50, starlight.2015q3@binnacle.cx wrote:
<tor-0.2.6.10-gz4x_guess.patch>
Thanks for this patch, I have logged it into the Tor Trac system as #16914 https://trac.torproject.org/projects/tor/ticket/16914
The full details are on Trac, but I’ll summarise my analysis of the patch here: This patch modifies the estimate for every gzip decompression performed by tor, not just the decompression of the consensus documents. But it's the largest files that matter for decompression speed and memory usage analysis.
So using `gzip -9 -v * 2>&1 | cut -f1 -d%` on some recent tor directory documents (it’s not exactly tor’s compression, the headers are different): {{{ cached-microdesc-consensus-20150826-1400-UTC: 56.3 cached-microdesc-consensus-20150827-2300-UTC: 56.1 cached-microdescs-20150826-1100-UTC: 51.9 cached-microdescs-20150827-2300-UTC: 52.1 }}}
The micro descriptor consensuses are 1.2 MBytes, and the combined size of all cached microdescriptors is 2.9MBytes.
The microdescriptor consensus and microdescriptors are downloaded by most Tor clients (and, therefore, by most Tor instances). The microdescriptors are downloaded individually, so their ratios may be slightly lower. (The "full" consensus and descriptors aren't used by most clients, so they can be ignored for the purposes of this analysis.)
I suggest we go with the patch, and increase the expected ratio to 75%. We currently double the size of the buffer when we need to reallocate anyway, so we are already using that much RAM for every decompression, we're just allocating 50%, then reallocating 75%. (Except if the individual microdescriptor ratios fall under 50%, in those cases, we're only using 50%.)
Alternately, we could increase the expected ratio to 70%, saving 1 reallocation and 5% RAM in most cases. This would be a win for both performance and RAM usage. The 5% difference between 70% and 75% is 61 KByte for the 1.2 MByte micro descriptor consensus, which is the largest individual file. This optimisation costs us nothing, except that if we cut it too close, and the compression ratio improves, we get a memory doubling during decompression, which we’re trying to avoid. This would most likely happen in the full consensus decompression (currently 66% compression ratio on 1.4 MBytes).
To proceed with the patch, we need to know / decide:
What is the range of compression ratios on recent microdescriptors and microdescriptor consensuses? Do they vary much? (Does someone have an archive somewhere?)
Do we want to go with the 70% option to save both RAM and performance?