Re: [tor-dev] prop224: HSDir caches question with OOM

16 Apr 2016

...
On 16 Apr 2016, at 05:47, David Goulet <dgoulet@ev0ke.net> wrote:
Hi!
(For the following, I'm only talking about HS directory.)
Here is my conundrum. I was working on plugging the new cache for proposal 224
into our OOM. I stumbled upon a fun problem.
We plan to have both current and new HS protocol in parallel. This means two
caches for both version 2 (current) and 3 (prop224). So far so good.
However, here is the problem. Now, our OOM handlers is called in:
cell_queues_check_size(). The heuristic is basically:
/* If we're spending over 20% of the memory limit on hidden service
      * descriptors, free them until we're down to 10%.  */
Considering we have _two_ caches now, we somehow have to cleanup both of them
"evenly" that is if we have to remove 20kb we either remove 10kb on each (if
possible) or we actually clean them by entry expiry time instead regardless of
which version.
I think the latter is what we want but then it complicates things a bit since
both cache contains _different_ data struct. and indexed with different keys
(DIGEST vs DIGEST256). One option here is to go over all entries in both cache,
create temporary objects and put them in a sorted list (by expiry time). Those
"temp object" would contain something like: 1) cache version, 2) expiry time
and 3) index key. Once done, we go over that list and delete entries until we
reach our number of bytes we need to remove.
This could work since we would do that when we are at 20% memory but still it
requires allocating some small objects to _clear_ our memory... Someone has
maybe something _better_ to propose? :)
(keep in mind that minimal change to the current HS code is preferable. I know
that unifying both caches somehow would help but that's lots more work! and
the new cache for prop224 will be able to support versionning for which our
current one doesn't, it's baked for v2.)
I'm asking because this will require some refactoring for which I want to make
sure we are in agreement on how to proceed and it doesn't end up not the right
solution.
Basically my goal is to move the OOM handler for both v2 and v3 to one single
function called hs_cache_handle_oom() and from which we'll do the gymnastic
describe above. We'll stop calling rend_cache_clean_v2_descs_as_dir()
explicitely after that in cell_queues_check_size().
Having code that is only executed under rare conditions is error-prone.
Allocating memory at OOM time is error-prone.

We could add cache entries to the unified list when they are created, rather than at OOM time.
This also has the benefit of being faster than creating and sorting a large list at OOM time.

Alternately, we could iterate through each cache like this:
Find ta, the oldest expiry time in Cache A
Find tb, the oldest expiry time in Cache B
Set Cache X to the cache with the oldest descriptor
Set tx to the minimum expiry time from ta and tb
Set ty to the maximum expiry time from ta and tb
While tx <= ty and we are above the desired memory limit
  Deallocate the oldest descriptor from Cache X
  Set tx to the oldest expiry time in Cache X
If we are still above the desired memory limit, repeat this entire process

We should add entries to a time-sorted list when they are created, otherwise this algorithm is O(n^2).

A third alternative is that we can iterate through each time period:
Set K to the oldest expected descriptor age in hours, minus 1 hour
Deallocate all entries from Cache A that are older than K hours
Deallocate all entries from Cache B that are older than K hours
Set K to K - 1 and repeat this process

This algorithm is O(Kn), which is ok as long as K is small.
This carries a slight risk of over-deallocating cache entries. Which is OK at OOM time.
I like this one, because it's simple, performant, and doesn't need any extra memory allocations.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B
ricochet:ekmygaiu4rzgsk6n

Re: [tor-dev] prop224: HSDir caches question with OOM

Tim Wilson-Brown - teor