[tor-dev] prop224: HSDir caches question with OOM

Tim Wilson-Brown - teor teor2345 at gmail.com
Wed Apr 20 01:04:26 UTC 2016

> On 20 Apr 2016, at 07:22, David Goulet <dgoulet at ev0ke.net> wrote:
> On 18 Apr (13:18:25), George Kadianakis wrote:
>> Tim Wilson-Brown - teor <teor2345 at gmail.com> writes:
>>> [ text/plain ]
>>>> On 16 Apr 2016, at 05:47, David Goulet <dgoulet at ev0ke.net> wrote:
>>>> Hi!
>>>> (For the following, I'm only talking about HS directory.)
>>>> Here is my conundrum. I was working on plugging the new cache for proposal 224
>>>> into our OOM. I stumbled upon a fun problem.
>>>> <snip>
>>> We should add entries to a time-sorted list when they are created, otherwise this algorithm is O(n^2).
>>> A third alternative is that we can iterate through each time period:
>>> Set K to the oldest expected descriptor age in hours, minus 1 hour
>>> Deallocate all entries from Cache A that are older than K hours
>>> Deallocate all entries from Cache B that are older than K hours
>>> Set K to K - 1 and repeat this process
>>> This algorithm is O(Kn), which is ok as long as K is small.
>>> This carries a slight risk of over-deallocating cache entries. Which is OK at OOM time.
>>> I like this one, because it's simple, performant, and doesn't need any extra memory allocations.
>> I feel this is a reasonable approach here. It will probably have to be combined
>> with time-sorted lists as teor suggested, otherwise each of those steps will
>> require walking through the whole cache every time.

Yes, that's why I said O(K*n), which is alright as long as K is small.
If we keep sorted lists (warn: v2 code changes), then the algorithm is O(n) instead.

>> BTW, before it says "Set K to K - 1 and repeat this process" we should also
>> check whether our memory issue has been addressed and if so then we can exit,
>> right?
> Yeah I would do that. Every time we cleanup a cache, we check if we've reached
> our bytes cutoff.

Yes, we can terminate the algorithm as soon as we reach our goal.

>> In that case, maybe we can add the same check between these two checks:
>>> Deallocate all entries from Cache A that are older than K hours
>>> Deallocate all entries from Cache B that are older than K hours
>> so that maybe sometimes we can get away with just removing stuff from the v2 cache.
>> Are there any other issues with the above approach?
> While implementing this solution, I found one annoying issue :).
> In order to compute K, I use the maximum lifetime of a cache entry. So
> basically if my maximum lifetime in hours is M=10 then K will start at 10 - 1.
> (Where 1 hour is basically the "assumed" HS post period.)

Note that we can decrement K by a multiple of the post period - for example,
it can be 3 hours: 9, 6, 3, 0.

This makes the algorithm run faster (as it would iterate through the unsorted v3
list 11 times, rather than 33 times), at the cost of sometimes deallocating some
more recent descriptors in that 3 hour period.

> However, our v2 and v3 cache have different lifetimes.
>    v2: 48 hours is the max lifetime (REND_CACHE_MAX_AGE) plus the maximum
>        clock skew of 24 hours (REND_CACHE_MAX_SKEW) == 72 hours.
>    v3: We've settled on 33 hours which is 24h time period, 6 hours of overlap
>        period and 3 hours of maximum clock skew.
> So somehow, we need to "unify" K to fit both caches. Some options (some I do
> not like):
> 1) We could iterate over both caches at first and find out the oldest
>   descriptor which requires a first pass of O(n) + O(m). Use that value for
>   our starting point of K. But then inherently, we'll most of the time wipe v2
>   descriptors since they have a long lifetime. (maybe good!?)

I think that clients are just as likely to re-use descriptors that are an hour old,
regardless of the maximum lifetimes of v2 and v3 descriptors.

So I don't care if we take out more v2 descriptors.

> 2) We could be crazier and set the v2 max lifetime to be the same as the v3...
>   (FYI, we already have an open ticket about that to bring it down).

I am ok with this, but we have no stats on it, so it's hard to tell what impact it
will have (if any).

Cutting the size of the v2 cache in half would also make space for the v3 cache
during the transition period. So I support this idea.

> 3) Go back to having a sorted list of very small objects that would contains
>   created timestamp, version and ptr to cache entry. This adds a bit more
>   memory for each entry (not that crazy) and require more codes in both v2 and
>   v3. (I'm sekritly trying to _not_ add more v2 code.)

This would make deallocation O(n), but as you say, it involves v2 code changes.
I wonder if we will want the sorted list in v3 going forward, regardless of what we do
with v2.


Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160420/dbc9b0ce/attachment.sig>

More information about the tor-dev mailing list