[tor-dev] prop224: HSDir caches question with OOM

David Goulet dgoulet at ev0ke.net
Tue Apr 19 21:22:01 UTC 2016

On 18 Apr (13:18:25), George Kadianakis wrote:
> Tim Wilson-Brown - teor <teor2345 at gmail.com> writes:
> > [ text/plain ]
> >
> >> On 16 Apr 2016, at 05:47, David Goulet <dgoulet at ev0ke.net> wrote:
> >> 
> >> Hi!
> >> 
> >> (For the following, I'm only talking about HS directory.)
> >> 
> >> Here is my conundrum. I was working on plugging the new cache for proposal 224
> >> into our OOM. I stumbled upon a fun problem.
> >> 
> >> <snip>
> >
> > We should add entries to a time-sorted list when they are created, otherwise this algorithm is O(n^2).
> >
> > A third alternative is that we can iterate through each time period:
> > Set K to the oldest expected descriptor age in hours, minus 1 hour
> > Deallocate all entries from Cache A that are older than K hours
> > Deallocate all entries from Cache B that are older than K hours
> > Set K to K - 1 and repeat this process
> >
> > This algorithm is O(Kn), which is ok as long as K is small.
> > This carries a slight risk of over-deallocating cache entries. Which is OK at OOM time.
> > I like this one, because it's simple, performant, and doesn't need any extra memory allocations.
> >
> I feel this is a reasonable approach here. It will probably have to be combined
> with time-sorted lists as teor suggested, otherwise each of those steps will
> require walking through the whole cache every time.
> BTW, before it says "Set K to K - 1 and repeat this process" we should also
> check whether our memory issue has been addressed and if so then we can exit,
> right?

Yeah I would do that. Every time we cleanup a cache, we check if we've reached
our bytes cutoff.

> In that case, maybe we can add the same check between these two checks:
> > Deallocate all entries from Cache A that are older than K hours
> > Deallocate all entries from Cache B that are older than K hours
> so that maybe sometimes we can get away with just removing stuff from the v2 cache.
> Are there any other issues with the above approach?

While implementing this solution, I found one annoying issue :).

In order to compute K, I use the maximum lifetime of a cache entry. So
basically if my maximum lifetime in hours is M=10 then K will start at 10 - 1.
(Where 1 hour is basically the "assumed" HS post period.)

However, our v2 and v3 cache have different lifetimes.

    v2: 48 hours is the max lifetime (REND_CACHE_MAX_AGE) plus the maximum
        clock skew of 24 hours (REND_CACHE_MAX_SKEW) == 72 hours.

    v3: We've settled on 33 hours which is 24h time period, 6 hours of overlap
        period and 3 hours of maximum clock skew.

So somehow, we need to "unify" K to fit both caches. Some options (some I do
not like):

1) We could iterate over both caches at first and find out the oldest
   descriptor which requires a first pass of O(n) + O(m). Use that value for
   our starting point of K. But then inherently, we'll most of the time wipe v2
   descriptors since they have a long lifetime. (maybe good!?)

2) We could be crazier and set the v2 max lifetime to be the same as the v3...
   (FYI, we already have an open ticket about that to bring it down).

3) Go back to having a sorted list of very small objects that would contains
   created timestamp, version and ptr to cache entry. This adds a bit more
   memory for each entry (not that crazy) and require more codes in both v2 and
   v3. (I'm sekritly trying to _not_ add more v2 code.)

4) <insert ideas>

> PS: We could also in theory imagine writing those descriptors on disk, and only
>     caching the ones that we have received requests for lately. This way we
>     would not utilize our RAM that much...

A new level of complexity here :). Before we go there, we should try to find
out how much memory an HSDir uses periodically for its cache. Fun stat to have
actually (for you out there stats lover! ;)

I expect that this could be a needed solution if HS adoption goes crazy while
not having a growth in the number of relays (and even then...). Proposal? :)


> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160419/df326fba/attachment.sig>

More information about the tor-dev mailing list