On 18 Apr (13:18:25), George Kadianakis wrote:
Tim Wilson-Brown - teor teor2345@gmail.com writes:
[ text/plain ]
On 16 Apr 2016, at 05:47, David Goulet dgoulet@ev0ke.net wrote:
Hi!
(For the following, I'm only talking about HS directory.)
Here is my conundrum. I was working on plugging the new cache for proposal 224 into our OOM. I stumbled upon a fun problem.
<snip>
We should add entries to a time-sorted list when they are created, otherwise this algorithm is O(n^2).
A third alternative is that we can iterate through each time period: Set K to the oldest expected descriptor age in hours, minus 1 hour Deallocate all entries from Cache A that are older than K hours Deallocate all entries from Cache B that are older than K hours Set K to K - 1 and repeat this process
This algorithm is O(Kn), which is ok as long as K is small. This carries a slight risk of over-deallocating cache entries. Which is OK at OOM time. I like this one, because it's simple, performant, and doesn't need any extra memory allocations.
I feel this is a reasonable approach here. It will probably have to be combined with time-sorted lists as teor suggested, otherwise each of those steps will require walking through the whole cache every time.
BTW, before it says "Set K to K - 1 and repeat this process" we should also check whether our memory issue has been addressed and if so then we can exit, right?
Yeah I would do that. Every time we cleanup a cache, we check if we've reached our bytes cutoff.
In that case, maybe we can add the same check between these two checks:
Deallocate all entries from Cache A that are older than K hours Deallocate all entries from Cache B that are older than K hours
so that maybe sometimes we can get away with just removing stuff from the v2 cache.
Are there any other issues with the above approach?
While implementing this solution, I found one annoying issue :).
In order to compute K, I use the maximum lifetime of a cache entry. So basically if my maximum lifetime in hours is M=10 then K will start at 10 - 1. (Where 1 hour is basically the "assumed" HS post period.)
However, our v2 and v3 cache have different lifetimes.
v2: 48 hours is the max lifetime (REND_CACHE_MAX_AGE) plus the maximum clock skew of 24 hours (REND_CACHE_MAX_SKEW) == 72 hours.
v3: We've settled on 33 hours which is 24h time period, 6 hours of overlap period and 3 hours of maximum clock skew.
So somehow, we need to "unify" K to fit both caches. Some options (some I do not like):
1) We could iterate over both caches at first and find out the oldest descriptor which requires a first pass of O(n) + O(m). Use that value for our starting point of K. But then inherently, we'll most of the time wipe v2 descriptors since they have a long lifetime. (maybe good!?)
2) We could be crazier and set the v2 max lifetime to be the same as the v3... (FYI, we already have an open ticket about that to bring it down).
3) Go back to having a sorted list of very small objects that would contains created timestamp, version and ptr to cache entry. This adds a bit more memory for each entry (not that crazy) and require more codes in both v2 and v3. (I'm sekritly trying to _not_ add more v2 code.)
4) <insert ideas>
PS: We could also in theory imagine writing those descriptors on disk, and only caching the ones that we have received requests for lately. This way we would not utilize our RAM that much...
A new level of complexity here :). Before we go there, we should try to find out how much memory an HSDir uses periodically for its cache. Fun stat to have actually (for you out there stats lover! ;)
I expect that this could be a needed solution if HS adoption goes crazy while not having a growth in the number of relays (and even then...). Proposal? :)
Cheers! David
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev