Hello,
On 4/16/2016 4:11 PM, David Goulet wrote: [snip]
A third alternative is that we can iterate through each time period: Set K to the oldest expected descriptor age in hours, minus 1 hour Deallocate all entries from Cache A that are older than K hours Deallocate all entries from Cache B that are older than K hours Set K to K - 1 and repeat this process
This algorithm is O(Kn), which is ok as long as K is small. This carries a slight risk of over-deallocating cache entries. Which is OK at OOM time. I like this one, because it's simple, performant, and doesn't need any extra memory allocations.
I do also like this one. It's pretty simple and efficient.
Now there is a fourth alternative that Yawning proposed in #tor-dev yesterday which is always prioritize our v2 cache in the OOM handling that is clean the v2 before than if we have to go to the v3 cache. It would be an incentive to "v3 is much more important than v2" kind of thing.
As he describe it, it's a bit like our tap vs ntor situation under pressure, we prioritize ntor and drop tap if needed.
I'm still quite _unsure_ about this. The v3 will bring more memory pressure with this second HSDir cache. And my intuition is that most users won't switch directly to v3 but will probably have a migration path from v2 to v3 like having the v2 onion on for X months before discontinuing it.
So losing reachability because we decide to drop v2 first could not be desirable. But then also how often does a HSDir OOM is triggered... ?
Anyway, right now I'm leaning towards your approach teor of just using the time-period.
More eyes on this would be great :).
Cheers! David
I agree that teor's O(Kn) is the best approach from performance (no additional memory allocations), simplicity and efficacy point of view. O(Kn) algorithm will clear the entries only based on their expiration time, it won't care to clean the v2 / v3 caches in equal measure which is good, given that we do not know how long HS operators will take / need to upgrade their services to prop 224.
The tap vs ntor situation was a good measure, but the threat model was different (we were trying to ensure new clients using ntor get resources from relays with priority as opposite to non-updated botnet zombies using tap). In the current situation we care about v2 and v3 HS caches exactly the same, for an unknown period of time which might not be short, so we shouldn't penalize v2 in any way.
This needs to be covered regardless how often a HSDir has its OOM triggered. I don't think we should assume it's hard to flood HSDirs with descriptors until the memory is full.
Now that HSDirs will need to handle two caches, is 20% of the total memory allocated for HS descriptors a good value? What harm would increasing it to let's say 25% do?