On Thu, Oct 11, 2012 at 5:32 AM, Mike Perry mikeperry@torproject.org wrote:
I misread this paragraph at first. I thought you were suggesting 3 parallel directory downloads when in fact you were discussing 3 parallel TLS connections, with only the first one that finishes actually getting a download.
[...]
Design: Fallback Dir Mirror Selection
Out of scope for this proposal; relevant for proposal 206.
To be clear, it's the part of this proposal that's shared with proposal 206 (directory sources) that would lower load on the authorities.
How do we know that bootstrapping is rare?
This looks like an argument of the form "The outcome would be horrible, but the current outcome is also horrible, so we wouldn't break stuff any worse." Right?
I wonder if in this case the answer isn't to actually back off from fetching after N minutes or M servers, like a sane system. Or to treat "hey, that's not a good consensus!" as different from "couldn't connect to directory server" in terms of what it means for how we back off.
Thus spake Nick Mathewson (nickm@alum.mit.edu):
Ok. Consider it a vote for your "third option" in proposal 206 then.
Also consider that I wrote this proposal in such a way that it both depends on 206, and is meant to make it possible to relax our requirements mirror selection for 206.
I think the parallel connection idea makes us have to worry much less about vetting the fallback dir mirrors quite so rigorously for uptime+longevity, in addition to improving bootstrap delay in the event of dirauth downtime.
Yes, this proposal depends upon 206. It doesn't make as much sense to implement it by itself, I don't think.
I guess it depends on the definition of rare. I meant compared to normal directory activity.
The lack of a TBB update mechanism probably does make bootstrap more prevalent than we'd like, I guess.
Also, if idle clients bootstrap if they've been idle more than 24 hours, then it's probably quite prevalent. I assumed they at least attempted to keep their consensus fresh, even if they were not being used. Am I wrong?
Well, more like "the outcome would be slightly less horrible, but also more resilient to unavailability, and more performant."
I analyzed the extreme case specifically because it allows us to more easily see the load consequences of the scheme than if we were to get bogged down by say, trying to estimate bootstrap frequency in normal operations. I think that is a distraction.
I have limits on the number of retries and total concurrent connection counts in the proposals. We can tweak them.
I thought about putting in a back-off in terms of retry frequency, but it didn't seem like a clear win over just limiting things in the first place, because there's already an implicit backoff by virtue of simply waiting for the TLS connection timeouts to expire once we hit the total pending connection limit.