[or-cvs] r9993: Describe a simpler implementation for proposal 108, and note (in tor/trunk: . doc/spec/proposals)
nickm at freehaven.net
Fri Apr 27 15:49:45 UTC 2007
On Wed, Apr 25, 2007 at 03:16:28PM -0400, Roger Dingledine wrote:
> On Fri, Apr 20, 2007 at 01:17:15PM -0400, nickm at seul.org wrote:
> > tor/trunk/doc/spec/proposals/108-mtbf-based-stability.txt
> > +Alternative:
> > +
> > + "A router's Stability shall be defined as the sum of $alpha ^ d$ for every
> > + $d$ such that the router was not observed to be unavailable $d$ days ago."
> > +
> > + This allows a simpler implementation: every day, we multiply yesterday's
> > + Stability by alpha, and if the router was running for all of today, we add
> > + 1.
> I don't think you mean quite that. For a server that just appeared,
> there are an infinite number of previous days where it was not observed
> to be unavailable. Do you mean 'was observed to be available'?
Ah, you're right.
> And by available, do we mean available for the entire day?
I think so, for arbitrary values of "day".
> What are some ways we can choose \alpha?
We should probably decide how much we'd like to discount the distant
past. Something between .80 and .95 is probably around right.
> > +Limitations:
> > +
> > + Authorities can have false positives and false negatives when trying to
> > + tell whether a router is up or down. So long as these aren't terribly
> > + wrong, and so long as they aren't significantly biased, we should be able
> > + to use them to estimate stability pretty well.
> I haven't seen any discussion about how the router's declared uptime fits
> into this. If a router goes down and then comes up again in between
> measurements, the proposed approach will treat it as being up the
> whole time -- yet connections through it will be broken. One approach
> to handling this would be to notice if the uptime decreases from one
> descriptor to the next. This would indicate a self-declared downtime
> for the router, and we can just figure that into the calculations.
This would be a good thing, but it _would_ give routers incentive to
lie about uptime.
> I'm not sure how we should compute the length of the downtime though:
> in some cases it will be just a split second as for a reboot or upgrade,
> but in others maybe the computer, network, or Tor process went down
> and then came back a long time later. I guess since our computations
> are just rough approximations anyway, we can just assume a zero-length
> downtime unless our active testing also noticed it.
Actually, I chose "up for an entire day" as a minimum quantum for a
reason. The main problem with router instability isn't the fraction
of time it's down; if you try to connect to a router that isn't there,
that's not a big deal. The problem with router instability is the
likelihood that it will _go_ down and drop all your circuits.
Remember, a router that goes down for 5 minutes out of a every hour
has a _higher_ fractional uptime than a router that goes down for one
day out of every week... but the latter router is far more stable, and
far more useful if your goal is long-lived circuits.
(That's why I originally chose MTBF rather than uptime percentage.
I'm _trying_ to approximate the same insight by requiring you to be up
for the entirety of a day rather than a fraction of it, but there may
be better ways to approximate it.)
> Speaking of the active testing, here's what we do right now:
> Every 10 seconds, we call dirserv_test_reachability(), and it tries making
> connections to a different 1/128 of the router list. So a given router
> gets tried every 1280 seconds, or a bit over 21 minutes. We declare a
> router to be unreachable if it has not been successfully found reachable
> within the past 45 minutes. So at least two testing periods not to go
> by before a running router is considered to be no longer running.
> So our measurements won't be perfect, but I think this approach is a
> much better one than just blindly believing the uptime entry in the
> router descriptor.
> What is our plan for storing (and publishing?) the observed uptime
> periods for each router?
I don't think publishing is necessary; there's nothing to stop us from
doing it later if we chhose.
To store the uptime, I was thinking of a flat file written
periodically; it would probably be something like 64K at the moment,
which wouldn't be a big problem for authorities to flush every 10
minutes or so. If we wanted to be fancier, we could keep an
append-only events journal, and periodically use it to rebuild a
status file, but that doesn't seem necessary.
We could also start poking at the dark sad world of Berkeley DB and
friends, I guess. The annoyances of that are well known, but it won't
be too bad if we only require it on authorities.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 652 bytes
Desc: not available
More information about the tor-dev