Proposal 161: Computing Bandwidth Adjustments
nickm at torproject.org
Wed May 13 05:30:06 UTC 2009
On Tue, May 12, 2009 at 09:46:05PM -0700, Mike Perry uttered:
> Thus spake Nick Mathewson (nickm at torproject.org):
> > There are some magic numbers here: Why 3%? How large are these files
> > and how often are they downloaded? Why "several hundred"? Is the
> > "several hundred" a function of the "45" above?
> Currently no. I've been meaning to do some statistics on how long it
> takes to converge. Eyeballing seems to indicate the filtered values
> converge after about 100 fetches or so, or about 3-4 per node. I've
> been doing runs of 250 just for good measure.
As discussed on IRC, we should see if we can get away with fetching
smaller files to test the weaker slices. If we can, hooray for us! If
it doesn't work, too bad. :/
As for the slice size, it seems that maybe a quasi-constant slice size
would work better than having it be fixed at 3% of the network. That
way, we always send the same amount of traffic per node, rather than
getting sparser and sparser as the slices get bigger but our traffic
stays the same.
> > I want to see a calculation for what fraction of the network capacity
> > we expect to use for this testing. I wouldn't expect it to be over a
> > percent, but I'd like to see the math to prove it.
> This is a function of how much we parallelize and how many scanners we
> have. Right now, there is 0 parallelization built in, which means that
> each scanner takes up approximately 20-50Kbytes/sec of two nodes at a
> time. This is compared to the 120Mbytes/sec of exit capacity Tor
> currently has.
> > Also, how long does it take to run these tests over the network? If
> > it takes much longer than a consensus interval, we should think about
> > what happens if a node's advertised capacity changes greatly, and
> > whether we want to treat newer measurements as more accurate.
> Each slice takes around 3-6 hours. The entire network can take as long
> as a week.. Hence the desire to parallelize..
My intuition says that unless we scan the whole network every two days
(or ideally less), our data will likely be useless.
What if, instead of running in a batch mode, we were constantly
scanning and iteratively refining our guesses for ratios, discounting
old data as newer data arrived? We'd still want to parallelize some,
but we wouldn't run into the problem of having some nodes' estimated
values be far more accurate than others.
> For the capacity change case, let's consider what happens in stages:
> 0. The node gets more capacity from its ISP or by boosting rate
> limiting (these may not actually be equivalent cases, but lets
> assume they are for now).
> 1. It is able to transmit more bytes in a 10 second interval. If it
> has enough clients trying to use it through other high capacity
> nodes, it will eventually fill its capacity during this interval.
> 2. Between now and the next time the consensus is published, the
> streams through it will exhibit higher average capacity.
> 3. At the next consensus interval, the authorities will publish the
> higher advertised bandwidth that the node has observed.
> 4. As the published value makes its way to clients, they will
> choose the node more often for their streams.
> 5. The average stream capacity of the node should now begin to drop.
> Steps 1-3 may actually occur over several consensus intervals. I've
> noticed that changing the capacity on my nodes drastically actually
> takes several consensus publishings to really boost their observed
> values to the maximum, probably because of the limiting factor of the
> spare capacity of other nodes in the circuits that run through my
> So we can see that we may actually notice higher capacities of nodes
> before the values are published in the consensus (ie steps 0-2). This
> should not be a big problem, as our reported bandwidth updates will be
> based on multiples of the old consensus values, and should approximate
> the new capacity of the node.
> The other issue is between steps 3 and 5: The node's new capacity is
> published, but it is not yet attracting enough client traffic to bring
> its average stream capacity down. In this case, we may publish a value
> that is too high.
> I'm not sure what to do with this second problem as it is
> non-deterministic. Using the alpha smoothing factor from your reply to
> 160 seems like it should help. I do actually track bandwidth changes
> during the scan, I and I can write some code to abstain from
> publishing updates to nodes whose bandwidths change drastically during
> the scan. But what is "drastically", and is this edge case worth the
> complexity, or is the alpha smoothing good enough?
I don't know, alas. Is there a way we can measure how this is working
in practice? I think that if we refresh our bandwidth estimates only
(say) every day or so, we will really hammer nodes that want to change
their bandwidth downward, which seems poor. Lagging on nodes that
raise their bandwidth isn't quite as bad.
Roger -- any intuition on this one? It is too hard for 01:30 EDT Nick.
> > (Crazy idea 1: If the values change too fast or slow, we could
> > exponentiate the ratios out of the algorithm before using them.
> > Slightly less crazy idea 2: if the values change too fast, we could do
> > a standard time-weighted average, where we compute the new declared
> > bandwidth BW' as a weighted average of the old declared bandwidth and
> > the new measured bandwidth.)
> Yeah, I like idea 2, basically what you mentioned in 160. This would
> be done Tor-end, not scanner end, right?
I don't have a strong feeling about that. To me, it seems easier to
do it at the scanner end, or as a processing step between the scanner
end and the authorities... but that's just because I'm thinking of the
scanner stuff as easier to change if we think of a better algorithm.
More information about the tor-dev