Proposal 161: Computing Bandwidth Adjustments

Mike Perry mikeperry at fscked.org
Wed May 13 04:46:05 UTC 2009


Thus spake Nick Mathewson (nickm at torproject.org):

> > 2. Average Stream Bandwidth Calculation
> >  
> >   The average stream bandwidths are obtained by dividing the network
> >   into 3% slices according to advertised node bandwidth, yielding 
> >   about 45 nodes per slice in the current network.
> > 
> >   Two hop circuits are built using nodes from the same slice, and a large
> >   file is downloaded via these circuits. This process is repeated
> >   several hundred times, and average stream capacities are assigned to
> >   each node from these results.
> 
> There are some magic numbers here: Why 3%?  How large are these files
> and how often are they downloaded?  Why "several hundred"?  Is the
> "several hundred" a function of the "45" above?

Currently no. I've been meaning to do some statistics on how long it
takes to converge. Eyeballing seems to indicate the filtered values
converge after about 100 fetches or so, or about 3-4 per node. I've
been doing runs of 250 just for good measure.

> I want to see a calculation for what fraction of the network capacity
> we expect to use for this testing.  I wouldn't expect it to be over a
> percent, but I'd like to see the math to prove it.

This is a function of how much we parallelize and how many scanners we
have. Right now, there is 0 parallelization built in, which means that
each scanner takes up approximately 20-50Kbytes/sec of two nodes at a
time. This is compared to the 120Mbytes/sec of exit capacity Tor
currently has.
 
> Also, how long does it take to run these tests over the network?  If
> it takes much longer than a consensus interval, we should think about
> what happens if a node's advertised capacity changes greatly, and
> whether we want to treat newer measurements as more accurate.

Each slice takes around 3-6 hours. The entire network can take as long
as a week.. Hence the desire to parallelize..

For the capacity change case, let's consider what happens in stages:

 0. The node gets more capacity from its ISP or by boosting rate
    limiting (these may not actually be equivalent cases, but lets
    assume they are for now).
 1. It is able to transmit more bytes in a 10 second interval. If it 
    has enough clients trying to use it through other high capacity 
    nodes, it will eventually fill its capacity during this interval.
 2. Between now and the next time the consensus is published, the
    streams through it will exhibit higher average capacity.
 3. At the next consensus interval, the authorities will publish the
    higher advertised bandwidth that the node has observed.
 4. As the published value makes its way to clients, they will 
    choose the node more often for their streams.
 5. The average stream capacity of the node should now begin to drop.

Steps 1-3 may actually occur over several consensus intervals. I've
noticed that changing the capacity on my nodes drastically actually
takes several consensus publishings to really boost their observed
values to the maximum, probably because of the limiting factor of the
spare capacity of other nodes in the circuits that run through my
node.

So we can see that we may actually notice higher capacities of nodes
before the values are published in the consensus (ie steps 0-2). This
should not be a big problem, as our reported bandwidth updates will be
based on multiples of the old consensus values, and should approximate
the new capacity of the node.

The other issue is between steps 3 and 5: The node's new capacity is
published, but it is not yet attracting enough client traffic to bring
its average stream capacity down. In this case, we may publish a value
that is too high. 

I'm not sure what to do with this second problem as it is
non-deterministic. Using the alpha smoothing factor from your reply to
160 seems like it should help. I do actually track bandwidth changes
during the scan, I and I can write some code to abstain from
publishing updates to nodes whose bandwidths change drastically during
the scan. But what is "drastically", and is this edge case worth the
complexity, or is the alpha smoothing good enough?

> Another possibility I want to consider: what if, instead of putting
> nodes into slices based on their declared bandwidths, we put them into
> slices based on the most recent bandwidth for them in the consensus?
> That way, if a node advertises 100x as much bandwidth as it really
> has, we will soon start measuring it along with the really other slow
> nodes, rather than measuring it along with the high-capacity nodes
> forever.
> 
> [Ah.  I see on IRC that the above paragraph is what you already have
> in mind.]

Yes. We can also use the ratio of the consensus values to the
published values to evaluate experimental reweighting mechanisms (such
as Steven Murdoch's queuing theory-based weighting or adjusting
directory mirror weighting) without affecting the network balancing.
If the ratios get closer to 1, the new algorithm is an improvement. If
they get farther from 1, the new algorithm is worse.


>  [...]
> > 3. Ratio Filtering
> > 
> >   After the base ratios are calculated, a second pass is performed
> >   to remove any streams with nodes of ratios less than X=0.5 from
> >   the results of other nodes.   In addition, all outlying streams
> >   with capacity of one standard deviation below a node's average
> >   are also removed.
> > 
> >   The final ratio result will be calculated as the maximum of 
> >   these two resulting ratios if both are less than 1.0, the minimum
> >   if both are greater than 1.0, and the mean if one is greater
> >   and one is less than 1.0.
> 
> I am little confused about what happens here.  Can you clarify this
> with some pseudocode for the whole algorithm?  Based on what you've
> written, I _think_ it would go something like:

This is roughly right except for step 4 and 6:
 
>      1. Partition nodes into equal-sized sets of approximately 45
>         nodes, dividing slices by declared bandwidth.
>      2. For each slice, build about 100 two-hop circuits, and check
>         the bandwidth on each.
>      3. For each node N, let BW_measured(N) = MEAN(b | b is the bandwidth
>         of a stream that went through N).
>      4. For each slice S, let BW_avg(S) = MEAN(b | b is the bandwidth
>         of a stream built through S).  Let BW_stddev(S) = The standard
> 	deviation of all such streams.  Let Normal_Streams(S) = { all
> 	streams in S such that their bandwidth b is at least 0.5 *
>         BW_avg(S), and such that |b-BW_avg(S)|<= BW_stddev(S) }


Let Bw_stddev(N) = Standard deviation of stream bw through node N

Let Normal_Routers(S) = 
   {all routers with Bw_measured(N)/Bw_avg(s) > 0.5 }

Let Normal_Streams(N) = 
   {all streams containing N such that they do not contain any node in
    Normal_Routers(S) other than N and such that their capacity is 
    greater than BW_measured(N)-Bw_stddev(N) }

Let BW_avg2(S) = MEAN(b | b in Normal_Streams(N) for all N)

Note the high part of the stddev is not removed from Normal_Streams(N).


>      5. For each node N, let BW_measured2(N) = MEAN(b | b is the bandwidth
>         of a stream that went through N, for all streams in
>         Normal_Streams(N))

Let Bw_Ratio(N) = Bw_measured(N)/Bw_avg(S)

Let Bw_Ratio2(N) = Bw_measured2(N)/Bw_avg2(S)

      6. For each node N, if BW_ratio and BW_ratio2 are on the
         same side of 1.0, let BW_final(N) be whichever of BW_ratio
         and BW_ratio2 is closer to 1.0.  Otherwise let BW_final(N)
         be the mean of BW_ratio and BW_ratio2.
> 
> Am I close?  Let's get the right pseudocode into the proposal; this is
> a complicated enough algorithm that it could use a mathy exposition.

Ok. Will update it with some pseudocode based on the above.
 
> Also, I'm not sure what the motivation is.  Have you tried it without
> this step, and found that it didn't work so well?

Well, I was bothered by the fact that fast nodes were being penalized
by slow nodes when we happened to use both in the same circuit. I was
also bothered by the possibility of nodes being able to sabotage
faster nodes by acting really slow whenever they observed them as one
of their neighbors.

I actually notice better convergence if I employ filtering as well.

> > 4. Integration with Proposal 160
> > 
> >   The final results will be produced for the voting mechanism
> >   described in Proposal 160 by multiplying the derived ratio by 
> >   the average observed advertised bandwidth during the course of the
> >   scan. This will produce a new bandwidth value that will be 
> >   output into a file consisting of lines of the form:
> > 
> >   <node-idhex> SP new_bandwidth NL
> > 
> >   This file can be either copied or rsynced into a directory readable
> >   by the directory authority.
> 
> Let's make this a little more extensible, so we can add new fields in
> the future without breaking things.  Let's say that the format is
> 
>     "node id=<idhex> SP bw=<new_bandwidth> NL"
> 
> and specify that implementations must ignore any lines starting with
> any keywords they don't recognize, or any key=value values inside a
> line if they don't recognize the keyword.
> 
> That way, if we want to tell Tor more in the future, we can.
> 
> 
> * *
> 
> So, something that didn't come up: I think that it's important to have
> values for weights change a bit slowly based on node speed or
> slowness.  Otherwise, we can wind up with a situation where we notice
> a node is slow, so everybody leaves, so we notice it is fast, so
> everybody hammers it.  I _think_ that the above algorithm has this
> property--does it?
> 
> (Crazy idea 1: If the values change too fast or slow, we could
> exponentiate the ratios out of the algorithm before using them.
> Slightly less crazy idea 2: if the values change too fast, we could do
> a standard time-weighted average, where we compute the new declared
> bandwidth BW' as a weighted average of the old declared bandwidth and
> the new measured bandwidth.)

Yeah, I like idea 2, basically what you mentioned in 160. This would
be done Tor-end, not scanner end, right?


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20090512/0d6525d8/attachment.pgp>


More information about the tor-dev mailing list