Node-Weight Balancing Corrections

Thu Jan 21 00:17:33 UTC 2010

For purposes of load balancing, exit and guard node node bandwidths
are weighted by fractional values when they are probabilistically
selected for circuit positions other than their primary purpose. This
done is to govern scarcity and properly distribute load amongst the
node classes.

The exit weight and guard weight are both currently computed from the
formulas in: http://archives.seul.org/or/dev/Jul-2007/msg00056.html
and http://archives.seul.org/or/dev/Jul-2007/msg00021.html.

We unfortunately made two oversights in the derivation of these
formulas. Firstly, exit-only nodes cannot be placed in the Entry
position, and Guard-only nodes cannot be placed in the Exit position.
Secondly, Guard+Exit flagged nodes are not properly used in many cases
of scarcity.

These oversights are evident in measurement results and in review of
the source code. 

While looking over data for Bug 1217
(https://bugs.torproject.org/flyspray/index.php?do=details&id=1217), I
discovered that nodes that were flagged for both Guard and Exit seemed
to be being rated as significantly less loaded by the bandwidth
authorities than either Guards or Exits alone. I also noticed that
Guard, Middle, and Exit capacity measurements were vastly different on
average. These measurements were observed using the consensus
statistic gathering script in TorFlow:
https://svn.torproject.org/svn/torflow/trunk/NetworkScanners/statsplitter.py

The source code actually has three implementations of the above
formulas: one for guards, one for exits, and one for guard+exits
(which is simply a multiplication of the exit and guard result).

After spot-checking the source against different edge cases, it
quickly became apparent that for the case when either or both Guards
or Exits were scarce, we were not properly weighting Guard+Exit nodes
in smartlist_choose_by_bandwidth(). This can be seen by considering
what happens when exit_weight=0 and/or guard_weight=0 in that
function:

 if (is_exit && is_guard)
   bw = ((uint64_t)(bandwidths[i] * exit_weight * guard_weight));

Because of the multiplication, the result is that when both guards and
exits are scarce, Exit+Guard nodes cannot be used for either the Exit
or the Guard position. Additionally, they are not considered at their
full bandwidth weight for the exit position if exits but not guards
are scarce. Similarly, they are not properly weighted for the guard
position if guards but not exits are scarce. This is obviously wrong.

To derive the correct formulas for weighting Guard, Exit, and
Guard+Exit nodes, we actually need to consider four separate cases of
Exit and Guard capacity:

1. Neither are scarce
2. Both Guard and Exit are scarce
3. One of Guard or Exit is scarce

Case 1: Neither are scarce

This is the most general case, and also the most common for the
current Tor network.

Let G be the total Guard-only capacity.
Let M be the total capacity of all Middle-only nodes.
Let E be the total Exit-only capacity.
Let D be the total Exit+Guard capacity (Dual-flagged nodes)
Let T be the total network capacity.

Note that M, E, G, and D are disjoint, and thus:
 T = G + M + E + D

If neither Guards nor Exits are scarce, we know that E > T/3 and 
G > T/3. Therefore the remainder of the bandwidth for middle nodes
must be M+D < T/3. In this case, it is relatively easy to see that the
Guard+Exit bandwidth could be devoted exclusively to the middle
position.

However, this will reduce entropy in the network, and exclude some
potentially scarce, stable Exit+Guard nodes from ever being used as
Exits. Let's instead divide the Exit+Guard bandwidth equally between
the Guard, Middle, and Exit positions in this case.

In order to do this, we want to define a set of constraints such that
the amount of bandwidth from Guard (G), Middle (M), Exit (E), and
Guard+Exit (D) nodes devoted to the entry, middle, and exit positions
are equal.

Let Wgd be the weight for choosing a Guard+Exit for the guard position.
Let Wmd be the weight for choosing a Guard+Exit for the middle position.
Let Wed be the weight for choosing a Guard+Exit for the exit position.

Let Wme be the weight for choosing an Exit for the middle position.
Let Wmg be the weight for choosing a Guard for the middle position.

Let Wgg be the weight for choosing a Guard for the guard position.
Let Wee be the weight for choosing an Exit for the exit position.

The allocation equality condition is then:

  Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G   (guard bw = middle bw)
  Wgg*G + Wgd*D = Wee*E + Wed*D               (guard bw = exit bw)

We also want to constrain the Guard+Exit weights to divide the
bandwidth equally between the entry, middle, and exit positions:

  Wed*D + Wmd*D + Wgd*D = D               (aka: Wed + Wmd + Wdg = 1)
  Wed = Wmd = Wgd = 1/3

Also, since the amount of Guard and Exit bandwidth placed on the
middle node is also removed from the entry and exit positions:

  Wmg*G + Wgg*G = G                       (aka: Wgg = 1-Wmg)
  Wme*E + Wee*E = E                       (aka: Wee = 1-Wme)

We can then simplify down to 2 equations with 2 unknowns:

  1. (1-Wme)*E + D/3 = (1-Wmg)*G + D/3
  2. (1-Wmg)*G + D/3 = M + D/3 + Wme*E + Wmg*G

Solving these two equations gives:

  Wmg = (2G - E - M)/3G
  Wme = (2E - M - G)/3E

Here is a Mathematica line to verify this from the general formula:
Reduce[{Wgg*G + Wgd*D == M + Wmd*D + Wme*E + Wmg*G,
        Wgg*G + Wgd*D == Wee*E + Wed*D,
        Wed*D + Wmd*D + Wgd*D == D,
        Wed == Wmd, Wmd == Wgd, Wgd == 1/3,
        Wmg*G + Wgg*G == G, Wme*E + Wee*E == E},
        {Wgg, Wgd, Wmd, Wme, Wmg, Wee, Wed}]

Case 2: Both Guards and Exits are scarce

If both exits and guards are scarce, we need to determine weightings
to distribute the Exit+Guard bandwidth evenly between the guard and
exit positions, and ignore the middle position. First, we should
consider some subcases.

Let R denote the more scarce class (Rare) between Guard vs Exit.
Let S denote the less scarce class.

Subcase a: R+D <= S

For this case, we should simply devote all of D to the more scarce
condition. This is because we don't have enough Exit+Guard bandwidth
to make the more-scarce position have as much capacity as the
less-scarce one.

Subcase b: R+D > S

For this case, we should divide D to make the two scarce classes
equal.

Using Case 1's General Form:

  Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G    (guard bw = middle bw)
  Wgg*G + Wgd*D = Wee*E + Wed*D                (guard bw = exit bw)
  Wed + Wmd + Wgd = 1                          (Wed*D+Wmd*D+Wgd*D = D)
  Wgg = 1-Wmg                                  (Wmg*G + Wgg*G = G)
  Wee = 1-Wme                                  (Wme*E + Wee*E = E)

We want guard bandwidth to equal exit bandwidth, and also:
   Wgg = Wee = 1 and Wme = Wmg = Wmd = 0.

Therefore, we know we want the following two equations to hold:
   1.   G +     Wgd*D = E + Wed*D     (guard bw = exit bw)
   2.   Wed*D + Wgd*D = D             (properly divide D)

Solving for Wed by adding 1+2:
    2*Wed*D + E + Wgd*D = G + Wgd*D + D
    2*Wed*D = G + D - E
    Wed = (G+D-E)/2D
    Wed = (G-E)/2D + 1/2

Solving for Wgd by rewriting 2 as Wgd=1-Wed:
    Wgd = 1-(G+D-E)/2D
    Wgd = (E+D-G)/2D
    Wgd = (E-G)/2D + 1/2

Here is a Mathematica line to verify this:
Reduce[{G + Wgd*D == E + Wed*D,
        Wed*D + Wgd*D == D},
       {Wed, Wgd}]

Case 3: One of Exit or Guard is scarce

If only one of exits or guards are scarce, we want to proceed
similarly as Case 2.

Let S be the scarce class.

This has two subcases:

Subcase a: S+D < T/3

In this case, we can simply treat D as the scarce class, and attempt
to balance the load between the non-scarce class and the middle nodes.
For simplification's sake, lets say S=G.

Using Case 1's General Form:

  Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G    (guard bw = middle bw)
  Wgg*G + Wgd*D = Wee*E + Wed*D                (guard bw = exit bw)
  Wed + Wmd + Wgd = 1                          (Wed*D+Wmd*D+Wgd*D = D)
  Wgg = 1-Wmg                                  (Wmg*G + Wgg*G = G)
  Wee = 1-Wme                                  (Wme*E + Wee*E = E)

We then want Wgg = Wgd = 1 and Wmd = Wed = Wmg = 0.

We're now left with:

  G + D < M + Wme*E                            (guard bw < middle bw)
  G + D < Wee*E                                (guard bw < exit bw)
  M + Wme = Wee*E                              (middle bw = exit bw)
  Wee = 1-Wme                                  (Wme*E + Wee*E = E)

Which is only two equations with two unknowns.

Solving for Wme:

  M + Wme = (1-Wme)*E
  M + Wme = E - Wme*E
  Wme + Wme*E = E - M
  (1+E)*Wme = E-M
  Wme = (E-M)/(1+E)

  Wee = (1+E)/(1+E) - (E-M)/(1+E)
  Wee = (1+M)/(1+E)

Here is a Mathematica line to verify this:
Reduce[{M + Wme == Wee*E, Wee == 1-Wme}, {Wme, Wee}]

Subcase b: S+D >= T/3

In this case, the formerly scarce class is no longer scarce, and now
we have a condition that is similar to Case 1 with the exception that
we want to change the D weighting from 1/3 for position S to whatever
quantity brings it up to T/3. For simplification's sake, lets say S=G:

  G+Wgd*D = T/3                                (guard bw = T/3)
  Wgd = (T/3-G)/D

Using Case 1's General Form:

  Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G    (guard bw = middle bw)
  Wgg*G + Wgd*D = Wee*E + Wed*D                (guard bw = exit bw)
  Wed + Wmd + Wgd = 1                          (Wed*D+Wmd*D+Wgd*D = D)
  Wgg = 1-Wmg                                  (Wmg*G + Wgg*G = G)
  Wee = 1-Wme                                  (Wme*E + Wee*E = E)

We then want Wgg = 1 and Wmg = 0.

We can then evenly divide D between middle and exit positions:

  Wed = Wmd

We're now left with:

  G + Wgd*D = M + Wed*D + Wme*E
  G + Wgd*D = (1-Wme)*E + Wed*D
  2*Wed*D + Wgd*D = D
  Wgd = (T/3-G)/D

This gives:

  Wed = 1/2 - (T/3-G)/2D
  Wme = (E-M)/2E

Here is a Mathematica line to verify this:
Reduce[{G + Wgd*D == M + Wed*D + Wme*E,
        G + Wgd*D == (1-Wme)*E + Wed*D,
        2*Wed*D + Wgd*D == D,
        Wgd == (T/3-G)/D,
        T == G + M + E + D},
        {Wed, Wme, Wgd}]

Implementation Notes:

The current smartlist_choose_by_bandwidth() calculates Guard, Exit,
and Total bandwidth based on the smartlist passed in, which has
already been restricted to viable nodes. We'll need to refactor this
function to receive G, E, M, D and T as parameters independent of the
current node selection list, which is a local position property
independent from the global capacity and balance of the network.

Furthermore, even these weights are approximate, because in actual
operation not all Exit nodes are equal. In fact, some nodes lack an
Exit flag, but still contribute Exit bandwidth for certain ports. We
could envision dividing the classes such that E and D are counted only
for the current exit port instead of purely based on flag, but this is
not always known at time of circuit construction.

The implementation should verify that the total weighted bandwidth
available to each position is properly balanced according to the
current scarcity case of the network. This will be difficult to do in
a unit test, but there should be debug ifdefs on non-release Tors to
check these conditions on the current consensus during selection.

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100120/46a1c9d6/attachment.pgp>