[tor-dev] Proposal: Load Balancing with Overhead Parameters

Mike Perry mikeperry at torproject.org
Thu Jan 14 16:07:47 UTC 2016

Tim Wilson-Brown - teor:
> > On 13 Jan 2016, at 00:53, Mike Perry <mikeperry at torproject.org> wrote:
> > 1. Overview
> > 
> > For padding overhead due to Proposals 251 and 254, and changes to hidden
> > service path selection in Proposal 247, it will be useful to be able to
> > specify a pair of parameters that represents the additional traffic
> > present on Guard and Middle nodes due to these changes.
> I don't know if it's worth noting that proposals 252 and 260 (the Single Onion
> Services variants) will reduce traffic to guard and middle nodes, as they
> remove the nodes between the hidden service and client-controlled endpoint
> (rendezvous point in Rendezvous Single Onion Services, extend point in
> Single Onion Services).
> I think these will just be part of the overhead parameters?

Probably, though this will require us to have some estimation on the
amount of network traffic using these services. Will they be broken out
separately in the extra-info hidden service stats?

> > 2. Simplified Load Balancing Equations
> > 
> > Recall that the point of the load balancing equations in section 3.8.3
> > of dir-spec.txt is to ensure that an equal amount of client traffic is
> > distributed between Guards, Middles, Exits, and Guard+Exits, where each
> > flag type can occupy one or more positions in a path. This allocation is
> > accomplished by solving a system of equations for weights for flag
> > position selection to ensure equal allocation of client traffic for each
> > position in a circuit.
> > 
> > If we ignore overhead for the moment and treat Guard+Exit nodes as Exit
> > nodes, then this allows the simplified system of equations to become:
> > 
> >  Wgg*G == M + Wme*E + Wmg*G    # Guard position == middle position
> >  Wgg*G == Wee*E                # Guard position == equals exit position
> >  Wmg*G + Wgg*G == G            # Guard allocation weights sum to 1
> >  Wme*E + Wee*E == E            # Exit allocation weights sum to 1
> I think this analysis assumes that all paths in the Tor network are 3-hop paths.

That is correct.
> What about the 4-hop paths created by circuit cannibalisation?
> (We don't actually know what proportion of the Tor network has 3-hop vs 4-hop paths.)
> Depending on whether an exit or internal circuit is cannibalised, they can look like:
> G M E E
> G M M E
> Or, if cannibalised from an exit or internal circuit:
> G M E M
> G M M M
> Again, I think these will just be part of the overhead parameters?

Ugh. This will be complicated to work out. Cannibalization rate is a
function of how often you need to build a path, and have circuits
available, but none with the right exit/endpoint at the time. This will
be hard to predict accurately in the wild.

Cannibalization has been the bane of my existence every time I've made
any performance or path selection change to Tor. It was an optimization
introduced because TAP was believed to be too CPU-expensive to allow us
to opt to build fresh circuits when we needed a different endpoint than
was currently available in the open set, and so we needed to add the
new required endpoint to an existing open circuit instead.

Now that we have ntor, I am wondering if we can just remove
cannibalization. My guess is that without it, we will pay some up-front
latency cost in terms of needing to build fresh full circuits in some
(rare?) cases, but probably will have faster and higher capacity
circuits for those cases once they are built (especially since the
Circuit Build Timeout cutoff works better without factoring in
cannibalization time, as well). Circuit build prediction should also
reduce this cost over time, since it will ensure that we always have a
circuit open to recently used endpoint types (either exit ports or
hidserv-capable circs).

David Goulet supposedly has a simulator+test script to examine the
impact of cannibalization. He said that he'd look into updating it for
0.2.7/master, but here's a graph of his results from 0.2.5:

Basically, that says that unless you make a whole lot of rendezvous
connections, cannibalized circuits are slower. I'm not sure why their
speed would be a function of quantity at all.. As I said, circuit
prediction should reduce the need for cannibalization over time, but it
shouldn't actually make the circuits that do get cannibalized any

All told, I think in the interest of simplification and also better load
balancing, we should still kill cannibalization for most cases. The one
case where I think it may still be desired is in the connection to the
hsdir, since it seems natural to just extend an existing circuit rather
than build a whole new one there.

> And what about hidden service paths (paths that include two middle nodes?)
> G M M

This one definitely needs to be included in the M_o overhead. I believe
the right additive amount is the current estimated percentage of network
hidden service traffic (by bytes), but we need to make sure we use the
estimate that reflects the fact that two hidden service paths are used
for each hidden service connection. So: network total hidden service
traffic percentage, not actual total hidden service throughput.

Does that make sense?

Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160114/e794ec6f/attachment.sig>

More information about the tor-dev mailing list