[tor-dev] Proposal: Load-balancing hidden services by splitting introduction from rendezvous

Nick Mathewson nickm at alum.mit.edu
Wed Oct 7 18:08:59 UTC 2015


On Wed, Sep 30, 2015 at 11:27 AM, Tom van der Woerdt <info at tvdw.eu> wrote:
> Hey all,
>
> I'd like your thoughts and comments on this proposal.
>
> Tom
>
>
> PS: If you want to deliver them in person, I'm in Berlin.
>
>
>
>
> Filename: xxx-intro-rendezvous-controlsocket.txt
> Title: Load-balancing hidden services by splitting introduction from
>        rendezvous


IMO great idea.   I ignored it until the Berlin meeting because the
title didn't reflect what it actually does in a way I understood.
Instead I would suggest a title more like:
   "Controller features to so hidden-service introduce2 handling to
happen on a separate host from rendezvous2 sending"

> Author: Tom van der Woerdt
> Created: 2015-09-30
> Status: draft
>
> 1. Overview and motivation
>
> To address scaling concerns with the onion web, we want to be able to
> spread the load of hidden services across multiple machines.
> OnionBalance is a great stab at this, and it can currently give us 60x
> the capacity by publishing 6 separate descriptors, each with 10
> introduction points, but more is better. This proposal aims to address
> hidden service scaling up to a point where we can handle millions of
> concurrent connections.
>
> The basic idea involves splitting the 'introduce' from the
> 'rendezvous', in the tor implementation, and adding new events and
> commands to the control specification to allow intercepting
> introductions and transmitting them to different nodes, which will then
> take care of the actual rendezvous. External controller code could
> relay the data to another node or a pool of nodes, all which are run by
> the hidden service operator, effectively distributing the load of
> hidden services over multiple processes.
>
> By cleverly utilizing the current descriptor methods, we could publish
> up to sixty unique introduction points, which could translate to many
> thousands of parallel tor workers. This should allow hidden services to
> go multi-threaded, with a few small changes.
>
>
> 2. Specification
>
> We propose two additions to the control specification, of which one is
> an event and the other is a new command. We also introduce a new
> configuration option.
>
>
> 2.1. DisableAutomaticRendezvous configuration option
>
> The syntax is:
>     "DisableAutomaticRendezvous" SP [1|0] CRLF
>
> This configuration option is defined to be a boolean toggle which, if
> set, stops the tor implementation from automatically doing a rendezvous
> when an INTRODUCE2 cell is received. Instead, an event will be sent to
> the controllers. If no controllers are present, the introduction cell
> should be dropped, as acting on it instead of dropping it could open a
> window for a DoS.
>
> For security reasons, the configuration should be made available only
> in the configuration files, and not as an option settable by the
> controller.
>
>
> 2.2. The "INTRODUCE" event
>
> The syntax is:
>     "650" SP "INTRODUCE" SP RendezvousData CRLF
>
>     RendezvousData = implementation-specific, but must not contain
>                      whitespace, must only contain human-readable
>                      characters, and should be no longer than 512 bytes
>
> The INTRODUCE event should contain sufficient data to allow continuing
> the rendezvous from another Tor instance. The exact format is left
> unspecified and left up to the implementation. From this follows that
> only matching versions can be used safely to coordinate the rendezvous
> of hidden service connections.

Recommendation: Allow it to be longer than 512 bytes (futureproofing),
rename it to something like "INTRODUCE_REQUEST_RECEIVED".

Recommendation: Specify what it would look like as implemented for today's Tor.


>
> 2.3. "PERFORM-RENDEZVOUS" command
>
> The syntax is:
>   "PERFORM-RENDEZVOUS" SP RendezvousData CRLF
>
> This command allows a controller to perform a rendezvous using data
> received through an INTRODUCE event. The format of RendezvousData is
> not specified other than that it must not contain whitespace, and
> should be no longer than 512 bytes.

Recommendation: Allow it to be longer than 512 bytes (futureproofing),
rename it to something like "ANSWER_RENDEZVOUS".

Recommendation: Specify what it would look like as implemented for today's Tor.


> 3. Compatibility and security
>
> The implementation of these methods should, ideally, not change
> anything in the network, and all control changes are opt-in, so this
> proposal is fully backwards compatible.
>
> Controllers handling this data must be careful to not leak rendezvous
> data to untrusted parties, as it could be used to intercept and
> manipulate hidden services traffic.
>
>
> 4. Example
>
> Let's take an example where a client (Alice) tries to contact Bob's
> hidden service. To do this, Bob follows the normal hidden service
> specification, except he sets up ten servers to do this. One of these
> publishes the descriptor, the others have this desabled. When the
> INTRODUCE2 cell arrives at the node which published the descriptor, it
> does not immediately try to perform the rendezvous, but instead outputs
> this to the controller. Through an out-of-band process this message is
> relayed to a controller of another node of Bob's, and this transmits
> the "PERFORM-RENDEZVOUS" command to that node. This node finally
> performs the rendezvous, and will continue to serve data to Alice,
> whose client will now not have to talk to the introduction point
> anymore.
>
>
> 5. Other considerations
>
> We have left the actual format of the rendezvous data in the control
> protocol unspecified, so that controllers do not need to worry about
> the various types of hidden service connections, most notably proposal
> 224.

IMO we need to specify what this looks like for current hidden
services, and for hidden services under proposal 224, or else this
proposal is not complete.

> The decision to not implement the actual cell relaying in the tor
> implementation itself was taken to allow more advanced configurations,
> and to leave the actual load-balancing algorithm to the implementor of
> the controller. The developer of the tor implementation should not
> have to choose between a round-robin algorithm and something that could
> pull CPU load averages from a centralized monitoring system.
>
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


More information about the tor-dev mailing list