Hello people,
you might have noticed that there is a redesign of hidden services going on.
For a while we've been told that "hidden services don't scale" and "there is a max number of clients that a hidden service can handle" so we decided to also consider hidden service scalability as part of the upcoming redesign. Unfortunately, we are not experienced in maintaining busy hidden services so we need some help here.
For example, a load balancing technique that hidden services are missing is DNS round-robin; that is, something that load balances on the IP layer (you send some clients to one IP and the rest to another IP). To do the equivalent in hidden services you have to do the load distribution in the Introduction Point layer. Some people have been thinking about this: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html https://lists.torproject.org/pipermail/tor-dev/2013-October/005674.html
Unfortunately, the above designs are not trivial to implement. They require the hidden service peers to communicate with each other so that they can generate and recycle keys, and it also puts Introduction Points in a position where they can find out the status or number of hidden service peers.
For this reason we started wondering whether DNS-round-robin-like scalability is actually worth such trouble. AFAIK most big websites use DNS round-robin, but is it necessary? What about application-layer solutions like HAProxy? Do application-layer load balancing solutions exist for other (stateful) protocols (IRC, XMPP, etc.)?
What should we do to make Hidden Services scale to a large number of clients?
Also, is "scalability" the correct phrase here? Should we replace it with a combination of "load balancing", "high availability", or something else?
[I sent this mail twice because the first did not get posted in tor-dev.]
On Fri, 20 Dec 2013, George Kadianakis wrote:
I think HAProxy and DNS round-robin solve different kinds of scalability issues:
- the application running behind the hidden service is taking too much ressources for a single node. Something like HAProxy can help split the application on multiple nodes, but all traffic still go through a single node.
- the network bandwidth that a single node can have or the number of connections it can handle is not enough. A solution like HAProxy doesn't help, because everything still has to go through a single node. I think the DNS round-robin is used to solve this kind of problem.
You'll probably run out of cpu crunching circuits before bandwidth or anything else. Multipublishing the same onion doesn't really work. Some services do code different onions in urls embedded in their pages to serve as their own cdn/scale but still take the first onion handoff hit. DNS over Tor does work fine with onioncat ipv6, but tor-devs seems determined to break this style of transport overlay with the new HS proposals, without providing a replacement. There is some partial utility in tcp resolvers on the host but it's not as useful as an overlay.
As some people said, another aspect of this project is increasing the 'availability' of hidden services.
That is, increasing the number of nodes of a Hidden Service so that even if one is down, the rest can still handle clients. This is not something that application-layer solutions like HAProxy can solve.
This even has security implications for Hidden Services since DoS attacks can lead to privacy attacks. For example, if someone suspects that a Hidden Service is hosted on a specific IP, they can shut down the IP and do a confirmation attack by checking if the HS is reachable anymore. If it's not reachable, chances are that the suspicion was correct.
If we could somehow allow multiple nodes to handle introductions and rendezvous requests, we might be able to make HSes more secure against DoS confirmation attacks like the above.
Unfortunately, this doesn't seem easy to do properly [0]. And of course, an attacker can still do confirmation attacks after the rendezvous is done, by sending specific amounts of data to the HS and checking the network of the suspected IP for that amount of incoming data...
[0]: For example, the proposed "scalable HS" solutions allow multiple nodes to handle introductions for an HS, but in the proposed designs the same node that handles the introduction is also the one that handles the rendezvous. If an attacker suspects a specific node to be part of an HS, she can do an introduction to it, then DoS it, and see if the node ever appears in the rendezvous point. If it doesn't, then the confirmation might be successful. There are probably even more subtle attacks that an attacker can do if she wants to do a confirmation attack against an HS node during the introduction or rendezvous steps.