[tor-dev] Proposal Waterfilling

Wed Mar 7 17:34:58 UTC 2018

Hi Florentin,

I've added some comments below.

Overall, I think a useful discussion for the community to have is to discuss whether or not we think Waterfilling is even a good idea in the first place, before you go ahead and do a bunch of work writing and fixing a proposal that may just end up in the pile of old grad student research ideas. (Maybe I'm too late, or maybe you want a proposal out there in any case.)

> On Mar 7, 2018, at 3:28 AM, Florentin Rochet <florentin.rochet at uclouvain.be> wrote:
> 
> Hi Aaron,
> 
> Thanks for your comments, you are definitely touching interesting aspects.
> 
> Here are thoughts regarding your objections:
> 
> 1) The cost of IPs vs. bandwidth is definitely a function of market offers. Your $500/Gbps/month seems quite expensive compared to what can be found on OVH (which is hosting a large number of relays): they ask ~3 euros/IP/month, including unlimited 100 Mbps traffic. If we assume that wgg = 2/3 and a water level at 10Mbps, this means that, if you want to have 1Gbps of guard bandwidth,
> - the current Tor mechanisms would cost you 3 * 10 * 3/2 = 45 euros/month
> - the waterfilling mechanism would cost you 3 * 100 = 300 euros/month
> 
> We do not believe that this is conclusive, as the market changes, and there certainly are dozens of other providers.
> 

Have you purchased service from OVH and run relays yourself? Have you talked to anyone who has? I strongly believe that you will not find a provider that legitimately offers you continuous 100 Mbit/s over a long period of time for 3 euros. Providers tend to use "unmetered" and "unlimited" bandwidth as marketing terms, but they don't actually mean what you think unlimited means. What they mean is that you have a 100 MBit/s network card, and they allow you to burst up to the full 100 MBit/s. However, they usually have a total bandwidth cap on such service, or become angry and threaten to disconnect your service if you don't cut down your usage (this has happened to me).

It is far more expensive to obtain *continuous*, i.e., *sustained* bandwidth usage over time. Generally, it's cheaper to buy in bulk. In the US, the cheapest bandwidth service we found (that also allows us to run Tor relays) was one that offers sustained 1 Gbit/s for an average of $500/month (including service fees).

> The same applies for 0-day attacks: if you need to buy them just for attacking Tor, then they are expensive. If you are an organization in the business of handling 0-day attacks for various other reasons, then the costs are very different. And it may be unclear to determine if it is easier/cheaper to compromise 1 top relay or 20 mid-level relays.
> 

It's hard to reason about this, since I'm not in the business. However, it you already have a zero-day, why would you want to waste it on a Tor relay? You would risk being discovered accessing the machine of a likely security-consious relay operator, and you could just run your own relays. Running your own relays does have some cost, but is far easier to manage and more reliable since you don't have to worry about being discovered or losing access because the software is patched.

> And we are not sure that the picture is so clear about botnets either: bots that can become guards need to have high availability (in order to pass the guard stability requirements), and such high availability bots are also likely to have a bandwidth that is higher than the water level (abandoned machines in university networks, ...). As a result, waterfilling would increase the number of high availability bots that are needed, which is likely to be hard.
> 

I think its much more likely that bots are running on my parents Windows machines than on high-bandwidth University machines. Sure, there might be some machines with outdated OSes out there on University networks, but they are also monitored pretty heavily for suspicious activity by the University IT folks, who regularly check in with the machine owners with anything suspicious occurs on the network.

> 2) Waterfilling makes it necessary for an adversary to run a larger number of relays. Apart from the costs of service providers, this large number of relays need to be managed in an apparently independent way, otherwise they would become suspicious to community  members, like nusenu who is doing a great job spotting all anomalies. It seems plausible that running 100 relays in such a way that they look independent is at least as difficult as doing that with 10 relays.
> 

But not much more difficult, and not difficult enough that an intern could not whip up a managed deployment in a few weeks. There are various tools out there that can automate software installation and configuration. Ansible, Chef, and Puppet are popular ones, but here is a longer list:

https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software

I would be surprised if at least TorServers.net didn't already use something like this, since they manage a large number of relays.

> 3) The question of the protection from relays, ASes or IXPs is puzzling, and we do not have a strong opinion about it. We focused on relays because they are what is available to any attacker, compared to ASes or IXPs which are more specific adversaries. But, if there is a consensus that ASes or IXPs should rather be considered as the main target, it is easy to implement waterfilling at the AS or IXP level rather than at the IP level: just aggregate the bandwidth relayed per AS or IXP, and apply the waterfilling level computation method to them. Or we could mix the weights obtained for all these adversaries, in order to get some improvement against all of them instead of an improvement against only one and being agnostic about the others.
> 
> 4) More fundamentally, since the fundamental idea of Tor is to mix traffic through a large number of relays, it seems to be a sound design principle to make the choice of the critical relays as uniform as possible, as Waterfilling aims to do.

I think this is the crux of my disagreement. We should base relay choice on security and/or performance, whether or not that means uniform choice. In a world where it is more costly to start up new relays than it is to run high bandwidth relays, the waterfilling approach may improve security. But, in my opinion, there are too many open questions and speculations going on here to be convinced that that's the world we live in.

> A casual Tor user may be concerned to see that his traffic is very likely to be routed through a very small number of top relays, and this effect is likely to increase as soon as a  multi-cores compliant implementation of Tor rises (rust dev). Current top relays which suffer from the main CPU bottleneck will probably be free to relay even more bandwidth than they already do, and gain an even more disproportionate consensus weight. Waterfilling might prevent that, and keep those useful relays doing their job at the middle position of paths.
> 

I run several high bandwidth relays. I can say that the only thing that eliminating the CPU bottleneck would do for me is allow me to run fewer relays in order to consume the bandwidth available on my machine; I still control the same amount of bandwidth overall. The fact that I have to run 4 relays to consume my bandwidth vs. 1 relay does not have any impact on my decision of whether or not I will run more; the main criterion in that decision is cost.

> We hope those thoughts can help, and thanks again for sharing yours.

I hope my perspective on things is useful in some way!

Best,
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20180307/a8ce3d9f/attachment.sig>