On 1 Sep 2015, at 07:45, Philipp Winter phw@nymity.ch wrote:
We sometimes see attacks from relays that are hosted on cloud platforms. I have been wondering if the benefit of having cloud-hosted relays outweighs the abuse we see from them.
To get an idea of the benefit, I analysed the bandwidth that is contributed by cloud-hosted relays. I first obtained the network blocks owned by three cloud providers (Amazon AWS, Google Cloud, Microsoft Azure), and determined the percent of bandwidth they contributed in July 2015. The results show that there were typically ~200 cloud-hosted relays online: https://nymity.ch/sybilhunting/png/cloud-hosted_relays_2015-07.png The spike shortly after hour 200 was caused by a lot of Amazon relays named "DenkoNet". The spike at the very beginning was caused by a number of relays that might very well belong together, too, based on their uptime pattern.
What counts, however, is bandwidth. Here's the total bandwidth fraction contributed by cloud-hosted relays over July 2015: https://nymity.ch/sybilhunting/png/cloud-hosted_bandwidth_2015-07.png There were no Google Cloud relays to contribute any bandwidth. Amazon AWS-powered relays contributed the majority of bandwidth, followed by Microsoft Azure-powered relays. Here's a summary of the time series in percent:
Min. Mean Median Max. 0.2% 0.8% 0.79% 1.5%
In an average consensus in July 2015, cloud-hosted relays contributed only around 0.8% of bandwidth. Note, however, that this is just a lower bound. The netblocks I used for the analysis could have changed, and I didn't consider providers other than Google, Amazon, and Microsoft.
There are also cloud-hosted bridges. Tor Cloud, however, has shut down, and the number of EC2 bridges is declining: https://metrics.torproject.org/cloudbridges.html?graph=cloudbridges&start=2015-01-01&end=2015-07-31
Can we preserve cloud-hosted bridges independently of whatever we decide to do to cloud-hosted relays?
The harm caused by cloud-hosted relays is more difficult to quantify. Getting rid of them also wouldn't mean getting rid of any attacks. At best, attackers would have to jump through more hoops.
If we were to decide to permanently reject cloud-hosted relays, we would have to obtain the netblocks that are periodically published by all three (and perhaps more) cloud providers: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html https://msdn.microsoft.com/en-us/library/azure/Dn175718.aspx https://cloud.google.com/appengine/kb/general?hl=en#static-ip
Note that this should be done periodically because the netblocks are subject to change.
I wonder about the impact of this proposal on Tor research and on Tor developers.
Some may consider it a benefit if researchers have to take more steps to interact with the Tor network.
I wonder how many Tor developers develop using cloud machines, and whether it’s a benefit for them to be able to test changes on the live Tor network, or a drawback. I test my changes on Linux using a cloud machine, and have used it at times to ensure that my changes don’t break when deployed on the live network. (I don’t do this at home, for both legal and connectivity reasons.)
Of course, I use chutney to test my changes on a test network, before I use them on the live network. So that’s another option for both researchers and developers. As an aside, we're working on making chutney easier to use, and we’re getting there incrementally. Here is a very rough draft plan: https://trac.torproject.org/projects/tor/wiki/doc/TorChutneyGuide https://trac.torproject.org/projects/tor/wiki/doc/TorChutneyGuide
Of course, if researchers or developers or others really need a machine, they can move to a smaller cloud provider. This has benefits for diversity, and reduces what Google, Amazon, and Microsoft can know about Tor.
Tim (teor)