On 1 Sep 2015, at 07:45, Philipp Winter <phw@nymity.ch> wrote:

We sometimes see attacks from relays that are hosted on cloud platforms.
I have been wondering if the benefit of having cloud-hosted relays
outweighs the abuse we see from them.

To get an idea of the benefit, I analysed the bandwidth that is
contributed by cloud-hosted relays.  I first obtained the network blocks
owned by three cloud providers (Amazon AWS, Google Cloud, Microsoft
Azure), and determined the percent of bandwidth they contributed in July
2015.  The results show that there were typically ~200 cloud-hosted
relays online:
<https://nymity.ch/sybilhunting/png/cloud-hosted_relays_2015-07.png>
The spike shortly after hour 200 was caused by a lot of Amazon relays
named "DenkoNet".  The spike at the very beginning was caused by a
number of relays that might very well belong together, too, based on
their uptime pattern.

What counts, however, is bandwidth.  Here's the total bandwidth fraction
contributed by cloud-hosted relays over July 2015:
<https://nymity.ch/sybilhunting/png/cloud-hosted_bandwidth_2015-07.png>
There were no Google Cloud relays to contribute any bandwidth.  Amazon
AWS-powered relays contributed the majority of bandwidth, followed by
Microsoft Azure-powered relays.  Here's a summary of the time series in
percent:

 Min.  Mean  Median  Max.
 0.2%  0.8%  0.79%   1.5%

In an average consensus in July 2015, cloud-hosted relays contributed
only around 0.8% of bandwidth.  Note, however, that this is just a lower
bound.  The netblocks I used for the analysis could have changed, and I
didn't consider providers other than Google, Amazon, and Microsoft.

There are also cloud-hosted bridges.  Tor Cloud, however, has shut down,
and the number of EC2 bridges is declining:
<https://metrics.torproject.org/cloudbridges.html?graph=cloudbridges&start=2015-01-01&end=2015-07-31>

Can we preserve cloud-hosted bridges independently of whatever we decide to do to cloud-hosted relays?

The harm caused by cloud-hosted relays is more difficult to quantify.
Getting rid of them also wouldn't mean getting rid of any attacks.  At
best, attackers would have to jump through more hoops.

If we were to decide to permanently reject cloud-hosted relays, we would
have to obtain the netblocks that are periodically published by all
three (and perhaps more) cloud providers:
<https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html>
<https://msdn.microsoft.com/en-us/library/azure/Dn175718.aspx>
<https://cloud.google.com/appengine/kb/general?hl=en#static-ip>

Note that this should be done periodically because the netblocks are
subject to change.

I wonder about the impact of this proposal on Tor research and on Tor developers.

Some may consider it a benefit if researchers have to take more steps to interact with the Tor network.

I wonder how many Tor developers develop using cloud machines, and whether it’s a benefit for them to be able to test changes on the live Tor network, or a drawback.
I test my changes on Linux using a cloud machine, and have used it at times to ensure that my changes don’t break when deployed on the live network. (I don’t do this at home, for both legal and connectivity reasons.)

Of course, I use chutney to test my changes on a test network, before I use them on the live network.
So that’s another option for both researchers and developers.
As an aside, we're working on making chutney easier to use, and we’re getting there incrementally.
Here is a very rough draft plan:
https://trac.torproject.org/projects/tor/wiki/doc/TorChutneyGuide

Of course, if researchers or developers or others really need a machine, they can move to a smaller cloud provider. This has benefits for diversity, and reduces what Google, Amazon, and Microsoft can know about Tor.

Tim (teor)