I think that is a bad idea. You don't know enough about a relay to have a clue about what the underlying hardware looks like from any of that metrics.

Simple example: You have a 8 core 16 threads cpu, run 4 instances, each node pinned to 2 threads and a 10 gig pipe, you will run each tor relay at max speed without effecting any of the other relays on the same server. But with your choosen metrics you would slow all of them down just in case. I don't even think that there are metrics from which you could guess that, the relay operator would have to set limits to do this effective, or if Tor would have proper multithread support you would just have to run one instance per server and you would be good to go with just the measurement you done from external.

On 08.08.2019 14:34, teor wrote:
Hi Rob,

On 8 Aug 2019, at 22:15, Rob Jansen <rob.g.jansen@nrl.navy.mil> wrote:

On Aug 6, 2019, at 5:48 PM, Roger Dingledine <arma@torproject.org> wrote:

On Tue, Aug 06, 2019 at 05:31:39PM -0400, Rob Jansen wrote:
Today, I started running the speedtest on all relays in the network. So far, I have finished about 100 relays (and counting). I expect that the advertised bandwidths reported by metrics will increase over the next few days. For this to happen, the bandwidth histories observed by a relay during my speedtest are first committed to the bandwidth history table (within 24 hours), and then reported in the server descriptors (within 18-36 hours, depending on when the bandwidth history commit happens).
Great.

There will be another confusing (confounding) factor, which is that the
weights in the consensus are chosen by the bandwidth authorities, so
even if the relay's self-reported bandwidth goes up (because it now sees
that it can handle more traffic), that doesn't mean that the consensus
weight will necessarily go up. In theory it ought to, but with a day or
so delay, as the bwauths catch on to the larger value in the descriptor;
but in practice, I am not willing to make bets on whether it will behave
as intended. :) So, call it another thing to keep an eye out for during
the experiment.
Another wrinkle to keep in mind is that my script measures one relay at a time. If there are multiple relays running on the same NIC, after my measurement each of them will think they have the full capacity of the NIC. So if we just add up all of the advertised bandwidths after my measurement without considering that some of them share a NIC, that will result in an over-estimate of the available capacity of the network.

To avoid over-estimating network capacity, we could use IP-based heuristics to guess which relays share a machine (e.g., if they share an IP address, or have a nearby IP address). In the long term, it would be nice if Tor would collect and report some sort of machine ID the same way it reports the platform.
More precisely, we're trying to answer the question:
"Which small sets of machines are limited by a common network link or shared CPU?"

A machine ID is an incomplete answer to this question: it doesn't deal with VMs, or
multiple machines that share a router.

Here are some other potential heuristics:
* clock skew / precise time: machine/VM?
* nearby IP addresses and common ASN: machine?/VM?/router?
* platform: machine
* tor version: operator? (a proxy for machine/VM/router)

Is there a cross-platform API for machine IDs?
Or similar APIs for our most common relay platforms? (Linux, BSDs, Windows)

T

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays