Hi,
Here are some detailed diagnostics.
My overall conclusion is: there isn't much bandwidth left on that exit.
On Sun, Jun 02, 2019 at 01:30:18PM +1000, teor wrote:
Which bandwidth authorities are limiting the consensus weight of these relays? Where are they located?
The one in question is in Sweden: https://metrics.torproject.org/rs.html#details/D5F2C65F4131A1468D5B67A8838A9...
It has votes of: w Bandwidth=10000 Measured=65200 w Bandwidth=10000 Measured=70000 w Bandwidth=10000 Measured=74200 w Bandwidth=10000 Measured=77000 w Bandwidth=10000 Measured=99400 w Bandwidth=10000 Measured=102000
and it currently reports a self-measured peak at 56MBytes/s.
So one could interpret the current bwauths as saying that it is a bit above average compared to other 56MByte/s relays. Maybe that's because the other 56MByte/s relays got better lately, or maybe that's because there's less overall traffic on the network, but my guess is it's because it's stuck in that rut because the bwauths are not good at realizing it could go a lot faster.
Well, it's not a simple geographical bias. That's the most common measurement issue we see. The closest bwauth has the median measurement, and the North American bwauths are evenly distributed above and below the median.
Interestingly, sbws measures just slightly above the median, so this also isn't an instance of torflow's "stuck in a partition" bug.
It would be nice to have some evidence that the relay is stuck, rather than just slow, poorly connected, or variable.
The Relays Search bandwidth history shows that both relays on that machine vary a lot: https://metrics.torproject.org/rs.html#details/D5F2C65F4131A1468D5B67A8838A9... https://metrics.torproject.org/rs.html#details/6B37261F1248DA6E6BB924161F8D7...
But it doesn't tell us *why* they vary.
Are the relays' observed bandwidths limiting their consensus weight?
bandwidth 89600000 102400000 55999620
So it looks like no.
I'm sorry, my question was poorly phrased.
The observed bandwidth is part of the torflow/sbws scaling algorithm, so it's always limiting the consensus weight.
In this case, if the relay observed more bandwidth, it would get about 1.3x that bandwidth as its consensus weight.
If the relays are being measured by longclaw's sbws instance, we should also look at their detailed measurement diagnostics.
Looks like yes, it is measured:
w Bandwidth=10000 Measured=78000
I look forward to hearing about these detailed measurement diagnostics. :)
We wrote a spec to answer all^ your questions: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
^ except for these undocumented fields: https://trac.torproject.org/projects/tor/ticket/30726
Here are some of the diagnostics from the latest bandwidth file:
1559468088 version=1.4.0 earliest_bandwidth=2019-05-28T09:35:16 file_created=2019-06-02T09:35:04 generator_started=2019-05-19T14:04:34 latest_bandwidth=2019-06-02T09:34:48
sbws has been running for a few weeks, and its still measuring.
number_consensus_relays=6552 number_eligible_relays=6302 percent_eligible_relays=96
It's measuring 96% of Running relays.
recent_measurement_attempt_count=329137 recent_measurement_failure_count=301111
It has a 90% measurement failure rate, which is way too high: https://trac.torproject.org/projects/tor/ticket/30719
But it's still measuring 96% of Running relays, so this bug might not be as much of a blocker as we thought.
recent_measurements_excluded_error_count=892 recent_measurements_excluded_few_count=647 recent_measurements_excluded_near_count=232 recent_measurements_excluded_old_count=0
1-4% of measurements are excluded for various reasons. We think that's normal. But it's hard to check, because torflow has limited diagnostics.
software=sbws software_version=1.1.0 time_to_report_half_network=224554
2.6 days is quite a long time to measure half the network. Probably due to #30719.
And here are the diagnostics for that relay, split over a few lines:
bw=7700
This is the vote measured bandwidth.
bw_mean=803269 bw_median=805104
This is the raw measured bandwidth, 784 KBytes/s. This is a *lot* lower than the observed bandwidth of 56 MBytes/s.
The most likely explanation is that the relay doesn't have much bandwidth left over.
But maybe this sbws instance needs more bandwidth. If we fixed #30719, there might be a lot more sbws bandwidth for successful measurements.
consensus_bandwidth=75000000 consensus_bandwidth_is_unmeasured=False
This is the consensus measured bandwidth in the sbws client's consensus, converted from scaled-kilobytes to scaled-bytes.
desc_bw_avg=89600000 desc_bw_bur=102400000
This relay is rate-limited to 85 Mbytes/s.
Maybe it would have more bandwidth if it wasn't rate-limited.
desc_bw_obs_last=54690734 desc_bw_obs_mean=54690734
sbws is operating off a descriptor, where the observed bandwidth was: 54690734
But the relay is now reporting: 55999620
So we might see the consensus weight increase a little bit in the next day or so.
error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0
This relay has no measurement errors.
master_key_ed25519=Q2Ft/AsNiru+HEx4KRdRxhnuohOs3ByA0t816gUG+Kk nick=che node_id=$D5F2C65F4131A1468D5B67A8838A9B7ED8C049E2
Yes, I am analysing the right relay.
relay_in_recent_consensus_count=310
It has been running for a while. This consensus count is surprising, but there's no spec for it, so I don't know what it's meant to be: https://trac.torproject.org/projects/tor/ticket/30724 https://trac.torproject.org/projects/tor/ticket/30726
relay_recent_measurement_attempt_count=1 relay_recent_priority_list_count=1
1 measurement in the last 5 days is very low. Probably due to #30719.
success=4
4 successful measurements is good, but it's weird that there is only 1 recent measurement attempt. These figures should be similar: https://trac.torproject.org/projects/tor/ticket/30725
time=2019-06-01T14:56:32
It was last measured about 18 hours ago.
T