I'm back to complain further about erratic bandwidth authority behavior, previously
[tor-relays] BWauth kookiness https://lists.torproject.org/pipermail/tor-relays/2015-May/007039.html
Allowing that BWauths are in a bit of flux, with tor26 replaced by maatuska and moria1 dropping the GuardFraction calculation, the bandwidth calculations evidence wildly erratic swings.
Specifically my relay, which is perfectly stable, reliable, fast (9.375 Mbyte/sec) has been assigned a jaggedly random series of consensus weights.
https://atlas.torproject.org/#details/4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1...
earlier, fairly sane
*gabelmoo Bandwidth=7701 Measured=9960 tor26 Bandwidth=7701 Measured=9340 moria1 Bandwidth=7701 Measured=18000 GuardFraction=66 longclaw Bandwidth=7701 Measured=12800
later, bit high, nutty
gablemoo Bandwidth=9375 Measured=17100 moria1 Bandwidth=9375 Measured=77900 GuardFraction=75 *longclaw Bandwidth=9375 Measured=23000
now, sane but undervalued
gablemoo Bandwidth=8925 Measured=14900 maatuska Bandwidth=8925 Measured=17200 moria1 Bandwidth=8925 Measured=5330 *longclaw Bandwidth=8925 Measured=7440
moria1 here is downright schizophrenic but other BWauths regularly double and halve the bandwidth value they assign. Graph shows it a more vividly.
What the graphs and numbers do not show is that the effective difference between the consensus values of ~7000 and ~23000 is staggering. At the low end of this range the relay shows roughly 2500 active circuits and a average load factor of about 20% of actual bandwidth, while an assignment of 23000 results in almost 8000 circuits and a load factor of more like 50% (both per Blutemagie.de).
My point is that BWauths should not be arbitrarily flipping stable, well run relays from the top to the bottom of this steeply sloped sweet-spot of the weighting curve. Perhaps the sweet-spot range has moved over the last couple of years as the price of bandwidth has dropped and faster connections become more prevalent, and this has been overlooked in the algorithm.
Regardless, it seems BWauth measurement should be more nuanced in this particular range, such that relays are not slammed constantly between "rather popular" and a "bit boring" irrespective of their actual available capacity.
One reason this vexes is that I would like to see how well the relay runs with Address Sanitizer active. ASAN provides obvious benefits w/r/t security, but entails a performance trade-off. With the BWauths throwing darts, eyes closed, when choosing weighting, it's impossible to gauge the performance impact of various adjustments.
In the bigger picture view, erratic BWauth weighting can't be adding clarity to the performance, capacity and utilization situation.
A straightforward improvement to BWauth measurement crossed my mind.
Seems likely part of the volatile, bipolar measurement issue is overfast feedback of weighting increases and the increased traffic that results.
For example, a BWauth measures 8 MByte/sec of bandwidth day one and increases the assigned score to 20k. The relay's weight attracts a pile-on of new traffic and now by day three the relay measures 2 Mbyte/sec of available bandwidth due to the presence a huge amount of traffic, and the BWauth crashed the assigned value back to perhaps 10k.
Thus the weight of the relay swinges wildly between two extremes.
Solution is for BWauths to time- average several days of measurements, probably with a decaying weight for older samples. Ten days of samples with the oldest four assigned declining weights comes to mind as a place to start, though of course the number of days and weighting parameters should be easily adjusted.
This will result in gradual shifting of BW weights assigned to relays with an equilibrium outcome rather than wild swings.
Will also compensate for random sample timing where a BWauth may test a relay at a busy time on one day and a light load time the next day.
Probably a downside threshold should exist and trigger the resetting of the accumulated data points to address relays that fail or deteriorate rapidly.
At 02:43 6/6/2015 -0400, starlight.2015q2@binnacle.cx wrote:
I'm back to complain further about erratic bandwidth authority behavior, previously. . .
MYSTERY SOLVED!!!
Of course one should always RTFS (read the fine specification) when trying to understand all things Tor.
First, scratch my previous time- averaging suggestion as the current implementation incorporates a sophisticated time averaging feedback algorithm borrowed from industrial control systems.
Turns out the problem of sudden, sharp, "schizophrenic" consensus jumps is a boundary artifact.
My relay is hovering right around the 12% threshold between the fastest and second fastest groups of relays (out of four groups).
These two groups operate as separate statistical domains, and the algorithm is complex enough that it would be shocking if the measurement of any one relay came out the same when moved from one group to the other group.
Looks to me like the relay is bouncing back-and-forth between the 0-12% band and the 12-35% band. This happens independently for each of the four BWauths. The 0-12% band assigned weight for the relay is anywhere from 130 to 200% of the weight calculated in the 12-35% band.
Group-hopping explains why the individual BWauth weights tend to jump dramatically and suddenly. The median-weight consensus selection adds an additional element of randomness.
Assigned-weight moves smoothly within either group, but is discontinuous when shifting from one group to the next.
Is interesting that 'moria1' consistently reports much higher weights than the three other BWauths. Might be that one of the algorithm parameters is set to a different value for this authority.
tor-relays@lists.torproject.org