Nicholas Hopper hopper@cs.umn.edu writes:
I think I'll have more to say later, but...
On Wed, Mar 26, 2014 at 11:36 AM, George Kadianakis desnacked@riseup.net wrote:
1.3. Age of guard as a factor on guard probabilities
By increasing the guard rotation period we also increase the lack of clients for young guards since clients will rotate guards even more infrequently now (see 'Phase three' of [1]).
We can try to mitigate this phenomenon by giving higher priority to young guards to be picked as guards:
To do so, everytime an authority needs to vote for a guard, it reads a set of consensus documents spanning the past NNN months, and calculates the age of the guard; that is, in how many consensuses its public key has been included in the past.
The authorities include the age of each guard by appending '[SP "Age=" INT]' in the guard's "w" line.
When a client picks a guard, it applies the age of each guard as a weight on its guard probability. XXX unspecified how
I'm pretty sure this section has it backwards from what was the intent of the discussion of "guard age" at the dev meeting. The weight factor should be applied when choosing a "new guard" as a middle or exit node, because it is being underutilized as a guard. This makes it straightforward to apply: if relay R has had the guard flag for fraction k of the last rotation period, then its weight for some other position should be
k*(weight with the guard flag) + (1-k)*(weight without the guard flag)
Ah, thanks for this!
I admit I didn't entirely grasp this suggestion during our discussion in the dev meeting. I still don't really understand it, I think.
So, based on your response, IIUC, the idea is that because young guards are underutilized, we want to increase the probability of them being chosen in non-guard positions, so that they become more utilized till more people pick them as guards?
Some questions on the terminology you used:
a) What do you mean by 'last rotation period'? When you say "for fraction k of the last rotation period", you mean that if the authorities read consensuses for the past 12 months, and the relay R was up as a guard for 6 months, then k would be 6/12 == 0.5?
b) By (weight with the guard flag) you mean the result of: <consensus BW> * <consensus weight> ?
So for example, for a guard relay with bandwidth == 10000 contesting to become a guard node for a circuit, you mean the result of: 10000 * Wgg ?
And by (weight without the guard flag) you mean that in the above example, we would do: 10000 * Wgm (assuming that the relay doesn't have the Exit flag)
Is that right, or did I misunderstand you?
Assuming the above terminology assumptions, I began trying to understand your formula. First of all, I was wondering how you ended up with it? Is this some standard form? I'm not very familiar with these things.
Here are some thoughts I generated while meditating on that formula:
a) The goal is that this formula should give us bigger values as k gets smaller (since the smaller the k, the younger the guard).
So, we want to get the maximum value for k=0, and the minimum value for k=1. In our case, for k=0, we get: new_weight = [weight without the guard flag] And for k=1 we get: new_weight = [weight with the guard flag]
This seems reasonable assuming that: [weight without the guard flag] > [weight with the guard flag]
As k goes from 0 to 1, the formula is a linear function that moves from the maximum to the minimum [0].
It's worth noting that the maximum weight a young guard can achieve for any position is its weight without the guard flag. Maybe we need to evaluate whether this weight increase is substantial?
b) I also checked whether [weight without the guard flag] > [weight with the guard flag] by checking the consensus weights of a recent consensus [1].
The assumption is generally true, if you assume a relay that only has the Guard flag. For example, 'Wmm > Wmg' and 'Wem > Weg'.
However, if the relay has both Guard and Exit flags, then there are some bad cases. For example, 'Wme < Wmd' and 'Wbd < Wbe'.
That is, the consensus weight of a Guard+Exit node for the middle position (Wmd = 871) is bigger than the consensus weight for an Exit node (Wme = 0).
So, in this case: [weight without the guard flag] < [weight with the guard flag]
To be honest, that doesn't really make sense to me from a load balancing perspective (since Guard+Exits are probably more overloaded than Exits), but I don't really understand consensus weights anyway.
It's worth noting that the weights in the above example are small enough that they probably wouldn't really make a big difference IRL. Still, it doesn't fit with the assumption of the formula.
[0]: Since: new_weight = k*[weight with the guard flag] + (1-k)*[weight without the guard flag] can be rearranged to: new_weight = [weight without the guard flag] + k([weight with the guard flag] - [weight without the guard flag])
Which is basically a linear function 'y = a*x + b' with its slope being: [weight with the guard flag] - [weight without the guard flag].
(Sorry for the mental masturbation...)
[1]: Here are some consensus weights from the consensus with "valid-after 2014-03-13 00:00:00":
Wme=0 (Weight for Exit-flagged nodes in the middle Position) Wmd=871 (Weight for Guard+Exit flagged nodes in the middle Position) Wmg=4198 (! Weight for Guard-flagged nodes in the middle Position) Wmm=10000 (Weight for non-flagged nodes in the middle Position Weg=8258 (! Weight for Guard flagged nodes in the exit Position) Wed=8258 (Weight for Guard+Exit-flagged nodes in the exit Position) Wee=10000 (Weight for Exit-flagged nodes in the exit Position) Wem=10000 (Weight for non-flagged nodes in the exit Position) Wbe=0 (Weight for Exit-flagged nodes for BEGIN_DIR requests) Wbd=871 (Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests) Wbg=4198 (! Weight for Guard flagged nodes for BEGIN_DIR requests) Wbm=10000 (Weight for non-flagged nodes for BEGIN_DIR requests)