Ola Bini obini@thoughtworks.com writes:
OK, with your feedback and thinking a bit more about it, here is a revision of the algorithm from yesterday. I think we are starting to get close so we will rip out the original simulation code and implement something that matches this now. Hopefully, the changes will be smaller.
Start of algorithm (arguments: USED_GUARDS, EXCLUDE_NODES)
- If selecting directory guards, GUARDS is all guards from the consensus with the V2Flag, otherwise GUARDS is all guards from the consensus
- Set UTOPIC_GUARDS to be all guards to use under utopic conditions from GUARDS
- Set DYSTOPIC_GUARDS to be all guards to use under dystopic conditions from GUARDS
- Set REMAINING_UTOPIC_GUARDS to be UTOPIC_GUARDS without EXCLUDE_NODES
- Set REMAINING_DYSTOPIC_GUARDS to be DYSTOPIC_GUARDS without EXCLUDE_NODES
- Create a list of PRIMARY_GUARDS that contain N_PRIMARY_GUARDS that are not bad by:
- Taking the next entry from USED_GUARDS
- If USED_GUARDS is empty:
- randomly select an entry from UTOPIC_GUARDS, weighted by bandwidth
- Set TRIED_GUARDS to be an empty set
- Set TRIED_DYSTOPIC_GUARDS to be an empty set
- Set state = STATE_PRIMARY_GUARDS
Each iteration of algorithm
If a new consensus has arrived:
- Update all guard profiles with new bad/non-bad information
- If any PRIMARY_GUARDS have become bad:
- re-add to the list of PRIMARY_GUARDS using the same procedure
- If any USED_GUARDS have become non-bad:
- add it back to PRIMARY_GUARDS at the place it would have been if it was non-bad when running the start of the algorithm. If this results in PRIMARY_GUARDS being larger than N_PRIMARY_GUARDS, remove from the end of the list until the list is N_PRIMARY_GUARDS long
- Ensure that UTOPIC_GUARDS and DYSTOPIC_GUARDS are updated with the changes from the consensus
If it was at least 3 minutes since we tried the primary guards and we are not in STATE_PRIMARY_GUARDS:
- save previous state
- set state = STATE_PRIMARY_GUARDS
STATE_PRIMARY_GUARDS:
- return each entry in PRIMARY_GUARDS in turn
- mark each entry as "unreachable" if algorithm doesn't terminate
- restore previous state (or STATE_TRY_UTOPIC if no previous)
STATE_TRY_UTOPIC:
- for each entry in TRIED_GUARDS that was marked as unreachable more than 20 minutes ago
- add it back to REMAINING_UTOPIC_GUARDS
- return each remaining entry from USED_GUARDS in turn
- for each entry, if algorithm doesn't terminate
- mark entry as "unreachable"
- add entry to TRIED_GUARDS
- if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
- if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC
- return each entry from REMAINING_UTOPIC_GUARDS randomly selected, weighted by bandwidth
- remove the returned entry from REMAINING_UTOPIC_GUARDS
- for each entry, if algorithm doesn't terminate
- mark entry as "unreachable"
- add entry to TRIED_GUARDS
- if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
- if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC
STATE_TRY_DYSTOPIC:
- for each entry in TRIED_DYSTOPIC_GUARDS that was marked as unreachable more than 20 minutes ago
- add it back to REMAINING_DYSTOPIC_GUARDS
- return each remaining DYSTOPIC entry from USED_GUARDS in turn
- for each entry, if algorithm doesn't terminate
- mark entry as "unreachable"
- add entry to TRIED_DYSTOPIC_GUARDS
- if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
- if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of DYSTOPIC_GUARDS:
- mark all guards in PRIMARY_GUARDS, TRIED_GUARDS and TRIED_DYSTOPIC_GUARDS as not "unreachable"
- return failure from the algorithm
- return each entry from REMAINING_DYSTOPIC_GUARDS randomly selected, weighted by bandwidth
- remove the returned entry from REMAINING_DYSTOPIC_GUARDS
- for each entry, if algorithm doesn't terminate
- mark entry as "unreachable"
- add entry to TRIED_DYSTOPIC_GUARDS
- if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
- if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of DYSTOPIC_GUARDS:
- mark all guards in PRIMARY_GUARDS, TRIED_GUARDS and TRIED_DYSTOPIC_GUARDS as not "unreachable"
- return failure from the algorithm
End of algorithm
- If circuit is set up correctly, let algorithm know
- Algorithm marks the guard chosen as used and makes sure it is in USED_GUARDS
- Otherwise do another run of the algorithm
Ah great! Good improvements. I think we are going the right way.
Here is another quick review. I include a second copy of the algorithm and comment inline:
- Start of algorithm (arguments: USED_GUARDS, EXCLUDE_NODES)
- If selecting directory guards, GUARDS is all guards from the consensus with the V2Flag, otherwise GUARDS is all guards from the consensus
- Set UTOPIC_GUARDS to be all guards to use under utopic conditions from GUARDS
- Set DYSTOPIC_GUARDS to be all guards to use under dystopic conditions from GUARDS
- Set REMAINING_UTOPIC_GUARDS to be UTOPIC_GUARDS without EXCLUDE_NODES
- Set REMAINING_DYSTOPIC_GUARDS to be DYSTOPIC_GUARDS without EXCLUDE_NODES
Maybe we also need to exclude USED_GUARDS from these two lists?
Create a list of PRIMARY_GUARDS that contain N_PRIMARY_GUARDS that are not bad by:
- Taking the next entry from USED_GUARDS
- If USED_GUARDS is empty:
- randomly select an entry from UTOPIC_GUARDS, weighted by bandwidth
Set TRIED_GUARDS to be an empty set
Set TRIED_DYSTOPIC_GUARDS to be an empty set
Set state = STATE_PRIMARY_GUARDS
Each iteration of algorithm
- If a new consensus has arrived:
- Update all guard profiles with new bad/non-bad information
- If any PRIMARY_GUARDS have become bad:
- re-add to the list of PRIMARY_GUARDS using the same procedure
- If any USED_GUARDS have become non-bad:
- add it back to PRIMARY_GUARDS at the place it would have been if it was non-bad when running the start of the algorithm. If this results in PRIMARY_GUARDS being larger than N_PRIMARY_GUARDS, remove from the end of the list until the list is N_PRIMARY_GUARDS long
- Ensure that UTOPIC_GUARDS and DYSTOPIC_GUARDS are updated with the changes from the consensus
Not sure if this is part of this algorithm, or it's actually another helper algorithm that is called when a consensus arrives. I feel it might be cleaner if we do it as a separate algo, but we can proceed like this as well since it's not too confusing.
If it was at least 3 minutes since we tried the primary guards and we are not in STATE_PRIMARY_GUARDS:
- save previous state
- set state = STATE_PRIMARY_GUARDS
STATE_PRIMARY_GUARDS:
- return each entry in PRIMARY_GUARDS in turn
- mark each entry as "unreachable" if algorithm doesn't terminate
So IIUC the "algorithm" referenced here is _not_ the algorithm that we are describing right now (let's call it ALGO_CHOOSE_GUARD). Instead the "algorithm" here is the _caller of ALGO_CHOOSE_GUARD_, which is the algorithm responsible for creating circuits, testing whether they work and reporting the results back to CHOOSE_GUARD_ALGO (let's call this other algorithm ALGO_BUILD_CIRCUIT).
Maybe we can clarify this?
Also, do PRIMARY_GUARDS go into TRIED_GUARDS and count against our thresholds?
- restore previous state (or STATE_TRY_UTOPIC if no previous)
- STATE_TRY_UTOPIC:
- for each entry in TRIED_GUARDS that was marked as unreachable more than 20 minutes ago
- add it back to REMAINING_UTOPIC_GUARDS
- return each remaining entry from USED_GUARDS in turn
- for each entry, if algorithm doesn't terminate
In general, I think we should make the context of this algorithm (ALGO_CHOOSE_GUARD) a bit clearer, so that we know when "Start of algorithm" is supposed to run, and when "End of algorithm" is supposed to run.
Because for example one could think here that in an asynchronous setting, when we try a entry from USED_GUARDS and find out whether the circuit succeeded or not, we need to run the algorithm from the beginning including the "Start of algorithm" step. Whereas if I understand correctly you assume that we will just drop in and continue exactly from this point.
I feel that this confusion is caused by the ALGO_CHOOSE_GUARD algorithm being pseudo-asynchronous. To address this I suggested additional states for when we are waiting for the results of a circuit construction, but I agree that this might complicate things too much.
Is there a way we can make this cleaner?
- mark entry as "unreachable" - add entry to TRIED_GUARDS - if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm - if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC - return each entry from REMAINING_UTOPIC_GUARDS randomly selected, weighted by bandwidth - remove the returned entry from REMAINING_UTOPIC_GUARDS - for each entry, if algorithm doesn't terminate - mark entry as "unreachable" - add entry to TRIED_GUARDS - if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm - if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC
<snip>
Finally, what kind of statistics and measurements does the simulator conduct?
For example, I can think of reachability related stats like:
* Time (or number of guards) till we manage to build first circuit * Time till we manage to recover from flaky network
and I can also think of security related stats like:
* Number of guards we tried before succeeding first circuit * Number of guards we exposed ourselves to after time t
etc.
Hi,
Thanks for the review.
Maybe we also need to exclude USED_GUARDS from these two lists?
Good point, we should definitely do that.
Not sure if this is part of this algorithm, or it's actually another helper algorithm that is called when a consensus arrives. I feel it might be cleaner if we do it as a separate algo, but we can proceed like this as well since it's not too confusing.
We can probably do that. The reason I put it in here is because it works on the same data structures that are used for the rest of the algorithm.
- STATE_PRIMARY_GUARDS:
- return each entry in PRIMARY_GUARDS in turn
- mark each entry as "unreachable" if algorithm doesn't terminate
So IIUC the "algorithm" referenced here is _not_ the algorithm that we are describing right now (let's call it ALGO_CHOOSE_GUARD). Instead the "algorithm" here is the _caller of ALGO_CHOOSE_GUARD_, which is the algorithm responsible for creating circuits, testing whether they work and reporting the results back to CHOOSE_GUARD_ALGO (let's call this other algorithm ALGO_BUILD_CIRCUIT).
We should probably clarify it. But yes, the algorithm referenced is ALGO_CHOOSE_GUARD. ALGO_CHOOSE_GUARD will terminate if ALGO_BUILD_CIRCUIT successfully builds a circuit.
Also, do PRIMARY_GUARDS go into TRIED_GUARDS and count against our thresholds?
Yes, they probably should. Will change it.
In general, I think we should make the context of this algorithm (ALGO_CHOOSE_GUARD) a bit clearer, so that we know when "Start of algorithm" is supposed to run, and when "End of algorithm" is supposed to run.
Because for example one could think here that in an asynchronous setting, when we try a entry from USED_GUARDS and find out whether the circuit succeeded or not, we need to run the algorithm from the beginning including the "Start of algorithm" step. Whereas if I understand correctly you assume that we will just drop in and continue exactly from this point.
Yes, exactly. "Start of algorithm" sets up the data structures etc. "Each iteration of algorithm" will be called repeatedly until we get failure or a working guard, and at that point "End of algorithm" will be run. I can make it clearer.
I feel that this confusion is caused by the ALGO_CHOOSE_GUARD algorithm being pseudo-asynchronous. To address this I suggested additional states for when we are waiting for the results of a circuit construction, but I agree that this might complicate things too much.
I can do that - not sure if it would clarify things. That state is entered every time the word "return" is used in "Each iteration of algorithm". Once "Each iteration of algorithm" is called again, it will continue at the same place. Not sure what you consider pseudo asynchronous about that. =)
Finally, what kind of statistics and measurements does the simulator conduct?
For example, I can think of reachability related stats like:
- Time (or number of guards) till we manage to build first circuit
- Time till we manage to recover from flaky network
and I can also think of security related stats like:
- Number of guards we tried before succeeding first circuit
- Number of guards we exposed ourselves to after time t
Haven't thought abou that yet - we currently only have the things that were already in the simulator code, which primarily checks how many failed vs successful guards we got. These other heuristics look interesting to check as well.
Cheers -- Ola Bini (https://olabini.se)
"Yields falsehood when quined" yields falsehood when quined.