On 2/16/16 12:20, George Kadianakis wrote:
The very latest prop259 basically forgets the unreachable guard status as soon as the algorithm terminates. I wonder if we actually want this. Hopefully guardsim has a simulation scenario that will illustrate whether that's a good idea or not.
We are assuming the unreachable state is persisted (by setting unreachable_since in the guard) in the current simulation. We are also ignoring entries known to be unreachable when choosing guards from either USED_GUARDS or REMAINING_UTOPIC_GUARDS to be our PRIMARY_GUARDS.
Maybe to reduce this exposure, we should try to go back to STATE_PRIMARY_GUARDS in those cases? Tor does a similar trick right now which has been very helpful: https://gitweb.torproject.org/tor.git/tree/src/or/entrynodes.c?id=tor-0.2.7....
Maybe an equivalent heuristic would be that if we are in STATE_RETRY_ONLY and we manage to connect to a non-primary guard, we hang up the connection, and go back into STATE_PRIMARY_GUARDS.
Having this in mind, and revisiting entry_guard_register_connect_status(), it seems the appropriate approach would be marking all USED_GUARDS for retry, discarding the current circuit and trying to build a new circuit (restart the whole algorithm) rather than marking the PRIMARY_GUARDS and going back to STATE_PRIMARY_GUARDS.
This is because by marking all the used guards for retry we can expect the next circuit to use the first available USED_GUARD as an entry guard.
In addition to that, my understanding of this behavior in the current guard selection algorithm is:
When you succeed to build a circuit with a new guard, you mark all previously used guards (the global entry_guards var in tor) for retry which are:
- listed in the latest consensus - not bad_since some time - unreachable_since some time and is "time to retry"* - (and it does not have other restrictions which are not important for our simulation, like being a bridge when required or fast/stable)
If you find any retryable guard in the condition above, you discard the current circuit and try to build a new one:
- best case: you have a new circuit with one of the previously unreachable used guards. - worst case: all previously unreachable used guards marked for retry fail again, and you have a new circuit with the same guard you discarded before (unless it is also unreachable now :S).
It's not really "every time you connect to a new guard, retry the previously used" but "every time you connect to a new guard, check if its time to retry a unreachable previously used guard".
* time to retry is progressive depending on how long the node has been unreachable, for example hourly in the first 6 hours, every 4 hours until 3 days, etc. (see: entry_is_time_to_retry).
Can this heuristic be improved? I think it should be considered for the algorithm.
We have reviewed this strategy in the simulation for the original algorithm and we are generating new data. We are also updating the proposal to include the described strategy.
Cheers,
Reinaldo and Iván