Reinaldo de Souza Jr rjunior@thoughtworks.com writes:
On 2/16/16 12:20, George Kadianakis wrote:
The very latest prop259 basically forgets the unreachable guard status as soon as the algorithm terminates. I wonder if we actually want this. Hopefully guardsim has a simulation scenario that will illustrate whether that's a good idea or not.
We are assuming the unreachable state is persisted (by setting unreachable_since in the guard) in the current simulation. We are also ignoring entries known to be unreachable when choosing guards from either USED_GUARDS or REMAINING_UTOPIC_GUARDS to be our PRIMARY_GUARDS.
Thanks for the work on the proposal! I haven't had time to read the new version yet apart from the stuff you point out in your email.
Making the unreachable state persistent seems like a reasonable idea.
However, I'm not so sure about rejecting previously unreachable guards from being primary guards. The concept of primary guards is to _always_ try to use the top N guards in your guardlist (as long as they are in the latest consensus). So the list of primary guards should stay the same as long as none of those nodes disappear from the consensus.
Consider the following case: The network was offline for a few seconds and we ended up marking our top 8 guards as unreachable. Then the network comes back up, and we want to build a new circuit: When it's time to choose the primary guards set we end up choosing the 9th, 10th and 11th node instead of the top 3 guards, even though the top 3 guards are actually online. Ideally Tor will always connect to the top 3 guards, and this is notthe case here.
Maybe the above issue is not possible with the new prop259 algorithm? If that's the case, why?
I'm also curious to see how your simulation reacts to this new change.
Maybe to reduce this exposure, we should try to go back to STATE_PRIMARY_GUARDS in those cases? Tor does a similar trick right now which has been very helpful: https://gitweb.torproject.org/tor.git/tree/src/or/entrynodes.c?id=tor-0.2.7....
Maybe an equivalent heuristic would be that if we are in STATE_RETRY_ONLY and we manage to connect to a non-primary guard, we hang up the connection, and go back into STATE_PRIMARY_GUARDS.
Having this in mind, and revisiting entry_guard_register_connect_status(), it seems the appropriate approach would be marking all USED_GUARDS for retry, discarding the current circuit and trying to build a new circuit (restart the whole algorithm) rather than marking the PRIMARY_GUARDS and going back to STATE_PRIMARY_GUARDS.
This is because by marking all the used guards for retry we can expect the next circuit to use the first available USED_GUARD as an entry guard.
In addition to that, my understanding of this behavior in the current guard selection algorithm is:
When you succeed to build a circuit with a new guard, you mark all previously used guards (the global entry_guards var in tor) for retry which are:
- listed in the latest consensus
- not bad_since some time
- unreachable_since some time and is "time to retry"*
- (and it does not have other restrictions which are not important for
our simulation, like being a bridge when required or fast/stable)
If you find any retryable guard in the condition above, you discard the current circuit and try to build a new one:
- best case: you have a new circuit with one of the previously
unreachable used guards.
- worst case: all previously unreachable used guards marked for retry
fail again, and you have a new circuit with the same guard you discarded before (unless it is also unreachable now :S).
I think your evaluation is correct here.
It's not really "every time you connect to a new guard, retry the previously used" but "every time you connect to a new guard, check if its time to retry a unreachable previously used guard".
- time to retry is progressive depending on how long the node has been
unreachable, for example hourly in the first 6 hours, every 4 hours until 3 days, etc. (see: entry_is_time_to_retry).
Can this heuristic be improved? I think it should be considered for the algorithm.
We have reviewed this strategy in the simulation for the original algorithm and we are generating new data. We are also updating the proposal to include the described strategy.
Great!
BTW, I noticed that the graphs here: https://github.com/twstrike/tor_guardsim/issues/1 are not really explained, and also the x and y axes are undocumented some times. I have not had time to read the new tor_guardsim to understand these graphs. Some documentation would be really helpful here; otherwise, you can explain these things to me in Valencia.
Thanks!