Re: [tor-dev] Next version of the algorithm

25 Feb 2016

      Reinaldo de Souza Jr <rjunior@thoughtworks.com> writes:
...
On 2/16/16 12:20, George Kadianakis wrote:
...
The
very latest prop259 basically forgets the unreachable guard status as soon as
the algorithm terminates. I wonder if we actually want this. Hopefully guardsim
has a simulation scenario that will illustrate whether that's a good idea or
not.
We are assuming the unreachable state is persisted (by setting
unreachable_since in the guard) in the current simulation. We are also
ignoring entries known to be unreachable when choosing guards from
either USED_GUARDS or REMAINING_UTOPIC_GUARDS to be our PRIMARY_GUARDS.
Thanks for the work on the proposal! I haven't had time to read the new version
yet apart from the stuff you point out in your email.

Making the unreachable state persistent seems like a reasonable idea.

However, I'm not so sure about rejecting previously unreachable guards from
being primary guards. The concept of primary guards is to _always_ try to use
the top N guards in your guardlist (as long as they are in the latest
consensus).  So the list of primary guards should stay the same as long as none
of those nodes disappear from the consensus.

Consider the following case: The network was offline for a few seconds and we
ended up marking our top 8 guards as unreachable. Then the network comes back
up, and we want to build a new circuit: When it's time to choose the primary
guards set we end up choosing the 9th, 10th and 11th node instead of the top 3
guards, even though the top 3 guards are actually online. Ideally Tor will
always connect to the top 3 guards, and this is notthe case here.

Maybe the above issue is not possible with the new prop259 algorithm? If that's
the case, why?

I'm also curious to see how your simulation reacts to this new change.
...
...
Maybe to reduce this
exposure, we should try to go back to STATE_PRIMARY_GUARDS in those cases? Tor
does a similar trick right now which has been very helpful:
    https://gitweb.torproject.org/tor.git/tree/src/or/entrynodes.c?id=tor-0.2.7....
Maybe an equivalent heuristic would be that if we are in STATE_RETRY_ONLY and
we manage to connect to a non-primary guard, we hang up the connection, and go
back into STATE_PRIMARY_GUARDS.
Having this in mind, and revisiting
entry_guard_register_connect_status(), it seems the
appropriate approach would be marking all USED_GUARDS for retry,
discarding the current circuit and trying to build a new circuit
(restart the whole algorithm) rather than marking the PRIMARY_GUARDS and
going back to STATE_PRIMARY_GUARDS.
This is because by marking all the used guards for retry we can expect
the next circuit to use the first available USED_GUARD as an entry guard.
In addition to that, my understanding of this behavior in the current
guard selection algorithm is:
When you succeed to build a circuit with a new guard, you mark all
previously used guards (the global entry_guards var in tor) for retry
which are:
- listed in the latest consensus
- not bad_since some time
- unreachable_since some time and is "time to retry"*
- (and it does not have other restrictions which are not important for
our simulation, like being a bridge when required or fast/stable)
If you find any retryable guard in the condition above, you discard the
current circuit and try to build a new one:
- best case: you have a new circuit with one of the previously
unreachable used guards.
  - worst case: all previously unreachable used guards marked for retry
fail again, and you have a new circuit with the same guard you discarded
before (unless it is also unreachable now :S).
I think your evaluation is correct here.
...
It's not really "every time you connect to a new guard, retry the
previously used" but "every time you connect to a new guard, check if
its time to retry a unreachable previously used guard".
* time to retry is progressive depending on how long the node has been
unreachable, for example hourly in the first 6 hours, every 4 hours
until 3 days, etc. (see: entry_is_time_to_retry).

...
...
Can this heuristic be improved? I think it should be considered for 
the algorithm.
We have reviewed this strategy in the simulation for the original
algorithm and we are generating new data. We are also updating the
proposal to include the described strategy.
Great!

BTW, I noticed that the graphs here:
        https://github.com/twstrike/tor_guardsim/issues/1
are not really explained, and also the x and y axes are undocumented some times.
I have not had time to read the new tor_guardsim to understand these
graphs. Some documentation would be really helpful here; otherwise, you can
explain these things to me in Valencia.

Thanks!