Re: [tor-dev] Next version of the algorithm

16 Feb 2016

      ...
Hi there,                                                                                                                                                             >
<snip>
ALGO_CHOOSE_ENTRY_GUARD keeps track of unreachable status for guards in state
private to the algorithm - this is initialized every time
ALGO_CHOOSE_ENTRY_GUARD_START is called.
...
Interesting. That seems like both a bug and a feature in some ways.
It's a security feature because we will try our guard list from the
beginning more frequently.
It's a performance "bug" because we have to cycle through all the
unreachable
nodes everytime we restart the algorithm, because we forgot they were
unreachable.  If the first multiple guards in your USED_GUARDS are actually
unreachable, then this will delay bootstrap by some time. Consider the case
where you need to make three circuits to connect to a hidden service as a
client (HSDir/IP/RP), so you have to call the algorithm three times in a
row.
Of course, if a guard is really unreachable it _should_ be marked as bad
within
an hour because it won't be listed in the next consensus. While this makes
sense, I wonder why my laptop guard list (in the state file) has a total of
24
guards, where 18 of them are marked as unreachable and only 6 of them are
marked as bad. Maybe they were all marked unreachable when the internet was
down. I wonder if this influences the performance of the algorithm.
It would be nice to know if the security/performance tradeoff here is
acceptable. Simulations might help, or we will have to learn it the hard way
when we implement the algorithm and try it out in various types of
networks.
Yes, interesting. Hmm. I'll try to come up with an unreachable measure
that sits outside of the algorithm, and see if we can simulate both
alternatives.
Returning to this for a bit. I think it would be good to decide whether we
should keep the unreachable status of guards on permannet disk state or
not. The
very latest prop259 basically forgets the unreachable guard status as soon as
the algorithm terminates. I wonder if we actually want this. Hopefully guardsim
has a simulation scenario that will illustrate whether that's a good idea or
not.

As an example of a troublesome edge case, consider Alice who operates a busy
hidden service that gets dozens of client requests per second. If the first few
guards on Alice's guardlist are actually offline, Tor will have to spend a few
seconds probing them for _every_ client request (to make the corresponding
rendezvous circuit). That seems like it will definitely influence performance.

---

I'd also like to point out another security consideration on how
STATE_PRIMARY_GUARDS works. I currently like how the 3 minute retry trigger
works; I think it can enforce correct guard usage in various unhandleable edge
cases. I wonder if this time-based trigger should be the only way to go back to
our primary guards.

For example, consider Bob a travelling laptop user whose Internet is constantly
up and down. While Bob has no Internet, Tor will keep on cycling through
guards.  When Bob finally manages to connect to a guard, chances are it's going
to be a low priority guard, or Tor will already be in STATE_RETRY_ONLY. In that
case, Bob will connect to this shitty guard, and only after 3 minutes (max) it
will start retrying its primary guards. This way, Bob is going to expose
himself to lots of guards on the network over time. Maybe to reduce this
exposure, we should try to go back to STATE_PRIMARY_GUARDS in those cases? Tor
does a similar trick right now which has been very helpful:
    https://gitweb.torproject.org/tor.git/tree/src/or/entrynodes.c?id=tor-0.2.7....

Maybe an equivalent heuristic would be that if we are in STATE_RETRY_ONLY and
we manage to connect to a non-primary guard, we hang up the connection, and go
back into STATE_PRIMARY_GUARDS.

Can this heuristic be improved? I think it should be considered for the
algorithm.

Thanks!

Re: [tor-dev] Next version of the algorithm

George Kadianakis