[tor-dev] Next version of the algorithm
desnacked at riseup.net
Tue Feb 16 17:20:31 UTC 2016
> Hi there, >
> ALGO_CHOOSE_ENTRY_GUARD keeps track of unreachable status for guards in state
> private to the algorithm - this is initialized every time
> ALGO_CHOOSE_ENTRY_GUARD_START is called.
>> Interesting. That seems like both a bug and a feature in some ways.
>> It's a security feature because we will try our guard list from the
>> beginning more frequently.
>> It's a performance "bug" because we have to cycle through all the
>> nodes everytime we restart the algorithm, because we forgot they were
>> unreachable. If the first multiple guards in your USED_GUARDS are actually
>> unreachable, then this will delay bootstrap by some time. Consider the case
>> where you need to make three circuits to connect to a hidden service as a
>> client (HSDir/IP/RP), so you have to call the algorithm three times in a
>> Of course, if a guard is really unreachable it _should_ be marked as bad
>> an hour because it won't be listed in the next consensus. While this makes
>> sense, I wonder why my laptop guard list (in the state file) has a total of
>> guards, where 18 of them are marked as unreachable and only 6 of them are
>> marked as bad. Maybe they were all marked unreachable when the internet was
>> down. I wonder if this influences the performance of the algorithm.
>> It would be nice to know if the security/performance tradeoff here is
>> acceptable. Simulations might help, or we will have to learn it the hard way
>> when we implement the algorithm and try it out in various types of
> Yes, interesting. Hmm. I'll try to come up with an unreachable measure
> that sits outside of the algorithm, and see if we can simulate both
Returning to this for a bit. I think it would be good to decide whether we
should keep the unreachable status of guards on permannet disk state or
very latest prop259 basically forgets the unreachable guard status as soon as
the algorithm terminates. I wonder if we actually want this. Hopefully guardsim
has a simulation scenario that will illustrate whether that's a good idea or
As an example of a troublesome edge case, consider Alice who operates a busy
hidden service that gets dozens of client requests per second. If the first few
guards on Alice's guardlist are actually offline, Tor will have to spend a few
seconds probing them for _every_ client request (to make the corresponding
rendezvous circuit). That seems like it will definitely influence performance.
I'd also like to point out another security consideration on how
STATE_PRIMARY_GUARDS works. I currently like how the 3 minute retry trigger
works; I think it can enforce correct guard usage in various unhandleable edge
cases. I wonder if this time-based trigger should be the only way to go back to
our primary guards.
For example, consider Bob a travelling laptop user whose Internet is constantly
up and down. While Bob has no Internet, Tor will keep on cycling through
guards. When Bob finally manages to connect to a guard, chances are it's going
to be a low priority guard, or Tor will already be in STATE_RETRY_ONLY. In that
case, Bob will connect to this shitty guard, and only after 3 minutes (max) it
will start retrying its primary guards. This way, Bob is going to expose
himself to lots of guards on the network over time. Maybe to reduce this
exposure, we should try to go back to STATE_PRIMARY_GUARDS in those cases? Tor
does a similar trick right now which has been very helpful:
Maybe an equivalent heuristic would be that if we are in STATE_RETRY_ONLY and
we manage to connect to a non-primary guard, we hang up the connection, and go
back into STATE_PRIMARY_GUARDS.
Can this heuristic be improved? I think it should be considered for the
More information about the tor-dev