[tor-dev] Onion Service Intro Point Retry Behavior

Fri Nov 1 17:51:37 UTC 2019

Hi David,

On 29/10/2019 14:52, David Goulet wrote:
> Long story short, couple weeks ago we've almost merged a new behavior on the
> service side with #31561 that would have ditch an intro point if its circuit
> would time out instead of retrying it. (Today, a service always retry their
> intro point up to 3 times on any type of circuit failure.)

Thanks for not merging this yet. :-)

> The primary original argument for retrying is based on the mobile use case. If
> a .onion is running on a cellphone and the network happens to be bad all the
> sudden, the service is better off to re-establish the intro circuits which
> would make the retry attempts of the client to finally succeed after a bit
> instead of having to re-fetch a descriptor and go to the new intro points.
> 
> Thus, in theory, it is mostly a reachability argument.
> 
> One question that can arise from this is: Will the client be able to reconnect
> using the old intro points by the time the service re-established?
> 
> In other words, is the retry behavior of the *client* allows enough time for
> the service to stabilize for the mobile use case? I'm curious to learn from
> people with experience with this!

For what it's worth, we used to run into the following problem with Briar:

* Device X tries to connect to device Y's hidden service
* X has a cached descriptor for Y's HS
* Since the time when X cached the descriptor, Y has lost its guard
connection, so it's built new intro circuits to new intro points
* After multiple connection attempts, X gives up on the intro points in
the cached descriptor and fetches a new descriptor
* This causes a delay in X connecting to Y

A typical mobile device loses its guard connection frequently - not
necessarily because it loses internet access, but because it switches
between wifi and mobile data. So the scenario above was very common.

Before the HS behaviour was changed to reuse the old intro points, we
had to maintain a patch against Tor to add a controller command for
flushing a cached HS descriptor before trying to connect. This
essentially made the client's descriptor cache redundant, so it was a
slight loss of efficiency, but better than trying a bunch of stale intro
points and then fetching a new descriptor anyway.

If you're considering switching back to the old behaviour, I'd like to
discuss whether we could make one of the following changes to continue
supporting the mobile HS use case:

1. Add a controller command for flushing an HS descriptor
2. Add a controller command for notifying Tor that we lost/gained
internet access, or switched between wifi and mobile data, so Tor knows
that (a) its guard connection may be dead, and (b) its intro circuits
may be dead, but not due to an attack by the intro points, so it can
safely reuse the intro points
3. If intro circuits are closed due to DisableNetwork changing from 0 to
1, remember this and reuse the intro points when the network is re-enabled

Android notifies apps of connectivity changes, so Briar could easily
pass this information on to Tor via a new controller command or by
setting DisableNetwork. (The general problem of detecting whether our
internet connectivity is broken for some definition of broken remains
hard, but fortunately we don't need to solve that to handle the common
cases of switching between wifi and mobile data, and losing mobile
signal, which the OS can tell us about.)

My one-sided two cents. ;-)

Cheers,
Michael