[tor-dev] Meeting about the new guard algorithm proposal (prop259)

Sat Mar 19 06:20:29 UTC 2016

Reinaldo de Souza Jr <rjunior at thoughtworks.com> writes:

> [ text/plain ]
> Thank you.
>
> Another thing I'm interested in is how the proposed algorithm structure
> fits into current tor code. The proposed algorithm is:
>
>     OPEN_CIRCUIT:
>       context = ALGO_CHOOSE_ENTRY_GUARD_START(...)
>       while True:
>         entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
>         circuit = composeCircuitAndConnect(entryGuard)
>
>         if not SHOULD_CONTINUE(isSuccessful(circuit)):
>           ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard)
>           return circuit
>
>
>  I'd like to have ideas of current tor functions with similar purposes.
>  This is the correlation I was able to find by reading the source code:

Hmm, yes finding the right interface here is very important! We might find that
we need to change the structure of the proposed algorithm slightly to fit into
Tor's networking logic.

Here is a quick reply: 

>  a) OPEN_CIRCUIT()
>  Seems to be equivalent to circuit_establish_circuit()

Seems to be the case.

>  b) while True:
>  Seems to be equivalent to onion_populate_cpath(). It even has a
>  "timeout" after 32 tries!

I'm not actually sure if this is the loop you are looking for. I don't think
any networking happens in onion_populate_cpath() at all. I think that loop is
there just to make sure that the final yet-to-be-created circuit will have at
least one node that supports ntor (for crypto/security reasons). However no
networking has taken place yet; it's just doing checks on the hypothetical
future circuit.

Because of the asynchronous networking logic of Tor, I'm not sure if you will
find a while loop that does precisely what you want here.

When a circuit fails in Tor, there is some retry logic to make a new one to
carry out its job. This retry logic might be the loop you are looking for, but
I'm not sure if the logic is somewhere centralized, or if it's special for each
different type of cell/circuit.

Sorry for not being more helpful here, but I have to move now for the weekend.
I'd suggest you do some runtime analysis of Tor with plenty of logs added, to
find the right place. I will try to have more feedback for you on Wednesday.

>  c) ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
>  Seems to be equivalent to choose_good_entry_server() as used in
>  onion_extend_cpath().

Seems to be the case, yes.

>  d) composeCircuitAndConnect(entryGuard)
>  This is the most uncertain to me. It seems to be circuit_handle_first_hop().

Indeed circuit_handle_first_hop() seems to be the function opening the initial
connection to the guard, after the circuit has been constructed. The function
channel_connect_for_circuit() seems to be the one actually doing the dirty
networking work here; setting up the channel and calling connection_or_connect().

>  The issue is: we rely on `unreachable_since` being updated in case we
>  fail to connect to the guard before the next call to (c). In current tor
>  code, this is done by entry_guard_register_connect_status() but I got
>  lost tracking when it happens.

Hmm, this is the part where the asynchronous networking logic of tor takes
over. A good way to comprehend this IMO is the good ol' "put log statements
everywhere, run tor under various scenarios and check the code flow".

In any case, here is an attempt at untangling the code:

If something goes bad connecting to the guard, I think circuit_build_failed()
will be called eventually, which is one of the places that sets
unreachable_since via entry_guard_register_connect_status().

Then, since that circuit failed, the retry logic of Tor (the exact details here
depend on the type of circuit, etc.) will try to create a new circuit to
complete the job that the previous circuit was supposed to do. During this
second circuit creation, the first entry guard will already have been marked by
circuit_build_failed() so unreachable_since will have been set and the first
entry guard will be skipped.

I think this is approximately how it works, but I'd suggest you add log
statements in all the functions calling entry_guard_register_connect_status()
and run tor, to see it yourself because I might be wrong.

>  circuit_mark_for_close_() seems to be called when
>  circuit_handle_first_hop() fails but it's unclear to me if
[ 12 more citation lines. Click/Enter to show. ]
>  entry_guard_register_connect_status() will ever be called as part of
>  circuit_mark_for_close_() - and if there is any guarantee it will be
>  called before the next invocation of onion_extend_cpath().
>
>  e) SHOULD_CONTINUE(isSuccessful(circuit))
>  This is also tricky. If onion_extend_cpath() is our loop, this is
>  supposed to break it in some case.
>
>  It is very similar to how entry_guard_register_connect_status() is used
>  in channel_do_open_actions(). But similarly to
>  entry_guard_register_connect_status(), I'm also unsure if
>  channel_do_open_actions() is called as part of onion_extend_cpath().
>
>  f) ALGO_CHOOSE_ENTRY_GUARD_END()
>  It could be entry_guard_register_connect_status().

Plausible yes.

>  Sorry for such a dense email. I'm trying to make an informed decision
>  before following my gut seeing how it breaks.
>
>  If there is any technique or call graph tool you find useful to get such
>  information, it would be much appreciated.

printf or log_warn debugging is the technique I would suggest here. Make sure
you run tor on various types of networks to see what happens.

Unfortunately, I didn't have time to answer all your questions. Will try to
have some smart thoughts on this for Wednesday.

[Email doubly sent, because I forgot to CC tor-dev originally]