Hello there,
seems like the prop259 algorithm has kind of stabilized and you guys have jumped into implementation. That's great!
A small problem in this process is that I'm probably the only person in Tor who understands the new algorithm right now. We could fix this by doing a small proposal IRC meeting where you guys could summarize the current state of the algorithm, as well as provide some simulation results. I think that folks like Nick, Mike and isis could provide valuable feedback at this point.
Would you guys be interested in something like this? I'm fine with doing it next week at some point. Maybe Wednesday or Thursday? Maybe at 15:00 UTC? Let me know if that's convenient for you.
Cheers!
Sounds good to me, Iván and company?
That would be great!
I have a couple of questions that may help me better prepare for this meeting:
1) In the proposal we assume the arrival of a new consensus as a discrete event. Does this assumption match current tor implementation, or is it more like "having at least X relay descriptors available"? What is the entrypoint for taking actions after we receive a new consensus?
Keep in mind we refer to "receiving a new consensus" meaning "the list of guards has changed", they might be different things. We are interested in reacting to changes to the "list of guards".
2) Once we have a guard_t, how can we know if it "is present in the latest consensus"?
We found the property `bad_since` but it seems to have a different semantics: when the guard was considered "nonfunctional, unlisted, excluded, or otherwise nonusable" (according to `ENTRY_GUARD_REMOVE_AFTER` description).
We also found `entry_guard_set_status` and how it consider a guard to be unlisted: a failure to find a node with the same identity as the guard using node_get_by_id().
At this point, my understanding is whatever is "in the consensus" can be found by node_get_by_id() and `bad_since` can be used as an additional data for decision making. Is this correct?
3) Current tor implementation seems to prefer handling lists of node_t rather than entry_guard_t, is there a reason for this?
The proposal implementation currently manipulate lists of entry_guard_t but when we need to call functions in existing tor code (node_sl_choose_by_bandwidth) we need to convert from guard do node, using node_get_by_id().
As a consequence, if I'm correct about 2, this will automatically filter out unlisted guards.
On 3/18/16 04:19, George Kadianakis wrote:
Hello there,
seems like the prop259 algorithm has kind of stabilized and you guys have jumped into implementation. That's great!
A small problem in this process is that I'm probably the only person in Tor who understands the new algorithm right now. We could fix this by doing a small proposal IRC meeting where you guys could summarize the current state of the algorithm, as well as provide some simulation results. I think that folks like Nick, Mike and isis could provide valuable feedback at this point.
Would you guys be interested in something like this? I'm fine with doing it next week at some point. Maybe Wednesday or Thursday? Maybe at 15:00 UTC? Let me know if that's convenient for you.
Cheers!
Reinaldo de Souza Jr rjunior@thoughtworks.com writes:
That would be great!
OK. Let's say Wednesday 15:00 UTC then!
I have a couple of questions that may help me better prepare for this meeting:
- In the proposal we assume the arrival of a new consensus as a
discrete event. Does this assumption match current tor implementation, or is it more like "having at least X relay descriptors available"? What is the entrypoint for taking actions after we receive a new consensus?
Indeed, in the current tor implementation it's a discrete event.
See entry_guards_compute_status() called by directory_info_has_arrived().
Keep in mind we refer to "receiving a new consensus" meaning "the list of guards has changed", they might be different things. We are interested in reacting to changes to the "list of guards".
- Once we have a guard_t, how can we know if it "is present in the
latest consensus"?
We found the property `bad_since` but it seems to have a different semantics: when the guard was considered "nonfunctional, unlisted, excluded, or otherwise nonusable" (according to `ENTRY_GUARD_REMOVE_AFTER` description).
We also found `entry_guard_set_status` and how it consider a guard to be unlisted: a failure to find a node with the same identity as the guard using node_get_by_id().
At this point, my understanding is whatever is "in the consensus" can be found by node_get_by_id() and `bad_since` can be used as an additional data for decision making. Is this correct?
Took a quick look and I think you are right.
node_get_by_id() uses the global nodelist for lookups. The global nodelist gets updated everytime tor receives a new consensus. So if node_get_by_id() can't find a node, it means it was not listed in the previous consensus.
Specifically in entry_guards_compute_status() we have:
SMARTLIST_FOREACH_BEGIN(entry_guards, entry_guard_t *, entry) { const node_t *r = node_get_by_id(entry->identity); const char *reason = NULL; if (entry_guard_set_status(entry, r, now, options, &reason)) changed = 1; ... } SMARTLIST_FOREACH_END(entry);
- Current tor implementation seems to prefer handling lists of node_t
rather than entry_guard_t, is there a reason for this?
I think tor uses entry_guard_t simply as a store of guard-related information about a node. However, to actually do networking with a node (e.g. connect to it) you need to go from entry_guard_t to node_t, because it's node_t that has networking capabilities.
Cheers!
The proposal implementation currently manipulate lists of entry_guard_t but when we need to call functions in existing tor code (node_sl_choose_by_bandwidth) we need to convert from guard do node, using node_get_by_id().
As a consequence, if I'm correct about 2, this will automatically filter out unlisted guards.
On 3/18/16 04:19, George Kadianakis wrote:
Hello there,
seems like the prop259 algorithm has kind of stabilized and you guys have jumped into implementation. That's great!
A small problem in this process is that I'm probably the only person in Tor who understands the new algorithm right now. We could fix this by doing a small proposal IRC meeting where you guys could summarize the current state of the algorithm, as well as provide some simulation results. I think that folks like Nick, Mike and isis could provide valuable feedback at this point.
Would you guys be interested in something like this? I'm fine with doing it next week at some point. Maybe Wednesday or Thursday? Maybe at 15:00 UTC? Let me know if that's convenient for you.
Cheers!
Thank you.
Another thing I'm interested in is how the proposed algorithm structure fits into current tor code. The proposed algorithm is:
OPEN_CIRCUIT: context = ALGO_CHOOSE_ENTRY_GUARD_START(...) while True: entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) circuit = composeCircuitAndConnect(entryGuard)
if not SHOULD_CONTINUE(isSuccessful(circuit)): ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard) return circuit
I'd like to have ideas of current tor functions with similar purposes. This is the correlation I was able to find by reading the source code:
a) OPEN_CIRCUIT() Seems to be equivalent to circuit_establish_circuit()
b) while True: Seems to be equivalent to onion_populate_cpath(). It even has a "timeout" after 32 tries!
c) ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) Seems to be equivalent to choose_good_entry_server() as used in onion_extend_cpath().
d) composeCircuitAndConnect(entryGuard) This is the most uncertain to me. It seems to be circuit_handle_first_hop().
The issue is: we rely on `unreachable_since` being updated in case we fail to connect to the guard before the next call to (c). In current tor code, this is done by entry_guard_register_connect_status() but I got lost tracking when it happens.
circuit_mark_for_close_() seems to be called when circuit_handle_first_hop() fails but it's unclear to me if entry_guard_register_connect_status() will ever be called as part of circuit_mark_for_close_() - and if there is any guarantee it will be called before the next invocation of onion_extend_cpath().
e) SHOULD_CONTINUE(isSuccessful(circuit)) This is also tricky. If onion_extend_cpath() is our loop, this is supposed to break it in some case.
It is very similar to how entry_guard_register_connect_status() is used in channel_do_open_actions(). But similarly to entry_guard_register_connect_status(), I'm also unsure if channel_do_open_actions() is called as part of onion_extend_cpath().
f) ALGO_CHOOSE_ENTRY_GUARD_END() It could be entry_guard_register_connect_status().
Sorry for such a dense email. I'm trying to make an informed decision before following my gut seeing how it breaks.
If there is any technique or call graph tool you find useful to get such information, it would be much appreciated.
Reinaldo de Souza Jr rjunior@thoughtworks.com writes:
[ text/plain ] Thank you.
Another thing I'm interested in is how the proposed algorithm structure fits into current tor code. The proposed algorithm is:
OPEN_CIRCUIT: context = ALGO_CHOOSE_ENTRY_GUARD_START(...) while True: entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) circuit = composeCircuitAndConnect(entryGuard) if not SHOULD_CONTINUE(isSuccessful(circuit)): ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard) return circuit
I'd like to have ideas of current tor functions with similar purposes. This is the correlation I was able to find by reading the source code:
Hmm, yes finding the right interface here is very important! We might find that we need to change the structure of the proposed algorithm slightly to fit into Tor's networking logic.
Here is a quick reply:
a) OPEN_CIRCUIT() Seems to be equivalent to circuit_establish_circuit()
Seems to be the case.
b) while True: Seems to be equivalent to onion_populate_cpath(). It even has a "timeout" after 32 tries!
I'm not actually sure if this is the loop you are looking for. I don't think any networking happens in onion_populate_cpath() at all. I think that loop is there just to make sure that the final yet-to-be-created circuit will have at least one node that supports ntor (for crypto/security reasons). However no networking has taken place yet; it's just doing checks on the hypothetical future circuit.
Because of the asynchronous networking logic of Tor, I'm not sure if you will find a while loop that does precisely what you want here.
When a circuit fails in Tor, there is some retry logic to make a new one to carry out its job. This retry logic might be the loop you are looking for, but I'm not sure if the logic is somewhere centralized, or if it's special for each different type of cell/circuit.
Sorry for not being more helpful here, but I have to move now for the weekend. I'd suggest you do some runtime analysis of Tor with plenty of logs added, to find the right place. I will try to have more feedback for you on Wednesday.
c) ALGO_CHOOSE_ENTRY_GUARD_NEXT(context) Seems to be equivalent to choose_good_entry_server() as used in onion_extend_cpath().
Seems to be the case, yes.
d) composeCircuitAndConnect(entryGuard) This is the most uncertain to me. It seems to be circuit_handle_first_hop().
Indeed circuit_handle_first_hop() seems to be the function opening the initial connection to the guard, after the circuit has been constructed. The function channel_connect_for_circuit() seems to be the one actually doing the dirty networking work here; setting up the channel and calling connection_or_connect().
The issue is: we rely on `unreachable_since` being updated in case we fail to connect to the guard before the next call to (c). In current tor code, this is done by entry_guard_register_connect_status() but I got lost tracking when it happens.
Hmm, this is the part where the asynchronous networking logic of tor takes over. A good way to comprehend this IMO is the good ol' "put log statements everywhere, run tor under various scenarios and check the code flow".
In any case, here is an attempt at untangling the code:
If something goes bad connecting to the guard, I think circuit_build_failed() will be called eventually, which is one of the places that sets unreachable_since via entry_guard_register_connect_status().
Then, since that circuit failed, the retry logic of Tor (the exact details here depend on the type of circuit, etc.) will try to create a new circuit to complete the job that the previous circuit was supposed to do. During this second circuit creation, the first entry guard will already have been marked by circuit_build_failed() so unreachable_since will have been set and the first entry guard will be skipped.
I think this is approximately how it works, but I'd suggest you add log statements in all the functions calling entry_guard_register_connect_status() and run tor, to see it yourself because I might be wrong.
circuit_mark_for_close_() seems to be called when circuit_handle_first_hop() fails but it's unclear to me if
[ 12 more citation lines. Click/Enter to show. ]
entry_guard_register_connect_status() will ever be called as part of circuit_mark_for_close_() - and if there is any guarantee it will be called before the next invocation of onion_extend_cpath().
e) SHOULD_CONTINUE(isSuccessful(circuit)) This is also tricky. If onion_extend_cpath() is our loop, this is supposed to break it in some case.
It is very similar to how entry_guard_register_connect_status() is used in channel_do_open_actions(). But similarly to entry_guard_register_connect_status(), I'm also unsure if channel_do_open_actions() is called as part of onion_extend_cpath().
f) ALGO_CHOOSE_ENTRY_GUARD_END() It could be entry_guard_register_connect_status().
Plausible yes.
Sorry for such a dense email. I'm trying to make an informed decision before following my gut seeing how it breaks.
If there is any technique or call graph tool you find useful to get such information, it would be much appreciated.
printf or log_warn debugging is the technique I would suggest here. Make sure you run tor on various types of networks to see what happens.
Unfortunately, I didn't have time to answer all your questions. Will try to have some smart thoughts on this for Wednesday.
[Email doubly sent, because I forgot to CC tor-dev originally]