On Mon, Aug 12, 2013 at 09:14:19PM -0400, Nick Mathewson wrote:
I propose that in 0.2.5.x, Tor clients stop sending CREATE_FAST cells, and use CREATE or CREATE2 cells instead as appropriate.
I'm a fan. Especially since some relays (like mine) have upgraded to Tor 0.2.5.x but their OpenSSL isn't new enough to have the new ciphers for stronger TLS.
There's one problem that I foresee though: in circuit_build_failed():
if (circ->cpath && circ->cpath->state != CPATH_STATE_OPEN) { /* We failed at the first hop. If there's an OR connection * to blame, blame it. Also, avoid this relay for a while, and * fail any one-hop directory fetches destined for it. */ [...] log_info(LD_OR, "Our circuit failed to get a response from the first hop " "(%s). I'm going to try to rotate to a better connection.", channel_get_canonical_remote_descr(n_chan)); n_chan->is_bad_for_new_circs = 1; [...] entry_guard_register_connect_status(n_chan_id, 0, 1, time(NULL));
In short, if the circuit fails before it's established its first hop, the Tor client blames the or_connection, marks the relay unusable, and moves on.
That behavior makes sense in the land of create_fast, since not getting a response is a sign of a real problem with the TCP connection to the relay (most commonly this happens because the TCP connection is dead but the kernel hasn't told us yet).
So, if we implement this proposal, it seems to me that we're going to have to teach circuit_build_failed() about *why* the circuit failed, so we can blame the or_conn if it's a timeout (see circuit_expire_building()), but not blame it if it got an explicit destroy cell.
I also pondered whether we would have weird interaction with our cbt code, e.g. if some of our guards have long delays for processing creates, and some don't. But I guess maybe that is a feature not a bug?
--Roger