Ola Bini obini@thoughtworks.com writes:
Hi,
Sorry for the string of emails!
Hopefully a simple question: The current proposal contains logic for keeping track of network up/down and setting timeouts for exponential backoff to test the network again. But if I understand correctly, this proposal is basically about replacing the algorithm used for choose_good_entry_server() - correct? So it seems like keeping track of network status doesn't really belong inside this algorithm at all. Wouldn't it make sense to return a specific failure to the caller and let the caller be in charge of when to retry?
Hmm.
It's quite hard (if not impossible) for a user-land application like Tor to actually keep track of whether the network is up or down in a multiplatform, secure and scalable manner (e.g. without using the dirauths as an oracle). So instead all we do is connect to Tor nodes and check whether we could reach them or not.
I think currently if a relay fails to answer a CREATE cell, we treat the relay as unreachable and we mark it as such in our guardlist (see entry_guard_register_connect_status()). This logic is considered suboptimal and it sometimes ends up marking good relays as unreachable. The retry logic in prop259 tries to compensate for this, by eventually retrying nodes that might have been marked as offline by a shaky network.
I agree it would be nice to decouple the above behavior from the guard picking behavior. However, the two behaviors are certainly linked with each other, since the only nodes that a Tor client connects to are its guard nodes. I'm not sure how separating these two behaviors would look like, but if you guys think it would simplify things I'd definitely be interested in hearing about it :)
WRT current code, if you see choose_random_entry_impl() (which is called by choose_good_entry_server()) you will notice that the status of relays is very central to this function since populate_live_entry_guards() will remove any guards that are inactive or previously found unreachable from the guardlist.
choose_random_entry_impl() does not contain logic about the network itself being up and down, because currently Tor does not have such logic. Instead, Tor will endlessly keep on cycling through guards till the network comes up.
Hopefully I answered the right question here.
BTW, if you guys want we can have another meeting next week Tuesday same time iff that will be helpful to you.
cheers!