On 13 Aug (15:47:43), George Kadianakis wrote:
Hello friends :)
This is a post to discuss how Tor should treat its entry guards when its network goes down. This is part of ticket #12595 [0] which aims to design better interfaces and data structures for entry guards.
This thread investigates what should happen when the network goes down and Tor's connection to a guard fails. How should Tor recognize that and connect back to that guard when the network goes up again?
I recently sent an email to tor-dev [1] which explains Tor's current behavior and its problems. tl;dr there are edge cases [2] where Tor will not detect the "network back up" event and will connect to low priority guards instead of connecting to its primary guards.
The fundamental issue here is that Tor does not have a primitive that detects whether the network is up or down, since any such primitive stands out to a network attacker [3]. This means that when Tor fails to connect to an entry guard, Tor can never be sure whether the guard was actually down or whether the network is down, and that complicates its algorithm significantly.
That's a difficult thing to detect but there are ways to have a fairly good estimate that the network is down versus the end point.
I'm going to describe different scenario that can happen with the network and considered "network down" event.
1) No power on the interface so basically an unplugged cable or dead switch.
This will trigger an event on the net device which will set the interface in a DOWN state thus not having the IFF_UP flag in the kernel (netdevice(7)). That can be probed with an ioctl() using SIOCGIFFLAGS. That does NOT indicate connectivity or not to the network, simply that the interface can not physically transmit data.
2) The default route or the network route to the address you are trying to reach do not exist.
You'll end up with a connect error set to "ENETUNREACH", "ENETDOWN", "EADDRNOTAVAIL" (this one is mostly for multicast and bind()) or "EHOSTUNREACH". The above can happen for instance if your interface loses its address or/and the route table changes. But not that those error code are event detected by the kernel thus NOTHING will come out of your computer onto the network.
3) Everything is quiescent but the network drops everything.
In that case, your connect() will most likely fail with a timeout in blocking mode or loop with EINPROGRESS for non blocking.
(Please add some scenario here if I'm missing some!).
That being said, we could do a combination of those to try to assert as best as possible if it's a network issue or not. Note that this is pretty Linux specific for the ioctl part(). Point 2) and 3) are POSIX standard code.
In #12595, I laid down a few different ways that Tor could improve its current "network down" entry guard algorithm [4]. After thinking some more, I've been leaning towards algorithm (a), which is:
Everytime we manage to connect to a guard, if it's not the top guard in our list, mark all previous guards as retriable and try again from the top.
For me, this seems like the most complete way of ensuring that the guards in the top of the list are always going to get a fair hearing even if the network was down when they were first probed.
However, that algorithm is not without its problems. Here are some notes:
This algorithm suffers from an infinite loop.
The naive version suffers from an obvious infinite loop if the first guards in our list are actually down. To protect against that, we will probably need to rewrite the algorithm to:
Everytime we manage to connect to a guard, if it's not the top guard in our list, mark all previous guards as retriable and try again from the top: this time pick whichever guard we can connect to even if it's not the top one.
This algorithm is not actually robust.
If we wanted to design a truly robust algorithm, it would have to be robust even against adversarial "network down" events. That is, the algorithm would need to work well even if you imagine the network as an "on"/"off" switch that the adversary can toggle at will.
Here is how that algorithm can fail against such an adversary:
- Tor starts up. Adversary switches network off.
- Tor starts going through guards and fails to connect to them.
- Adversary switches network on.
- Tor detects "network up" event, marks all guards as up and goes from the top.
- Adversary switches network off.
- Tor starts going through guards and fails to connect to them.
- Adversary switches network on, and Tor establishes a circuit to the guard that it was currently walking over.
I think the crucial point here is 5), this has to be detected and if so, you stop, reflag everyone and try again when you get a network up event.
If the adversary is a LAN adversary, she learns the order of the guards in Tor's list, so she can basically choose whichever guard she wants.
FWIW, whether such an adversary is realistic is up for debate.
This algorithm is not very elegant.
This algorithm might make guard fingerprinting worse.
Imagine that the first 2 guards in your list are actually down. Everytime Tor detects a "network up" event, it will attempt to connect to those 2 dead guards before connecting to the third guard which is up.
Tor should do a best effort to detect dead guard versus network issues. This is doable. I don't think it's going to be 100% reliable to differentiate the event but I pretty sure we can get to a very close estimate. (see the notes above on the network up/down detection).
When detecting a non connecting guards (thus assuming dead for a period of time of X), would it be possible to simply choose a new one?
. A LAN/WAN adversary will be able to see those failed connections and the third successful connection, which form a nice tight guard fingerprinting vector (similar to #10969).
An attacker looking at the network and simply dropping packets so every connect times out... It's a bit of a lose game. That might be mitigated a bit with tor going to sleep for a while if too many unsuccessful connect() in a row but that will just make the attack go slower instead in the long run...
Whatever is used, a network down issue can be detected locally on the machine where the guard down implies sending data out of the box thus revealing your plan.
Cheers! David
And that's all folks.
I'm looking forward to some feedback on the proposed algorithms as well as improvements and suggestions.
PS: I think that Tor was doing a trick with UDP to learn its public IP address or something. I need to read up on that trick, and see if it can be used to build a "network up" primitive.
https://trac.torproject.org/projects/tor/ticket/12595#comment:6
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev