Thus spake Mike Perry (mikeperry@torproject.org):
Thus spake Mike Perry (mikeperry@torproject.org):
Also exists at https://gitweb.torproject.org/user/mikeperry/torspec.git/blob/path-bias-tuni...
I've updated this proposal to address some questions and comments from people who have reviewed it via private email. The url for these changes is: https://gitweb.torproject.org/user/mikeperry/torspec.git/blob/path-bias-tuni...
The following sections were added: "Security Considerations: Targeted Failure Attacks" "Implementation Notes: Differences between this proposal and source"
I also added a couple paragraphs to the Motivation and Design Description sections, to clarify some points.
During the course of off-list discussion, implementation, and testing, I've decided to make the following major changes in the code that are not yet reflected in the proposal:
1. Instead of counting circuit attempts after the first hop succeeds, we want to wait until the second hop also succeeds. The reason is because there currently is a large amount of variation in the per-hop rate of onionskin failure due to CPU overload conditions. During testing, I watched the end-to-end circuit success rate repeatedly fluctuate between 90% and 50%, with the difference being due almost entirely to per-hop onionskin failure with reasons RESOURCELIMIT and/or INTERNAL (CPU overload).
Waiting until the second hop completes removes a lot of the effect of this without impacting what we're looking for (guard-to-exit bias). To see why, imagine that each node occasionally experiences as much as 25% onionskin failure. If we count after the first hop in such a network, that's a success rate of only (1-.25)*(1-.25) = 0.56, which triggers our notice alarm (set at 70%). However, if we wait until two hops, the alarm should not trigger in this same scenario.
Further, because of this squaring of the per-hop success rates, per-hop failure is way less appealing to the adversary than end-to-end tagging for the same amount of network compromise. An adversary that controls 30% of the network would have to drive the end-to-end circuit success rate down to 9% for per-hop failure, but only 30% for end-to-end tagging-based failure. In either case, we'd still catch the per-hop adversary as they failed their last hop, just as the end-to-end tagger would.
Thanks for Anupam Das for bringing the per-hop failure issue up.
2. It turns out we also need to track successful circuit use as opposed to just construction. There are ways to use cryptographic tagging after a circuit is successfully built that enable an adversary to either destroy that circuit before use, or simply timeout/fail all stream attempts on that circuit.
In the implementation, we wait until the circuit is marked for close to decide this. I currently consider a circuit successfully used if it gets a valid RELAY response in its lifetime. A circuit is unsuccessfully used if it is marked dirty but did not receive any successful cells back from the exit, or if it closes unexpectedly before we get a chance to try to use it.
Thanks to Rob Jansen for pointing this out.
3. We can't count path bias for circuits where the adversary controls our destination node, because that node could be selected to cause us to repeatedly fail circuits to it and make us distrust our guard nodes. Right now, this is limited to client-side hidden service INTRO circuits, and server-side REND circuits.
Arguably the server-side REND case is more serious than the client-side INTRO case, except for the fact that malicious hidden service web pages could source lots of third party content elements from failing .onions, causing us to mistrust our guard nodes. (Per-origin stream isolation can't fix this, unfortunately.)
This one I discovered during testing.
4. Related to 2 and 3: If a stream merely times out or experiences any other "retriable" failure that causes us to simply try another circuit, we need to "probe" that circuit with a faux RELAY_BEGIN cell to ensure if we get that cell back, and don't get any (potentially tagged) unrecognized garbage in the interim. In addition to catching RELAY-cell taggers, this also helps us avoid the situation where an adversary forces us to repeatedly connect to an unresponsive Internet server. See #7691.
This one I discovered during testing.