commit 9b2886f674c5ed835962b6788f0272e38cc33977 Author: Mike Perry mikeperry-git@fscked.org Date: Wed Oct 10 16:51:45 2012 -0700
Add path bias tuning proposal. --- proposals/xxx-path-bias-tuning.txt | 196 ++++++++++++++++++++++++++++++++++++ 1 files changed, 196 insertions(+), 0 deletions(-)
diff --git a/proposals/xxx-path-bias-tuning.txt b/proposals/xxx-path-bias-tuning.txt new file mode 100644 index 0000000..b6cf165 --- /dev/null +++ b/proposals/xxx-path-bias-tuning.txt @@ -0,0 +1,196 @@ +Title: Tuning the Parameters for the Path Bias Defense +Author: Mike Perry +Created: 01-10-2012 +Status: Open +Target: 0.2.4.x+ + + +Overview + + This proposal describes how we can use the results of network scans to + set reasonable limits for the Path Bias defense, which causes clients to + be informed about and ideally rotate away from Guards that provide + extremely low circuit success rates. + +Motivation + + The Path Bias defense is designed to defend against a type of route + capture where malicious Guard nodes deliberately fail circuits that + are determined to extend to non-colluding Exit nodes to maximize + their network utilization in favor of carrying only compromised + traffic. + + This attack was explored in the academic literature in [1], and a + variant involving cryptographic tagging was posted to tor-dev[2] in + March. + + In the extreme, the attack allows an adversary that carries c/n + of the network capacity to deanonymize c/n of the network + connections, breaking the O((c/n)^2) property of Tor's original + threat model. + +Design Description + + The Path Bias defense is a client-side accounting mechanism in Tor that + tracks the circuit failure rate for each of the client's guards. + + Clients maintain two integers for each of their guards: a count of the + number of times a circuit was extended at least one hop through that + guard, and a count of the number of circuits that successfully complete + through that guard. The ratio of these two numbers is used to determine + a circuit success rate for that Guard. + + The system should issue a notice log message when Guard success rate + falls below 70%, a warn when Guard success rate falls below 50%, and + should drop the Guard when the success rate falls below 30%. + + To ensure correctness, checks are performed to ensure that + we do not count successes without also counting the first hop. + + Similarly, to provide a moving average of recent Guard activity while + still preserving the ability to ensure correctness, we "scale" the + success counts by an integer divisor (currently 2) when the counts + exceed the moving average window (300) and when the division + does not produce integer truncation. + + No log messages should be displayed, nor should any Guard be + dropped until it has completed more than 150 first hops. + +Analysis: Simulation + + To test the defense in the face of various types of malicious and + non-malicious Guard behavior, I wrote a simulation program in + Python[3]. + + The simulation confirmed that without any defense, an adversary + that provides c/n of the network capacity is able to observe c/n + of the network flows using circuit failure attacks. + + It also showed that with the defense, an adversary that wishes to + evade detection has compromise rates bounded by: + + P(compromise) <= (c/n)^2 * (100/CUTOFF_PERCENT) + circs_per_client <= circuit_attempts*(c/n) + + In this way, the defense restores the O((c/n)^2) compromise property, + but unfortunately only over long periods of time (see Security + Considerations below). + + The spread between the cutoff values and the normal rate of circuit + success has a substantial effect on false positives. From the + simulation's results, the sweet spot for the size of this spread appears + to be 10%. In other words, we want to set the cutoffs such that they are + 10% below the success rate we expect to see in normal usage. + + The simulation also demonstrates that larger "scaling window" sizes + reduce false positives for instances where non-malicious guards + experience some ambient rate of circuit failure. + +Analysis: Live Scan + + Preliminary Guard node scanning using the txtorcon circuit scanner[4] + shows normal circuit completion rates between 80-90% for most Guard + nodes. + + However, it also showed that CPU overload conditions can easily push + success rates as low as 45%. Even more concerning is that for a brief + period during the live scan, success rates dropped to 50-60% + network-wide (regardless of Guard node choice). + + Based on these results, the notice condition should be 70%, the warn + condition should be 50%, and the drop condition should be 30%. + +Future Analysis: Deployed Clients + + It's my belief that further analysis should be done by deploying + loglines for all three thresholds in clients in the live network + and utilize user reports on how often high rates of circuit failure + are seen before we deploy changes to rotate away from failing Guards. + + I believe these log lines should be deployed in 0.2.3.x clients, + to maximize the exposure of the code to varying network conditions, + so that we have enough data to consider deploying the Guard-dropping + cutoff in 0.2.4.x. + +Security Considerations + + Circuit failure can occur for many reasons. The most common of which is + CPU resource exhaustion at high speed relays. + + While the scaling window does provide freshness, it also creates the + possibility of conditions where clients can be forced off their Guards + due to temporary network-wide or Guard-specific CPU DoS. This provides + another reason beyond false positive concerns to set the window as + large is reasonable. + + In my mind, the value to aim for here is a number of circuits large + enough to put this attack vector on par with the timescales used by + the bandwidth authorities to redirect client activity away from + loaded (ie DoSed) nodes. + + Simulation results show that a Guard node that previously + succeeded 80% of its circuits would need to be DoSed down to + 5% success rate for around 230 circuit attempts before the client + would reject it, using a scaling window of 300 circuits. + + Assuming one circuit per Guard per 10 minutes of normal client + activity, this is a sustained DoS attack of around 38 hours, + which is on the order of the bandwidth authority measurement + interval (nodes are measured every 1-2 days). + + The converse of this, though, is that larger values allow + Guard nodes to compromise clients for duty cycles of around the + size of this window (up to the (c/n)^2 * 100/CUTOFF_PERCENT + limit in aggregate), so we do have to consider that trade off. + +Implementation Notes: Log Messages + + Log messages need to be chosen with care to avoid alarming users. + I suggest: + + Notice: "It appears that your Guard %s is failing a larger number of + circuits than usual. This could mean that the Guard is overloaded, or + the Tor network is overloaded. Success counts are %d/%d." + + Warn: "It appears that your Guard %s is failing a very large number of + circuits. Most likely, this could mean that the Guard is overloaded, + or the Tor network is overloaded, but it could also mean an attack + against you or the Guard. Success counts are %d/%d." + + Drop: "It appears that your Guard %s is failing an extremely large + number of circuits. [Tor has disabled use of this Guard.] Success + counts are %d/%d." + + The second piece of the Drop message would not be present in 0.2.3.x, + since the Guard won't actually be dropped. + +Implementation Notes: Consensus Parameters + + The following consensus parameters reflect the constants listed + in the proposal. These parameters should also be available + for override in torrc. + + pb_mincircs=150 + The minimum number of first hops before we log or drop Guards. + + pb_noticepct=70 + The threshhold of circuit success below which we display a notice. + + pb_warnpct=50 + The threshhold of circuit success below which we display a warn. + + pb_disablepct=30 + The threshhold of circuit success below which we disable the guard. + + pb_scalecircs=300 + The number of first hops at which we scale the counts down. + + pb_scalefactor=2 + The integer divisor by which we scale. + + + +1. http://freehaven.net/anonbib/cache/ccs07-doa.pdf +2. https://lists.torproject.org/pipermail/tor-dev/2012-March/003347.html +3. https://gitweb.torproject.org/torflow.git/tree/HEAD:/CircuitAnalysis/PathBia... +4. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/failu...