[tor-commits] [torspec/master] Add path bias tuning proposal.
nickm at torproject.org
nickm at torproject.org
Thu Oct 11 14:34:57 UTC 2012
Author: Mike Perry <mikeperry-git at fscked.org>
Date: Wed Oct 10 16:51:45 2012 -0700
Add path bias tuning proposal.
proposals/xxx-path-bias-tuning.txt | 196 ++++++++++++++++++++++++++++++++++++
1 files changed, 196 insertions(+), 0 deletions(-)
diff --git a/proposals/xxx-path-bias-tuning.txt b/proposals/xxx-path-bias-tuning.txt
new file mode 100644
@@ -0,0 +1,196 @@
+Title: Tuning the Parameters for the Path Bias Defense
+Author: Mike Perry
+ This proposal describes how we can use the results of network scans to
+ set reasonable limits for the Path Bias defense, which causes clients to
+ be informed about and ideally rotate away from Guards that provide
+ extremely low circuit success rates.
+ The Path Bias defense is designed to defend against a type of route
+ capture where malicious Guard nodes deliberately fail circuits that
+ are determined to extend to non-colluding Exit nodes to maximize
+ their network utilization in favor of carrying only compromised
+ This attack was explored in the academic literature in , and a
+ variant involving cryptographic tagging was posted to tor-dev in
+ In the extreme, the attack allows an adversary that carries c/n
+ of the network capacity to deanonymize c/n of the network
+ connections, breaking the O((c/n)^2) property of Tor's original
+ threat model.
+ The Path Bias defense is a client-side accounting mechanism in Tor that
+ tracks the circuit failure rate for each of the client's guards.
+ Clients maintain two integers for each of their guards: a count of the
+ number of times a circuit was extended at least one hop through that
+ guard, and a count of the number of circuits that successfully complete
+ through that guard. The ratio of these two numbers is used to determine
+ a circuit success rate for that Guard.
+ The system should issue a notice log message when Guard success rate
+ falls below 70%, a warn when Guard success rate falls below 50%, and
+ should drop the Guard when the success rate falls below 30%.
+ To ensure correctness, checks are performed to ensure that
+ we do not count successes without also counting the first hop.
+ Similarly, to provide a moving average of recent Guard activity while
+ still preserving the ability to ensure correctness, we "scale" the
+ success counts by an integer divisor (currently 2) when the counts
+ exceed the moving average window (300) and when the division
+ does not produce integer truncation.
+ No log messages should be displayed, nor should any Guard be
+ dropped until it has completed more than 150 first hops.
+ To test the defense in the face of various types of malicious and
+ non-malicious Guard behavior, I wrote a simulation program in
+ The simulation confirmed that without any defense, an adversary
+ that provides c/n of the network capacity is able to observe c/n
+ of the network flows using circuit failure attacks.
+ It also showed that with the defense, an adversary that wishes to
+ evade detection has compromise rates bounded by:
+ P(compromise) <= (c/n)^2 * (100/CUTOFF_PERCENT)
+ circs_per_client <= circuit_attempts*(c/n)
+ In this way, the defense restores the O((c/n)^2) compromise property,
+ but unfortunately only over long periods of time (see Security
+ Considerations below).
+ The spread between the cutoff values and the normal rate of circuit
+ success has a substantial effect on false positives. From the
+ simulation's results, the sweet spot for the size of this spread appears
+ to be 10%. In other words, we want to set the cutoffs such that they are
+ 10% below the success rate we expect to see in normal usage.
+ The simulation also demonstrates that larger "scaling window" sizes
+ reduce false positives for instances where non-malicious guards
+ experience some ambient rate of circuit failure.
+Analysis: Live Scan
+ Preliminary Guard node scanning using the txtorcon circuit scanner
+ shows normal circuit completion rates between 80-90% for most Guard
+ However, it also showed that CPU overload conditions can easily push
+ success rates as low as 45%. Even more concerning is that for a brief
+ period during the live scan, success rates dropped to 50-60%
+ network-wide (regardless of Guard node choice).
+ Based on these results, the notice condition should be 70%, the warn
+ condition should be 50%, and the drop condition should be 30%.
+Future Analysis: Deployed Clients
+ It's my belief that further analysis should be done by deploying
+ loglines for all three thresholds in clients in the live network
+ and utilize user reports on how often high rates of circuit failure
+ are seen before we deploy changes to rotate away from failing Guards.
+ I believe these log lines should be deployed in 0.2.3.x clients,
+ to maximize the exposure of the code to varying network conditions,
+ so that we have enough data to consider deploying the Guard-dropping
+ cutoff in 0.2.4.x.
+ Circuit failure can occur for many reasons. The most common of which is
+ CPU resource exhaustion at high speed relays.
+ While the scaling window does provide freshness, it also creates the
+ possibility of conditions where clients can be forced off their Guards
+ due to temporary network-wide or Guard-specific CPU DoS. This provides
+ another reason beyond false positive concerns to set the window as
+ large is reasonable.
+ In my mind, the value to aim for here is a number of circuits large
+ enough to put this attack vector on par with the timescales used by
+ the bandwidth authorities to redirect client activity away from
+ loaded (ie DoSed) nodes.
+ Simulation results show that a Guard node that previously
+ succeeded 80% of its circuits would need to be DoSed down to
+ 5% success rate for around 230 circuit attempts before the client
+ would reject it, using a scaling window of 300 circuits.
+ Assuming one circuit per Guard per 10 minutes of normal client
+ activity, this is a sustained DoS attack of around 38 hours,
+ which is on the order of the bandwidth authority measurement
+ interval (nodes are measured every 1-2 days).
+ The converse of this, though, is that larger values allow
+ Guard nodes to compromise clients for duty cycles of around the
+ size of this window (up to the (c/n)^2 * 100/CUTOFF_PERCENT
+ limit in aggregate), so we do have to consider that trade off.
+Implementation Notes: Log Messages
+ Log messages need to be chosen with care to avoid alarming users.
+ I suggest:
+ Notice: "It appears that your Guard %s is failing a larger number of
+ circuits than usual. This could mean that the Guard is overloaded, or
+ the Tor network is overloaded. Success counts are %d/%d."
+ Warn: "It appears that your Guard %s is failing a very large number of
+ circuits. Most likely, this could mean that the Guard is overloaded,
+ or the Tor network is overloaded, but it could also mean an attack
+ against you or the Guard. Success counts are %d/%d."
+ Drop: "It appears that your Guard %s is failing an extremely large
+ number of circuits. [Tor has disabled use of this Guard.] Success
+ counts are %d/%d."
+ The second piece of the Drop message would not be present in 0.2.3.x,
+ since the Guard won't actually be dropped.
+Implementation Notes: Consensus Parameters
+ The following consensus parameters reflect the constants listed
+ in the proposal. These parameters should also be available
+ for override in torrc.
+ The minimum number of first hops before we log or drop Guards.
+ The threshhold of circuit success below which we display a notice.
+ The threshhold of circuit success below which we display a warn.
+ The threshhold of circuit success below which we disable the guard.
+ The number of first hops at which we scale the counts down.
+ The integer divisor by which we scale.
More information about the tor-commits