[torspec/master] Make "path bias tuning" into proposal 209.

11 Oct 2012

commit 273e77c2afafcfa8efbafac5f24059497727ac74
Author: Nick Mathewson <nickm@torproject.org>
Date:   Thu Oct 11 10:34:52 2012 -0400

    Make "path bias tuning" into proposal 209.
---
 proposals/000-index.txt            |    4 +
 proposals/209-path-bias-tuning.txt |  201 ++++++++++++++++++++++++++++++++++++
 proposals/xxx-path-bias-tuning.txt |  200 -----------------------------------
 3 files changed, 205 insertions(+), 200 deletions(-)

diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index e2cd9cf..2863d79 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -129,6 +129,8 @@ Proposals by number:
 206  Preconfigured directory sources for bootstrapping [OPEN]
 207  Directory guards [OPEN]
 208  IPv6 Exits Redux [OPEN]
+209  Limit reported bandwidth of unmeasured nodes [OPEN]
+209  Tuning the Parameters for the Path Bias Defense [OPEN]
 
 
 Proposals by status:
@@ -174,6 +176,8 @@ Proposals by status:
    206  Preconfigured directory sources for bootstrapping [for 0.2.4.x]
    207  Directory guards [for 0.2.4.x]
    208  IPv6 Exits Redux [for 0.2.4.x]
+   209  Limit reported bandwidth of unmeasured nodes [for 0.2.4.x]
+   209  Tuning the Parameters for the Path Bias Defense [for 0.2.4.x+]
  ACCEPTED:
    117  IPv6 exits [for 0.2.4.x]
    140  Provide diffs between consensuses
diff --git a/proposals/209-path-bias-tuning.txt b/proposals/209-path-bias-tuning.txt
new file mode 100644
index 0000000..004729e
--- /dev/null
+++ b/proposals/209-path-bias-tuning.txt
@@ -0,0 +1,201 @@
+Filename: 209-path-bias-tuning.txt
+Title: Tuning the Parameters for the Path Bias Defense
+Author: Mike Perry
+Created: 01-10-2012
+Status: Open
+Target: 0.2.4.x+
+
+
+Overview
+
+ This proposal describes how we can use the results of simulations in
+ combination with network scans to set reasonable limits for the Path
+ Bias defense, which causes clients to be informed about and ideally
+ rotate away from Guards that provide extremely low circuit success
+ rates.
+
+Motivation
+
+ The Path Bias defense is designed to defend against a type of route capture
+ where malicious Guard nodes deliberately fail circuits that extend to
+ non-colluding Exit nodes to maximize their network utilization in favor of
+ carrying only compromised traffic.
+
+ This attack was explored in the academic literature in [1], and a
+ variant involving cryptographic tagging was posted to tor-dev[2] in
+ March.
+
+ In the extreme, the attack allows an adversary that carries c/n
+ of the network capacity to deanonymize c/n of the network
+ connections, breaking the O((c/n)^2) property of Tor's original
+ threat model.
+
+Design Description
+
+ The Path Bias defense is a client-side accounting mechanism in Tor that
+ tracks the circuit failure rate for each of the client's guards.
+
+ Clients maintain two integers for each of their guards: a count of the
+ number of times a circuit was extended at least one hop through that
+ guard, and a count of the number of circuits that successfully complete
+ through that guard. The ratio of these two numbers is used to determine
+ a circuit success rate for that Guard.
+
+ The system should issue a notice log message when Guard success rate
+ falls below 70%, a warn when Guard success rate falls below 50%, and
+ should drop the Guard when the success rate falls below 30%.
+
+ To ensure correctness, checks are performed to ensure that
+ we do not count successes without also counting the first hop.
+
+ Similarly, to provide a moving average of recent Guard activity while
+ still preserving the ability to ensure correctness, we "scale" the
+ success counts by an integer divisor (currently 2) when the counts
+ exceed the moving average window (300) and when the division
+ does not produce integer truncation.
+
+ No log messages should be displayed, nor should any Guard be
+ dropped until it has completed at least 150 first hops (inclusive).
+
+Analysis: Simulation
+
+ To test the defense in the face of various types of malicious and
+ non-malicious Guard behavior, I wrote a simulation program in
+ Python[3].
+
+ The simulation confirmed that without any defense, an adversary
+ that provides c/n of the network capacity is able to observe c/n
+ of the network flows using circuit failure attacks.
+
+ It also showed that with the defense, an adversary that wishes to
+ evade detection has compromise rates bounded by:
+
+   P(compromise) <= (c/n)^2 * (100/CUTOFF_PERCENT)
+   circs_per_client <= circuit_attempts*(c/n)
+
+ In this way, the defense restores the O((c/n)^2) compromise property,
+ but unfortunately only over long periods of time (see Security
+ Considerations below).
+
+ The spread between the cutoff values and the normal rate of circuit
+ success has a substantial effect on false positives. From the
+ simulation's results, the sweet spot for the size of this spread appears
+ to be 10%. In other words, we want to set the cutoffs such that they are
+ 10% below the success rate we expect to see in normal usage.
+
+ The simulation also demonstrates that larger "scaling window" sizes
+ reduce false positives for instances where non-malicious guards
+ experience some ambient rate of circuit failure.
+
+Analysis: Live Scan
+
+ Preliminary Guard node scanning using the txtorcon circuit scanner[4]
+ shows normal circuit completion rates between 80-90% for most Guard
+ nodes.
+ 
+ However, it also showed that CPU overload conditions can easily push
+ success rates as low as 45%. Even more concerning is that for a brief
+ period during the live scan, success rates dropped to 50-60%
+ network-wide (regardless of Guard node choice).
+
+ Based on these results, the notice condition should be 70%, the warn 
+ condition should be 50%, and the drop condition should be 30%.
+
+Future Analysis: Deployed Clients
+
+ It's my belief that further analysis should be done by deploying 
+ loglines for all three thresholds in clients in the live network
+ to utilize user reports on how often high rates of circuit failure
+ are seen before we deploy changes to rotate away from failing Guards.
+
+ I believe these log lines should be deployed in 0.2.3.x clients,
+ to maximize the exposure of the code to varying network conditions,
+ so that we have enough data to consider deploying the Guard-dropping
+ cutoff in 0.2.4.x.
+
+Security Considerations
+
+ While the scaling window does provide freshness and can help mitigate
+ "bait-and-switch" attacks, it also creates the possibility of conditions
+ where clients can be forced off their Guards due to temporary
+ network-wide CPU DoS. This provides another reason beyond false positive
+ concerns to set the scaling window as large as is reasonable.
+
+ A DoS directed at specific Guard nodes is unlikely to allow an
+ adversary to cause clients to rotate away from that Guard, because it
+ is unlikely that the DoS can be precise enough to allow first hops to
+ that Guard to succeed, but also cause extends to fail. This leaves
+ network-wide DoS as the primary vector for influencing clients.
+
+ Simulation results show that in order to cause clients to rotate away
+ from a Guard node that previously succeeded 80% of its circuits, an
+ adversary would need to induce a 25% success rate for around 350 circuit
+ attempts before the client would reject it or a 5% success rate
+ for around 215 attempts, both using a scaling window of 300 circuits.
+ 
+ Assuming one circuit per Guard per 10 minutes of active client
+ activity, this is a sustained network-wide DoS attack of 60 hours
+ for the 25% case, or 38 hours for the 5% case.
+
+ Presumably this is enough time for the directory authorities to respond by
+ altering the pb_disablepct consensus parameter before clients rotate,
+ especially given that most clients are not active for even 38 hours on end,
+ and will tend to stop building circuits while idle.
+
+ If we raised the scaling window to 500 circuits, it would require 1050
+ circuits if the DoS brought circuit success down to 25% (175 hours), and
+ 415 circuits if the DoS brought the circuit success down to 5% (69 hours).
+
+ The tradeoff, though, is that larger scaling window values allow Guard nodes
+ to compromise clients for duty cycles of around the size of this window (up to
+ the (c/n)^2 * 100/CUTOFF_PERCENT limit in aggregate), so we do have to find
+ balance between these concerns.
+
+Implementation Notes: Log Messages
+
+ Log messages need to be chosen with care to avoid alarming users.
+ I suggest:
+
+ Notice: "Your Guard %s is failing more circuits than usual. Most likely
+ this means the Tor network is overloaded. Success counts are %d/%d."
+
+ Warn: "Your Guard %s is failing a very large amount of circuits. Most likely
+ this means the Tor network is overloaded, but it could also mean an attack
+ against you or potentially the Guard itself. Success counts are %d/%d."
+
+ Drop: "Your Guard %s is failing an extremely large amount of circuits. [Tor
+ has disabled use of this Guard.] Success counts are %d/%d."
+
+ The second piece of the Drop message would not be present in 0.2.3.x,
+ since the Guard won't actually be dropped.
+
+Implementation Notes: Consensus Parameters
+
+ The following consensus parameters reflect the constants listed
+ in the proposal. These parameters should also be available 
+ for override in torrc.
+
+ pb_mincircs=150
+   The minimum number of first hops before we log or drop Guards.
+
+ pb_noticepct=70
+   The threshold of circuit success below which we display a notice.
+
+ pb_warnpct=50
+   The threshold of circuit success below which we display a warn.
+
+ pb_disablepct=30
+   The threshold of circuit success below which we disable the guard.
+
+ pb_scalecircs=300
+   The number of first hops at which we scale the counts down.
+
+ pb_scalefactor=2
+   The integer divisor by which we scale.
+
+
+
+1. http://freehaven.net/anonbib/cache/ccs07-doa.pdf
+2. https://lists.torproject.org/pipermail/tor-dev/2012-March/003347.html
+3. https://gitweb.torproject.org/torflow.git/tree/HEAD:/CircuitAnalysis/PathBia...
+4. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/failu...
diff --git a/proposals/xxx-path-bias-tuning.txt b/proposals/xxx-path-bias-tuning.txt
deleted file mode 100644
index 964aedb..0000000
--- a/proposals/xxx-path-bias-tuning.txt
+++ /dev/null
@@ -1,200 +0,0 @@
-Title: Tuning the Parameters for the Path Bias Defense
-Author: Mike Perry
-Created: 01-10-2012
-Status: Open
-Target: 0.2.4.x+
-
-
-Overview
-
- This proposal describes how we can use the results of simulations in
- combination with network scans to set reasonable limits for the Path
- Bias defense, which causes clients to be informed about and ideally
- rotate away from Guards that provide extremely low circuit success
- rates.
-
-Motivation
-
- The Path Bias defense is designed to defend against a type of route capture
- where malicious Guard nodes deliberately fail circuits that extend to
- non-colluding Exit nodes to maximize their network utilization in favor of
- carrying only compromised traffic.
-
- This attack was explored in the academic literature in [1], and a
- variant involving cryptographic tagging was posted to tor-dev[2] in
- March.
-
- In the extreme, the attack allows an adversary that carries c/n
- of the network capacity to deanonymize c/n of the network
- connections, breaking the O((c/n)^2) property of Tor's original
- threat model.
-
-Design Description
-
- The Path Bias defense is a client-side accounting mechanism in Tor that
- tracks the circuit failure rate for each of the client's guards.
-
- Clients maintain two integers for each of their guards: a count of the
- number of times a circuit was extended at least one hop through that
- guard, and a count of the number of circuits that successfully complete
- through that guard. The ratio of these two numbers is used to determine
- a circuit success rate for that Guard.
-
- The system should issue a notice log message when Guard success rate
- falls below 70%, a warn when Guard success rate falls below 50%, and
- should drop the Guard when the success rate falls below 30%.
-
- To ensure correctness, checks are performed to ensure that
- we do not count successes without also counting the first hop.
-
- Similarly, to provide a moving average of recent Guard activity while
- still preserving the ability to ensure correctness, we "scale" the
- success counts by an integer divisor (currently 2) when the counts
- exceed the moving average window (300) and when the division
- does not produce integer truncation.
-
- No log messages should be displayed, nor should any Guard be
- dropped until it has completed at least 150 first hops (inclusive).
-
-Analysis: Simulation
-
- To test the defense in the face of various types of malicious and
- non-malicious Guard behavior, I wrote a simulation program in
- Python[3].
-
- The simulation confirmed that without any defense, an adversary
- that provides c/n of the network capacity is able to observe c/n
- of the network flows using circuit failure attacks.
-
- It also showed that with the defense, an adversary that wishes to
- evade detection has compromise rates bounded by:
-
-   P(compromise) <= (c/n)^2 * (100/CUTOFF_PERCENT)
-   circs_per_client <= circuit_attempts*(c/n)
-
- In this way, the defense restores the O((c/n)^2) compromise property,
- but unfortunately only over long periods of time (see Security
- Considerations below).
-
- The spread between the cutoff values and the normal rate of circuit
- success has a substantial effect on false positives. From the
- simulation's results, the sweet spot for the size of this spread appears
- to be 10%. In other words, we want to set the cutoffs such that they are
- 10% below the success rate we expect to see in normal usage.
-
- The simulation also demonstrates that larger "scaling window" sizes
- reduce false positives for instances where non-malicious guards
- experience some ambient rate of circuit failure.
-
-Analysis: Live Scan
-
- Preliminary Guard node scanning using the txtorcon circuit scanner[4]
- shows normal circuit completion rates between 80-90% for most Guard
- nodes.
- 
- However, it also showed that CPU overload conditions can easily push
- success rates as low as 45%. Even more concerning is that for a brief
- period during the live scan, success rates dropped to 50-60%
- network-wide (regardless of Guard node choice).
-
- Based on these results, the notice condition should be 70%, the warn 
- condition should be 50%, and the drop condition should be 30%.
-
-Future Analysis: Deployed Clients
-
- It's my belief that further analysis should be done by deploying 
- loglines for all three thresholds in clients in the live network
- to utilize user reports on how often high rates of circuit failure
- are seen before we deploy changes to rotate away from failing Guards.
-
- I believe these log lines should be deployed in 0.2.3.x clients,
- to maximize the exposure of the code to varying network conditions,
- so that we have enough data to consider deploying the Guard-dropping
- cutoff in 0.2.4.x.
-
-Security Considerations
-
- While the scaling window does provide freshness and can help mitigate
- "bait-and-switch" attacks, it also creates the possibility of conditions
- where clients can be forced off their Guards due to temporary
- network-wide CPU DoS. This provides another reason beyond false positive
- concerns to set the scaling window as large as is reasonable.
-
- A DoS directed at specific Guard nodes is unlikely to allow an
- adversary to cause clients to rotate away from that Guard, because it
- is unlikely that the DoS can be precise enough to allow first hops to
- that Guard to succeed, but also cause extends to fail. This leaves
- network-wide DoS as the primary vector for influencing clients.
-
- Simulation results show that in order to cause clients to rotate away
- from a Guard node that previously succeeded 80% of its circuits, an
- adversary would need to induce a 25% success rate for around 350 circuit
- attempts before the client would reject it or a 5% success rate
- for around 215 attempts, both using a scaling window of 300 circuits.
- 
- Assuming one circuit per Guard per 10 minutes of active client
- activity, this is a sustained network-wide DoS attack of 60 hours
- for the 25% case, or 38 hours for the 5% case.
-
- Presumably this is enough time for the directory authorities to respond by
- altering the pb_disablepct consensus parameter before clients rotate,
- especially given that most clients are not active for even 38 hours on end,
- and will tend to stop building circuits while idle.
-
- If we raised the scaling window to 500 circuits, it would require 1050
- circuits if the DoS brought circuit success down to 25% (175 hours), and
- 415 circuits if the DoS brought the circuit success down to 5% (69 hours).
-
- The tradeoff, though, is that larger scaling window values allow Guard nodes
- to compromise clients for duty cycles of around the size of this window (up to
- the (c/n)^2 * 100/CUTOFF_PERCENT limit in aggregate), so we do have to find
- balance between these concerns.
-
-Implementation Notes: Log Messages
-
- Log messages need to be chosen with care to avoid alarming users.
- I suggest:
-
- Notice: "Your Guard %s is failing more circuits than usual. Most likely
- this means the Tor network is overloaded. Success counts are %d/%d."
-
- Warn: "Your Guard %s is failing a very large amount of circuits. Most likely
- this means the Tor network is overloaded, but it could also mean an attack
- against you or potentially the Guard itself. Success counts are %d/%d."
-
- Drop: "Your Guard %s is failing an extremely large amount of circuits. [Tor
- has disabled use of this Guard.] Success counts are %d/%d."
-
- The second piece of the Drop message would not be present in 0.2.3.x,
- since the Guard won't actually be dropped.
-
-Implementation Notes: Consensus Parameters
-
- The following consensus parameters reflect the constants listed
- in the proposal. These parameters should also be available 
- for override in torrc.
-
- pb_mincircs=150
-   The minimum number of first hops before we log or drop Guards.
-
- pb_noticepct=70
-   The threshold of circuit success below which we display a notice.
-
- pb_warnpct=50
-   The threshold of circuit success below which we display a warn.
-
- pb_disablepct=30
-   The threshold of circuit success below which we disable the guard.
-
- pb_scalecircs=300
-   The number of first hops at which we scale the counts down.
-
- pb_scalefactor=2
-   The integer divisor by which we scale.
-
-
-
-1. http://freehaven.net/anonbib/cache/ccs07-doa.pdf
-2. https://lists.torproject.org/pipermail/tor-dev/2012-March/003347.html
-3. https://gitweb.torproject.org/torflow.git/tree/HEAD:/CircuitAnalysis/PathBia...
-4. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/failu...

    

nickm＠torproject.org

tags

participants (1)