[tor-commits] [torspec/master] Write a proposal for the mesh vanguards design.

Sun May 27 13:59:29 UTC 2018

commit d4aaf28eb3a3ae9f60996d97ed6bf21153cc7601
Author: Mike Perry <mikeperry-git at torproject.org>
Date:   Tue May 8 21:14:43 2018 +0000

    Write a proposal for the mesh vanguards design.
---
 proposals/xxx-mesh-vanguards.txt | 528 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 528 insertions(+)

diff --git a/proposals/xxx-mesh-vanguards.txt b/proposals/xxx-mesh-vanguards.txt
new file mode 100644
index 0000000..86a74ad
--- /dev/null
+++ b/proposals/xxx-mesh-vanguards.txt
@@ -0,0 +1,528 @@
+Title: Mesh-based vanguards
+Authors: George Kadianakis and Mike Perry
+Created: 2018-05-08
+Status: Draft
+
+0. Motivation
+
+  A guard discovery attack allows attackers to determine the guard
+  node of a Tor client. The hidden service rendezvous protocol
+  provides an attack vector for a guard discovery attack since anyone
+  can force an HS to construct a 3-hop circuit to a relay (#9001).
+
+  Following the guard discovery attack with a compromise and/or
+  coercion of the guard node can lead to the deanonymization of a
+  hidden service.
+
+1. Overview
+
+  This document tries to make the above guard discovery + compromise
+  attack harder to launch. It introduces a configuration
+  option which makes the hidden service also pin the second and third
+  hops of its circuits for a longer duration.
+
+  With this new path selection, we force the adversary to perform a
+  Sybil attack and two compromise attacks before succeeding. This is
+  an improvement over the current state where the Sybil attack is
+  trivial to pull off, and only a single compromise attack is required.
+
+  With this new path selection, an attacker is forced to do a one or
+  more node compromise attacks before learning the guard node of a hidden
+  service. This increases the uncertainty of the attacker, since
+  compromise attacks are costly and potentially detectable, so an
+  attacker will have to think twice before beginning a chain of node
+  compromise attacks that they might not be able to complete.
+
+1.1. Tor integration
+
+  The mechanisms introduced in this proposal are currently implemented
+  partially in Tor and partially through an external Python script:
+            https://github.com/mikeperry-tor/vanguards
+
+  The Python script uses the new Tor configuration options HSLayer2Nodes and
+  HSLayer3Nodes to be able to select nodes for the guard layers. The Python
+  script is tasked with maintaining and rotating the guard nodes as needed
+  based on the lifetimes described in this proposal.
+
+  In the future, we are aiming to include the whole functionality into Tor,
+  with no need for external scripts.
+
+1.2. Visuals
+
+  Here is how a hidden service rendezvous circuit currently looks like:
+
+                     -> middle_1 -> middle_A
+                     -> middle_2 -> middle_B
+                     -> middle_3 -> middle_C
+                     -> middle_4 -> middle_D
+       HS -> guard   -> middle_5 -> middle_E
+                     -> middle_6 -> middle_F
+                     -> middle_7 -> middle_G
+                     -> middle_8 -> middle_H
+                     ->   ...    ->  ...
+                     -> middle_n -> middle_n
+
+  this proposal pins the two middle positions into a much more
+  restricted sets, as follows:
+
+                       -> guard_2A
+                                   -> guard_3A
+          -> guard_1A  -> guard_2B -> guard_3B
+       HS                          -> guard_3C
+          -> guard_1B  -> guard_2C -> guard_3D
+                                   -> guard_3E
+                       -> guard_2D -> guard_3F
+
+  Additionally, to avoid linkability, we insert an extra middle node
+  after the third layer guard for client side intro and hsdir circuits,
+  and service-side rendezvous circuits. This means that the set of
+  paths for Client (C) and Service (S) side look like this:
+
+     C - G - L2 - L3 - R
+     S - G - L2 - L3 - HSDIR
+     S - G - L2 - L3 - I
+     C - G - L2 - L3 - M - I
+     C - G - L2 - L3 - M - HSDIR
+     S - G - L2 - L3 - M - R
+
+1.3. Threat model, Assumptions, and Goals
+
+  Consider an adversary with the following powers:
+
+     - Can launch a Sybil guard discovery attack against any node of a
+       rendezvous circuit. The slower the rotation period of the node,
+       the longer the attack takes. Similarly, the higher the percentage
+       of the network is compromised, the faster the attack runs.
+
+     - Can compromise any node on the network, but this compromise takes
+       time and potentially even coercive action, and also carries risk
+       of discovery.
+
+  We also make the following assumptions about the types of attacks:
+
+  1. A Sybil attack is observable by both people monitoring the network
+     for large numbers of new nodes, as well as vigilant hidden service
+     operators. It will require either large amounts of traffic sent
+     towards the hidden service, multiple test circuits, or both.
+
+  2. A Sybil attack against the second or first layer Guards will be
+     more noisy than a Sybil attack against the third layer guard, since the
+     second and first layer Sybil attack requires a timing side channel in
+     order to determine success, whereas the Sybil success is almost
+     immediately obvious to third layer guard, since it will be instructed
+     to connect to a cooperating malicious rend point by the adversary.
+
+  3. As soon as the adversary is confident they have won the Sybil attack,
+     an even more aggressive circuit building attack will allow them to
+     determine the next node very fast (an hour or less).
+
+  4. The adversary is strongly disincentivized from compromising nodes that
+     may prove useless, as node compromise is even more risky for the
+     adversary than a Sybil attack in terms of being noticed.
+
+  Given this threat model, our security parameters were selected so that
+  the first two layers of guards should be hard to attack using a Sybil
+  guard discovery attack and hence require a node compromise attack. Ideally,
+  we want the node compromise attacks to carry a non-negligible probability of
+  being useless to the adversary by the time they complete.
+
+  On the other hand, the outermost layer of guards should rotate fast enough to
+  _require_ a Sybil attack.
+
+  See our vanguard simulator project for a simulation of the above adversary
+  model and a motivation for the parameters selected within this proposal:
+        https://github.com/asn-d6/vanguard_simulator
+        https://github.com/asn-d6/vanguard_simulator/wiki/Optimizing-vanguard-topologies
+
+
+2. Design
+
+  When a hidden service picks its guard nodes, it also picks an
+  additional NUM_LAYER2_GUARDS-sized set of middle nodes for its
+  `second_guard_set`, as well as a NUM_LAYER3_GUARDS-sized set of
+  middle nodes for its `third_guard_set`.
+
+  When a hidden service needs to establish a circuit to an HSDir,
+  introduction point or a rendezvous point, it uses nodes from
+  `second_guard_set` as the second hop of the circuit and nodes from
+  `third_guard_set` as third hop of the circuit.
+
+  A hidden service rotates nodes from the 'second_guard_set' at a random
+  time between MIN_SECOND_GUARD_LIFETIME hours and
+  MAX_SECOND_GUARD_LIFETIME hours.
+
+  A hidden service rotates nodes from the 'third_guard_set' at a random
+  time between MIN_THIRD_GUARD_LIFETIME and MAX_THIRD_GUARD_LIFETIME
+  hours.
+
+  Each node's rotation time is tracked independently, to avoid disclosing
+  the rotation times of the primary and second-level guards.
+
+2.1. Security parameters
+
+  We set NUM_LAYER2_GUARDS to 4 nodes and NUM_LAYER3_GUARDS to 6 nodes.
+
+  We set MIN_SECOND_GUARD_LIFETIME to 1 day, and MAX_SECOND_GUARD_LIFETIME
+  to 45 days inclusive, for an average rotation rate of 29.5 days, using
+  the max(X,X) distribution specified in Section 3.3.
+
+  We set MIN_THIRD_GUARD_LIFETIME to 1 hour, and MAX_THIRD_GUARD_LIFETIME
+  to 48 hours inclusive, for an average rotation rate of 31.5 hours, using
+  the max(X,X) distribution specified in Section 3.3.
+
+  See Section 3 for more analysis on these constants.
+
+2.2. Path restriction changes
+
+  In order to avoid information leaks and ensure paths can be built, path
+  restrictions must be loosened.
+
+  In particular, we allow the following:
+     1. Nodes from the same /16 and same family for any/all hops
+     2. Guard nodes can be chosen for RP/IP/HSDIR
+     3. Guard nodes can be chosen for hop before RP/IP/HSDIR.
+
+  The first change prevents the situation where paths cannot be built if two
+  layers all share the same subnet and/or node family. It also prevents the
+  the use of a different entry guard based on the family or subnet of the
+  IP, HSDIR, or RP.
+
+  The second change prevents an adversary from forcing the use of a different
+  entry guard by enumerating all guard-flaged nodes as the RP.
+
+  The third change prevents an adversary from learning the guard node by way
+  of noticing which nodes were not chosen for the hop before it.
+
+
+3. Rationale and Security Parameter Selection
+
+3.1. Sybil rotation counts for a given number of Guards
+
+  The probability of Sybil success for Guard discovery can be modeled as
+  the probability of choosing 1 or more malicious middle nodes for a
+  sensitive circuit over some period of time.
+
+  P(At least 1 bad middle) = 1 - P(All Good Middles)
+                           = 1 - P(One Good middle)^(num_middles)
+                           = 1 - (1 - c/n)^(num_middles)
+
+  c/n is the adversary compromise percentage
+
+  In the case of Vanguards, num_middles is the number of Guards you rotate
+  through in a given time period. This is a function of the number of vanguards
+  in that position (v), as well as the number of rotations (r).
+
+  P(At least one bad middle) = 1 - (1 - c/n)^(v*r)
+
+  Here's detailed tables in terms of the number of rotations required for
+  a given Sybil success rate for certain number of guards.
+
+  1.0% Network Compromise:
+   Sybil Success   One   Two  Three  Four  Five  Six  Eight  Nine  Ten  Twelve  Sixteen
+    10%            11     6     4     3     3     2     2     2     2     1       1
+    15%            17     9     6     5     4     3     3     2     2     2       2
+    25%            29    15    10     8     6     5     4     4     3     3       2
+    50%            69    35    23    18    14    12     9     8     7     6       5
+    60%            92    46    31    23    19    16    12    11    10     8       6
+    75%           138    69    46    35    28    23    18    16    14    12       9
+    85%           189    95    63    48    38    32    24    21    19    16      12
+    90%           230   115    77    58    46    39    29    26    23    20      15
+    95%           299   150   100    75    60    50    38    34    30    25      19
+    99%           459   230   153   115    92    77    58    51    46    39      29
+
+  5.0% Network Compromise:
+   Sybil Success   One   Two  Three  Four  Five  Six  Eight  Nine  Ten  Twelve  Sixteen
+    10%             3     2     1     1     1     1     1     1     1     1       1
+    15%             4     2     2     1     1     1     1     1     1     1       1
+    25%             6     3     2     2     2     1     1     1     1     1       1
+    50%            14     7     5     4     3     3     2     2     2     2       1
+    60%            18     9     6     5     4     3     3     2     2     2       2
+    75%            28    14    10     7     6     5     4     4     3     3       2
+    85%            37    19    13    10     8     7     5     5     4     4       3
+    90%            45    23    15    12     9     8     6     5     5     4       3
+    95%            59    30    20    15    12    10     8     7     6     5       4
+    99%            90    45    30    23    18    15    12    10     9     8       6
+
+  10.0% Network Compromise:
+   Sybil Success   One   Two  Three  Four  Five  Six  Eight  Nine  Ten  Twelve  Sixteen
+    10%             2     1     1     1     1     1     1     1     1     1       1
+    15%             2     1     1     1     1     1     1     1     1     1       1
+    25%             3     2     1     1     1     1     1     1     1     1       1
+    50%             7     4     3     2     2     2     1     1     1     1       1
+    60%             9     5     3     3     2     2     2     1     1     1       1
+    75%            14     7     5     4     3     3     2     2     2     2       1
+    85%            19    10     7     5     4     4     3     3     2     2       2
+    90%            22    11     8     6     5     4     3     3     3     2       2
+    95%            29    15    10     8     6     5     4     4     3     3       2
+    99%            44    22    15    11     9     8     6     5     5     4       3
+
+  The rotation counts in these tables were generated with:
+     def num_rotations(c, v, success):
+       r = 0
+       while 1-math.pow((1-c), v*r) < success: r += 1
+       return r
+
+3.2. Rotation Period
+
+  As specified in Section 1.2, the primary driving force for the third
+  layer selection was to ensure that these nodes rotate fast enough that
+  it is not worth trying to compromise them, because it is unlikely for
+  compromise to succeed and yield useful information before the nodes stop
+  being used.
+
+  From the table in Section 3.1, with NUM_LAYER2_GUARDS=4 and
+  NUM_LAYER3_GUARDS=6, it can be seen that this means that the Sybil attack
+  on layer3 will complete with 50% chance in 12*31.5 hours (15.75 days)
+  for the 1% adversary, ~4 days for the 5% adversary, and 2.62 days for the
+  10% adversary.
+
+  Since rotation of each node happens independently, the distribution of
+  when the adversary expects to win this Sybil attack in order to discover
+  the next node up is uniform. This means that on average, the adversary
+  should expect that half of the rotation period of the next node is already
+  over by the time that they win the Sybil.
+
+  With this fact, we choose our range and distribution for the second
+  layer rotation to be short enough to cause the adversary to risk
+  compromising nodes that are useless, yet long enough to require a
+  Sybil attack to be noticeable in terms of client activity. For this
+  reason, we choose a minimum second-layer guard lifetime of 1 day,
+  since this gives the adversary a minimum expected value of 12 hours for
+  during which they can compromise a guard before it might be rotated.
+  If the total expected rotation rate is 29.5 days, then the adversary can
+  expect overall to have 14.75 days remaining after completing their Sybil
+  attack before a second-layer guard rotates away.
+
+3.3. Rotation distributions
+
+  In order to skew the distribution of the third layer guard towards
+  higher values, we use max(X,X) for the distribution, where X is a
+  random variable that takes on values from the uniform distribution.
+
+  Here's a table of expectation (arithmetic means) for relevant
+  ranges of X (sampled from 0..N-1). The table was generated with the
+  following python functions:
+
+  def ProbMinXX(N, i): return (2.0*(N-i)-1)/(N*N)
+  def ProbMaxXX(N, i): return (2.0*i+1)/(N*N)
+
+  def ExpFn(N, ProbFunc):
+    exp = 0.0
+    for i in xrange(N): exp += i*ProbFunc(N, i)
+    return exp
+
+  The current choice for second-layer guards is noted with **, and
+  the current choice for third-layer guards is noted with ***.
+
+   Range  Min(X,X)   Max(X,X)
+   40      12.84       26.16
+   41      13.17       26.83
+   42      13.50       27.50
+   43      13.84       28.16
+   44      14.17       28.83
+   45      14.50       29.50**
+   46      14.84       30.16
+   47      15.17       30.83
+   48      15.50       31.50***
+
+  The Cumulative Density Function (CDF) tells us the probability that a
+  guard will no longer be in use after a given number of time units have
+  passed.
+
+  Because the Sybil attack on the third node is expected to complete at any
+  point in the second node's rotation period with uniform probability, if we
+  want to know the probability that a second-level Guard node will still be in
+  use after t days, we first need to compute the probability distribution of
+  the rotation duration of the second-level guard at a uniformly random point
+  in time. Let's call this P(R=r).
+
+  For P(R=r), the probability of the rotation duration depends on the selection
+  probability of a rotation duration, and the fraction of total time that
+  rotation is likely to be in use. This can be written as:
+
+  P(R=r) = ProbMaxXX(X=r)*r / \sum_{i=1}^N ProbMaxXX(X=i)*i
+
+  or in Python:
+
+  def ProbR(N, r, ProbFunc=ProbMaxXX):
+     return ProbFunc(N, r)*r/ExpFn(N, ProbFunc)
+
+  For the full CDF, we simply sum up the fractional probability density for
+  all rotation durations. For rotation durations less than t days, we add the
+  entire probability mass for that period to the density function. For
+  durations d greater than t days, we take the fraction of that rotation
+  period's selection probability and multiply it by t/d and add it to the
+  density. In other words:
+
+  def FullCDF(N, t, ProbFunc=ProbR):
+    density = 0.0
+    for d in xrange(N):
+      if t >= d: density += ProbFunc(N, d)
+      # The +1's below compensate for 0-indexed arrays:
+      else: density += ProbFunc(N, d)*(float(t+1))/(d+1)
+    return density
+
+  Computing this yields the following distribution for our current parameters:
+
+   t          P(SECOND_ROTATION <= t)
+   1               0.03247
+   2               0.06494
+   3               0.09738
+   4               0.12977
+   5               0.16207
+  10               0.32111
+  15               0.47298
+  20               0.61353
+  25               0.73856
+  30               0.84391
+  35               0.92539
+  40               0.97882
+  45               1.00000
+
+  This CDF tells us that for the second-level Guard rotation, the
+  adversary can expect that 3.3% of the time, their third-level Sybil
+  attack will provide them with a second-level guard node that has only
+  1 day remaining before it rotates. 6.5% of the time, there will
+  be only 2 day or less remaining, and 9.7% of the time, 3 days or less.
+
+  Note that this distribution is still a day-resolution approximation.
+
+
+4. Security concerns and mitigations
+
+4.1. Mitigating fingerprinting of new HS circuits
+
+  By pinning the middle nodes of rendezvous circuits, we make it
+  easier for all hops of the circuit to detect that they are part of a
+  special hidden service circuit with varying degrees of certainty.
+
+  The Guard node is able to recognize a Vanguard client with a high
+  degree of certainty because it will observe a client IP creating the
+  overwhelming majority of its circuits to just a few middle nodes in
+  any given 31.5 day time period.
+
+  The middle nodes will be able to tell with a variable certainty that
+  depends on both its traffic volume and upon the popularity of the
+  service, because they will see a large number of circuits that tend to
+  pick the same Guard and Exit.
+
+  The final nodes will be able to tell with a similar level of certainty
+  that depends on their capacity and the service popularity, because they
+  will see a lot of handshakes that all tend to have the same second
+  hops.
+
+  The most serious of these is the Guard fingerprinting issue. When
+  proposal 254-padding-negotiation is implemented, services that enable
+  this feature should use those padding primitives to create fake circuits
+  to random middle nodes that are not their guards, in an attempt to look
+  more like a client.
+
+  Additionally, if Tor Browser implements "virtual circuits" based on
+  SOCKS username+password isolation in order to enforce the re-use of
+  paths when SOCKS username+passwords are re-used, then the number of
+  middle nodes in use during a typical user's browsing session will be
+  proportional to the number of sites they are viewing at any one time.
+  This is likely to be much lower than one new middle node every ten
+  minutes, and for some users, may be close to the number of Vanguards
+  we're considering.
+
+  This same reasoning is also an argument for increasing the number of
+  second-level guards beyond just two, as it will spread the hidden
+  service's traffic over a wider set of middle nodes, making it both
+  easier to cover, and behave closer to a client using SOCKS virtual
+  circuit isolation.
+
+5. Default vs optional behavior
+
+  We suggest this torrc option to be optional because it changes path
+  selection in a way that may seriously impact hidden service performance,
+  especially for high traffic services that happen to pick slow guard
+  nodes.
+
+  However, by having this setting be disabled by default, we make hidden
+  services who use it stand out a lot. For this reason, we should in fact
+  enable this feature globally, but only after we verify its viability for
+  high-traffic hidden services, and ensure that it is free of second-order
+  load balancing effects.
+
+  Even after that point, until Single Onion Services are implemented,
+  there will likely still be classes of very high traffic hidden services
+  for whom some degree of location anonymity is desired, but for which
+  performance is much more important than the benefit of Vanguards, so there
+  should always remain a way to turn this option off.
+
+  In the meantime, a reference implementation is available at:
+  https://github.com/mikeperry-tor/vanguards/blob/master/vanguards/vanguards.py
+
+
+Appendix A: Full Python program for generating tables in this proposal
+
+#!/usr/bin/python
+import math
+
+############ Section 3.1 #################
+def num_rotations(c, v, success):
+  i = 0
+  while 1-math.pow((1-c), v*i) < success: i += 1
+  return i
+
+def rotation_line(c, pct):
+  print "    %2d%%        %6d%6d%6d%6d%6d%6d%6d%6d%6d%6d%8d" % \
+     (pct, num_rotations(c, 1, pct/100.0), num_rotations(c, 2, pct/100.0), \
+      num_rotations(c, 3, pct/100.0), num_rotations(c, 4, pct/100.0),
+      num_rotations(c, 5, pct/100.0), num_rotations(c, 6, pct/100.0),
+      num_rotations(c, 8, pct/100.0), num_rotations(c, 9, pct/100.0),
+      num_rotations(c, 10, pct/100.0), num_rotations(c, 12, pct/100.0),
+      num_rotations(c, 16, pct/100.0))
+
+def rotation_table_31():
+  for c in [1,5,10]:
+    print "\n  %2.1f%% Network Compromise: " % c
+    print "   Sybil Success   One   Two  Three  Four  Five  Six  Eight  Nine  Ten  Twelve  Sixteen"
+    for success in [10,15,25,50,60,75,85,90,95,99]:
+      rotation_line(c/100.0, success)
+
+############ Section 3.3 #################
+def ProbMinXX(N, i): return (2.0*(N-i)-1)/(N*N)
+def ProbMaxXX(N, i): return (2.0*i+1)/(N*N)
+
+def ExpFn(N, ProbFunc):
+  exp = 0.0
+  for i in xrange(N): exp += i*ProbFunc(N, i)
+  return exp
+
+def ProbUniformX(N, i): return 1.0/N
+
+def ProbR(N, r, ProbFunc=ProbMaxXX):
+  return ProbFunc(N, r)*r/ExpFn(N, ProbFunc)
+
+def FullCDF(N, t, ProbFunc=ProbR):
+  density = 0.0
+  for d in xrange(N):
+    if t >= d: density += ProbFunc(N, d)
+    # The +1's below compensate for 0-indexed arrays:
+    else: density += ProbFunc(N, d)*float(t+1)/(d+1)
+  return density
+
+def expectation_table_33():
+  print "\n   Range  Min(X,X)   Max(X,X)"
+  for i in xrange(10,49):
+    print "   %2d      %2.2f       %2.2f" % (i, ExpFn(i,ProbMinXX), ExpFn(i, ProbMaxXX))
+
+def CDF_table_33():
+  print "\n   t          P(SECOND_ROTATION <= t)"
+  for i in xrange(1,46):
+    print "  %2d               %2.5f" % (i, FullCDF(45, i-1))
+
+########### Output ############
+
+# Section 3.1
+rotation_table_31()
+
+# Section 3.3
+expectation_table_33()
+CDF_table_33()
+
+----------------------
+
+1. https://onionbalance.readthedocs.org/en/latest/design.html#overview