[tor-commits] [torspec/master] Further improvements to 238-hs-relay-stats.txt.

nickm at torproject.org nickm at torproject.org
Tue Jan 6 17:54:02 UTC 2015


commit ae59ee1e346ef31208572c3accd3ed9ae513da81
Author: George Kadianakis <desnacked at riseup.net>
Date:   Sun Dec 14 16:42:47 2014 +0200

    Further improvements to 238-hs-relay-stats.txt.
    
    - Inverse the order of the obfuscation methods as discussed in the
      mailing list.
    
    - Change delta_f of the HSDir stats to be 8 instead of 1.
    
    - Add links to pfm's graphs.
---
 proposals/238-hs-relay-stats.txt |   94 ++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 34 deletions(-)

diff --git a/proposals/238-hs-relay-stats.txt b/proposals/238-hs-relay-stats.txt
index e7bf184..d46e35d 100644
--- a/proposals/238-hs-relay-stats.txt
+++ b/proposals/238-hs-relay-stats.txt
@@ -84,14 +84,15 @@ Status: Draft
         direction on a circuit after receiving and successfully
         processing a RENDEZVOUS1 cell.
 
-        The actual number is obfuscated as detailed in section
-        "2.4. Statistics obfuscation". The parameters of the
-        obfuscation are included in the key=val part of the line.
+        The actual number is obfuscated as detailed in
+        [STAT-OBFUSCATION]. The parameters of the obfuscation are
+        included in the key=val part of the line.
 
    The obfuscatory parameters for this statistic are:
      * delta_f = 2048
      * epsilon = 0.3
      * bin_size = 1024
+   (Also see [CELL-LAPLACE-GRAPH] for a graph of the Laplace distribution.)
 
    So, an example line could be:
      hidserv-rend-relayed-cells 19456 delta_f=2048 epsilon=0.30 binsize=1024
@@ -111,55 +112,45 @@ Status: Draft
         descriptors published to and accepted by this hidden-service
         directory.
 
-        The actual number number is obfuscated as detailed in section
-        "2.4. Statistics obfuscation". The parameters of the
-        obfuscation are included in the key=val part of the line.
+        The actual number number is obfuscated as detailed in
+        [STAT-OBFUSCATION]. The parameters of the obfuscation are
+        included in the key=val part of the line.
 
-   The obfuscatory parameters for these statistics are:
-     * delta_f = 1
+   The obfuscatory parameters for this statistic are:
+     * delta_f = 8
      * epsilon = 0.3
      * bin_size = 8
+   (Also see [ONIONS-LAPLACE-GRAPH] for a graph of the Laplace distribution.)
 
    So, an example line could be:
     hidserv-dir-onions-seen 112 delta_f=1 epsilon=0.30 binsize=8
 
-2.4. Statistics obfuscation
+2.4. Statistics obfuscation [STAT-OBFUSCATION]
 
   We believe that publishing the actual measurement values in such a
   system might have unpredictable effects, so we obfuscate these
   statistics before publishing:
 
-                   +--------------+    +--------------------+
-   actual value -> |additive noise| -> |round-up obfuscation| -> public statistic
-                   +--------------+    +--------------------+
+                   +-----------+    +--------------+
+   actual value -> |  binning  | -> |additive noise| -> public statistic
+                   +-----------+    +--------------+
 
   We are using two obfuscation methods to better hide the actual
   numbers even if they remain the same over multiple measurement
   periods.
 
-  Specifically, given the actual measurement value, we first deploy
-  additive noise in a fashion similar to basic differential
-  privacy. Then, we round up this obfuscated result to the nearest
-  multiple of an integer (which is a security parameter), to derive a
-  final result which can be published safely.
+  Specifically, given the actual measurement value, we first apply
+  data binning to it (basically we round it up to the nearest multiple
+  of an integer, see [DATA-BINNING]). And then we apply additive noise
+  to the binned value in a fashion similar to differential privacy.
 
   More information about the obfuscation methods follows:
 
-2.4.1. Additive noise
-
-  We apply additive noise to the actual measurement by adding to it a
-  random value sampled from a Laplace distribution . Following the
-  differential privacy methodology [DIFF-PRIVACY], our obfuscatory
-  Laplace distribution has \mu = 0 and b = (delta_f / epsilon).
-
-  The precise values of delta_f and epsilon are different for each
-  statistic and are defined on the respective statistics sections.
+2.4.1. Data binning
 
-2.4.2. Round-up obfuscation
-
-  To further hide any patterns, before publishing statistics, we round
-  up the result to the nearest multiple of 'bin_size'. 'bin_size' is
-  an integer security parameter and can be found on the respective
+  The first thing we do to the original measurement value, is to round
+  it up to the nearest multiple of 'bin_size'.  'bin_size' is an
+  integer security parameter and can be found on the respective
   statistics sections.
 
   This is similar to how Tor keeps bridge user statistics. As an
@@ -168,6 +159,17 @@ Status: Draft
   values, so for example, if the measurement value is -9 and bin_size
   is 8, the value will be rounded up to -8.
 
+2.4.2. Additive noise
+
+  Then, before publishing the statistics, we apply additive noise to
+  the binned value by adding to it a random value sampled from a
+  Laplace distribution . Following the differential privacy
+  methodology [DIFF-PRIVACY], our obfuscatory Laplace distribution has
+  mu = 0 and b = (delta_f / epsilon).
+
+  The precise values of delta_f and epsilon are different for each
+  statistic and are defined on the respective statistics sections.
+
 
 3. Security
 
@@ -196,7 +198,7 @@ Status: Draft
 
 4. Discussion
 
-4.1. Why count only RP cells? Why not also count IP cells?
+4.1. Why count only RP cells? Why not count IP cells too?
 
    There are three phases in the rendezvous protocol where traffic is
    generated: (1) when hidden services make themselves available in
@@ -211,7 +213,7 @@ Status: Draft
 
 4.2. How to use these stats?
 
- 4.2.1. How to use RP Cell statistics
+ 4.2.1. How to use rendezvous cell statistics
 
    We plan to extrapolate reported values to network totals by dividing
    values by the probability of clients picking relays as rendezvous
@@ -259,9 +261,33 @@ Status: Draft
    consider the part of the statistics interval following the valid-after
    time of that consensus.
 
+4.3. Why does the obfuscation work?
+
+   By applying data binning, we smudge the original value making it
+   harder for attackers to guess it. Specifically, an attacker who
+   knows the bin, can only guess the underlying value with probability
+   1/bin_size.
+
+   By applying additive noise, we make it harder for the adversary to
+   find out the current bin, which makes it even harder to get the
+   original value. If additive noise was not applied, an adversary
+   could try to detect changes in the original value by checking when
+   we switch bins.
+
+5. Acknowledgements
 
-5. References
+   Thanks go to 'pfm' for the helpful Laplace graphs.
+
+6. References
 
 [GUARD-DISCOVERY]: https://lists.torproject.org/pipermail/tor-dev/2014-September/007474.html
 
 [DIFF-PRIVACY]: http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf
+
+[DATA-BINNING]: https://en.wikipedia.org/wiki/Data_binning
+
+[CELL-LAPLACE-GRAPH]: https://raw.githubusercontent.com/corcra/pioton/master/vis/laplacePDF_mu0_b6800.png
+                      https://raw.githubusercontent.com/corcra/pioton/master/vis/laplaceCDF_mu0_b6800.png
+
+[ONIONS-LAPLACE-GRAPH]: https://raw.githubusercontent.com/corcra/pioton/master/vis/laplacePDF_mu0_b3.png
+                        https://raw.githubusercontent.com/corcra/pioton/master/vis/laplaceCDF_mu0_b3.png





More information about the tor-commits mailing list