[or-cvs] r16556: {tor} Update proposal after feedback from Nick. (tor/trunk/doc/spec/proposals)

mikeperry at seul.org mikeperry at seul.org
Fri Aug 15 04:13:13 UTC 2008


Author: mikeperry
Date: 2008-08-15 00:13:11 -0400 (Fri, 15 Aug 2008)
New Revision: 16556

Modified:
   tor/trunk/doc/spec/proposals/151-path-selection-improvements.txt
Log:

Update proposal after feedback from Nick.



Modified: tor/trunk/doc/spec/proposals/151-path-selection-improvements.txt
===================================================================
--- tor/trunk/doc/spec/proposals/151-path-selection-improvements.txt	2008-08-14 23:09:48 UTC (rev 16555)
+++ tor/trunk/doc/spec/proposals/151-path-selection-improvements.txt	2008-08-15 04:13:11 UTC (rev 16556)
@@ -34,6 +34,9 @@
 
     From our initial observations, this value appears to be on the order 
     of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
+    The exact value for this #define will be determined by performing
+    goodness of fit tests using measurments obtained from the shufflebt.py
+    script from TorFlow.
  
   Long Term Storage
 
@@ -41,18 +44,32 @@
     histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
     writing out the statistics to disk. The format of this histogram on disk 
     is yet to be finalized, but it will likely be of the format 
-    'CircuitBuildTime <bin> <count>'.
+    'CircuitBuildTime <bin> <count>', with the total specified as 
+    'TotalBuildTimes <total>'
     Example:
 
-    CircuitBuildTimeBin 1 100
-    CircuitBuildTimeBin 2 50
+    TotalBuildTimes 100
+    CircuitBuildTimeBin 1 50
+    CircuitBuildTimeBin 2 25
+    CircuitBuildTimeBin 3 13
     ...
 
     Reading the histogram in will entail multiplying each bin by the 
     BUILDTIME_BIN_WIDTH and then inserting <count> values into the 
     circuit_build_times array each with the value of
-    <bin>*BUILDTIME_BIN_WIDTH.
+    <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the 
+    values in the circular array, a form of index skipping must
+    be employed. Values from bin #N with bin count C and total T
+    will occupy indexes specified by N+((T/C)*k)-1, where k is the
+    set of integers ranging from 0 to C-1.
 
+    For example, this would mean that the values from bin 1 would
+    occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
+    The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
+    will be inserted at the first empty position in the array greater 
+    than the selected index (which may requiring looping around the 
+    array back to index 0).
+
   Learning the CircuitBuildTimeout
 
     Based on studies of build times, we found that the distribution of
@@ -71,6 +88,27 @@
     From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
     the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. 
 
+  Testing
+
+    After circuit build times, storage, and learning are implemented,
+    the resulting histogram should be checked for consistency by
+    verifying it persists across successive Tor invocations where 
+    no circuits are built. In addition, we can also use the existing
+    buildtime scripts to record build times, and verify that the histogram 
+    the python produces matches that which is output to the state file in Tor,
+    and verify that the Pareto parameters and cutoff points also match.
+  
+  Soft timeout vs Hard Timeout
+   
+    At some point, it may be desirable to change the cutoff from a 
+    single hard cutoff that destroys the circuit to a soft cutoff and
+    a hard cutoff, where the soft cutoff merely triggers the building
+    of a new circuit, and the hard cutoff triggers destruction of the 
+    circuit.
+
+    Good values for hard and soft cutoffs seem to be 85% and 65% 
+    respectively, but we should eventually justify this with observation.
+
   When to Begin Calculation
 
     The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before 
@@ -87,9 +125,6 @@
     Also, in the event of network failure, the observation mechanism 
     should stop collecting timeout data.
 
-    Circuits that timeout will be destroyed, as this indicates one
-    or more of their respective nodes are currently overloaded.
-
   Client Hints
 
     Some research still needs to be done to provide initial values



More information about the tor-commits mailing list