Hi everyone,<br><br>Here's a proposal for improving path selection through collecting build time statistics. We're still collecting data, so it's quite short. Comments are appreciated. <br><br>Fallon<br><br><br>
Filename: path-selection-improvements.txt<br>Title: Improving Tor Path Selection <br>Version:<br>Last-Modified: <br>Author: Fallon Chen, Mike Perry<br>Created: 5-Jul-2008<br>Status: Draft<br><br>Overview<br> The performance of paths selected can be improved by adjusting the<br>
CircuitBuildTimeout and the number of guards. This proposal describes <br> a method of tracking buildtime statistics, and using those statistics to adjust the<br> CircuitBuildTimeout and the number of guards. <br><br>
Motivation<br> Tor's performance can be improved by excluding those circuits that<br> have long buildtimes (and by extension, high latency). For those Tor users<br> who require better performance and have lower requirements for anonymity,<br>
this would be a very useful option to have. <br><br>Implementation<br><br> Learning the CircuitBuildTimeout<br> Based on studies of build times, we found that the distribution of<br> circuit buildtimes appears to be a Pareto distribution. The number of <br>
circuits to observe (ncircuits_to_observe) before changing the<br> CircuitBuildTimeout will be tunable. From our preliminary measurements,<br> it is likely that ncircuits_to_observe will be somewhere on the order of <br>
1000. The values can be represented compactly in Tor in milliseconds as a <br> circular array of 16 bit integers. More compact long-term storage<br> representations can be implemented by simply storing a histogram with<br>
50 millisecond buckets when writing out the statistics to disk.<br> <br> Calculating the preferred CircuitBuildTimeout<br> Circuits that have longer buildtimes than some x% of the estimated CDF of<br> the Pareto distribution will be excluded. x will be tunable as well. <br>
<br> Circuit timeouts<br> In the event of a timeout, backoff values should include the 100-x% of<br> expected CDF of timeouts.<br> Also, in the event of network failure, the observation mechanism should<br> stop collecting timeout data.<br>
<br> Other notes<br> Since this follows a Pareto distribution, large reductions on the timeout<br> can be achieved without cutting off a great number of the total paths.<br> However, hard statistics on which cutoff percentage gives optimal<br>
performance have not yet been gathered.<br><br>Issues<br> Impact on anonymity<br> <br><br><br><br>