[or-cvs] r19162: {projects} start making a 2009 todo list out of the performance ideas. (projects/performance)

Fri Mar 27 17:03:48 UTC 2009

Author: arma
Date: 2009-03-27 13:03:48 -0400 (Fri, 27 Mar 2009)
New Revision: 19162

Added:
   projects/performance/perf-todo
Modified:
   projects/performance/performance.tex
Log:
start making a 2009 todo list out of the performance ideas. put names
on items, note which can be left out of 2009. needs to be finished,
and then re-weighted so names don't show up too often.


Added: projects/performance/perf-todo
===================================================================

--- projects/performance/perf-todo	                        (rev 0)
+++ projects/performance/perf-todo	2009-03-27 17:03:48 UTC (rev 19162)
@@ -0,0 +1,119 @@
+
+performance roadmap (elaboration of sec 3.1 from the 3-year roadmap)
+
+Coderman:
+  - 1.1, UDP-Tor.
+    - Explore how hard it would be to get a user-space TCP stack with
+      suitable properties.
+    - Help Ian and Chris deploy a testbed (prototype) network
+
+Roger, Steven:
+  - 1.2, new circuit window sizes
+    - Conclude whether the transition period will hurt as much as it
+      seems like it will.
+    - Pick a lower number, and switch to it.
+Metrics: gather queue sizes from relays so we have a better sense of
+what's actually going on.
+
+Roger, others:
+  - 2.1, squeeze loud circuits
+    - Evaluate the code to see what stats we can keep about circuit use.
+    - Write proposals for various meddling. Look at the research papers
+      that Juliusz pointed us to. Ask our systems friends.
+
+  - 2.4, rate-limit at clients
+    - Consider ways to choose what rate limits to use.
+    - Reconsider in the context of 2.1 proposals above
+
+  - 2.5, Default exit policies
+    - Change Vidalia's default exit policy to not click "other protocols".
+    D let exit relays specify some destination networks/ports that get
+      rate limited further.
+Metrics: At what fraction of exit relays allowing a given port out
+do connections to that port start to suffer? That is, if even 5%
+of the relays (by bandwidth) allowing a port to exit are enough for
+most connections to that port to work fine, then we're going to have
+a tough time pushing unwanted traffic off the network just by changing
+some exit policies. (Alas, this question is messy because it pretends
+that the amount of traffic generated for port x is independent of x.
+How to phrase it so it's more useful?)
+
+  - 2.6, tell users not to file-share
+    - Put statement on the Tor front page
+    - Put statement on the download pages too
+    - And the FAQ
+    D Should we put some sort of popup in Vidalia? How to detect?
+    - Contact Azureus people and get them to fix their docs, and/or
+      remove the feature, and/or pop up a warning.
+
+  - 3.1.2, Tor weather
+    - Link to it from the Tor relay page
+    - and the torrc.sample
+M   - Put a link in Vidalia's relay interface
+I   - Implement time-to-notification (immediate, a day, a week)
+N   - Build a plan for how Tor weather can learn about hibernating
+      relays if we take them out of the v3 consensus and we obsolete v2.
+
+Steven, with help from Jake
+  - 3.1.3, facebook app
+    - ? [Steven should fill this in]
+
+  - 3.6, incentives to relay
+    - Sort out how to do circuit priority in practice. I think the only
+      answer here is to make different TLS connections for different
+      priorities. (Otherwise other people can free-ride on your
+      high-priority conns.)
+Metrics: what period of time should the gold star status last? That is,
+What period of time, taken as a rolling snapshot of which relays are
+present in the network, guarantees a sufficiently large anonymity set
+for high-priority relays?
+    D design the actual crypto for certs, and build something.
+
+  - 3.7, clients automatically become relays
+    D what's the algorithm for knowing when you should upgrade from
+      being a client to a bridge, and from being a bridge to a relay?
+    D implement enough internal performance/stability tracking for
+      clients to be able to know when they've crossed a threshold.
+    (both of these only really doable once we've done more work on
+     looking at anonymity risks)
+
+  - 4.1, balance traffic better
+    - Steven and Mike should decide if we should do Steven's plan
+      (rejigger the bandwidth numbers at the authorities based on
+      Steven's algorithm), or Mike's plan (relay scanning to identify
+      the unbalanced relays and fix them on the fly), or both.
+    - Figure out how to actually modify bandwidths in the consensus. We
+      may need to change the consensus voting algorithm to decide what
+      bandwidth to advertise based on something other than median:
+      if 7 authorities provide bandwidths, and 2 are doing scanning,
+      then the 5 that aren't scanning will outvote any changes. Should
+      all 7 scan? Should only some vote? Extra points if it doesn't
+      change all the numbers every new consensus, so consensus diffing
+      is still practical.
+    - Make clients actually use the bandwidth numbers in the consensus.
+    - Should a relay with rate/burst of 100/100 have the same capacity
+      as a relay with rate/burst of 100/500?
+
+  - 4.2, getting better bandwidth estimates
+Metrics: how accurate are the ten-second-bandwidth-burst advertised
+numbers anyway, in terms of guessing capacity? Steven says we're at 50%
+load, but is that just because our advertised bandwidth is a function
+of our recent load?
+    - What is "true" capacity anyway?
+Metrics: What other algorithms can we use to produce a more accurate
+advertised bandwidth?
+    - Compare Mike's active probe data to the capacities. Where there
+      are differences, which one is less wrong?
+    - We should instrument Tor relays to compute peer bandwidths. But
+      to do that, we need to understand the anonymity implications of
+      publishing all this aggregated traffic data. Is it really safe
+      enough?
+
+  - 4.3, Micah's latency plan
+    - Hear some numbers from him about how good it can be and how
+      reliable it can be, in theory.
+    - Micah writes a proposal to make Tor relays compute their
+      coordinates, so we can do more direct measurements of whether it
+      should work.
+
+

Modified: projects/performance/performance.tex
===================================================================
--- projects/performance/performance.tex	2009-03-27 16:23:46 UTC (rev 19161)
+++ projects/performance/performance.tex	2009-03-27 17:03:48 UTC (rev 19162)
@@ -907,7 +907,7 @@
 %Whereas in practice Tor relays have finite length queues (which controls network load), and the distribution of input cells is not known.
 %Unfortunately, these assumptions are necessary to apply standard queueing theory results.
 
-To find the optimum relay selection probabilities the model, Steven
+To find the optimum relay selection probabilities for the model, Steven
 used a hill-climbing algorithm to minimize network latency, with a Tor
 directory snapshot as input.
 The results (shown in \prettyref{fig:optimum-selection} and