# [or-cvs] r18836: {projects} finish section 3. (projects/performance)

arma at seul.org arma at seul.org
Tue Mar 10 03:14:48 UTC 2009

Author: arma
Date: 2009-03-09 23:14:48 -0400 (Mon, 09 Mar 2009)
New Revision: 18836

Modified:
projects/performance/performance.tex
Log:
finish section 3.

Modified: projects/performance/performance.tex
===================================================================
--- projects/performance/performance.tex	2009-03-10 01:21:27 UTC (rev 18835)
+++ projects/performance/performance.tex	2009-03-10 03:14:48 UTC (rev 18836)
@@ -493,6 +493,17 @@
Roger and Jake have been working on this angle, and Jake will be ramping
up even more on it in 2009.

+Advocacy and education is particularly important in the context of
+new and quickly-changing policies. In particular, the data retention
+question in Germany is causing some instability in the overall set
+of volunteers running relays. Karsten's latest metrics\footnote{\url
+https://www.torproject.org/projects/metrics} show that while the number
+of relays in other countries held steady or went up during 2008, the
+numbers in Germany went down over the course of 2008. On the other hand,
+the total amount of bandwidth provided by German relays held steady
+during 2008 -- so while other operators picked up the slack, we still
+lose overall diversity of relays.
+
\subsubsection{Better support for relay operators}

Getting somebody to set up a relay is one thing; getting them to keep it
@@ -539,15 +550,15 @@

\subsection{Funding more relays directly}

-Another option is to directly pay for hosting fees for fast relays.
+Another option is to directly pay for hosting fees for fast relays (or
+to directly sponsor others to run them).

-The main challenge with this approach is that the efficiency is low:
-at even cheap hosting rates, the cost of a significant number of new
-relays grows quickly. For example, if we can find 100 non-exit relays
-providing 1MB/s for as low as \$100/mo, that's \$120k per year. Figure
-twice that if we count maintenance and coordination costs, the overhead
-to find 100 locations that are on sufficiently different networks and
+The main challenge with this approach is that the efficiency is low: at
+even cheap hosting rates, the cost of a significant number of new relays
+grows quickly. For example, if we can find 100 non-exit relays providing
+1MB/s for as low as \$100/mo, that's \$120k per year. Figure some more
+for maintenance and coordination, the overhead to find 100 locations
+that are on sufficiently different networks and administrative zones, etc.

The amount of work involved in running them as exit relays might be a
few times this cost, due to higher hosting fees, more effort involved
@@ -566,19 +577,20 @@
{\bf Risk}: Low.

{\bf Plan}: If we end up with extra funding, sure. Otherwise, I think
-our time and effort is better spent on the development items that will
+our time and effort are better spent on design and coding that will
have long-term impact rather than be recurring costs.

\subsection{Fast Tor relays on Windows}
+\label{sec:overlapped-io}

Advocating that users set up relays is all well and good, but if most
users are on Windows, and Tor doesn't support fast relays on Windows
well, then we're in a bad position.

Nick has been adapting libevent so it can handle a buffer-based
-abstraction rather than its traditional socket-based abstraction. Then
-we will modify Tor so it uses this new abstraction. Nick's blog
-post\footnote{\url
+abstraction rather than the traditional Unix-style socket-based
+abstraction. Then we will modify Tor to use this new abstraction. Nick's
+blog post\footnote{\url
https://blog.torproject.org/blog/some-notes-progress-iocp-and-libevent}
provides more detail.

@@ -588,55 +600,78 @@

{\bf Risk}: Low.

-{\bf Plan}: Keep at it. We're on schedule to get a test version (that
+{\bf Plan}: Keep at it. We're on schedule to get a test version (one that
works for Nick) out in September 2009. Then iterate until it works
for everybody.

\subsection{Node scanning to find overloaded nodes or broken exits}

-Part of the reason that Tor is slow is because a few of the relays are
+Part of the reason that Tor is slow is because some of the relays are
advertising more bandwidth than they can realistically handle. These
anomalies might be due to bad load balancing on the part of the Tor
designers, bad rate limiting or flaky network connectivity on the part
-of the relay operator, or malicious intent. Further, some exit relays
-might fail to give back the real' content, requiring users to try again
-and again.
+of the relay operator, or malicious intent. Similarly, some exit relays
+might fail to give back the real' content, requiring users to repeat
+their connection attempts.

-If we
+Mike has been working on tools to
+identify these relays: SpeedRacer\footnote{\url
+and SoaT\footnote{\url
+Once the tools are further refined, we should be able to figure out if
+there are general classes of problems (load balancing, common usability
+problems, etc) that mean we should modify our design to compensate. The
+end goal is to get our tools to the point where they can automatically
+tell the directory authorities to leave out certain misbehaving relays in
+for each relay.

-
-
{\bf Impact}: Low.

{\bf Effort}: Medium.

{\bf Risk}: Low.

-{\bf Plan}: Keep at it. We're on schedule to get a test version (that
-works for Nick) out in September 2009. Then iterate until it works
+{\bf Plan}: Keep at it. We're on schedule to get a test version
+(that works for Mike) out in mid 2009. Then iterate until it works
for everybody.

+\subsection{Getting dynamic-IP relays back into the relay list quickly}

+Currently there is a delay of 2-5 hours between when a relay changes its
+IP address and when that relay gets used again by clients. This delay
+causes two problems: relays on dynamic IP addresses will be underutilized
+(contributing less to the total network capacity than they could),
+and clients waste time connecting to relay IP addresses that are no
+longer listening.

-\subsection{getting dynamic ip relays back into the client list quickly}
+There are several approaches that can mitigate this problem by notifying
+The first approach is to continue on our path of simplifying directory
+information (see \prettyref{sec:directory-overhead}): if we can put out
+diffs'' of the network status more often than once an hour, clients
+can get updated quicker.
+A second approach is for each relay to estimate how volatile its IP address
+is, and advertise this in its descriptor.
+Clients then ignore relays with volatile IP addresses and old descriptor.
+Similarly, directory authorities could prioritise the distribution of
+updated IP addresses for freshly changed nodes.

-Use of nodes on dynamic IP addresses
+As a last note here, we currently have some bugs that are causing relays
+with dynamic IP addresses to fall out of the network entirely. If a
+third to half of the relays are running on dynamic IP addresses, that's

-Currently there is a significant delay between a node changing IP address
-and that node being used by clients.
-For this reason, nodes on dynamic IP addresses will be underutilized,
-and connections to their old IP address will fail.
-To mitigate these problems, clients could be notified sooner of IP
-One possibility is to for nodes to estimate how volatile their IP address
-is, and advertise this in their descriptor.
-Clients ignore nodes with volatile IP addresses and old descriptor.
-Similarly, directory authorities could prioritise the distribution of
-updated P addresses for freshly changed nodes.
+{\bf Impact}: Low-medium.

+{\bf Effort}: Low-medium.

+{\bf Risk}: Low.
+
+{\bf Plan}: Track down and fix bugs for Tor 0.2.2.x. Continue simplifying
+directory information so we can get new info to clients quicker.
+
\subsection{Incentives to relay}

Our blog post on this topic\footnote{\url
@@ -646,17 +681,22 @@
anonymity problem, and one that's quite complex.

I think we should move forward with the first (simple but flawed)
-design. There are two phases to moving it forward. The first phase
+design. There are several pieces to moving it forward. The first phase
is changing Tor's queueing mechanisms to be able to give some circuits
priority over others. This step also ties into the other development items
in this document regarding cell-, circuit-, and connection-priorities. The
second phase is then redesigning the gold star'' mechanism so the
-priority that relays earn lasts long enough that there's a sufficient
-anonymity set for them. We'll need to look at current network metrics
-to discover a good upper bound on relay churn.
+priority earned by relays lasts long enough that there's a sufficient
+anonymity set for them. We'll need to look at current and projected
+network metrics to discover a good upper bound on relay churn. The
+question to answer is: What period of time, taken as a rolling snapshot
+of which relays are present in the network, guarantees a sufficiently
+large anonymity set for high-priority relays?'' Hopefully the answer is
+something like 7 or 14 days. There are other missing pieces in there, like
+what do we mean by sufficiently?'', that we'll just have to guess about.
+The third phase is to actually sort out how to construct and distribute
+gold-star cryptographic certificates that entry relays can verify.

-
-
{\bf Impact}: Medium-high.

{\bf Effort}: Medium-high.
@@ -665,13 +705,35 @@
community-oriented infrastructure, we might end up hurting more than
we help.

-{\bf Plan}:
+{\bf Plan}: Accomplishing the three phases above will put us in a much
+better position to decide whether to deploy this idea. At the same time,
+the more complex options might become more within reach as other research
+teams investigate and refine them, so we should keep an eye on them too.

+\subsection{Reachable clients become relays automatically}

+Even if we don't add in an incentive scheme, simply making suitable
+users into relays by default should do a lot for our capacity problems.

-\subsection{reachable clients become relays automatically}
+reachability testing, bandwidth estimation, UPnP support built in to
+Vidalia, and so on.

+{\bf Impact}: High.

+{\bf Effort}: Medium, now that we've done a lot of hard work already.
+
+{\bf Risk}: Medium. Relaying traffic could introduce anonymity risks,
+relays by default could make some users upset.
+
+{\bf Plan}: Wrap up our investigations into the anonymity implications
+of being a relay, at the same time as working on a plan for exactly how
+the Tor client should decide if it's suitable for elevation to relay
+status. We need to finish deployment of \prettyref{sec:overlapped-io}
+before we can roll this out, or we'll just make a bunch of Windows
+machines crash.
+
\section{Choosing paths imperfectly}

\subsection{We don't balance the load over our bandwidth numbers correctly}
@@ -966,6 +1028,7 @@
\section{Network overhead too high for modem users}