# [or-cvs] more work

Roger Dingledine arma at seul.org
Tue Oct 28 21:55:41 UTC 2003

Update of /home/or/cvsroot/doc
In directory moria.mit.edu:/tmp/cvs-serv26399

Modified Files:
tor-design.bib tor-design.tex tor-spec.txt
Log Message:
more work

Index: tor-design.bib
===================================================================
RCS file: /home/or/cvsroot/doc/tor-design.bib,v
retrieving revision 1.10
retrieving revision 1.11
diff -u -d -r1.10 -r1.11
--- tor-design.bib	27 Oct 2003 12:05:35 -0000	1.10
+++ tor-design.bib	28 Oct 2003 21:55:38 -0000	1.11
@@ -24,6 +24,7 @@
$<$http://www.pmg.lcs.mit.edu/$\tilde{\hspace{5pt}}$castro/application/recovery.pdf$>$},
}

+% wasn't jean camp an editor for FC03 too?
@inproceedings{econymics,
title = {On the Economics of Anonymity},
author = {Alessandro Acquisti and Roger Dingledine and Paul Syverson},

Index: tor-design.tex
===================================================================
RCS file: /home/or/cvsroot/doc/tor-design.tex,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -d -r1.32 -r1.33
--- tor-design.tex	28 Oct 2003 11:29:46 -0000	1.32
+++ tor-design.tex	28 Oct 2003 21:55:38 -0000	1.33
@@ -58,6 +58,7 @@
protects against known anonymity-breaking attacks as well
as or better than other systems with similar design parameters.
% and we present a big list of open problems at the end
+% and we present a new practical design for rendezvous points
\end{abstract}

%\begin{center}
@@ -773,7 +774,7 @@
\end{aligned}

-Being able to prove knowledge of this $K$ shows both that it was Bob
+The second step shows both that it was Bob
who received $g^x$, and that it was Bob who came up with $y$. We use
PK encryption in the first step (rather than, eg, using the first two
steps of STS, which has a signature in the second step) because we
@@ -804,8 +805,8 @@
it recognizes the stream ID. A stream ID is recognized either if it
is an already open stream at that OR, or if it is equal to zero. The
zero stream ID is treated specially, and is used for control messages,
-e.g.  starting a new stream. If the stream ID is unrecognized, the OR
-sends the relay cell downstream. This \emph{leaky pipe} circuit design
+e.g. starting a new stream. If the stream ID is unrecognized, the OR
+passes the relay cell downstream. This \emph{leaky pipe} circuit design
allows Alice's streams to exit at different ORs, for example to tolerate
different exit policies, or to keep the ORs from knowing that two streams
originate at the same person.
@@ -815,9 +816,9 @@
that circuit, and passes a new destroy cell forward. But since circuits
can be built incrementally, they can also be torn down incrementally:
Alice can send a relay truncate cell to a node along the circuit. That
-node will send a destroy cell forward, and reply with a relay truncated
-acknowledgement. Alice might truncate her circuit so she can extend it
-to different nodes without notifying the first few nodes (or somebody
+node will send a destroy cell forward, and reply with an acknowledgement
+(relay truncated). Alice might truncate her circuit so she can extend it
+to different nodes without signaling to the first few nodes (or somebody
observing them) that she is changing her circuit. That is, nodes in the
middle are not even aware that the circuit was truncated, because the
relay cells are encrypted. Similarly, if a node on the circuit goes down,
@@ -868,29 +869,31 @@
for TCP-based applications that, for example, close a socket for writing
but are still willing to read.

-\SubSection{Tagging attacks on streams}
+\SubSection{Integrity checking on streams}

In the old Onion Routing design, traffic was vulnerable to a malleability
-attack: since there was no integrity checking, an adversary could
+attack: without integrity checking, an adversary could
guess some of the plaintext of a cell, xor it out, and xor in his own
encryption!

-Some examples of this attack might be to change a create cell to a
-destroy cell, to change the destination address in a relay begin cell
-to the adversary's webserver, or to change a user on an ftp connection
+For example, an adversary could change a create cell to a
+destroy cell; change the destination address in a relay begin cell
+to the adversary's webserver; or change a user on an ftp connection
from typing dir'' to typing delete *''. Any node or observer along
the path can introduce such corruption in a stream.

-Tor solves the tagging attack with respect to external adversaries simply
-by using TLS. Addressing the insider tagging attack is more complex.
+Tor solves this malleability attack with respect to external adversaries
+simply by using TLS. Addressing the insider malleability attack is more
+complex.

Rather than doing integrity checking of the relay cells at each hop
(like Mixminion \cite{minion-design}), which would increase packet size
by a function of path length\footnote{This is also the argument against
using recent cipher modes like EAX \cite{eax} --- we don't want the added
message-expansion overhead at each hop, and we don't want to leak the path
-length}, we choose to accept passive timing attacks, and do integrity
+length (or pad to some max path length).}, we choose to accept passive
+timing attacks, and do integrity
checking only at the edges of the circuit. When Alice negotiates a key
with that hop, they both start a SHA-1 with some derivative of that key,
thus starting out with randomness that only the two of them know. From
@@ -915,6 +918,7 @@

\SubSection{Website fingerprinting attacks}

+% this subsection probably wants to move to analysis -RD
old onion routing is vulnerable to website fingerprinting attacks like
david martin's from usenix sec and drew's from pet2002. so is tor. we
@@ -922,24 +926,97 @@
a followup to \cite{defensive-dropping} that tells us what, exactly,
to do, and why, exactly, it helps.

-\SubSection{Congestion control and fairness}
+\SubSection{Rate limiting and fairness}
+
+Nodes use a token bucket approach \cite{foo} to limit the number of
+bytes they receive. Tokens are added to the bucket each second (when
+the bucket is full, new tokens are discarded.) Each token represents
+permission to receive one byte from the network --- to receive a byte,
+the connection must remove a token from the bucket. Thus if the bucket
+is empty, that connection must wait until more tokens arrive. The number
+of tokens we add enforces a longterm average rate of incoming bytes, yet
+we still permit short-term bursts above the allowed bandwidth. Currently
+bucket sizes are set to ten seconds worth of traffic.
+
+Further, we want to avoid starving any Tor streams. Entire circuits
+could starve if we read greedily from connections and one connection
+uses all the remaining bandwidth. We solve this by dividing the number
+of tokens in the bucket by the number of connections that want to read,
+and reading at most that number of bytes from each connection. We iterate
+this procedure until the number of tokens in the bucket is under some
+threshold (eg 10KB), at which point we greedily read from connections.
+
+Because the number of bytes going out of a node is roughly the same
+as the number of bytes that have come in, doing rate limiting only on
+incoming bytes should be sufficient.
+
+of the circuit can automatically distinguish interactive streams compared
+to bulk streams --- interactive streams supply cells only rarely. We can
+get good latency for these streams by giving them preferential service,
+while still getting good overall throughput to the bulk streams. Such
+preferential treatment can have impact on anonymity, but an adversary
+who can observe the stream can already learn this information through
+timing attacks.
+
+\SubSection{Congestion control}
\label{subsec:congestion}

-Even with bandwidth throttling, we still need to worry about congestion,
-either accidental or intentional. If a lot of people make circuits into
-the same node, and they all come out through the same connection, then
-that connection may become saturated (be unable to send out cells as
-quickly as it wants to). For example, an adversary can make a 'put'
-request through the onion routing network to a webserver he owns,
+Even with bandwidth rate limiting, we still need to worry about
+congestion, either accidental or intentional. If enough users choose
+the same OR-to-OR connection for their circuits, that connection
+will become saturated. For example, an adversary can make a put'
+request through the onion routing network to a webserver he runs,
and then refuse to read any of the bytes at the webserver end of the
-circuit. These bottlenecks can propagate back through the entire network,
-mucking up everything.
+circuit. Without some congestion control mechanism, these bottlenecks
+can propagate back through the entire network.

+\subsubsection{Circuit-level}

-Describe circuit-level and stream-level
-congestion control issues and solutions.
-Describe circuit-level and stream-level fairness issues; cite Marc's
-anonnet stuff.
+To control a circuit's bandwidth usage, each OR keeps track of two
+windows. The package window tracks how many relay data cells the OR is
+allowed to package (from outside streams) for transmission back to the OP,
+and the deliver window tracks how many relay data cells it is willing
+to deliver to streams outside the network. Each window is initialized
+(say, to 1000 data cells). When a data cell is packaged or delivered,
+the appropriate window is decremented. When an OR has received enough
+data cells (currently 100), it sends a relay sendme cell towards the OP,
+with stream ID zero. When an OR receives a relay sendme cell with stream
+ID zero, it increments its packaging window. Either of these cells
+increments the corresponding window by 100. If the packaging window
+reaches 0, the OR stops reading from TCP connections for all streams
+on the corresponding circuit, and sends no more relay data cells until
+receiving a relay sendme cell.
+
+The OP behaves identically, except that it must track a packaging window
+and a delivery window for every OR in the circuit. If a packaging window
+reaches 0, it stops reading from streams destined for that OR.
+
+\subsubsection{Stream-level}
+
+The stream-level congestion control mechanism is similar to the
+circuit-level mechanism above. ORs and OPs use relay sendme cells
+to implement end-to-end flow control for individual streams across
+circuits. Each stream begins with a package window (e.g. 500 cells),
+and increments the window by a fixed value (50) upon receiving a relay
+sendme cell. Rather than always returning a relay sendme cell as soon
+as enough cells have arrived, the stream-level congestion control also
+has to check whether data has been successfully flushed onto the TCP
+stream; it sends a relay sendme only when the number of bytes pending
+to be flushed is under some threshold (currently 10 cells worth).
+
+Currently, non-data relay cells do not affect the windows. Thus we
+avoid potential deadlock issues, e.g. because a stream can't send a
+relay sendme cell because its packaging window is empty.
+
+\subsubsection{Needs more research}
+
+We don't need to reimplement full TCP windows (with sequence numbers,
+the ability to drop cells when we're full and retransmit later, etc),
+because the TCP streams already guarantee in-order delivery of each
+cell. But we need to investigate further the effects of the current
+parameters on throughput and latency, while also keeping privacy in mind;
+see Section \ref{sec:maintaining-anonymity} for more discussion.

\Section{Other design decisions}

@@ -1159,9 +1236,11 @@
\label{sec:rendezvous}

Rendezvous points are a building block for \emph{location-hidden services}
-(aka responder anonymity) in the Tor network. Location-hidden
-services means Bob can offer a tcp service, such as a webserver,
-without revealing the IP of that service.
+(aka responder anonymity) in the Tor network. Location-hidden services
+means Bob can offer a TCP service, such as a webserver, without revealing
+the IP of that service. One motivation for location privacy is to provide
+protection against DDoS attacks: attackers are forced to attack the
+onion routing network as a whole rather than just Bob's IP.

We provide this censorship resistance for Bob by allowing him to
advertise several onion routers (his \emph{Introduction Points}) as his
@@ -1171,8 +1250,12 @@
rendezvous
point. This extra level of indirection means Bob's introduction points
don't open themselves up to abuse by serving files directly, eg if Bob
-chooses a node in France to serve material distateful to the French. The
-extra level of indirection also allows Bob to respond to some requests
+chooses a node in France to serve material distateful to the French,
+%
+% We need a more legitimate-sounding reason here.
+%
+or if Bob's service tends to get DDoS'ed by script kiddies.
+The extra level of indirection also allows Bob to respond to some requests
and ignore others.

We provide the necessary glue so that Alice can view webpages from Bob's
@@ -1220,9 +1303,13 @@
and gets connected to Alice's pipe, his first cell contains the
other half of the DH key exchange.

-% briefly talk about our notion of giving cookies to people proportional
-% to how important they are, for location-protected servers hardened
-% against DDoS threat? -RD
+The authentication tokens can be used to provide selective access to users
+proportional to how important it is that they main uninterrupted access
+to the service. During normal situations, Bob's service might simply be
+offered directly from mirrors; Bob also gives out authentication cookies
+to special users. When those mirrors are knocked down by DDoS attacks,
+those special users can switch to accessing Bob's service via the Tor
+rendezvous system.

\subsection{Integration with user applications}

Index: tor-spec.txt
===================================================================
RCS file: /home/or/cvsroot/doc/tor-spec.txt,v
retrieving revision 1.40
retrieving revision 1.41
diff -u -d -r1.40 -r1.41
--- tor-spec.txt	27 Oct 2003 10:24:27 -0000	1.40
+++ tor-spec.txt	28 Oct 2003 21:55:38 -0000	1.41
@@ -178,6 +178,7 @@

1. Choose a chain of N onion routers (R_1...R_N) to constitute
the path, such that no router appears in the path twice.
+         [this is wrong, see October 2003 discussion on or-dev]

2. If not already connected to the first router in the chain,
open a new connection to that router.
@@ -424,7 +425,7 @@
receives a RELAY_SENDME cell with stream ID zero, it increments its
packaging window.

-   Either of these cells increment the corresponding window by 100.
+   Each of these cells increments the corresponding window by 100.

The OP behaves identically, except that it must track a packaging
window and a delivery window for every OR in the circuit.
@@ -435,6 +436,7 @@
If a packaging window reaches 0, the OR or OP stops reading from
TCP connections for all streams on the corresponding circuit, and
sends no more RELAY_DATA cells until receiving a RELAY_SENDME cell.
+[this stuff is badly worded; copy in the tor-design section -RD]

6.4. Stream-level flow control

`