commit f69101ea9ea33dbff0ede8185b9ef700c074b6d7
Author: Nick Mathewson <nickm(a)torproject.org>
Date: Fri Nov 9 19:21:44 2012 -0500
More revisions from the TODO:
- revise abstract
- v3 directory system
- v3 link protocol
- more on isolation
- create_fast.
---
todo | 25 ++++
tor-design-2012.tex | 323 +++++++++++++++++++++++++++++++++++----------------
2 files changed, 250 insertions(+), 98 deletions(-)
diff --git a/todo b/todo
index c807072..6ddbcb3 100644
--- a/todo
+++ b/todo
@@ -1,18 +1,43 @@
Tentative breakdown. Feel free to take on something here that isn't done
yet!
+LEGEND:
+ - Not done
+ o Done
+ . Partially done
+
+ITEMS:
+
* Integrate the content from the first blog post [nick] **
+ o Node discovery and the directory protocol
+ o Security improvements to hidden services
+ o DHT
+ - Improved authorization model for hidden services
+ o Faster first-hop circuit establishment with CREATE_FAST
+ o Cell queueing and scheduling.
* Integrate content from the second blog post [steven]
+ - guard nodes
+ - Bridges, censorship resistance, and pluggable transports
+ - Changes and complexities in our path selection algorithms
+ o stream isolation
* Integrate content from the third blog post [steven]
+ o link protocol tls
+ - rise and fall of .exit
+ . controller protocol
+ o torbutton
+ o tor browser bundle
* Revise the abstract and introduction [nick]
+ o Abstract
+ - Introduction
* Revise related work [steven]
* Revise design goals and assumptions [steven]
* Revise tor-design up to "opening and closing streams" [nick] **
* Revise tor-design "opening and closing streams" onward [steven]
* Revise hidden services section [nick]
+ . somewhat done? DHT and autho
* Revise "other design decisions" [nick]
* Revise "attacks and defenses" [steven]
diff --git a/tor-design-2012.tex b/tor-design-2012.tex
index e00d963..e7a662b 100644
--- a/tor-design-2012.tex
+++ b/tor-design-2012.tex
@@ -74,19 +74,23 @@ Paul Syverson \\ Naval Research Lab \\ syverson(a)itd.nrl.navy.mil}
\begin{abstract}
We present Tor, a circuit-based low-latency anonymous
-communication service. This second-generation Onion Routing
-system addresses limitations in the original design by adding
+communication service. This Onion Routing
+system addresses limitations in the earlier design by adding
perfect forward secrecy, congestion control, directory servers,
-integrity checking, configurable exit policies, and a practical
+integrity checking, configurable exit policies,
+anticensorship features, guard nodes, application- and
+user-selectable stream isolation, and a practical
design for location-hidden services via rendezvous points. Tor
-works on the real-world Internet, requires no special privileges
+is deployed on the real-world Internet, requires no special privileges
or kernel modifications, requires little synchronization or
coordination between nodes, and provides a reasonable tradeoff
-between anonymity, usability, and efficiency. We briefly
-describe our experiences with an international network of more
-than 30 nodes. We close with a list of open problems in
+between anonymity, usability, and efficiency.
+An earlier paper in 2004 described Tor's original design;
+here we explain Tor's current design as of late 2012, and
+describe our experiences with an international network of
+approximately 3000 nodes and XXXXX %?????
+users. We close with a list of open problems in
anonymous communication.
-% TODO: Abstract needs rewrite when we're done. -NM
\end{abstract}
%\begin{center}
@@ -202,19 +206,16 @@ until the congestion subsides.
% We've been working on this some; we have found that our current approach
% doesn't work so well. -NM
-\textbf{Directory authorities:} The earlier Onion Routing design
-planned to flood state information through the network---an
-approach that can be unreliable and complex. Tor takes a
-simplified view toward distributing this information. Certain
-more trusted nodes act as \emph{directory authorities}: they
-provide signed directories describing known routers and their
-current state. Users periodically download them directly from
-the authorities or from a mirror, via HTTP tunelled over a Tor
-circuit.
-% The above paragraph is almost right. But the more trusted nodes are called
-% ``authorities'' and we use http-over-tor to fetch stuff. There's a layer
-% of caches too. -NM
-% Believed done - SJM
+\textbf{Directory authorities:} The earlier Onion Routing
+design planned to flood state information through the
+network---an approach that can be unreliable and complex.
+Tor takes a simplified view toward distributing this
+information. Certain more trusted nodes act as
+\emph{directory authorities}: they collaborate to generate
+signed directory documents describing known routers and
+their current state. Users periodically download these
+documents directly from the authorities or a mirror, via
+HTTP tunelled over a Tor circuit.
\textbf{Variable exit policies:} Tor provides a consistent
mechanism for each node to advertise a policy describing the
@@ -635,13 +636,12 @@ Each onion router maintains a long-term identity key and a
short-term onion key. The identity key is used to sign TLS
certificates, to sign the OR's \emph{router descriptor} (a
summary of its keys, address, bandwidth, exit policy, and so
-on), and (by directory servers) to sign directories. The onion
+on). The onion
key is used to decrypt requests from users to set up a circuit
and negotiate ephemeral keys. The TLS protocol also establishes
a short-term link key when communicating between ORs. Short-term
keys are rotated periodically and independently, to limit the
impact of key compromise.
-% Directories are not signed with identity keys any longer. -NM
% Clarify the role of the link keys. -NM
% XXXX I hope somewhere in this paperwe talk more about the link protocol, so
% we can say more abotu the v2 and v3 versions of it. -NM
@@ -744,6 +744,67 @@ and commands in more detail below.
\mbox{\epsfig{figure=cell-struct,width=7cm}}
\end{figure}
+\subsection{TLS details}
+Tor's original (version 1) TLS handshake was fairly
+straightforward. The initiator said that it supported a
+sensible set of cryptographic algorithms and parameters
+(ciphersuites, in TLS terminology) and the responder selected
+one. If one side wanted to prove to the other that it was a
+Tor node, it would send a two-element certificate chain
+signed by the key published in the Tor directory.
+
+This approach met all the security properties envisaged at
+the time the 2004 design paper was written, but Tor's
+increasing use in censorship resistance changed the
+requirements – Tor's protocol signature also had to look (to
+the extent possible) like that of HTTPS web traffic, to
+prevent censors using deep-packet-inspection to detect and
+block Tor. Tor's use of fixed two-certificate chains was a
+giveaway.
+
+After an intermediary design that relied (fragilely
+% Cite stuff about how TLS renegotiation went away for a
+% while once everybody realized it was insecure -NM
+and observably)
+on TLS renegotiation
+% Cite proposal 130.
+, Tor shifted to a mixed authentication
+model, where the TLS handshake can complete with any
+(secure) credentials and ciphersuites desired, and an inner
+handshake done within the TLS protocol provides the
+authentication that Tor actually wants.\footnote{To
+ determine that this newer version of the link protocol handshake
+ is to be used, the initiator avoids using the exact set
+ of ciphersuites used by early Tor versions, and the Tor
+ responder uses an X509 certificate unlike those generated by
+ earlier versions of Tor.
+% Cite proposal 176 and tor-spec
+ This may be too clever for Tor's
+ own good; we mean to eliminate it once every supported version of
+ Tor supports this version of Tor's link protocol.}
+
+To perform the inner handshake once the TLS handshake is
+done, the parties negotiate a Tor link protocol version by
+exchanging \emph{versions} cells containing the list of link
+protocol versions each supports, then choosing the highest
+versions supported by both. Next, the responder sends an
+\emph{certs} cell containing the
+actual certificate chain authenticating the public key it
+used for the TLS handshake with its identity key. The
+responder also sends a random nonces as a challenge. If the
+initiator also wishes to authenticate herself as an OR, she
+sends an \emph{certs} cell of her own, followed by an
+\emph{authenenticate} cell signed by her link key,
+containing: a digest of both identity keys, a digest of all
+messages she has sent and received so far, a digest of the
+responder's TLS link certificate, the current time, a random
+nonce, and a MAC using the TLS master secret as its key, of
+the TLS handshake's client\_random and server\_random
+parameters.
+
+% Justify the above. -MN
+
+
\subsection{Circuits and streams}
\label{subsec:circuits}
@@ -754,23 +815,52 @@ design imposed high costs on applications like web browsing that
open many TCP streams.
In Tor, each circuit can be shared by many TCP streams. To
-avoid delays, users construct circuits preemptively.
-% Clarify: OPs construct circuits preemptively, not users. -NM
+avoid delays, OPs construct circuits preemptively.
To limit linkability among their streams, the user's OP will not
-assign a new stream to a circuit if the circuit has previously
-carried a stream which the user has indicated should be separate
+assign a new stream to a circuit if the circuit\footnote{
+ Occasionally people suggest that isolating \emph{exits}
+ would be better than isolating circuits, so that two
+ isolated streams would never appear to come from the same
+ IP as one another. A little analysis shows that this
+ approach would hurt anonymity, however: a destination
+ service could observe that two accounts both used Tor, but
+ never arrived from the same exit node IP at the same time, and
+ thereby conclude that those accounts were probably run by
+ the same user.}
+has previously
+carried a stream which the user has indicated should be isolated
from the new one. By default, a user signals that two streams
should not be linkable by making SOCKS connections to different
ports, from a different IP address, or with different SOCKS
-authentication credentials. Even when a stream would otherwise
+authentication credentials. Tor's SOCKS ports can
+additionally be configured to isolate streams based on
+destination port\footnote{Some designs have suggested
+ port-based isolation as a means for keeping use of separate
+ protocols from becoming linked to each other. This is
+ non-workable, though, if one of the protocols is one such
+ as HTTP or HTTPS where
+ applications can typically be made to use any
+ attacker-selected port.}
+or address. Even when a stream would otherwise
be permitted to be carried by a circuit, if the circuit's first
stream was created more than 10 minutes (by default) ago, that
circuit will not be considered for re-use and closed once there
are no remaining streams, then the OP will build a new circuit
preemptively.
-% Also mention that there are mechanisms that applications can use
-% to signal that streams shouldn't be sent over the same circuit. -NM
-% Believed done -SJM
+
+With careful configuration, this system can be used to avoid
+numerous linking attacks. For example, a user accessing
+multiple pseudonymous chat accounts could configure her chat
+application to use a separate SOCKS username for each one,
+thus telling Tor not place any of their streams on the same
+circuit (which would reveal to the exit node and suggest to
+the exit that the accounts were shared by the same user).
+Or for applications that don't support SOCKS authentication,
+the user might configure multiple SOCKS ports, and tell each
+application a different port, so that for example her
+anonymous web browsing never shares a circuit with her
+pseudonymous IM usage.
+
OPs
consider rotating to a new circuit once a minute: thus even
heavy users spend negligible time building circuits, but a
@@ -857,6 +947,15 @@ Dolev-Yao model.
% implementation of the protocol above is a little fraught.
% Maaaybe mention ACE and ntor handshakes as future directions
% here; if not, mention them in future work. -NM
+
+As an optimization, Alice client may sent an \emph{create\_fast} cell in
+place of her first \emph{create} cell: instead of sending an encrypted $g^x$
+value, she simply sends a random value $x$, Bob replies with a
+\emph{created_fast} cell containing a random value $y$, and they base their
+shared keys on $H(x|y)$. This handshake saves the expense of RSA and
+Diffie-Hellman, but provides no authentication, integrity, confidentiality or
+forward secrecy on its own: it relies on the TLS protocol that Alice and Bob
+are already using for their link in order to achieve these properties.
\\
\noindent{\large\bf Relay cells}\\
@@ -1091,11 +1190,6 @@ Currently each cell has a 30-second half-life. Such
preferential treatment presents a possible end-to-end attack,
but an adversary observing both ends of the stream can already
learn this information through timing attacks.
-% I don't think we do anything like what we had in mind when we
-% wrote the above paragraph. -NM
-
-% We should mention EWMA in this section. -NM
-% Believed done -SJM
\subsection{Congestion control}
\label{subsec:congestion}
@@ -1195,9 +1289,6 @@ can unauthorized users not connect to the hidden service or its
introduction points (the descriptor contains an authentication
credential), they also cannot discover whether the hidden
service is online.
-% We eventually went and built a distributed directory in Tor to deal with
-% this. -NM
-% Believed done -SJM
Alice, the client, chooses an OR as her
\emph{rendezvous point}. She connects to one of Bob's
@@ -1523,8 +1614,6 @@ project~\cite{darkside} give us a glimpse of likely issues.
\subsection{Directory Servers}
\label{subsec:dirservers}
-% This whole section needs a rewrite -NM
-
First-generation Onion Routing
designs~\cite{freedom2-arch,or-jsac98} used in-band network
status updates: each router flooded a signed statement to its
@@ -1545,65 +1634,103 @@ track changes in network topology and node state, including keys
and exit policies. Each such \emph{directory server} acts as an
HTTP server, so clients can fetch current network state and
router lists, and so other ORs can upload state information.
-Onion routers periodically publish signed statements of their
-state to each directory server. The directory servers combine
-this information with their own views of network liveness, and
-generate a signed description (a \emph{directory}) of the entire
-network state. Client software is pre-loaded with a list of the
-directory servers and their keys, to bootstrap each client's
-view of the network.
-
-When a directory server receives a signed statement for an OR,
-it checks whether the OR's identity key is recognized. Directory
-servers do not advertise unrecognized ORs---if they did, an
-adversary could take over the network by creating many
-servers~\cite{sybil}. Instead, new nodes must be approved by the
-directory server administrator before they are
-included. Mechanisms for automated node approval are an area of
-active research, and are discussed more in
-Section~\ref{sec:maintaining-anonymity}.
-
-Of course, a variety of attacks remain. An adversary who
-controls a directory server can track clients by providing them
+
+A small number of partially trusted directory servers (nine
+as of late 2012) are ``directory authorities.'' Onion
+routers periodically publish signed statements of their
+state to each directory authority. The directory servers
+combine this information with their own views of network
+liveness, and periodically collaborate to vote on a
+description (a consensus \emph{directory}) of the entire
+network state, signed by as many of the authorities as
+possible. Client software is pre-loaded with a list of the
+directory authorities and their public keys, to bootstrap
+each client's view of the network.
+
+When a directory authority receives a signed statement for
+an OR, it does not advertise the node as running until it
+tested that it correctly responds to direct and anonymous
+circuit creation attempts. The number of nodes that can run
+with a single IP address is limited, and authority
+administrators try to keep a lookout for nodes that appear
+to be configured too similarly or running all on the same
+subnet. Other than that, the authority subsystem takes no
+action to prevent Sybil attacks~\cite{sybil}. Previous
+designs had declared that authority operators should
+hand-approve each new node, but this system proved
+ineffective in practice.
+
+To avoid centralizing trust in any single authority, clients
+will not use a consensus document unless it has been signed
+by a threshold (half, rounded up) of the authorities that
+the client recognizes. To prevent rollback attacks, each
+consensus document has a range of times in which it's valid,
+and clients don't use a consensus which have been invalid
+for too long.
+
+Requiring a consensus view of the network prevents
+individual directory authorities from mounting a variety of
+attacks: if clients trusted a single directory authority, then
+an attacker who
+controlled that server can track clients by providing each client
different information---perhaps by listing only nodes under its
control, or by informing only certain clients about a given
-node. Even an external adversary can exploit differences in
-client knowledge: clients who use a node listed on one directory
-server but not the others are vulnerable.
-
-Thus these directory servers must be synchronized and redundant,
-so that they can agree on a common directory. Clients should
-only trust this directory if it is signed by a threshold of the
-directory servers.
-
-The directory servers in Tor are modeled after those in
-Mixminion~\cite{minion-design}, but our situation is
-easier. First, we make the simplifying assumption that all
-participants agree on the set of directory servers. Second,
-while Mixminion needs to predict node behavior, Tor only needs a
-threshold consensus of the current state of the network. Third,
-we assume that we can fall back to the human administrators to
-discover and resolve problems when a consensus directory cannot
-be reached. Since there are relatively few directory servers
-(currently 3, but we expect as many as 9 as the network scales),
-we can afford operations like broadcast to simplify the
-consensus-building protocol.
-
-To avoid attacks where a router connects to all the directory
-servers but refuses to relay traffic from other routers, the
-directory servers must also build circuits and use them to
-anonymously test router
-reliability~\cite{mix-acc}. Unfortunately, this defense is not
-yet designed or implemented.
-
-Using directory servers is simpler and more flexible than
-flooding. Flooding is expensive, and complicates the analysis
-when we start experimenting with non-clique network
-topologies. Signed directories can be cached by other onion
-routers, so directory servers are not a performance bottleneck
-when we have many users, and do not aid traffic analysis by
-forcing clients to announce their existence to any central
-point.
+node. Even an external adversary could exploit differences in
+client knowledge: clients who use a node listed by one authority
+server but not another are distinguishable, and hence
+vulnerable.
+% Cite epistemic attacks. -NM
+
+The directory authorities use a voting algorithm chosen more
+for simplicity of implementation than for byzantine fault
+tolerance. At an interval before a vote is to be taken,
+every authority floods the others with a signed vote document
+containing its view of the composition of the network and
+the status of all routers in it. In the next interval, each
+authority asks all the other authorities for votes from any
+authority it didn't receive a vote from. Then, each
+authorities follows a well-specified voting algorithm such
+that, if each has the same set of votes, each will produce
+the same consensus as an output. Finally, they sign this
+consensus document, and collect signatures from every
+authority that signed the same consensus.
+
+This voting system is not robust to ill-timed authority
+failures, ill-behaved authorities giving their peers
+different votes, authorities who disagree about the
+composition of the set of authorities, and similar
+issues. In practice, we handle accidental failures in
+directory authority operation by setting consensus validity
+intervals so that an occasional day or two of missing
+consensus votes doesn't hurt the network, and by keeping in
+touch with the authority operators, who try to keep the
+number of running authorities well above the threshold. We
+have not yet needed to deal with a hostile or compromised
+authority: our design restricts the damage that such an
+authority could do to casting a maliciously designed vote,
+or preventing the vote from occurring. In the event of such
+a denial of service from a hostile authority, it would be
+sufficient to detect the authority's malfeasance, and remove
+it from the authority set.
+
+Authorities' long-term private keys are kept offline. Rather
+than signing documents with them directly, authorities use
+them to sign certificates containing shorter-term 'signing
+keys' that they keep online and use for signing documents.
+
+%To avoid attacks where a router connects to all the directory
+%servers but refuses to relay traffic from other routers, the
+%directory servers must also build circuits and use them to
+%anonymously test router
+%reliability~\cite{mix-acc}. Unfortunately, this defense is not
+%yet designed or implemented.
+
+To avoid excessive load on the directory authorities,
+clients do not contact them directly except when
+bootstrapping. Instead, most Tor servers act as ``directory
+caches,'' and periodically fetch network consensus
+documents; clients can contact a cache instead, once they
+know who the caches are.
\section{Attacks and Defenses}
\label{sec:attacks}