commit f69101ea9ea33dbff0ede8185b9ef700c074b6d7 Author: Nick Mathewson nickm@torproject.org Date: Fri Nov 9 19:21:44 2012 -0500
More revisions from the TODO:
- revise abstract - v3 directory system - v3 link protocol - more on isolation - create_fast. --- todo | 25 ++++ tor-design-2012.tex | 323 +++++++++++++++++++++++++++++++++++---------------- 2 files changed, 250 insertions(+), 98 deletions(-)
diff --git a/todo b/todo index c807072..6ddbcb3 100644 --- a/todo +++ b/todo @@ -1,18 +1,43 @@ Tentative breakdown. Feel free to take on something here that isn't done yet!
+LEGEND: + - Not done + o Done + . Partially done + +ITEMS: +
* Integrate the content from the first blog post [nick] ** + o Node discovery and the directory protocol + o Security improvements to hidden services + o DHT + - Improved authorization model for hidden services + o Faster first-hop circuit establishment with CREATE_FAST + o Cell queueing and scheduling. * Integrate content from the second blog post [steven] + - guard nodes + - Bridges, censorship resistance, and pluggable transports + - Changes and complexities in our path selection algorithms + o stream isolation * Integrate content from the third blog post [steven] + o link protocol tls + - rise and fall of .exit + . controller protocol + o torbutton + o tor browser bundle
* Revise the abstract and introduction [nick] + o Abstract + - Introduction * Revise related work [steven]
* Revise design goals and assumptions [steven] * Revise tor-design up to "opening and closing streams" [nick] ** * Revise tor-design "opening and closing streams" onward [steven] * Revise hidden services section [nick] + . somewhat done? DHT and autho
* Revise "other design decisions" [nick] * Revise "attacks and defenses" [steven] diff --git a/tor-design-2012.tex b/tor-design-2012.tex index e00d963..e7a662b 100644 --- a/tor-design-2012.tex +++ b/tor-design-2012.tex @@ -74,19 +74,23 @@ Paul Syverson \ Naval Research Lab \ syverson@itd.nrl.navy.mil}
\begin{abstract} We present Tor, a circuit-based low-latency anonymous -communication service. This second-generation Onion Routing -system addresses limitations in the original design by adding +communication service. This Onion Routing +system addresses limitations in the earlier design by adding perfect forward secrecy, congestion control, directory servers, -integrity checking, configurable exit policies, and a practical +integrity checking, configurable exit policies, +anticensorship features, guard nodes, application- and +user-selectable stream isolation, and a practical design for location-hidden services via rendezvous points. Tor -works on the real-world Internet, requires no special privileges +is deployed on the real-world Internet, requires no special privileges or kernel modifications, requires little synchronization or coordination between nodes, and provides a reasonable tradeoff -between anonymity, usability, and efficiency. We briefly -describe our experiences with an international network of more -than 30 nodes. We close with a list of open problems in +between anonymity, usability, and efficiency. +An earlier paper in 2004 described Tor's original design; +here we explain Tor's current design as of late 2012, and +describe our experiences with an international network of +approximately 3000 nodes and XXXXX %????? +users. We close with a list of open problems in anonymous communication. -% TODO: Abstract needs rewrite when we're done. -NM \end{abstract}
%\begin{center} @@ -202,19 +206,16 @@ until the congestion subsides. % We've been working on this some; we have found that our current approach % doesn't work so well. -NM
-\textbf{Directory authorities:} The earlier Onion Routing design -planned to flood state information through the network---an -approach that can be unreliable and complex. Tor takes a -simplified view toward distributing this information. Certain -more trusted nodes act as \emph{directory authorities}: they -provide signed directories describing known routers and their -current state. Users periodically download them directly from -the authorities or from a mirror, via HTTP tunelled over a Tor -circuit. -% The above paragraph is almost right. But the more trusted nodes are called -% ``authorities'' and we use http-over-tor to fetch stuff. There's a layer -% of caches too. -NM -% Believed done - SJM +\textbf{Directory authorities:} The earlier Onion Routing +design planned to flood state information through the +network---an approach that can be unreliable and complex. +Tor takes a simplified view toward distributing this +information. Certain more trusted nodes act as +\emph{directory authorities}: they collaborate to generate +signed directory documents describing known routers and +their current state. Users periodically download these +documents directly from the authorities or a mirror, via +HTTP tunelled over a Tor circuit.
\textbf{Variable exit policies:} Tor provides a consistent mechanism for each node to advertise a policy describing the @@ -635,13 +636,12 @@ Each onion router maintains a long-term identity key and a short-term onion key. The identity key is used to sign TLS certificates, to sign the OR's \emph{router descriptor} (a summary of its keys, address, bandwidth, exit policy, and so -on), and (by directory servers) to sign directories. The onion +on). The onion key is used to decrypt requests from users to set up a circuit and negotiate ephemeral keys. The TLS protocol also establishes a short-term link key when communicating between ORs. Short-term keys are rotated periodically and independently, to limit the impact of key compromise. -% Directories are not signed with identity keys any longer. -NM % Clarify the role of the link keys. -NM % XXXX I hope somewhere in this paperwe talk more about the link protocol, so % we can say more abotu the v2 and v3 versions of it. -NM @@ -744,6 +744,67 @@ and commands in more detail below. \mbox{\epsfig{figure=cell-struct,width=7cm}} \end{figure}
+\subsection{TLS details} +Tor's original (version 1) TLS handshake was fairly +straightforward. The initiator said that it supported a +sensible set of cryptographic algorithms and parameters +(ciphersuites, in TLS terminology) and the responder selected +one. If one side wanted to prove to the other that it was a +Tor node, it would send a two-element certificate chain +signed by the key published in the Tor directory. + +This approach met all the security properties envisaged at +the time the 2004 design paper was written, but Tor's +increasing use in censorship resistance changed the +requirements – Tor's protocol signature also had to look (to +the extent possible) like that of HTTPS web traffic, to +prevent censors using deep-packet-inspection to detect and +block Tor. Tor's use of fixed two-certificate chains was a +giveaway. + +After an intermediary design that relied (fragilely +% Cite stuff about how TLS renegotiation went away for a +% while once everybody realized it was insecure -NM +and observably) +on TLS renegotiation +% Cite proposal 130. +, Tor shifted to a mixed authentication +model, where the TLS handshake can complete with any +(secure) credentials and ciphersuites desired, and an inner +handshake done within the TLS protocol provides the +authentication that Tor actually wants.\footnote{To + determine that this newer version of the link protocol handshake + is to be used, the initiator avoids using the exact set + of ciphersuites used by early Tor versions, and the Tor + responder uses an X509 certificate unlike those generated by + earlier versions of Tor. +% Cite proposal 176 and tor-spec + This may be too clever for Tor's + own good; we mean to eliminate it once every supported version of + Tor supports this version of Tor's link protocol.} + +To perform the inner handshake once the TLS handshake is +done, the parties negotiate a Tor link protocol version by +exchanging \emph{versions} cells containing the list of link +protocol versions each supports, then choosing the highest +versions supported by both. Next, the responder sends an +\emph{certs} cell containing the +actual certificate chain authenticating the public key it +used for the TLS handshake with its identity key. The +responder also sends a random nonces as a challenge. If the +initiator also wishes to authenticate herself as an OR, she +sends an \emph{certs} cell of her own, followed by an +\emph{authenenticate} cell signed by her link key, +containing: a digest of both identity keys, a digest of all +messages she has sent and received so far, a digest of the +responder's TLS link certificate, the current time, a random +nonce, and a MAC using the TLS master secret as its key, of +the TLS handshake's client_random and server_random +parameters. + +% Justify the above. -MN + + \subsection{Circuits and streams} \label{subsec:circuits}
@@ -754,23 +815,52 @@ design imposed high costs on applications like web browsing that open many TCP streams.
In Tor, each circuit can be shared by many TCP streams. To -avoid delays, users construct circuits preemptively. -% Clarify: OPs construct circuits preemptively, not users. -NM +avoid delays, OPs construct circuits preemptively. To limit linkability among their streams, the user's OP will not -assign a new stream to a circuit if the circuit has previously -carried a stream which the user has indicated should be separate +assign a new stream to a circuit if the circuit\footnote{ + Occasionally people suggest that isolating \emph{exits} + would be better than isolating circuits, so that two + isolated streams would never appear to come from the same + IP as one another. A little analysis shows that this + approach would hurt anonymity, however: a destination + service could observe that two accounts both used Tor, but + never arrived from the same exit node IP at the same time, and + thereby conclude that those accounts were probably run by + the same user.} +has previously +carried a stream which the user has indicated should be isolated from the new one. By default, a user signals that two streams should not be linkable by making SOCKS connections to different ports, from a different IP address, or with different SOCKS -authentication credentials. Even when a stream would otherwise +authentication credentials. Tor's SOCKS ports can +additionally be configured to isolate streams based on +destination port\footnote{Some designs have suggested + port-based isolation as a means for keeping use of separate + protocols from becoming linked to each other. This is + non-workable, though, if one of the protocols is one such + as HTTP or HTTPS where + applications can typically be made to use any + attacker-selected port.} +or address. Even when a stream would otherwise be permitted to be carried by a circuit, if the circuit's first stream was created more than 10 minutes (by default) ago, that circuit will not be considered for re-use and closed once there are no remaining streams, then the OP will build a new circuit preemptively. -% Also mention that there are mechanisms that applications can use -% to signal that streams shouldn't be sent over the same circuit. -NM -% Believed done -SJM + +With careful configuration, this system can be used to avoid +numerous linking attacks. For example, a user accessing +multiple pseudonymous chat accounts could configure her chat +application to use a separate SOCKS username for each one, +thus telling Tor not place any of their streams on the same +circuit (which would reveal to the exit node and suggest to +the exit that the accounts were shared by the same user). +Or for applications that don't support SOCKS authentication, +the user might configure multiple SOCKS ports, and tell each +application a different port, so that for example her +anonymous web browsing never shares a circuit with her +pseudonymous IM usage. + OPs consider rotating to a new circuit once a minute: thus even heavy users spend negligible time building circuits, but a @@ -857,6 +947,15 @@ Dolev-Yao model. % implementation of the protocol above is a little fraught. % Maaaybe mention ACE and ntor handshakes as future directions % here; if not, mention them in future work. -NM + +As an optimization, Alice client may sent an \emph{create_fast} cell in +place of her first \emph{create} cell: instead of sending an encrypted $g^x$ +value, she simply sends a random value $x$, Bob replies with a +\emph{created_fast} cell containing a random value $y$, and they base their +shared keys on $H(x|y)$. This handshake saves the expense of RSA and +Diffie-Hellman, but provides no authentication, integrity, confidentiality or +forward secrecy on its own: it relies on the TLS protocol that Alice and Bob +are already using for their link in order to achieve these properties. \
\noindent{\large\bf Relay cells}\ @@ -1091,11 +1190,6 @@ Currently each cell has a 30-second half-life. Such preferential treatment presents a possible end-to-end attack, but an adversary observing both ends of the stream can already learn this information through timing attacks. -% I don't think we do anything like what we had in mind when we -% wrote the above paragraph. -NM - -% We should mention EWMA in this section. -NM -% Believed done -SJM
\subsection{Congestion control} \label{subsec:congestion} @@ -1195,9 +1289,6 @@ can unauthorized users not connect to the hidden service or its introduction points (the descriptor contains an authentication credential), they also cannot discover whether the hidden service is online. -% We eventually went and built a distributed directory in Tor to deal with -% this. -NM -% Believed done -SJM
Alice, the client, chooses an OR as her \emph{rendezvous point}. She connects to one of Bob's @@ -1523,8 +1614,6 @@ project~\cite{darkside} give us a glimpse of likely issues. \subsection{Directory Servers} \label{subsec:dirservers}
-% This whole section needs a rewrite -NM - First-generation Onion Routing designs~\cite{freedom2-arch,or-jsac98} used in-band network status updates: each router flooded a signed statement to its @@ -1545,65 +1634,103 @@ track changes in network topology and node state, including keys and exit policies. Each such \emph{directory server} acts as an HTTP server, so clients can fetch current network state and router lists, and so other ORs can upload state information. -Onion routers periodically publish signed statements of their -state to each directory server. The directory servers combine -this information with their own views of network liveness, and -generate a signed description (a \emph{directory}) of the entire -network state. Client software is pre-loaded with a list of the -directory servers and their keys, to bootstrap each client's -view of the network. - -When a directory server receives a signed statement for an OR, -it checks whether the OR's identity key is recognized. Directory -servers do not advertise unrecognized ORs---if they did, an -adversary could take over the network by creating many -servers~\cite{sybil}. Instead, new nodes must be approved by the -directory server administrator before they are -included. Mechanisms for automated node approval are an area of -active research, and are discussed more in -Section~\ref{sec:maintaining-anonymity}. - -Of course, a variety of attacks remain. An adversary who -controls a directory server can track clients by providing them + +A small number of partially trusted directory servers (nine +as of late 2012) are ``directory authorities.'' Onion +routers periodically publish signed statements of their +state to each directory authority. The directory servers +combine this information with their own views of network +liveness, and periodically collaborate to vote on a +description (a consensus \emph{directory}) of the entire +network state, signed by as many of the authorities as +possible. Client software is pre-loaded with a list of the +directory authorities and their public keys, to bootstrap +each client's view of the network. + +When a directory authority receives a signed statement for +an OR, it does not advertise the node as running until it +tested that it correctly responds to direct and anonymous +circuit creation attempts. The number of nodes that can run +with a single IP address is limited, and authority +administrators try to keep a lookout for nodes that appear +to be configured too similarly or running all on the same +subnet. Other than that, the authority subsystem takes no +action to prevent Sybil attacks~\cite{sybil}. Previous +designs had declared that authority operators should +hand-approve each new node, but this system proved +ineffective in practice. + +To avoid centralizing trust in any single authority, clients +will not use a consensus document unless it has been signed +by a threshold (half, rounded up) of the authorities that +the client recognizes. To prevent rollback attacks, each +consensus document has a range of times in which it's valid, +and clients don't use a consensus which have been invalid +for too long. + +Requiring a consensus view of the network prevents +individual directory authorities from mounting a variety of +attacks: if clients trusted a single directory authority, then +an attacker who +controlled that server can track clients by providing each client different information---perhaps by listing only nodes under its control, or by informing only certain clients about a given -node. Even an external adversary can exploit differences in -client knowledge: clients who use a node listed on one directory -server but not the others are vulnerable. - -Thus these directory servers must be synchronized and redundant, -so that they can agree on a common directory. Clients should -only trust this directory if it is signed by a threshold of the -directory servers. - -The directory servers in Tor are modeled after those in -Mixminion~\cite{minion-design}, but our situation is -easier. First, we make the simplifying assumption that all -participants agree on the set of directory servers. Second, -while Mixminion needs to predict node behavior, Tor only needs a -threshold consensus of the current state of the network. Third, -we assume that we can fall back to the human administrators to -discover and resolve problems when a consensus directory cannot -be reached. Since there are relatively few directory servers -(currently 3, but we expect as many as 9 as the network scales), -we can afford operations like broadcast to simplify the -consensus-building protocol. - -To avoid attacks where a router connects to all the directory -servers but refuses to relay traffic from other routers, the -directory servers must also build circuits and use them to -anonymously test router -reliability~\cite{mix-acc}. Unfortunately, this defense is not -yet designed or implemented. - -Using directory servers is simpler and more flexible than -flooding. Flooding is expensive, and complicates the analysis -when we start experimenting with non-clique network -topologies. Signed directories can be cached by other onion -routers, so directory servers are not a performance bottleneck -when we have many users, and do not aid traffic analysis by -forcing clients to announce their existence to any central -point. +node. Even an external adversary could exploit differences in +client knowledge: clients who use a node listed by one authority +server but not another are distinguishable, and hence +vulnerable. +% Cite epistemic attacks. -NM + +The directory authorities use a voting algorithm chosen more +for simplicity of implementation than for byzantine fault +tolerance. At an interval before a vote is to be taken, +every authority floods the others with a signed vote document +containing its view of the composition of the network and +the status of all routers in it. In the next interval, each +authority asks all the other authorities for votes from any +authority it didn't receive a vote from. Then, each +authorities follows a well-specified voting algorithm such +that, if each has the same set of votes, each will produce +the same consensus as an output. Finally, they sign this +consensus document, and collect signatures from every +authority that signed the same consensus. + +This voting system is not robust to ill-timed authority +failures, ill-behaved authorities giving their peers +different votes, authorities who disagree about the +composition of the set of authorities, and similar +issues. In practice, we handle accidental failures in +directory authority operation by setting consensus validity +intervals so that an occasional day or two of missing +consensus votes doesn't hurt the network, and by keeping in +touch with the authority operators, who try to keep the +number of running authorities well above the threshold. We +have not yet needed to deal with a hostile or compromised +authority: our design restricts the damage that such an +authority could do to casting a maliciously designed vote, +or preventing the vote from occurring. In the event of such +a denial of service from a hostile authority, it would be +sufficient to detect the authority's malfeasance, and remove +it from the authority set. + +Authorities' long-term private keys are kept offline. Rather +than signing documents with them directly, authorities use +them to sign certificates containing shorter-term 'signing +keys' that they keep online and use for signing documents. + +%To avoid attacks where a router connects to all the directory +%servers but refuses to relay traffic from other routers, the +%directory servers must also build circuits and use them to +%anonymously test router +%reliability~\cite{mix-acc}. Unfortunately, this defense is not +%yet designed or implemented. + +To avoid excessive load on the directory authorities, +clients do not contact them directly except when +bootstrapping. Instead, most Tor servers act as ``directory +caches,'' and periodically fetch network consensus +documents; clients can contact a cache instead, once they +know who the caches are.
\section{Attacks and Defenses} \label{sec:attacks}