[tor-commits] [tor-design-2012/master] Various updates to the design paper

sjm217 at torproject.org sjm217 at torproject.org
Tue Nov 6 18:37:02 UTC 2012


commit f4b9a440ccb01490f0596c0857506d89378a9ea4
Author: Steven Murdoch <Steven.Murdoch at cl.cam.ac.uk>
Date:   Tue Nov 6 18:35:16 2012 +0000

    Various updates to the design paper
    
    - Stream isolation
    - Mention about Torbutton
    - Node hibernation
    - EWMA
    - DHT and authorization for hidden services
    - Path length limitations
    - Abuse management (DNSBL and ExoneraTor)
---
 tor-design-2012.tex |  173 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 120 insertions(+), 53 deletions(-)

diff --git a/tor-design-2012.tex b/tor-design-2012.tex
index 66b6ad4..e00d963 100644
--- a/tor-design-2012.tex
+++ b/tor-design-2012.tex
@@ -756,12 +756,21 @@ open many TCP streams.
 In Tor, each circuit can be shared by many TCP streams.  To
 avoid delays, users construct circuits preemptively.
 % Clarify: OPs construct circuits preemptively, not users. -NM
-To limit
-linkability among their streams, users' OPs build a new circuit
-periodically if the previous ones have been used, and expire old
-used circuits that no longer have any open streams.
+To limit linkability among their streams, the user's OP will not
+assign a new stream to a circuit if the circuit has previously
+carried a stream which the user has indicated should be separate
+from the new one.  By default, a user signals that two streams
+should not be linkable by making SOCKS connections to different
+ports, from a different IP address, or with different SOCKS
+authentication credentials.  Even when a stream would otherwise
+be permitted to be carried by a circuit, if the circuit's first
+stream was created more than 10 minutes (by default) ago, that
+circuit will not be considered for re-use and closed once there
+are no remaining streams, then the OP will build a new circuit
+preemptively.
 % Also mention that there are mechanisms that applications can use
 % to signal that streams shouldn't be sent over the same circuit. -NM
+% Believed done -SJM
 OPs
 consider rotating to a new circuit once a minute: thus even
 heavy users spend negligible time building circuits, but a
@@ -939,24 +948,40 @@ resolve it into an IP address first and then pass the IP address
 to the Tor client. If the application does DNS resolution first,
 Alice thereby reveals her destination to the remote DNS server,
 rather than sending the hostname through the Tor network to be
-resolved at the far end. Common applications like Mozilla and
-SSH have this flaw.
+resolved at the far end. Common applications like Firefox and
+SSH need to be configured to use SOCKS4A or SOCKS5 (with the
+option to send hostnames rather than IP address) to avoid this
+vulnerability.
 % No longer true wrt mozilla and SSH. -NM
+% Believed done -SJM
 
-With Mozilla, the flaw is easy to address: the filtering HTTP
-proxy called Privoxy gives a hostname to the Tor client, so
-Alice's computer never does DNS resolution.  But a portable
-general solution, such as is needed for SSH, is an open
-problem. Modifying or replacing the local nameserver can be
-invasive, brittle, and unportable. Forcing the resolver library
-to prefer TCP rather than UDP is hard, and also has portability
-problems. Dynamically intercepting system calls to the resolver
-library seems a promising direction. We could also provide a
-tool similar to \emph{dig} to perform a private lookup through
-the Tor network. Currently, we encourage the use of
-privacy-aware proxies like Privoxy wherever possible.
+With Firefox, the Torbutton add-on ensures that the browser
+sends requests via Tor by configuring Firefox to correctly use a
+SOCKS proxy. However, this is not in itself sufficient to
+provide private web browsing, because the browser provides many
+ways for a malicious site to link separate accesses to being
+from the from the same user. Therefore Torbutton and patches
+applied to the version of Firefox delivered with Tor also
+restrict tracking capabilities -- both intended ones (such as
+cookies, and more modern variants like DOM storage) and
+unintended channels (such as TLS session IDs), normalizes
+browser characteristics accessible from Javascript (such as
+screen size and system colors), and blocks plugins which may
+leak identifying information. Previously the Privoxy filtering
+proxy was used for this purpose, but its major weakness is that
+it is unable to protect the user from being tracked over HTTPS
+because a proxy sees only encrypted content.
+
+Other applications, which can be configured to use SOCKS (and
+send the proxy a hostname rather than IP address), may be
+connected directly to Tor. Other options available are to
+intercept calls to the resolver and sockets libraries with
+\emph{torsocks} or to use a firewall (Tor supports BSD and
+Linux's in-built firewall) to intercept outgoing DNS requests
+and TCP connections, sending them via Tor.
 % We don't recommend privoxy any more.  Proxies are one solution though.  But
 % for web, we think you need a custom browser. -NM
+% Believed done -SJM
 
 Closing a Tor stream is analogous to closing a TCP stream: it
 uses a two-step handshake for normal operation, or a one-step
@@ -1042,6 +1067,11 @@ their bandwidth usage. To accommodate them, Tor servers use a
 token bucket approach~\cite{tannenbaum96} to
 enforce a long-term average rate of incoming bytes, while still
 permitting short-term bursts above the allowed bandwidth.
+To accommodate volunteers who are charged when their daily,
+weekly, or monthly bandwidth usage exceeds a limit, Tor servers
+can be configured to ``hibernate'', closing existing connections
+and refusing new ones, once the limit has been reached. The
+server will become usable again at the start of the next period.
 
 Because the Tor protocol outputs about the same number of bytes
 as it takes in, it is sufficient in practice to limit only
@@ -1052,19 +1082,20 @@ because the local application may be awaiting a reply.)
 Therefore, we treat this case as if the entire cell size had
 been read, regardless of the cell's fullness.
 
-Further, inspired by Rennhard et al's design in~\cite{anonnet},
-a circuit's edges can heuristically distinguish interactive
-streams from bulk streams by comparing the frequency with which
-they supply cells.  We can provide good latency for interactive
-streams by giving them preferential service, while still giving
-good overall throughput to the bulk streams. Such preferential
-treatment presents a possible end-to-end attack, but an
-adversary observing both ends of the stream can already learn
-this information through timing attacks.
-% I don't think we do anything like what we had in mind when we wrote the
-% above paragraph. -NM
+To provide good latency for interactive service, Tor chooses
+which cells to deliver favouring circuits that had been quiet
+recently. Specifically, when Tor is about to put a cell on an
+outgoing connection it chooses the circuit which has sent the
+lowest total exponentially-decaying number of cells so far.
+Currently each cell has a 30-second half-life.  Such
+preferential treatment presents a possible end-to-end attack,
+but an adversary observing both ends of the stream can already
+learn this information through timing attacks.
+% I don't think we do anything like what we had in mind when we
+% wrote the above paragraph. -NM
 
 % We should mention EWMA in this section. -NM
+% Believed done -SJM
 
 \subsection{Congestion control}
 \label{subsec:congestion}
@@ -1153,15 +1184,21 @@ not require them to modify their applications.
 
 We provide location-hiding for Bob by allowing him to advertise
 several onion routers (his \emph{introduction points}) as
-contact points. He may do this on any robust efficient key-value
-lookup system with authenticated updates, such as a distributed
-hash table (DHT) like CFS~\cite{cfs:sosp01}.\footnote{ Rather
-  than rely on an external infrastructure, the Onion Routing
-  network can run the lookup service itself.  Our current
-  implementation provides a simple lookup system on the
-  directory servers.}
+contact points, in a distributed hash table (DHT). He does this
+by publishing the hidden service descriptor (containing
+introduction point's addresses) to the ORs whose identity keys
+are closest to a hash of the location-hidden service's identity
+key, the current date, and a replica number. Optionally, the
+hidden service descriptor can be encrypted under a key shared
+with authorized users of the hidden service. Therefore not only
+can unauthorized users not connect to the hidden service or its
+introduction points (the descriptor contains an authentication
+credential), they also cannot discover whether the hidden
+service is online.
 % We eventually went and built a distributed directory in Tor to deal with
 % this.  -NM
+% Believed done -SJM
+
 Alice, the client, chooses an OR as her
 \emph{rendezvous point}. She connects to one of Bob's
 introduction points, informs him of her rendezvous point, and
@@ -1175,6 +1212,7 @@ also allows Bob to respond to some requests and ignore others.
 
 % Mention that the list of introduction points can now be encrypted. -NM
 % Mention user authentication? -NM
+% Believed done -SJM
 
 \subsection{Rendezvous points in Tor}
 
@@ -1344,6 +1382,31 @@ other users when they build new circuits.
 
 % What about link-to-link rate limiting?
 
+The Tor network itself can be exploited as a DoS amplifier,
+because for every relay cell sent into an OP, a cell is
+generated at each hop on the circuit. An adversary could create
+a long path, potentially going through the same node many times,
+and overload CPU or network resources with only a small
+investment of both. To resist this attack, the length of a path
+is limited to 8, enforced by the distinction between
+\emph{relay} and \emph{relay\_early} cells. Incoming
+\emph{relay\_early} cells may contain any type of relay cell but
+if they are not destined for the OR which receives them, result
+in a further \emph{relay\_early} cell being generated.
+Only 8 \emph{Relay\_early} cells are  permitted to be sent on a
+circuit.  Similarly \emph{relay} cells result in a \emph{relay}
+cell being created, and may be sent without limit, but
+\emph{relay} cells cannot contain an extend request. In this
+way, intermediate ORs cannot know how long the path length is
+(they always see up to 8 \emph{relay\_early} cells, and don't
+know what they contain) but an OP cannot send more than 8 extend
+requests and so cannot generate a path of longer than 8 hops.
+This does not however prevent an adversary tunneling Tor over
+Tor, and connecting from an exit node back to the Tor network.
+
+% Mention long-path defense -NM
+% Believed done -SJM
+
 Adversaries can also attack the Tor network's hosts and network
 links. Disrupting a single circuit or link breaks all streams
 passing along that part of the circuit. Users similarly lose
@@ -1357,8 +1420,6 @@ require more buffering at the network edges, however, and the
 performance and anonymity implications from this extra
 complexity still require investigation.
 
-% Mention long-path defense -NM
-
 % Mention asymmetries in protocols, where a little effort from an attacker
 % can make Tor do more calculation.  That's bad. -NM
 
@@ -1412,22 +1473,28 @@ limited set of services, such as HTTP, SSH, or AIM.  This is not
 a complete solution, of course, since abuse opportunities for
 these protocols are still well known.
 
-We have not yet encountered any abuse in the deployed network,
-but if we do we should consider using proxies to clean traffic
-for certain protocols as it leaves the network.  For example,
-much abusive HTTP behavior (such as exploiting buffer overflows
-or well-known script vulnerabilities) can be detected in a
-straightforward manner.  Similarly, one could run automatic spam
-filtering software (such as SpamAssassin) on email exiting the
-OR network.
+%We have not yet encountered any abuse in the deployed network,
+%but if we do we should consider using proxies to clean traffic
+%for certain protocols as it leaves the network.  For example,
+%much abusive HTTP behavior (such as exploiting buffer overflows
+%or well-known script vulnerabilities) can be detected in a
+%straightforward manner.  Similarly, one could run automatic spam
+%filtering software (such as SpamAssassin) on email exiting the
+%OR network.
 % Above paragraph is no longer true at all even a little -NM
-
-ORs may also rewrite exiting traffic to append headers or other
-information indicating that the traffic has passed through an
-anonymity service.  This approach is commonly used by email-only
-anonymity systems.  ORs can also run on servers with hostnames
-like {\tt anonymous} to further alert abuse targets to the
-nature of the anonymous traffic.
+% Removed for now -SJM
+
+To manage the abuse potential, The Tor Project operates a DNS
+blacklist system, allowing service operators to easily identify
+whether a particular incoming connection may have arrived over
+the Tor network. The service may then choose to block the
+connection, subject it to extra scrutiny, or restrict the rights
+of the user who is connecting via Tor (such as giving read-only,
+rather than read-write, access to a wiki). A similar service
+also allows retrospective queries over the list of exit nodes to
+allow exit-node operators to show that their computer was an
+exit node at the time an abusive connection was made, and
+therefore that they should not be liable for any harm cause.
 
 A mixture of open and restricted exit nodes allows the most
 flexibility for volunteers running servers. But while having



More information about the tor-commits mailing list