[or-cvs] migrate stuff from section 4 to 5 and vice versa

Roger Dingledine arma at seul.org
Tue Feb 8 07:54:30 UTC 2005


Update of /home2/or/cvsroot/tor/doc/design-paper
In directory moria.mit.edu:/home2/arma/work/onion/cvs/tor/doc/design-paper

Modified Files:
	challenges.tex 
Log Message:
migrate stuff from section 4 to 5 and vice versa


Index: challenges.tex
===================================================================
RCS file: /home2/or/cvsroot/tor/doc/design-paper/challenges.tex,v
retrieving revision 1.52
retrieving revision 1.53
diff -u -d -r1.52 -r1.53
--- challenges.tex	8 Feb 2005 07:37:30 -0000	1.52
+++ challenges.tex	8 Feb 2005 07:54:28 -0000	1.53
@@ -423,7 +423,7 @@
 % this para should probably move to the scalability / directory system. -RD
 % Nope. Cut for space, except for small comment added above -PFS
 
-\section{Policy issues}
+\section{Social challenges}
 
 Many of the issues the Tor project needs to address extend beyond
 system design and technology development. In particular, the
@@ -498,7 +498,7 @@
 
 On the other hand, while the number of active concurrent users may not
 matter as much as we'd like, it still helps to have some other users
-who use the network. We investigate this issue in the next section.
+on the network. We investigate this issue next.
 
 \subsection{Reputability and perceived social value}
 Another factor impacting the network's security is its reputability:
@@ -803,8 +803,8 @@
 
 \section{Design choices}
 
-In addition to social issues, Tor also faces some design challenges that must
-be addressed as the network develops.
+In addition to social issues, Tor also faces some design tradeoffs that must
+be investigated as the network develops.
 
 \subsection{Transporting the stream vs transporting the packets}
 \label{subsec:stream-vs-packet}
@@ -915,54 +915,6 @@
 mid-latency as they are constructed, we could handle both types of traffic
 on the same network, giving users a choice between speed and security.
 
-\subsection{Measuring performance and capacity}
-\label{subsec:performance}
-
-One of the paradoxes with engineering an anonymity network is that we'd like
-to learn as much as we can about how traffic flows so we can improve the
-network, but we want to prevent others from learning how traffic flows in
-order to trace users' connections through the network.  Furthermore, many
-mechanisms that help Tor run efficiently
-require measurements about the network.
-
-Currently, nodes try to deduce their own available bandwidth (based on how
-much traffic they have been able to transfer recently) and include this
-information in the descriptors they upload to the directory. Clients
-choose servers weighted by their bandwidth, neglecting really slow
-servers and capping the influence of really fast ones.
-
-This is, of course, eminently cheatable.  A malicious node can get a
-disproportionate amount of traffic simply by claiming to have more bandwidth
-than it does.  But better mechanisms have their problems.  If bandwidth data
-is to be measured rather than self-reported, it is usually possible for
-nodes to selectively provide better service for the measuring party, or
-sabotage the measured value of other nodes.  Complex solutions for
-mix networks have been proposed, but do not address the issues
-completely~\cite{mix-acc,casc-rep}.
-
-Even with no cheating, network measurement is complex.  It is common
-for views of a node's latency and/or bandwidth to vary wildly between
-observers.  Further, it is unclear whether total bandwidth is really
-the right measure; perhaps clients should instead be considering nodes
-based on unused bandwidth or observed throughput.
-% XXXX say more here?
-
-%How to measure performance without letting people selectively deny service
-%by distinguishing pings. Heck, just how to measure performance at all. In
-%practice people have funny firewalls that don't match up to their exit
-%policies and Tor doesn't deal.
-
-%Network investigation: Is all this bandwidth publishing thing a good idea?
-%How can we collect stats better? Note weasel's smokeping, at
-%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
-%which probably gives george and steven enough info to break tor?
-
-Even if we can collect and use this network information effectively, we need
-to make sure that it is not more useful to attackers than to us.  While it
-seems plausible that bandwidth data alone is not enough to reveal
-sender-recipient connections under most circumstances, it could certainly
-reveal the path taken by large traffic flows under low-usage circumstances.
-
 \subsection{Running a Tor node, path length, and helper nodes}
 \label{subsec:helper-nodes}
 
@@ -1111,79 +1063,119 @@
 a way for their users, using unmodified software, to get end-to-end
 encryption and end-to-end authentication to their website.
 
-\subsection{Trust and discovery}
-\label{subsec:trust-and-discovery}
+\subsection{Location diversity and ISP-class adversaries}
+\label{subsec:routing-zones}
 
-The published Tor design adopted a deliberately simplistic design for
-authorizing new nodes and informing clients about Tor nodes and their status.
-In the early Tor designs, all nodes periodically uploaded a signed description
-of their locations, keys, and capabilities to each of several well-known {\it
-  directory servers}.  These directory servers constructed a signed summary
-of all known Tor nodes (a ``directory''), and a signed statement of which
-nodes they
-believed to be operational at any given time (a ``network status'').  Clients
-periodically downloaded a directory in order to learn the latest nodes and
-keys, and more frequently downloaded a network status to learn which nodes are
-likely to be running.  Tor nodes also operate as directory caches, in order to
-lighten the bandwidth on the authoritative directory servers.
+Anonymity networks have long relied on diversity of node location for
+protection against attacks---typically an adversary who can observe a
+larger fraction of the network can launch a more effective attack. One
+way to achieve dispersal involves growing the network so a given adversary
+sees less. Alternately, we can arrange the topology so traffic can enter
+or exit at many places (for example, by using a free-route network
+like Tor rather than a cascade network like JAP). Lastly, we can use
+distributed trust to spread each transaction over multiple jurisdictions.
+But how do we decide whether two nodes are in related locations?
 
-In order to prevent Sybil attacks (wherein an adversary signs up many
-purportedly independent nodes in order to increase her chances of observing
-a stream as it enters and leaves the network), the early Tor directory design
-required the operators of the authoritative directory servers to manually
-approve new nodes.  Unapproved nodes were included in the directory,
-but clients
-did not use them at the start or end of their circuits.  In practice,
-directory administrators performed little actual verification, and tended to
-approve any Tor node whose operator could compose a coherent email.
-This procedure
-may have prevented trivial automated Sybil attacks, but would do little
-against a clever attacker.
+Feamster and Dingledine defined a \emph{location diversity} metric
+in \cite{feamster:wpes2004}, and began investigating a variant of location
+diversity based on the fact that the Internet is divided into thousands of
+independently operated networks called {\em autonomous systems} (ASes).
+The key insight from their paper is that while we typically think of a
+connection as going directly from the Tor client to her first Tor node,
+actually it traverses many different ASes on each hop. An adversary at
+any of these ASes can monitor or influence traffic. Specifically, given
+plausible initiators and recipients and path random path selection,
+some ASes in the simulation were able to observe 10\% to 30\% of the
+transactions (that is, learn both the origin and the destination) on
+the deployed Tor network (33 nodes as of June 2004).
 
-There are a number of flaws in this system that need to be addressed as we
-move forward.  They include:
-\begin{tightlist}
-\item Each directory server represents an independent point of failure; if
-  any one were compromised, it could immediately compromise all of its users
-  by recommending only compromised nodes.
-\item The more nodes join the network, the more unreasonable it
-  becomes to expect clients to know about them all.  Directories
-  become infeasibly large, and downloading the list of nodes becomes
-  burdensome.
-\item The validation scheme may do as much harm as it does good.  It is not
-  only incapable of preventing clever attackers from mounting Sybil attacks,
-  but may deter node operators from joining the network.  (For instance, if
-  they expect the validation process to be difficult, or if they do not share
-  any languages in common with the directory server operators.)
-\end{tightlist}
+The paper concludes that for best protection against the AS-level
+adversary, nodes should be in ASes that have the most links to other ASes:
+Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
+is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming
+initiator and responder are both in the U.S., it actually \emph{hurts}
+our location diversity to add far-flung nodes in continents like Asia
+or South America.
 
-We could try to move the system in several directions, depending on our
-choice of threat model and requirements.  If we did not need to increase
-network capacity in order to support more users, we could simply
- adopt even stricter validation requirements, and reduce the number of
-nodes in the network to a trusted minimum.  
-But, we can only do that if can simultaneously make node capacity
-scale much more than we anticipate feasible soon, and if we can find
-entities willing to run such nodes, an equally daunting prospect.
+Many open questions remain. First, it will be an immense engineering
+challenge to get an entire BGP routing table to each Tor client, or to
+summarize it sufficiently. Without a local copy, clients won't be
+able to safely predict what ASes will be traversed on the various paths
+through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02}
+and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to
+determine location diversity; but the above paper showed that in practice
+many of the Mixmaster nodes that share a single AS have entirely different
+IP prefixes. When the network has scaled to thousands of nodes, does IP
+prefix comparison become a more useful approximation?
+%
+Second, we can take advantage of caching certain content at the
+exit nodes, to limit the number of requests that need to leave the
+network at all. What about taking advantage of caches like Akamai or
+Google~\cite{shsm03}? (Note that they're also well-positioned as global
+adversaries.)
+%
+Third, if we follow the paper's recommendations and tailor path selection
+to avoid choosing endpoints in similar locations, how much are we hurting
+anonymity against larger real-world adversaries who can take advantage
+of knowing our algorithm?
+%
+Lastly, can we use this knowledge to figure out which gaps in our network
+would most improve our robustness to this class of attack, and go recruit
+new nodes with those ASes in mind?
 
+%Tor's security relies in large part on the dispersal properties of its
+%network. We need to be more aware of the anonymity properties of various
+%approaches so we can make better design decisions in the future.
 
-In order to address the first two issues, it seems wise to move to a system
-including a number of semi-trusted directory servers, no one of which can
-compromise a user on its own.  Ultimately, of course, we cannot escape the
-problem of a first introducer: since most users will run Tor in whatever
-configuration the software ships with, the Tor distribution itself will
-remain a potential single point of failure so long as it includes the seed
-keys for directory servers, a list of directory servers, or any other means
-to learn which nodes are on the network.  But omitting this information
-from the Tor distribution would only delegate the trust problem to the
-individual users, most of whom are presumably less informed about how to make
-trust decisions than the Tor developers.
+\subsection{The China problem}
+\label{subsec:china}
 
-%Network discovery, sybil, node admission, scaling. It seems that the code
-%will ship with something and that's our trust root. We could try to get
-%people to build a web of trust, but no. Where we go from here depends
-%on what threats we have in mind. Really decentralized if your threat is
-%RIAA; less so if threat is to application data or individuals or...
+Citizens in a variety of countries, such as most recently China and
+Iran, are periodically blocked from accessing various sites outside
+their country. These users try to find any tools available to allow
+them to get-around these firewalls. Some anonymity networks, such as
+Six-Four~\cite{six-four}, are designed specifically with this goal in
+mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors
+such as Voice of America to set up a network to encourage Internet
+freedom. Even though Tor wasn't
+designed with ubiquitous access to the network in mind, thousands of
+users across the world are trying to use it for exactly this purpose.
+% Academic and NGO organizations, peacefire, \cite{berkman}, etc
+
+Anti-censorship networks hoping to bridge country-level blocks face
+a variety of challenges. One of these is that they need to find enough
+exit nodes---servers on the `free' side that are willing to relay
+arbitrary traffic from users to their final destinations. Anonymizing
+networks including Tor are well-suited to this task, since we have
+already gathered a set of exit nodes that are willing to tolerate some
+political heat.
+
+The other main challenge is to distribute a list of reachable relays
+to the users inside the country, and give them software to use them,
+without letting the authorities also enumerate this list and block each
+relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
+addresses (or having them donated), abandoning old addresses as they are
+`used up', and telling a few users about the new ones. Distributed
+anonymizing networks again have an advantage here, in that we already
+have tens of thousands of separate IP addresses whose users might
+volunteer to provide this service since they've already installed and use
+the software for their own privacy~\cite{koepsell:wpes2004}. Because
+the Tor protocol separates routing from network discovery \cite{tor-design},
+volunteers could configure their Tor clients
+to generate node descriptors and send them to a special directory
+server that gives them out to dissidents who need to get around blocks.
+
+Of course, this still doesn't prevent the adversary
+from enumerating all the volunteer relays and blocking them preemptively.
+Perhaps a tiered-trust system could be built where a few individuals are
+given relays' locations, and they recommend other individuals by telling them
+those addresses, thus providing a built-in incentive to avoid letting the
+adversary intercept them. Max-flow trust algorithms~\cite{advogato}
+might help to bound the number of IP addresses leaked to the adversary. Groups
+like the W3C are looking into using Tor as a component in an overall system to
+help address censorship; we wish them luck.
+
+%\cite{infranet}
 
 \section{Scaling}
 \label{sec:scaling}
@@ -1282,119 +1274,127 @@
 %efficiency over baseline, and also to determine how far we are from
 %optimal efficiency (what we could get if we ignored the anonymity goals).
 
-\subsection{Location diversity and ISP-class adversaries}
-\label{subsec:routing-zones}
+\subsection{Trust and discovery}
+\label{subsec:trust-and-discovery}
 
-Anonymity networks have long relied on diversity of node location for
-protection against attacks---typically an adversary who can observe a
-larger fraction of the network can launch a more effective attack. One
-way to achieve dispersal involves growing the network so a given adversary
-sees less. Alternately, we can arrange the topology so traffic can enter
-or exit at many places (for example, by using a free-route network
-like Tor rather than a cascade network like JAP). Lastly, we can use
-distributed trust to spread each transaction over multiple jurisdictions.
-But how do we decide whether two nodes are in related locations?
+The published Tor design adopted a deliberately simplistic design for
+authorizing new nodes and informing clients about Tor nodes and their status.
+In the early Tor designs, all nodes periodically uploaded a signed description
+of their locations, keys, and capabilities to each of several well-known {\it
+  directory servers}.  These directory servers constructed a signed summary
+of all known Tor nodes (a ``directory''), and a signed statement of which
+nodes they
+believed to be operational at any given time (a ``network status'').  Clients
+periodically downloaded a directory in order to learn the latest nodes and
+keys, and more frequently downloaded a network status to learn which nodes are
+likely to be running.  Tor nodes also operate as directory caches, in order to
+lighten the bandwidth on the authoritative directory servers.
 
-Feamster and Dingledine defined a \emph{location diversity} metric
-in \cite{feamster:wpes2004}, and began investigating a variant of location
-diversity based on the fact that the Internet is divided into thousands of
-independently operated networks called {\em autonomous systems} (ASes).
-The key insight from their paper is that while we typically think of a
-connection as going directly from the Tor client to her first Tor node,
-actually it traverses many different ASes on each hop. An adversary at
-any of these ASes can monitor or influence traffic. Specifically, given
-plausible initiators and recipients and path random path selection,
-some ASes in the simulation were able to observe 10\% to 30\% of the
-transactions (that is, learn both the origin and the destination) on
-the deployed Tor network (33 nodes as of June 2004).
+In order to prevent Sybil attacks (wherein an adversary signs up many
+purportedly independent nodes in order to increase her chances of observing
+a stream as it enters and leaves the network), the early Tor directory design
+required the operators of the authoritative directory servers to manually
+approve new nodes.  Unapproved nodes were included in the directory,
+but clients
+did not use them at the start or end of their circuits.  In practice,
+directory administrators performed little actual verification, and tended to
+approve any Tor node whose operator could compose a coherent email.
+This procedure
+may have prevented trivial automated Sybil attacks, but would do little
+against a clever attacker.
 
-The paper concludes that for best protection against the AS-level
-adversary, nodes should be in ASes that have the most links to other ASes:
-Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction
-is safest when it starts or ends in a Tier-1 ISP. Therefore, assuming
-initiator and responder are both in the U.S., it actually \emph{hurts}
-our location diversity to add far-flung nodes in continents like Asia
-or South America.
+There are a number of flaws in this system that need to be addressed as we
+move forward.  They include:
+\begin{tightlist}
+\item Each directory server represents an independent point of failure; if
+  any one were compromised, it could immediately compromise all of its users
+  by recommending only compromised nodes.
+\item The more nodes join the network, the more unreasonable it
+  becomes to expect clients to know about them all.  Directories
+  become infeasibly large, and downloading the list of nodes becomes
+  burdensome.
+\item The validation scheme may do as much harm as it does good.  It is not
+  only incapable of preventing clever attackers from mounting Sybil attacks,
+  but may deter node operators from joining the network.  (For instance, if
+  they expect the validation process to be difficult, or if they do not share
+  any languages in common with the directory server operators.)
+\end{tightlist}
 
-Many open questions remain. First, it will be an immense engineering
-challenge to get an entire BGP routing table to each Tor client, or to
-summarize it sufficiently. Without a local copy, clients won't be
-able to safely predict what ASes will be traversed on the various paths
-through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02}
-and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to
-determine location diversity; but the above paper showed that in practice
-many of the Mixmaster nodes that share a single AS have entirely different
-IP prefixes. When the network has scaled to thousands of nodes, does IP
-prefix comparison become a more useful approximation?
-%
-Second, we can take advantage of caching certain content at the
-exit nodes, to limit the number of requests that need to leave the
-network at all. What about taking advantage of caches like Akamai or
-Google~\cite{shsm03}? (Note that they're also well-positioned as global
-adversaries.)
-%
-Third, if we follow the paper's recommendations and tailor path selection
-to avoid choosing endpoints in similar locations, how much are we hurting
-anonymity against larger real-world adversaries who can take advantage
-of knowing our algorithm?
-%
-Lastly, can we use this knowledge to figure out which gaps in our network
-would most improve our robustness to this class of attack, and go recruit
-new nodes with those ASes in mind?
+We could try to move the system in several directions, depending on our
+choice of threat model and requirements.  If we did not need to increase
+network capacity in order to support more users, we could simply
+ adopt even stricter validation requirements, and reduce the number of
+nodes in the network to a trusted minimum.  
+But, we can only do that if can simultaneously make node capacity
+scale much more than we anticipate feasible soon, and if we can find
+entities willing to run such nodes, an equally daunting prospect.
 
-%Tor's security relies in large part on the dispersal properties of its
-%network. We need to be more aware of the anonymity properties of various
-%approaches so we can make better design decisions in the future.
 
-\subsection{The China problem}
-\label{subsec:china}
+In order to address the first two issues, it seems wise to move to a system
+including a number of semi-trusted directory servers, no one of which can
+compromise a user on its own.  Ultimately, of course, we cannot escape the
+problem of a first introducer: since most users will run Tor in whatever
+configuration the software ships with, the Tor distribution itself will
+remain a potential single point of failure so long as it includes the seed
+keys for directory servers, a list of directory servers, or any other means
+to learn which nodes are on the network.  But omitting this information
+from the Tor distribution would only delegate the trust problem to the
+individual users, most of whom are presumably less informed about how to make
+trust decisions than the Tor developers.
 
-Citizens in a variety of countries, such as most recently China and
-Iran, are periodically blocked from accessing various sites outside
-their country. These users try to find any tools available to allow
-them to get-around these firewalls. Some anonymity networks, such as
-Six-Four~\cite{six-four}, are designed specifically with this goal in
-mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors
-such as Voice of America to set up a network to encourage Internet
-freedom. Even though Tor wasn't
-designed with ubiquitous access to the network in mind, thousands of
-users across the world are trying to use it for exactly this purpose.
-% Academic and NGO organizations, peacefire, \cite{berkman}, etc
+%Network discovery, sybil, node admission, scaling. It seems that the code
+%will ship with something and that's our trust root. We could try to get
+%people to build a web of trust, but no. Where we go from here depends
+%on what threats we have in mind. Really decentralized if your threat is
+%RIAA; less so if threat is to application data or individuals or...
 
-Anti-censorship networks hoping to bridge country-level blocks face
-a variety of challenges. One of these is that they need to find enough
-exit nodes---servers on the `free' side that are willing to relay
-arbitrary traffic from users to their final destinations. Anonymizing
-networks including Tor are well-suited to this task, since we have
-already gathered a set of exit nodes that are willing to tolerate some
-political heat.
+\subsection{Measuring performance and capacity}
+\label{subsec:performance}
 
-The other main challenge is to distribute a list of reachable relays
-to the users inside the country, and give them software to use them,
-without letting the authorities also enumerate this list and block each
-relay. Anonymizer solves this by buying lots of seemingly-unrelated IP
-addresses (or having them donated), abandoning old addresses as they are
-`used up', and telling a few users about the new ones. Distributed
-anonymizing networks again have an advantage here, in that we already
-have tens of thousands of separate IP addresses whose users might
-volunteer to provide this service since they've already installed and use
-the software for their own privacy~\cite{koepsell:wpes2004}. Because
-the Tor protocol separates routing from network discovery \cite{tor-design},
-volunteers could configure their Tor clients
-to generate node descriptors and send them to a special directory
-server that gives them out to dissidents who need to get around blocks.
+One of the paradoxes with engineering an anonymity network is that we'd like
+to learn as much as we can about how traffic flows so we can improve the
+network, but we want to prevent others from learning how traffic flows in
+order to trace users' connections through the network.  Furthermore, many
+mechanisms that help Tor run efficiently
+require measurements about the network.
 
-Of course, this still doesn't prevent the adversary
-from enumerating all the volunteer relays and blocking them preemptively.
-Perhaps a tiered-trust system could be built where a few individuals are
-given relays' locations, and they recommend other individuals by telling them
-those addresses, thus providing a built-in incentive to avoid letting the
-adversary intercept them. Max-flow trust algorithms~\cite{advogato}
-might help to bound the number of IP addresses leaked to the adversary. Groups
-like the W3C are looking into using Tor as a component in an overall system to
-help address censorship; we wish them luck.
+Currently, nodes try to deduce their own available bandwidth (based on how
+much traffic they have been able to transfer recently) and include this
+information in the descriptors they upload to the directory. Clients
+choose servers weighted by their bandwidth, neglecting really slow
+servers and capping the influence of really fast ones.
 
-%\cite{infranet}
+This is, of course, eminently cheatable.  A malicious node can get a
+disproportionate amount of traffic simply by claiming to have more bandwidth
+than it does.  But better mechanisms have their problems.  If bandwidth data
+is to be measured rather than self-reported, it is usually possible for
+nodes to selectively provide better service for the measuring party, or
+sabotage the measured value of other nodes.  Complex solutions for
+mix networks have been proposed, but do not address the issues
+completely~\cite{mix-acc,casc-rep}.
+
+Even with no cheating, network measurement is complex.  It is common
+for views of a node's latency and/or bandwidth to vary wildly between
+observers.  Further, it is unclear whether total bandwidth is really
+the right measure; perhaps clients should instead be considering nodes
+based on unused bandwidth or observed throughput.
+% XXXX say more here?
+
+%How to measure performance without letting people selectively deny service
+%by distinguishing pings. Heck, just how to measure performance at all. In
+%practice people have funny firewalls that don't match up to their exit
+%policies and Tor doesn't deal.
+
+%Network investigation: Is all this bandwidth publishing thing a good idea?
+%How can we collect stats better? Note weasel's smokeping, at
+%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
+%which probably gives george and steven enough info to break tor?
+
+Even if we can collect and use this network information effectively, we need
+to make sure that it is not more useful to attackers than to us.  While it
+seems plausible that bandwidth data alone is not enough to reveal
+sender-recipient connections under most circumstances, it could certainly
+reveal the path taken by large traffic flows under low-usage circumstances.
 
 \subsection{Non-clique topologies}
 
@@ -1493,7 +1493,7 @@
 authentication mechanisms. We can't just keep escalating the blacklist
 standoff forever.
 %
-Fourth, as described in Section~\ref{sec:scaling}, the current Tor
+Fourth, the current Tor
 architecture does not scale even to handle current user demand. We must
 find designs and incentives to let clients relay traffic too, without
 sacrificing too much anonymity.



More information about the tor-commits mailing list