[or-cvs] r15283: Minor cleanups, reworked discussion section. (projects/hidserv/trunk/doc)

kloesing at seul.org kloesing at seul.org
Sun Jun 15 20:19:11 UTC 2008


Author: kloesing
Date: 2008-06-15 16:19:11 -0400 (Sun, 15 Jun 2008)
New Revision: 15283

Modified:
   projects/hidserv/trunk/doc/report.pdf
   projects/hidserv/trunk/doc/report.tex
Log:
Minor cleanups, reworked discussion section.

Modified: projects/hidserv/trunk/doc/report.pdf
===================================================================
(Binary files differ)

Modified: projects/hidserv/trunk/doc/report.tex
===================================================================
--- projects/hidserv/trunk/doc/report.tex	2008-06-15 19:41:31 UTC (rev 15282)
+++ projects/hidserv/trunk/doc/report.tex	2008-06-15 20:19:11 UTC (rev 15283)
@@ -79,9 +79,7 @@
 %
 \begin{figure}
 \centering
-\includegraphics[width=0.8\textwidth]{hs_overview.png}\\
-\emph{DONE Christian: The cell names are hardly readable on screen. Can
-you try making them bold or use the font that you used in Fetch RSD?}
+\includegraphics[width=0.8\textwidth]{hs_overview.png}
 \caption{Overview of hidden service establishment and access}
 \label{fig:hs_overview}
 \end{figure}
@@ -337,7 +335,7 @@
 them build circuits previously.
 
 When setting up the measurement environment, a bug considering the RendNodes
-configuration option is discovered. Using this configuration option the user
+configuration option was discovered. Using this configuration option the user
 can suggest specific Tor relays to be selected as rendezvous node by providing
 a list of nicknames or identifiers. Since the rendezvous point is usually 
 established using cannibalization, there must be an existing circuit available 
@@ -349,27 +347,9 @@
 while the former scenario reduces performance, because we have to wait until 
 the new circuit is built.
 
-\emph{DONE Christian: Mention bug here that selecting a rendezvous by means
-of configuring it in torrc fails. This bug was probably introduced with the
-config option for RendNodes, right?}
-\emph{TODO Karsten: When was this?}
+\emph{TODO Karsten: This bug was probably introduced with the
+config option for RendNodes, right? When was this?}
 
-\emph{DONE Christian: You started the Tor process configured to provide the
-hidden service only \emph{minutes} after the Tor relays? Then you didn't
-use the V3 directory, right? You should mention this fact.}
-
-%% The following paragraphs describe the measured substeps in detail and which 
-%% log statements are used to determine the values. An overview of all steps 
-%% is shown in Figure \ref{fig:substeps}. Messages are indicated by solid lines,
-%% while circuits are represented by dashed lines.
-%%
-%% \begin{figure}
-%% \centering
-%% \includegraphics[width=0.8\textwidth]{substeps.pdf}
-%% \caption{Overview of connection establishment substeps}
-%% \label{fig:substeps}
-%% \end{figure}
-
 \paragraph{Rendezvous Descriptor Round-trip Time}
 
 The time for fetching the rendezvous service descriptor from a hidden
@@ -567,9 +547,6 @@
 log files are also online.\footnote{\url{http://freehaven.net/~karsten/hidserv/perfdata-2008-04-22.tar.bz2}}
 All roles used Tor version 0.2.0.7-alpha.
 
-%\emph{DONE Christian: Make raw data available, too. Which Tor version was
-%used (see first line in log files)?}
-
 \subsection{Service Publication}
 
 Figure~\ref{fig:publtime} shows the overall service publication times as a
@@ -801,15 +778,10 @@
 to third hop can be responsible for the failure. If no appropriate circuit is
 available, neither open nor still being built, a completely new circuit is opened,
 which also takes longer than extending an existing circuit. Since the 
-measurements were performed with Tor version \emph{0.2.0.7-alpha}, there is
+measurements were performed with Tor version 0.2.0.7-alpha, there is
 no connection to the bug discussed in the previous section, which was introduced
 in Tor 0.2.0.14-alpha.
 
-%\emph{DONE Christian: Can you confirm that these measurements were
-%performed with a Tor version that was \emph{not} affected by the bug
-%that was discussed in the previous subsection and that was introduced in
-%Tor 0.2.0.14-alpha? If so, mention this fact here.}
-
 The timeout of 60 seconds seems too long. It is the same timeout that applies
 for general circuit establishment, but in this case only an extension by one
 hop takes place. So for cannibalization a lower timeout values might improve
@@ -820,25 +792,6 @@
 after receiving the rendezvous descriptor. Considering intermediate substeps
 like receiving the INTRODUCE\_ACK cell for timeouts should also be investigated.
 
-
-%% \emph{TODO Christian: See Roger's comment that a timeout of one minute
-%% seems to be too long. Should the timeout be reduced here? And does this
-%% timeout apply only to the step of connecting to the introduction point or
-%% to the whole establishment process? In the latter case, would it make sense
-%% to have more than one timeout for the different phases of connection
-%% establishment, e.g.\ 20 seconds for connecting to the introduction point
-%% and 40 seconds for the rest of connection establishment? It might be
-%% another improvement to further investigate this and recommend new
-%% timeouts.
-%% (Would it further make sense to include receipt of INTRO\_ACK into a new
-%% timeout logic?)}
-
-%\emph{DONE Christian: The text implies that cannibalization takes place in
-%\emph{all} cases. Is this really the case?}
-
-%\emph{DONE Christian: You text implies that a circuit would be chosen for
-%extension if it is not open? Does this really happen?}
-
 \begin{figure}
 \centering
 \includegraphics[width=0.8\textwidth]{introcirc.png}
@@ -846,8 +799,6 @@
 Min. & 1st Qu. & Median & Mean & 3rd Qu. & Max. & StdDev\\\hline
 0.035 & 0.636 & 1.793 & 7.784 & 4.642 & 114.400 & 17.532
 \end{tabular}
-%\emph{DONE Christian: Can you add some summary values here as it is done
-%in the figures above?}
 \caption{Histogram of times until introduction circuit is open}
 \label{fig:introcirc}
 \end{figure}
@@ -867,9 +818,6 @@
 
 \emph{TODO Christian: What might that mean? That's weird. -RD}
 
-%\emph{DONE Christian: What kind of improvement would you suggest from these
-%findings?}
-
 \begin{figure}
 \centering
 \includegraphics[width=0.8\textwidth]{estrend_rendack.png}
@@ -878,8 +826,6 @@
 ESTABLISH\_RENDEZVOUS & 0.0090 & 0.1100 & 0.2670 & 0.8891 & 0.7775 & 56.1000 \\
 RENDEZVOUS\_ACK & 0.0040 & 0.0720 & 0.2040 & 0.7455 & 0.8993 & 32.7700
 \end{tabular}
-%\emph{DONE Christian: Can you add some summary values here as it is done
-%in the figures above?}
 \caption{ESTABLISH\_RENDEZVOUS and RENDEZVOUS\_ACK transfer times}
 \label{fig:estrend_rendack}
 \end{figure}
@@ -969,29 +915,164 @@
 
 \section{Discussion}
 
-Ideas what changes are most likely to improve the overall performance.
+The analysis of setting up and accessing a hidden service has revealed a
+couple of new insights for potential performance improvements that shall be
+summarized here. First, a few \emph{bugs} could be spotted of which some
+have already been fixed in the course of this analysis:
 
-\begin{itemize}
-\item Handle upload failures for each directory separately. (See Outliers)
-\item Restrict resetting 30 seconds delay to events that have an effect of
-the uploaded descriptor only.
-\item Prevent upload of empty descriptors if we can upload a non-empty one
-shortly after.
-\item Fix bug that completely ignores introduction points originating from
-cannibalized circuits.
-\item Think about 30-seconds delay when publishing descriptor.
-\item Cannibalize two introduction circuits simultaneously, possibly to two
-different introduction points.
-\item Pre-open more circuits for cannibalization to avoid waiting completeley
-new circuits.
-\item Discard rendezvous point when using {\O}verlier's simplified hidden
-service protocol.
-\item Not performance enhancing, but fix bug that RendNodes config option does
-not work in case of cannibalization.
-\item $\cdots$
-\end{itemize}
+\begin{description}
+\item[Premature Descriptor Upload] In very rare situations new hidden
+service descriptors were published earlier than 30 seconds after the last
+change to the service, although the current thinking is that a hidden
+service descriptor that's been stable for 30 seconds is worth publishing.
+This minor bug was in the code since Tor version 0.0.9pre6 released on
+November 15, 2004. This bug is fixed in Tor version 0.2.1.1-alpha released
+on June 13, 2008.
+%
+\item[Abandoning Valid Introduction Points] While setting up a hidden
+service, some valid introduction circuits were overlooked and abandoned.
+This might be the reason for the long delay in making a hidden service
+available. This major bug was introduced with Tor version 0.2.0.14-alpha
+released on December 23, 2007. It is now fixed and included in Tor
+0.2.1.1-alpha and 0.2.0.28-rc which were both released on June 13, 2008.
+%
+\item[Ignoring Cannibalized Introduction Points] When establishing a
+hidden service, introduction points that originate from cannibalized
+circuits are completely ignored and not included in rendezvous service
+descriptors. This might be another major reason for delay in making a
+hidden service available. This bug was introduced in Tor with version
+0.2.0.14-alpha released on December 23, 2007. It will probably be fixed in
+the upcoming Tor versions 0.2.0.29(-rc) and 0.2.1.2-alpha.
+%
+\item[Disregarding Predefined Set of Rendezvous Points]
+\emph{TODO Christian: Describe RendNodes bug in more detail here: Not
+performance enhancing, but fix bug that RendNodes config option does not
+work in case of cannibalization.}
+\end{description}
 
-\emph{TODO Karsten: When this list is reasonably populated, make two
-paragraphs out of it.}
+The analysis further revealed a couple of \emph{possible improvements} for
+the phases of making a hidden service available and for establishing a
+connection:
+
+\begin{description}
+\item[Descriptor Upload Failures] The current logic to upload rendezvous
+service descriptors does not handle failures in a reasonable way. In case
+of a failure, Tor waits for a solid hour before making the next attempt.
+There should either be a smaller timeout or an individual handling of
+failures per directory.
+%
+\item[Inaccuracies in Descriptor Upload Logic] The logic to decide whether
+a descriptor should be uploaded needs a reworking. Currently, unrelated
+events like giving up on an introduction point candidate resets an internal
+30-seconds timer. The result is an unexpected exceedance of the timer.
+Another weird effect is the uploading of descriptors without any
+introduction points contained in them.
+%
+\item[Descriptor Upload Timing] The choice to wait for 30 seconds for a
+service to have a stable set of introduction points is rather arbitrary. An
+analysis of typical delays in establishing introduction points might help
+to apply a more suitable algorithm here.
+%
+\item[Increase Count of Internal Circuits] The number of preemptively built
+internal circuits for later cannibalization should be increased. Really
+popular hidden services require more than two internal circuits in the pool
+to answer multiple client requests at the same time. This scenario was not
+yet analyzed, but will probably exhibit even worse performance as measured
+here. The number of preemptively built internal circuits should be a
+function of connection requests in the past to adapt to changing needs.
+Furthermore, an increased number of internal circuits on client side would
+allow clients to establish connections to more than one hidden service at
+a time.
+%
+\item[Build More Introduction Circuits than Needed on Hidden Server] When
+establishing introduction points, a hidden service could launch 5 instead
+of 3 introduction circuits at the same time and use only the first 3 that
+could be established. The remaining two circuits could still be used for
+other purposes afterwards.
+%
+\item[Parallel Connection to Two Introduction Points by Clients] A client
+could attempt to establish two introduction circuits to two different
+introduction points simultaneously and only use the first that succeeds.
+The slower circuit could still be used for another purpose. However, there
+is a possible anonymity issue here that needs to be taken under
+consideration, because the circuit can now be linked to one of the hidden
+service's introduction points. Further, it needs to be evaluated whether
+the extension of two circuits at the same time has a negative effect on
+clients with low bandwidth.
+\end{description}
+
+There are areas which are candidates for performance improvements, but
+which are rather vague at the moment and require a relevant amount of
+effort for \emph{further investigation} before they can be realized.
+These items will be considered in the course of this project:
+
+\begin{description}
+\item[Rendezvous Protocol Simplifications] {\O}verlier and Syverson have
+proposed two simplified rendezvous protocols.\footnote{Lasse {\O}verlier
+and Paul Syverson, Improving Efficiency and Simplicity of Tor Circuit
+Establishment and Hidden Services,
+\url{http://www.freehaven.net/anonbib/#overlier-pet2007}}
+Their first protocol aims at using a single circuit on client-side to
+contain the rendezvous point and connect to an introduction point. The
+second protocol goes the extra mile to unite the roles of introduction
+point and rendezvous point and save another circuit. It can be assumed that
+these simplifications improve connection establishment times. However, they
+have yet unclear effects on anonymity, given that they are implemented
+without the authors' proposed valet node approach which requires a major
+redesign of hidden services.
+%
+\item[Distributed Hidden Service Directory] The current measurements have
+been performed using the central hidden service directory only. In the near
+future the central directory will be replaced by a distributed directory
+consisting of a subset of all Tor relays. This design is already deployed
+and in an experimental state. As soon as a reasonable number of distributed
+hidden service directory nodes are deployed, there should be further
+measurements to compare performance with the current directory design. It
+might turn out that more sophisticated means for load balancing between
+distributed directory nodes are necessary than are currently implemented.
+%
+\item[Grand Scaling Plan] \emph{TODO Karsten: Find out some details about
+this one.}
+%
+\item[Low-Bandwith Measurements] For some improvement suggestions,
+e.g.\ reducing timeouts, the effect on clients with low bandwidth is yet
+unclear. Future measurements should therefore include clients and possibly
+hidden servers on low-bandwidth Internet connections.
+\end{description}
+
+At last, there are few approaches that could have a positive effect on the
+performance of Tor Hidden Services, but which require substantial changes
+to the core parts of Tor. That brings them \emph{out of the scope} of this
+project that aims at improving hidden services only. Anyway, these ideas
+are at least worth mentioning for later attempts to improve Tor performance
+in general:
+
+\begin{description}
+\item[Circuit Establishment Timeout] A Tor client could keep track of how
+long it takes to establish a circuit, and discard circuits that took too
+long to establish. The justification is that a circuit that took long to
+build will probably take long to transport messages. This approach should
+considerably increase all parts of hidden services, too.
+%
+\item[Allow Failing Circuit Extensions] The current rationale of Tor is to
+discard a complete circuit once an extend attempt has failed. The main
+reason is to inhibit that an adversary can control the set of relays to
+which a client can build circuits and prevent extensions to other nodes
+than the adversary's. A relaxation of this logic by allowing, e.g., 3
+extend failures per hop might result in a significant speeding up of hidden
+services, that depend on fast circuit extension to a great extent.
+%
+\item[Better Load Balancing on Relays] The reason why some circuits are
+orders of magnitude slower than others might result from the fact that
+single relays are overloaded. A better path selection scheme based on a
+relay's current load might lead to better load balancing and thereby
+lower delay in message transmission.
+%
+\item[Prioritize Low-Volume Circuits] Tor relays currently handle all
+incoming cells on a first-come-first-served basis. A better approach than
+this is to give higher priority to circuits that have not sent as many
+cells lately. This would especially prioritize introduction and rendezvous
+circuits and thereby accelerate connection establishment.
+\end{description}
 \end{document}
 



More information about the tor-commits mailing list