commit 4c459ffd374f5416f769923afee242943a4c494c
Author: Karsten Loesing <karsten.loesing(a)gmx.net>
Date: Fri Sep 6 15:46:17 2013 +0200
First draft of torperf2 design document.
---
2013/torperf2/.gitignore | 2 +
2013/torperf2/torperf2.bib | 10 ++
2013/torperf2/torperf2.tex | 372 ++++++++++++++++++++++++++++++++++++++++++
2013/torperf2/tortechrep.cls | 1 +
4 files changed, 385 insertions(+)
diff --git a/2013/torperf2/.gitignore b/2013/torperf2/.gitignore
new file mode 100644
index 0000000..78ee6f6
--- /dev/null
+++ b/2013/torperf2/.gitignore
@@ -0,0 +1,2 @@
+torperf2.pdf
+
diff --git a/2013/torperf2/torperf2.bib b/2013/torperf2/torperf2.bib
new file mode 100644
index 0000000..6435dd2
--- /dev/null
+++ b/2013/torperf2/torperf2.bib
@@ -0,0 +1,10 @@
+@techreport{tor-2009-09-001,
+ author = {Karsten Loesing},
+ title = {Performance of Requests over the {Tor} Network},
+ institution = {The Tor Project},
+ number = {2009-09-001},
+ year = {2009},
+ month = {September},
+ url = {https://research.torproject.org/techreports/torperf-2009-09-22.pdf}
+}
+
diff --git a/2013/torperf2/torperf2.tex b/2013/torperf2/torperf2.tex
new file mode 100644
index 0000000..b3eb955
--- /dev/null
+++ b/2013/torperf2/torperf2.tex
@@ -0,0 +1,372 @@
+\documentclass{tortechrep}
+\usepackage{url}
+\usepackage{graphicx}
+\usepackage{enumerate}
+\usepackage{hyperref}
+
+\begin{document}
+
+\title{Requirements and Software Design for a\\
+Better Tor Performance Measurement Tool}
+
+\author{Karsten Loesing}
+
+\contact{\href{mailto:karsten@torproject.org}{karsten@torproject.org}}
+\reportid{}
+\date{\emph{\\--- This report has not been published yet, so it has no
+number and its URL is going to change once it's published; better not
+reference it yet. Work in progress. ---}}
+
+\maketitle
+
+\section{Introduction}
+
+Four years ago, we presented a simple tool to measure performance of the
+Tor network \cite{tor-2009-09-001}.
+This tool, called Torperf, requests static files of three different sizes
+over the Tor network and logs timestamps of various request substeps.
+These data turned out to be quite useful to observe user-perceived network
+performance over time.%
+\footnote{\url{https://metrics.torproject.org/performance.html}}
+However, static file downloads are not the typical use case of a user
+browsing the web using Tor, so absolute numbers are not very meaningful.
+Also, Torperf consists of a bunch of shell scripts which makes it neither
+very user-friendly to set up and run, nor extensible to cover new use
+cases.
+
+For reference, we made an earlier approach 1.5 later that suggested
+redesigning the Python parts in Torperf, but that redesign never
+happened.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/2565}}
+
+In this report we outline requirements and a software design for a rewrite
+of Torperf.
+We loosely start with non-functional requirements, so aspects like
+user-friendliness or extensibility, because these requirements drive the
+rewrite more than the immediate need for new features.
+After that, we discuss actual functional requirements, so experiments that
+we want to perform to measure things in the Tor network on a regular and
+automated basis.
+Finally we suggest a software design that fulfills all these requirements.
+This report is meant to be updated in the process of writing this new
+software, so that it can later serve as documentation.
+Only then it's going to be published officially.
+
+\section{Requirements}
+
+Before we talk about functional requirements, that is, actual experiments
+the new Torperf is supposed to perform, we want to discuss more general
+requirements of a Torperf rewrite.
+For the time being, assume that the major functional requirement is to
+``make client requests over the Tor network and record timestamps and
+other meta data for later analysis.''
+Whichever type of request this is, it's important that all or most of the
+non-functional requirements listed in the following are met.
+We start with configuration requirements, followed by requirements to
+results formats.
+
+\subsection{Configuration requirements}
+
+\subsubsection{Installation and upgrade}
+
+Whoever installs Torperf shouldn't be required to understand its codebase,
+nor rely on support by someone who does.\footnote{This sounds obvious, but
+this was the case with the current, shell script based Torperf, and it's
+also the reason why its current known userbase is 2.}
+Ideally, the new Torperf comes as OS package, meaning that it only relies
+on third-party software that is also either available as OS package or
+that is shipped with the Torperf package.
+To put it simply, installing on Debian Wheezy should be as easy as
+\verb+apt-get install+, and upgrading should simply be an
+\verb+apt-get update && apt-get upgrade+ away.
+
+\subsubsection{Run as service}
+
+The new Torperf should run as OS service, unrelated to a specific user.
+It should have a config file that is easy to understand, and
+it should come with scripts to start, restart, and stop the Torperf
+service.
+If the service operator wants to run multiple experiments at once, they'd
+simply configure all those experiments in the configuration file or
+directory, rather than starting multiple instances of Torperf.
+
+\subsubsection{Single configuration point}
+
+All requests that Torperf performs use a local Tor client, go over the Tor
+network, and are answered by some server.
+Some experiments may be designed to use remote servers not controlled by
+the person running Torperf, but others require setting up a custom server.
+In the latter case, configuring, starting, restarting, or stopping client
+and server should happen in a single place, that is, on a single machine
+as part of the Torperf service.
+Fortunately, running client and server on the same physical should not
+have any effect on measurement results, because all requests and responses
+traverse the Tor network.
+
+\subsubsection{User-defined tor version or binary}
+
+A key part of measurements is the tor software version or binary used to
+make requests or even to handle requests in the hidden-service case.
+It should be easy to specify a tor version that is not the current tor
+version shipped with the OS.
+Similarly, it should be easy to point to a custom tor binary to use for
+measurements.
+It should be possible to run different experiments with different tor
+versions or binaries in the same Torperf service instance.
+
+\subsubsection{User-defined third-party software version or binary}
+
+Similar to the previous requirement, it should be easy to specify custom
+versions or binaries of third-party software other than the version
+currently shipped with the OS.
+This applies, for example, to Firefox when attempting to make measurements
+more reaslistic.
+
+\subsection{Results requirements}
+
+\subsubsection{Results format}
+
+The measurement results produced by Torperf contain no sensitive data by
+design, because all requests are made by Torperf itself, not by actual Tor
+users.
+We'll want to collect as much information as needed to perform useful
+analysis.
+But at the same time we want to keep it as easy as possible to process
+results.
+
+Results may come from various sources, e.g., an HTTP/SOCKS client,
+Selenium/Firefox, a tor client used to make the request or to answer the
+request as hidden service, an HTTP server, etc.
+Torperf should not store original logs, because that would only
+shift the issue of processing different log formats to the analysis step.
+Such an approach would also generate unnecessarily large results files.
+Torperf should rather process data from these sources and store them in a
+custom results format that can be easily processed using tools shipped
+with Torperf.
+Results may include data which appear not immediately relevant to
+measuring Tor performance, but which may be useful for related purposes.
+For example, Torperf should include data about circuit failures in its
+results, even though these circuits may not have been used in actual
+requests.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/8662}}
+Deciding which data to store should be the responsibility of whoever
+designs a Torperf experiment, though formats should be somewhat uniform
+between experiments.
+A possible results data format could be JSON, but other data formats are
+plausible, too.
+
+\subsubsection{Results web interface}
+
+Torperf should provide all its results via a public web interface.
+This includes measured timestamps as well as any meta data collected in
+an experiment.
+Results should be available for all measurements as well as for a subset
+specified by measurement time or filtered by other criteria.
+
+\subsubsection{Results accumulator}
+
+A Torperf service instance should be able to accumulate results from its
+own experiments and remote Torperf service instances.
+Therefore, it should simply download new results from their web interfaces
+and incorporate them in its own database.
+It would be useful to have the accumulator warn if a remote service
+instance becomes stale and doesn't serve recent results anymore.
+Note how the service instance name or identifier should become part of
+Torperf results meta data.
+
+\subsubsection{Truncate old results}
+
+A Torperf service instance should be able to limit space requirements for
+storing past results.
+This could be done by discarding results that are older than a configured
+number of days.
+
+\subsubsection{Results graph data}
+
+Ideally, Torperf provides aggregate statistics via its web interface which
+can then be visualized by others.
+Even more ideally, Torperf serves a few HTML pages containing the
+necessary JavaScript to visualize results.
+While this may sound like going overboard here, making Torperf the tool
+to run performance measurements \emph{and} to present results has the
+advantage of not building and maintaining a separate tool for the latter.%
+\footnote{For reference, the current Torperf produces measurement results
+which are re-formatted by metrics-db and visualized by metrics-web with
+help of metrics-lib.
+Any change to Torperf triggers subsequent changes to the other three
+codebases, which is suboptimal.}
+
+\subsubsection{Results parsing library}
+
+The new Torperf should come with an easy-to-use library to process its
+results.
+Alternatively, this library could be provided and maintained as part of
+stem or another parsing library.
+
+\section{Experiments}
+
+\subsection{High priority experiments}
+
+\subsubsection{Alexa top-X websites using Selenium/Firefox}
+
+One major reason for rewriting Torperf is to make its results more
+realistic.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/7168}}
+It was suggested to track down Will Scott's torperf-like scripts, make
+them public if needed, and do a trial deployment somewhere.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/7516}}
+An experiment making these more realistic measurements should use
+something like Selenium/Firefox to control an actual browser to make
+requests.
+As a variant, this experiment should be run with a Firefox that uses
+optimistic data.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/3875}}
+
+\subsubsection{Static file downloads}
+
+Another major reason for rewriting Torperf is to supersede the existing,
+bit-rotting codebase.
+The new Torperf should therefore provide an experiment that is identical
+to the current, single Torperf experiment: download static files of sizes
+50 KiB, 1 MiB, and 5 MiB over the Tor network and record timestamps and
+relevant meta data.
+
+There are quite a few details in getting these downloads right, which
+shall not all be specified here.
+For example, Torperf needs to enforce using a fresh circuit for each run,
+which is currently ensured by reducing the maximum circuit dirtiness to a
+value that is lower than the experiment period.
+An alternative may be to send the \verb+NEWNYM+ signal to the tor process
+after every stream.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/2766}}
+
+The best way to extract these requirements is to read the source.%
+\footnote{\url{https://gitweb.torproject.org/torperf.git/tree}}
+There also exists a proof-of-concept implementation of this experiment in
+Twisted which can and should serve as starting point for this rewrite.%
+\footnote{\url{https://gitweb.torproject.org/karsten/torperf.git/tree/refs/heads/perfd}}
+
+The results format of the current Torperf codebase shall serve as
+blueprint for designing a new results format.%
+\footnote{\url{https://metrics.torproject.org/formats.html\#torperf}}
+As stated before, the new results format should make use of JSON or
+another data format, so that results can be processed more easily.
+
+Results may even contain more information than the original Torperf
+results.
+For example, assuming that the new Torperf service controls both HTTP
+client and server, new timestamps could be added for serving the first
+and last byte.
+Matching timestamps between client and server can be achieved by serving
+new random content in each request and use this content (or a digest
+thereof) as request identifier.
+
+\subsection{Lower priority experiments}
+
+\subsubsection{Canonical median web page}
+
+A lower-priority experiment would be to devise and deploy the canonical
+median web page.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/7517}}
+Such an experiment would share a lot with static file downloads, but
+would serve an actual web page using Torperf's own webserver.
+
+\subsubsection{Hidden service performance}
+
+Another possible experiment would be to measure performance to a
+hidden service.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/1944}}
+This hidden service could easily run on the same tor client instance,
+or on a separate tor client instance running on the same physical host.
+We'll want to add additional timestamps for hidden service events.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/2554}}
+The hidden service could serve static files, a canonical median web page,
+or even an Alexa top-X website.
+
+The latter has the minor disadvantage that server timestamps cannot be
+matched with client timestamps based on content, but only
+probabilistically via timing.
+This may become problematic when running lots and lots of experiments at
+the same time.
+Or maybe it's possible to match client and server observations using
+identifiers, like the rendezvous cookie, exchanged in the rendezvous
+process instead.
+
+\subsubsection{GET POST performance}
+
+The experiments so far all measured download speed, but it's also easy
+to measure upload speed.%
+\footnote{\url{https://trac.torproject.org/projects/tor/ticket/7010}}
+Torperf's own web server could accept POST data and measure timestamp of
+first byte received, last byte received, etc.
+Similar to static file downloads, client and server timestamps can be
+matched based on the random data that is posted to the server, or a digest
+of that data.
+
+\section{Software design}
+
+The previous sections outlined requirements to the Torperf rewrite and
+possible experiments, unrelated to an actual implementation.
+The purpose of this section is to suggest a possible software design
+matching these requirements.
+It's quite possible that there are better software designs.
+This section should serve as starting point for a discussion.
+
+We have an initial proof-of-concept prototype that uses Twisted to
+implement a small subset of the new Torperf.
+Even though the new Torperf does quite a few things at once, like starting
+tor processes, making requests, answering requests, collecting results,
+etc., it can be implemented as a single Twisted application.
+This shifts some deployment problems to what people usually do when
+deploying Twisted applications, rather than forcing us to solve these
+problems yet another time.
+Of course, at the same time it forces us to follow the Twisted model of
+writing applications, which seems not too bad in this case.
+The current prototype also requires twisted-socks as SOCKS client,
+txtorcon for communicating with tor clients, and stem for event parsing.
+
+The following list outlines tasks that the new Torperf needs to perform.
+These could be implemented as Python classes, though this list was not
+written with Python or Twisted specifics in mind:
+
+\begin{description}
+\item[configuration handler] Validate and parse configuration file or
+directory, provide configuration values to other application parts.
+\item[logger] Log operation details for debugging purposes and for normal
+service operation, unrelated to storing measurement results.
+\item[tor process starter] Configure, start, hup, and stop local tor
+client processes, using a previously configured tor version or binary.
+\item[tor controller] Connect to tor's control port, force-set guards if
+required by the experiment, register for and handle asynchronous events,
+set up and tear down hidden services.
+\item[request scheduler] Start new requests following a previously
+configured schedule.
+\item[request runner] Handle a single request from creation over various
+possible sub states to timeout, failure, or completion.
+\item[HTTP/SOCKS client] Make HTTP GET requests using our own SOCKS client
+and connecting to a local tor client to gather as many timestamps of
+substeps as possible.
+\item[Selenium/Firefox wrapper] Make web request using Selenium/Firefox,
+possibly using Xvfb and using a previously configured Firefox version or
+binary, and collect as many timestamps of substeps as possible.
+\item[SOCKS or HTTP proxy] Capture even more timestamps by proxying
+requests made via Selenium/Firefox through our own SOCKS or HTTP proxy
+before handing them to the local tor client.
+\item[HTTP server] Run on port 80, serve static files and the canonical
+median weg page, accept POST requests, provide measurement results via
+RESTful API, present measurement results on web page.
+\item[Alexa top-X web pages updater] Periodically retrieve list of top-X
+web pages.
+\item[results database] Store request details, retrieve results,
+periodically delete old results if configured.
+\item[results accumulator] Periodically collect results from other Torperf
+instances, warn if they're out of date.
+\item[analysis scripts] Command-line tools to process measurement results
+and produce graphs and other aggregate statistics.
+Could also become part of stem instead, together with a tutorial.
+\end{description}
+
+\bibliography{torperf2}
+
+\end{document}
+
diff --git a/2013/torperf2/tortechrep.cls b/2013/torperf2/tortechrep.cls
new file mode 120000
index 0000000..4c24db2
--- /dev/null
+++ b/2013/torperf2/tortechrep.cls
@@ -0,0 +1 @@
+../../tortechrep.cls
\ No newline at end of file