[tor-dev] Torperf implementation considerations (was: Torperf)

16 Sep 2013

      [cc tor-dev]

On 16 September 2013 09:47, Karsten Loesing <karsten@torproject.org> wrote:
...
Hmm, I don't think the HTTP client/server part is the right interface to
write another client and server and call it Torperf compatible.  The
Torperf data API would be a better interface for that: people could
write their own experiments and provide data that is Torperf compatible,
or they could use the data that Torperf provides and analyze or
visualize it in a better way.  But writing only half of an experiment,
client or server, wouldn't be of much use.
I thought bit more about this and I don't fully agree (but you've nailed it
that the data is the core api). I've come to the conclusion that I think it
makes sense for TorPerf to do much of the heavy lifting and the core server
aspects, but I think the experiments should be decoupled in such a way as
to allow for flexible client implementations.

Below are a bunch of semi related musings on how to get to this:

Of course as many experiments will just be doing simple http requests, the
experiments will really just be wrapper scripts around a bundled default
client implementation which would be similar to the one in your perfd
branch.

Specifically, to aid in this, I'd propose something like the folder
structure below: https://etherpad.mozilla.org/iqrgueVFd6

[Why a set of directories? There's not a solid reason other than it forces
unique names and gives good opportunity to isolate experiment specific
data. If an experiment has no specific data files it should be alright to
just have the config file instead of a directory. A directory structure
also mimics the idea of a restful web service as mentioned in the pdf, i.e.
the user could easily know to go to http:/.../results/myexperiment/ to see
results filtered for that single experiment. Either way, it's a minor
detail, I just feel it's easier for a user.]

A good use of the specific experiment data could be for static files, where
anything in an experiments '/public' folder would be served by the
webserver while that experiment is running. If wanting to ensure nothing
gets proxy cached, the experiment could be responsible for generating
random data files for each run. (This is a clear separation of server vs
experiment implementation concern.) Another idea could be custom experiment
views(js probably) that would transform the experiment results when viewed
on the web dashboard.

The experiments should not be responsible for determining details such as
socks port or control port, the TorPerf service should deal with load
balancing a bunch of tor instances on different ports and just tell an
experiment 'Do your stuff using this sock port, this public ip:port,
etc...' via environment variables. (The config could ask that certain
aspects of the torrc file are setup specifically but managing ports and
stuff is just asking for user error unless there's some experiment that
requires it.)

The config should minimally have an execution rate and a command. The
command could be to just execute the bundled TorPerf client implementation
with boring parameters e.g. just fetch facebook.com and record the normal
timings that the default implementation tracks.

A more interesting config's command could be the alexa_top_100 where it's
just a basic script/makefile that fetches a new alexa list if the one in
the local folder is older than X days and then for each site in the list it
runs the TorPerf client instance.

The TorPerf service instance should be able to run experiments by just
executing the commands and recording whatever is written to stdout and
stderr. After the command exits, if it's non zero then the stderr output is
treated as error messages while if it's zero then it's treated as info
messages. The stdout data should be treated as TorPerf results and it's an
error if it's not well formed. If it's not well formed it should be
captured in the results file for debug purposes. An example of
informational output might be that the alexa_top_100 experiment updated
it's list before this result set. [On the web interface it should be clear
which results sets contained errors or information]

Once the experiment finishes, the service should postprocess the results
from the experiment and replace any fingerprinted entries (This needs to be
well defined) with the server side timing information for that specific
fingerprint. Then the server should store the results file in a timestamped
file (timestamped probably by experiment start time) and update it's
database(if there is one).

The experiments should be able to specify their required tor version in
their config, but it should accept placeholder values such as the default
'latest', 'latest-stable' or even 'latest(3)' which would run the
experiment for all 3 of the latest tor versions. I think the ability to
check the performance of the same experiment over multiple Tor versions
could be interesting, especially to determine if any one build has caused
anomalies in performance. I would expect very few experiments to run across
multiple versions though.

Additionally, something that could be neat, but it's not clearly in the
requirements, should TorPerf be responsible for notifying the user when
there are new clients available to run as latest? Would it be useful to be
able to specify that some experiments should be run on 'master' or a gitref
and that it would be pulled between runs? That's probably not practical.

Apologies for the length and lack of order!
Kevin

[tor-dev] Torperf implementation considerations (was: Torperf)

Kevin Butler