Re: [tor-dev] Torperf implementation considerations

24 Sep 2013


      On 9/24/13 1:39 PM, Sathyanarayanan Gunasekaran wrote:
...
Hi.
On Tue, Sep 24, 2013 at 6:03 AM, Karsten Loesing <karsten@torproject.org> wrote:
...
On 9/23/13 12:53 AM, Sathyanarayanan Gunasekaran wrote:
...
I don't understand how this will work when users just apt-get install
torperf. Ideally if someone writes a good experiment, they should send
the patches upstream and get it merged, and then we update torperf to
include those tests and then the users just update torperf with their
package managers.
I agree with you that this is a rather unusual requirement and that
adding new experiments to Torperf is the better approach.  That's why
the paragraph said "should" and "ideally".  I added your concerns to the
design document to make this clearer.  (Maybe we should mark
requirements as either "must-do", "should-do", or "could-do"?)
Well, "ideally" implies that we want to do this at some point. Do we?
I don't feel strongly.  I'd prefer a design that makes it easy to add
new experiments, but I'm fine with an approach that requires merging
patches.  We can always add the functionality to drop something in a
directory and make Torperf magically detect and run the new experiment,
but that can happen later.  Maybe we shouldn't let that distract us from
getting the first version done.  I commented out this section.
...
...
...
...
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run
different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure
how to work around.  For example, we want a single web server listening
on port 80 for all experiments and for providing results.
Oh. I did not mean running multiple torperf instances
*simultaneously*; I just meant sequentially.
But what if we want to run multiple experiments at the same time?
That's a quite common requirement.  Right now, we run 3 Torperf
experiments on ferrinii at the same time.  A few years ago, we ran 15
experiments with tors using different guard selection strategies and
downloading different file sizes.
...
...
Why do you think it's hard to run different tor versions or binaries in
the same Torperf service instance?
Then each experiment needs to deal with locating, bootstrapping, and
shutting down Tor. We could just run a torperf test against a
particular tor version, once that's completed, we can run against
another tor version and so on. I'm not against this idea -- it can be
done. I just don't think it's high priority.
Torperf should help with bootstrapping and shutting down tor, because
that's something that all experiments need.  Locating tor could just be
a question of passing the path to a tor binary to Torperf.  See above
for sequential vs. parallel experiments.
...
...
...
...
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or
verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors.  We might also want to
measure older versions which aren't contained in package repositories
anymore, and we might want to measure custom branches with performance
tweaks.  Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar).  Shadow
can download a user-defined tor version ('--tor-version'), or it can
build a local tor path ('--tor-prefix'):
If the user wants to run torperf against tor versions that are not
present in the package managers, then the user should download and
build tor -- not torperf. Once a local binary is present, the user can
run torperf against it with a --tor prefix.
It's perfectly fine if the first version only supports a '--tor-binary'
option and leaves downloading and building of custom tor versions to the
user.  Of course, Torperf should be able to support the default tor
binary that comes with the operating system for non-expert users.  But
supporting a '--tor-version' option that downloads and builds a tor
binary can come in version two.  I tried to describe this approach in
the design document.  (Please rephrase as needed.)
...
...
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
Nope, this is perfectly fine. I just don't want torperf to download,
verify and build tor.
Perfectly fine to ignore for now.  It's not a crazy feature.  But let's
do this later.
...
...
...
...
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service
instances. If by "accumulate", you mean read another file from
/results which the *user* has downloaded, then yes. Torperf shouldn't
*download* result files from remote instances.
Why not?  The alternative is to build another tool that downloads result
files from remote instances.  That's what we do right now (see footnote:
"For reference, the current Torperf produces measurement results which
are re-formatted by metrics-db and visualized by metrics-web with help
of metrics-lib.  Any change to Torperf triggers subsequent changes to
the other three codebases, which is suboptimal.")
This could just be a wget script that downloads the results from
another server. I just don't want that to be a part of torperf.
Torperf should just measure performance and display data, IMHO -- not
worry about downloading and aggregating results from another system.
Or maybe we can do this later and change it to "Ideally torperf should
.."
This isn't the most urgent feature to build, though we need it before we
can kill the current Torperf and replace it with the new one.  However,
using wget to download results from another service is exactly the
approach that brought us to the current situation of Torperf being a
bunch of scripts.  I'd rather not want to write a single script for
Torperf to do what it's supposed to do, but design it in a way that it
can already do all the things we want it to do.  Accumulating results
and presenting them is part of these things.
...
...
...
...
The new Torperf should come with an easy-to-use library to process its results
Torperf results should just be JSON(or similar) files that already
have libraries and we should invent a new result format and write a
library for it.
Yes, that's what I mean.  If you understood this differently, can you
rephrase the paragraph?
"Torperf should store its results in a format that is widely used and
already has libraries(like JSON), so that other applications can use
the results and build on it". Maybe?
Changed.  (See also the footnote which I put in a few hours ago.)
...
...
...
...
request scheduler Start new requests following a previously configured schedule.
request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
These are experiment specific. Some tests may not even need to do
requests. No need for these to be a part of torperf.
I'm thinking how we can reduce code duplication as much as possible.
The experiments in the design document all make requests, so it would be
beneficial for them to have Torperf schedule and handle their requests.
 If an experiment doesn't have the notion of request it doesn't have to
use the request scheduler or runner.  But how would such an experiment
work?  Do you have an example?
Nope I don't have an example. Maybe as I write the tests, I'll have a
better idea about the structure. Ignore this comment for now!
Okay.
...
...
...
...
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store.  I still think a database makes sense here, but
this is not a requirement.  As long as we can store, retrieve, and
periodically delete results, everything's fine.
Cool!
...
Again, thanks a lot for your input!
Updated PDF:
https://people.torproject.org/~karsten/volatile/torperf2.pdf
Great, thanks!
Updated the PDF again, same URL.  Thanks!

All the best,
Karsten