[tor-dev] Torperf implementation considerations

Karsten Loesing karsten at torproject.org
Tue Sep 24 13:11:48 UTC 2013


On 9/24/13 1:39 PM, Sathyanarayanan Gunasekaran wrote:
> Hi.
> 
> On Tue, Sep 24, 2013 at 6:03 AM, Karsten Loesing <karsten at torproject.org> wrote:
>> On 9/23/13 12:53 AM, Sathyanarayanan Gunasekaran wrote:
>>>
>>> I don't understand how this will work when users just apt-get install
>>> torperf. Ideally if someone writes a good experiment, they should send
>>> the patches upstream and get it merged, and then we update torperf to
>>> include those tests and then the users just update torperf with their
>>> package managers.
>>
>> I agree with you that this is a rather unusual requirement and that
>> adding new experiments to Torperf is the better approach.  That's why
>> the paragraph said "should" and "ideally".  I added your concerns to the
>> design document to make this clearer.  (Maybe we should mark
>> requirements as either "must-do", "should-do", or "could-do"?)
> 
> Well, "ideally" implies that we want to do this at some point. Do we?

I don't feel strongly.  I'd prefer a design that makes it easy to add
new experiments, but I'm fine with an approach that requires merging
patches.  We can always add the functionality to drop something in a
directory and make Torperf magically detect and run the new experiment,
but that can happen later.  Maybe we shouldn't let that distract us from
getting the first version done.  I commented out this section.

>>>> It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
>>>
>>> I don't think we need this now. I'm totally ok with having users run
>>> different torperf instances for different tor versions.
>>
>> Running multiple Torperf instances has disadvantages that I'm not sure
>> how to work around.  For example, we want a single web server listening
>> on port 80 for all experiments and for providing results.
> 
> Oh. I did not mean running multiple torperf instances
> *simultaneously*; I just meant sequentially.

But what if we want to run multiple experiments at the same time?
That's a quite common requirement.  Right now, we run 3 Torperf
experiments on ferrinii at the same time.  A few years ago, we ran 15
experiments with tors using different guard selection strategies and
downloading different file sizes.

>> Why do you think it's hard to run different tor versions or binaries in
>> the same Torperf service instance?
> 
> Then each experiment needs to deal with locating, bootstrapping, and
> shutting down Tor. We could just run a torperf test against a
> particular tor version, once that's completed, we can run against
> another tor version and so on. I'm not against this idea -- it can be
> done. I just don't think it's high priority.

Torperf should help with bootstrapping and shutting down tor, because
that's something that all experiments need.  Locating tor could just be
a question of passing the path to a tor binary to Torperf.  See above
for sequential vs. parallel experiments.

>>>> It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
>>>
>>> IMHO, torperf should just measure performance, not download Tor or
>>> verify signatures. We have good package managers that do that already.
>>
>> Ah, we don't just want to measure packaged tors.  We might also want to
>> measure older versions which aren't contained in package repositories
>> anymore, and we might want to measure custom branches with performance
>> tweaks.  Not sure if we actually want to verify signatures of tor versions.
>>
>> I think we should take Shadow's approach (or something similar).  Shadow
>> can download a user-defined tor version ('--tor-version'), or it can
>> build a local tor path ('--tor-prefix'):
> 
> If the user wants to run torperf against tor versions that are not
> present in the package managers, then the user should download and
> build tor -- not torperf. Once a local binary is present, the user can
> run torperf against it with a --tor prefix.

It's perfectly fine if the first version only supports a '--tor-binary'
option and leaves downloading and building of custom tor versions to the
user.  Of course, Torperf should be able to support the default tor
binary that comes with the operating system for non-expert users.  But
supporting a '--tor-version' option that downloads and builds a tor
binary can come in version two.  I tried to describe this approach in
the design document.  (Please rephrase as needed.)

>> https://github.com/shadow/shadow/blob/master/setup#L109
>>
>> Do you see any problems with this?
> 
> Nope, this is perfectly fine. I just don't want torperf to download,
> verify and build tor.

Perfectly fine to ignore for now.  It's not a crazy feature.  But let's
do this later.

>>>> A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
>>>
>>> Torperf should not accumulate results from remote Torperf service
>>> instances. If by "accumulate", you mean read another file from
>>> /results which the *user* has downloaded, then yes. Torperf shouldn't
>>> *download* result files from remote instances.
>>
>> Why not?  The alternative is to build another tool that downloads result
>> files from remote instances.  That's what we do right now (see footnote:
>> "For reference, the current Torperf produces measurement results which
>> are re-formatted by metrics-db and visualized by metrics-web with help
>> of metrics-lib.  Any change to Torperf triggers subsequent changes to
>> the other three codebases, which is suboptimal.")
> 
> This could just be a wget script that downloads the results from
> another server. I just don't want that to be a part of torperf.
> Torperf should just measure performance and display data, IMHO -- not
> worry about downloading and aggregating results from another system.
> Or maybe we can do this later and change it to "Ideally torperf should
> .."

This isn't the most urgent feature to build, though we need it before we
can kill the current Torperf and replace it with the new one.  However,
using wget to download results from another service is exactly the
approach that brought us to the current situation of Torperf being a
bunch of scripts.  I'd rather not want to write a single script for
Torperf to do what it's supposed to do, but design it in a way that it
can already do all the things we want it to do.  Accumulating results
and presenting them is part of these things.

>>>> The new Torperf should come with an easy-to-use library to process its results
>>>
>>> Torperf results should just be JSON(or similar) files that already
>>> have libraries and we should invent a new result format and write a
>>> library for it.
>>
>> Yes, that's what I mean.  If you understood this differently, can you
>> rephrase the paragraph?
> 
> "Torperf should store its results in a format that is widely used and
> already has libraries(like JSON), so that other applications can use
> the results and build on it". Maybe?

Changed.  (See also the footnote which I put in a few hours ago.)

>>>> request scheduler Start new requests following a previously configured schedule.
>>>> request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
>>>
>>> These are experiment specific. Some tests may not even need to do
>>> requests. No need for these to be a part of torperf.
>>
>> I'm thinking how we can reduce code duplication as much as possible.
>> The experiments in the design document all make requests, so it would be
>> beneficial for them to have Torperf schedule and handle their requests.
>>  If an experiment doesn't have the notion of request it doesn't have to
>> use the request scheduler or runner.  But how would such an experiment
>> work?  Do you have an example?
> 
> Nope I don't have an example. Maybe as I write the tests, I'll have a
> better idea about the structure. Ignore this comment for now!

Okay.

>>>> results database Store request details, retrieve results, periodically delete old results if configured.
>>>
>>> Not sure if we really need a database. These tests look pretty simple to me.
>>
>> Rephrased to data store.  I still think a database makes sense here, but
>> this is not a requirement.  As long as we can store, retrieve, and
>> periodically delete results, everything's fine.
>>
> 
> Cool!
> 
>> Again, thanks a lot for your input!
>>
>> Updated PDF:
>>
>> https://people.torproject.org/~karsten/volatile/torperf2.pdf
> 
> Great, thanks!

Updated the PDF again, same URL.  Thanks!

All the best,
Karsten



More information about the tor-dev mailing list