[tor-dev] Torperf implementation considerations

Kevin haqkrs at gmail.com
Wed Sep 25 20:30:46 UTC 2013


Hi Karsten, Sathya,

Sorry for the delayed response, been having connection issues all week
on anything other than a phone ~_~. I've included updates from Sathya's
later mails below also with additional comments added.

> I don't see how we could make new experiments language agnostic.  These
> new experiments will want to re-use Torperf classes/components, which
> they can't do if they're just "shell scripts".  They need to implement
> an interface provided by Torperf, and Torperf needs to provide
> functionality via an interface that the experiments understand.  If an
> experiment is in fact just a shell script, it should be trivial to write
> a wrapper class to call that.  But I think that's the exception, not the
> rule.
> 
> Or maybe we have a different understanding of an "experiment".  Can you
> give an example for an experiment that is not listed in Section 3 of the
> design document and state how you'd want to integrate that into Torperf
> without touching its sources?
I don't think it's such a hard piece to achieve, if you consider the
Alexa & Selenium experiment. We're going to need to have a basic HTTP
proxy implementation inside the Torperf service (proxying to the socks
proxy of a specific tor version, specified by the experiment's config).

If you imagine this, this HTTP proxy is already an interface that
applies most of the logic required (times appropriate http and socks
timings, socks ports, tor version, time started, etc) so the selenium
client is really just responsible for it's unique data.

Then, assuming the result format is not hard to muster up (as you've
specified it currently, it should be simple), then gaining agnostic
experiments would not be difficult.

A concrete example that differs from Section 3, would be; that, to
change the alexa experiment to do say the top5 sites in France or
something, it should hopefully be trivial to just change a text file and
be done with it instead of having to be familiar with Python/whatever.

That said, it is an 'ideally we could' kind of point, so not a blocker
to not aim for it. Either way, the user will be free to hack on whatever
experiments so I'm sure it won't be hard for them to do the above by
just hacking on the final implementation :) The real users are likely
technically adept I guess!(?)

> I also added an Appendix A with suggested data formats.  Maybe these
> data formats make it clearer what I'd expect as output from an experiment.
This is great, thanks for this. We're thinking along the same lines for
that experiment at least :) I think it would be useful to also add
desired/required information on other experiments as we progress as it
can definitely help clarify what is required from each implementation.

>>> I agree with you that this is a rather unusual requirement and that
>>> adding new experiments to Torperf is the better approach.  That's why
>>> the paragraph said "should" and "ideally".  I added your concerns to the
>>> design document to make this clearer.  (Maybe we should mark
>>> requirements as either "must-do", "should-do", or "could-do"?)
>>
>> Well, "ideally" implies that we want to do this at some point. Do we?
> 
> I don't feel strongly.  I'd prefer a design that makes it easy to add
> new experiments, but I'm fine with an approach that requires merging
> patches.  We can always add the functionality to drop something in a
> directory and make Torperf magically detect and run the new experiment,
> but that can happen later.  Maybe we shouldn't let that distract us from
> getting the first version done.  I commented out this section.
I think the magically detect and run part can definitely be left for
future, but installation should still be this easy.

Surely it's just as easy to implement detecting new experiments on
service startup as to implement not doing that. (while still somehow
allowing experiments to be added... is this implying hardcoded experiments?)

Also, perhaps you don't want to support this, but how does the patch and
merge system work for quick deployments of short lived experiments? (Is
there ever such a thing? Karsten?)
Or what if someone does develop a neat set of experiments for their own
personal use that doesn't really apply to the project as a whole, are we
expected to merge them upstream? What if they don't want to share?

>>>>> It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
>>>>
>>>> I don't think we need this now. I'm totally ok with having users run
>>>> different torperf instances for different tor versions.
>>>
>>> Running multiple Torperf instances has disadvantages that I'm not sure
>>> how to work around.  For example, we want a single web server listening
>>> on port 80 for all experiments and for providing results.
>>
>> Oh. I did not mean running multiple torperf instances
>> *simultaneously*; I just meant sequentially.
> 
> But what if we want to run multiple experiments at the same time?
> That's a quite common requirement.  Right now, we run 3 Torperf
> experiments on ferrinii at the same time.  A few years ago, we ran 15
> experiments with tors using different guard selection strategies and
> downloading different file sizes.

I disagree with removing this requirement.

>>> Why do you think it's hard to run different tor versions or binaries in
>>> the same Torperf service instance?
>>
>> Then each experiment needs to deal with locating, bootstrapping, and
>> shutting down Tor. We could just run a torperf test against a
>> particular tor version, once that's completed, we can run against
>> another tor version and so on. I'm not against this idea -- it can be
>> done. I just don't think it's high priority.
> 
> Torperf should help with bootstrapping and shutting down tor, because
> that's something that all experiments need.  Locating tor could just be
> a question of passing the path to a tor binary to Torperf.  See above
> for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'.
{ ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }

Is there requirement to run the *same* experiment across different Tor
versions at the same time (literally parallel) or just to have "I as a
user set this up to run for X,Y,Z versions and ran it one time and got
all my results."?

I think this is what Sathya is saying with:
> We could just run a torperf test against a particular tor version, 
> once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for
it at any time, and it does it's versioned runs sequentially.

I think the discussion above is talking about two different things, I
think it would be beneficial to decide what needs to be actually
parallel and what just needs to be one-time setup for a user.

Are there any concerns around parallel requests causing noise in the
timing information? Or are we happy to live with a small 1-2(?)ms noise
level per experiment in order to benefit from faster experiment runtimes
in aggregate?

On that, can we be clear with our vocabulary, "Torperf tests" means
"Torperf experiments", right?

>>>>> It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
>>>>
>>>> IMHO, torperf should just measure performance, not download Tor or
>>>> verify signatures. We have good package managers that do that already.
>>>
>>> Ah, we don't just want to measure packaged tors.  We might also want to
>>> measure older versions which aren't contained in package repositories
>>> anymore, and we might want to measure custom branches with performance
>>> tweaks.  Not sure if we actually want to verify signatures of tor versions.
>>>
>>> I think we should take Shadow's approach (or something similar).  Shadow
>>> can download a user-defined tor version ('--tor-version'), or it can
>>> build a local tor path ('--tor-prefix'):
>>
>> If the user wants to run torperf against tor versions that are not
>> present in the package managers, then the user should download and
>> build tor -- not torperf. Once a local binary is present, the user can
>> run torperf against it with a --tor prefix.
> 
> It's perfectly fine if the first version only supports a '--tor-binary'
> option and leaves downloading and building of custom tor versions to the
> user.  Of course, Torperf should be able to support the default tor
> binary that comes with the operating system for non-expert users.  But
> supporting a '--tor-version' option that downloads and builds a tor
> binary can come in version two.  I tried to describe this approach in
> the design document.  (Please rephrase as needed.)
> 
>>> https://github.com/shadow/shadow/blob/master/setup#L109
>>>
>>> Do you see any problems with this?
>>
>> Nope, this is perfectly fine. I just don't want torperf to download,
>> verify and build tor.
> 
> Perfectly fine to ignore for now.  It's not a crazy feature.  But let's
> do this later.

Agree on this, no need to do downloading / verifying / installing Tor in
initial releases, it's likely a huge PITA.
But I think we should have the tor binary locations listed in the config
rather than a command flag. (Listing multiple Tor versions via command
flag seems a lot more error prone to me)

>>>>> A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
>>>>
>>>> Torperf should not accumulate results from remote Torperf service
>>>> instances. If by "accumulate", you mean read another file from
>>>> /results which the *user* has downloaded, then yes. Torperf shouldn't
>>>> *download* result files from remote instances.
>>>
>>> Why not?  The alternative is to build another tool that downloads result
>>> files from remote instances.  That's what we do right now (see footnote:
>>> "For reference, the current Torperf produces measurement results which
>>> are re-formatted by metrics-db and visualized by metrics-web with help
>>> of metrics-lib.  Any change to Torperf triggers subsequent changes to
>>> the other three codebases, which is suboptimal.")
>>
>> This could just be a wget script that downloads the results from
>> another server. I just don't want that to be a part of torperf.
>> Torperf should just measure performance and display data, IMHO -- not
>> worry about downloading and aggregating results from another system.
>> Or maybe we can do this later and change it to "Ideally torperf should
>> .."
> 
> This isn't the most urgent feature to build, though we need it before we
> can kill the current Torperf and replace it with the new one.  However,
> using wget to download results from another service is exactly the
> approach that brought us to the current situation of Torperf being a
> bunch of scripts.  I'd rather not want to write a single script for
> Torperf to do what it's supposed to do, but design it in a way that it
> can already do all the things we want it to do.  Accumulating results
> and presenting them is part of these things.

"Torperf should just measure performance and display data", displaying
aggregate data is displaying data! :P

But, surely if Torperf just achieves this by wget'ing stuff, and the
user doesn't have to worry about anything other than setting a remote
server and an interval to poll, that would be considered done? (Torperf
handles the scheduling and managing of the data files)

>>>>> results database Store request details, retrieve results, periodically delete old results if configured.
>>>>
>>>> Not sure if we really need a database. These tests look pretty simple to me.
>>>
>>> Rephrased to data store.  I still think a database makes sense here, but
>>> this is not a requirement.  As long as we can store, retrieve, and
>>> periodically delete results, everything's fine.
>>>
>>
>> Cool!

I don't think we need a database for the actual results (but a flat file
structure is just a crap database! :). I do however think, once we start
to provide the data visualisation aspects, it will need a database for
performance when doing queries that are more than simple listings.

regards,
Kevin




More information about the tor-dev mailing list