[tor-dev] Torperf implementation considerations

Tue Oct 1 01:03:48 UTC 2013

Hi Karsten, Sathya,

Hope you've both had great weekends, please see inline!

> Want to help define the remaining data formats?  I think we need these
> formats:
> 
> - file_upload would be quite similar to file_download, but for the GET
> POST performance experiment.  Or maybe we can generalize file_download
> to cover either GET or POST requests and the respective timings.
> 
> - We'll need a document for hidden_service_request that does not only
> contain timings, but also references the client-side circuit used to
> fetch the hidden service descriptor, rendezvous circuit, and
> introduction circuit, and server-side introduction and rendezvous circuits.
> 
> - These data formats are all for fetching/posting static files.  We
> should decide on a data format for actual website fetches.  Rob van der
> Hoeven suggested HAR, which I included in a footnote.  So, maybe we
> should extend HAR to store the tor-specific stuff, or we should come up
> with something else.
> 
> - Are there any other data formats missing?
> 

I think extending the HAR format (with minimal changes really, it's
already reasonably generic) would be a good fit for the real fetches
indeed. Do you feel HAR is overkill for the others?

I think it wouldn't be such a bad idea to use it for all, perhaps this
could be a future requirement if not an initial one. (e.g. the
static_file_downloads would have that in 'creator', but it would have
multiple 'entries' each representing a static file (with our own fields
added, filesize, tor_version, etc...)

There are a number of perks with using HAR:
- TBB probably already knows how to record .HAR files so the selenium
work would basically just be to open a browser and record a few
navigations to .HAR (I know Chrome can do this easily, so I'm assuming
our TBB version of Firefox is also capable)

- We can benefit from any tooling build around HAR, either to
statistically analyse, or to provide visualisation.

- There is a a decent amount of research around HAR compression
(although it basically seems to just be gzipping) but if we can support
compressed HAR then we can allow servers to store a lot more history.

There is also the negative that the HAR files will probably provide *too
much* data, but we could probably prune the files before archiving them
or as a stage before total deletion.

While spending some time implementing these things, I have been playing
with a faked alexa experiment, and I think the HAR format, or atleast
something that allows for multiple entries per experiment results set is
necessary for all our experiments (even static file).

I'll get back to you regarding these data formats in future when I have
time to actually look at what the other experiments need (I've mainly
focused on alexa & static downloading this far)

>> I think the magically detect and run part can definitely be left for
>> future, but installation should still be this easy.
>>
>> Surely it's just as easy to implement detecting new experiments on
>> service startup as to implement not doing that. (while still somehow
>> allowing experiments to be added... is this implying hardcoded experiments?)
> 
> I guess experiment types will be hard-coded, but experiment instances
> will be configurable.
> 
>> Also, perhaps you don't want to support this, but how does the patch and
>> merge system work for quick deployments of short lived experiments? (Is
>> there ever such a thing? Karsten?)
> 
> Yes, there might be such a thing as short-lived experiments.  We'd
> probably commit such a patch to a separate branch and decide after the
> experiment if it's worth adding the experiment to the master branch.
> 
>> Or what if someone does develop a neat set of experiments for their own
>> personal use that doesn't really apply to the project as a whole, are we
>> expected to merge them upstream? What if they don't want to share?
> 
> I think we should only merge experiments that are general enough for
> others to run.
> 

This doesn't entirely address my concern. If upstream branches are made
for short lived experiments (rather than just sharing a folder between
people), how do the users install that? (since they would have initially
apt-get installed? not git/svn?)

And my concern around non-general or non-shared experiments skips the
issue, how will they distribute them to whoever needs to run it? Their
own git repo infrastructure? (But of course I agree we should only
upstream general things)

>>> Torperf should help with bootstrapping and shutting down tor, because
>>> that's something that all experiments need.  Locating tor could just be
>>> a question of passing the path to a tor binary to Torperf.  See above
>>> for sequential vs. parallel experiments.
>> Locating Tor should just be settings in 'the Torperf config'.
>> { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
>>
>> Is there requirement to run the *same* experiment across different Tor
>> versions at the same time (literally parallel) or just to have "I as a
>> user set this up to run for X,Y,Z versions and ran it one time and got
>> all my results."?
> 
> You would typically not run an experiment a single time, but set it up
> to run for a few days.  And you'd probably set up parallel experiments
> to start with a short time offset.  (Not sure if this answers your
> question.)
> 
I messed up the question with that single word, I'll address this below.
I meant 'ran torperf once' and 'got my results periodically as the
schedule defines'

>> I think this is what Sathya is saying with:
>>> We could just run a torperf test against a particular tor version, 
>>> once that's completed, we can run against another tor version and so on.
>> i.e. for each experiment, there's only one instance of Tor allocated for
>> it at any time, and it does it's versioned runs sequentially.
> 
> For each experiment there's one tor instance running a given version.
> You wouldn't stop, downgrade/upgrade, and restart the tor instance while
> the experiment is running.  If you want to run an experiment on
> different tor versions, you'd start multiple experiments.  For example:
> 
> 1. download 50KiB static file, use tor 0.2.3.x on socks port 9001, start
> every five minutes starting at :01 of the hour.
> 
> 2. download 50KiB static file, use tor 0.2.4.x on socks port 9002, start
> every five minutes starting at :02 of the hour.
> 
> 3. download 50KiB static file, use tor 0.2.5.x on socks port 9003, start
> every five minutes starting at :03 of the hour.
> 
I don't think it's a good idea to have to define such specificity for
humans. They should be able to just define the 50kb (and probably more
file sizes) with an five minute interval for versions x, y, z. (I also
don't think the user should define the socks port to use, but that's a
minor detail.)

I think you've answered my question here though. I'll summarise below!

>> I think the discussion above is talking about two different things, I
>> think it would be beneficial to decide what needs to be actually
>> parallel and what just needs to be one-time setup for a user.
>>
>> Are there any concerns around parallel requests causing noise in the
>> timing information? Or are we happy to live with a small 1-2(?)ms noise
>> level per experiment in order to benefit from faster experiment runtimes
>> in aggregate?
> 
> Not sure which of these questions are still open.  We should definitely
> get this clear in the design document.  What would we write, and where
> in the document would we put it?
> 
Perhaps we can cover this near 2.1.2 "If the service operator wants to
run multiple experiments...."

I think you've defined above that no experiment should be run at the
same time as another. I.e. the main service should not be executing code
for different experiments at the same time. (Above you've humanly
inputted to start each a minute after, which assumes each experiment
takes under a minute to execute --- what happens if there are timeouts
or slow networks?)

I would agree with this as it will help to keep experiments more
accurate (e.g. static file download for a 50MB file didn't adversely
affect the performance for a hidden service test that started at the
same time)

I think the service itself should handle scheduling things on a periodic
basis and should make it clear how late the service happened compared to
it's desired schedule (e.g. one experiment took longer than a minute, so
the other started 5 seconds late, so it should start in (interval - 5
seconds) time.

How this would happen in practice would be, the service starts up,
checks results files for last runtimes for each experiment, then runs
any that haven't run in their last INTERVAL seconds. Assuming
experiments execute in time lower than
SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the
system will perfectly schedule by default.

There could be problems with this approach though, for example, what
happens if there is a large experiment, e.g. Alexa when the network is
slow, which takes say, 5 minutes. If this is scheduled to run every 10
minutes and there are some other experiments that are scheduled to run
every 2 minutes, then we have a problem. Either the 2 minute experiments
run on their intended schedule with potential inaccuracies caused, or
the 2 minute interval is not a 2minute interval. I think we should aim
for the latter and warn the user when they have made schedules like the
above.
- Another option is to break up experiments into chunks, where overall
only one request is going at a time, so a request is our atomic
scheduling option, but that becomes harder to coordinate and is highly
inefficient in terms of network throughput.

We could monitor experiments average runtime and determine 'optimal'
scheduling based on that, but I think the best thing in the short term
is just to say 'don't schedule long experiments to be run frequently if
you plan to run lots of other small experiments'

>> On that, can we be clear with our vocabulary, "Torperf tests" means
>> "Torperf experiments", right?
> 
> Yes.  Hope my "experiment type" vs. "experiment instance" was not too
> confusing. ;)
> 
That's perfectly clear to me :)

Regards,
Kevin