[tor-dev] Torperf implementation considerations

Karsten Loesing karsten at torproject.org
Wed Oct 23 11:51:12 UTC 2013


On 10/1/13 3:03 AM, Kevin Butler wrote:
> Hi Karsten, Sathya,
> 
> Hope you've both had great weekends, please see inline!

Hi Kevin,

apologies for not replying earlier!  Finally, replying now.

>> Want to help define the remaining data formats?  I think we need these
>> formats:
>>
>> - file_upload would be quite similar to file_download, but for the GET
>> POST performance experiment.  Or maybe we can generalize file_download
>> to cover either GET or POST requests and the respective timings.
>>
>> - We'll need a document for hidden_service_request that does not only
>> contain timings, but also references the client-side circuit used to
>> fetch the hidden service descriptor, rendezvous circuit, and
>> introduction circuit, and server-side introduction and rendezvous circuits.
>>
>> - These data formats are all for fetching/posting static files.  We
>> should decide on a data format for actual website fetches.  Rob van der
>> Hoeven suggested HAR, which I included in a footnote.  So, maybe we
>> should extend HAR to store the tor-specific stuff, or we should come up
>> with something else.
>>
>> - Are there any other data formats missing?
>>
> 
> I think extending the HAR format (with minimal changes really, it's
> already reasonably generic) would be a good fit for the real fetches
> indeed. Do you feel HAR is overkill for the others?

I'm not sure, because I don't know the HAR format (this was a suggestion
that didn't look crazy to me, so I added it to the PDF).  But I think we
could use HAR for all kinds of requests.  We'll probably need to use
something else for stream and circuit information, because they can be
unrelated to specific requests.

> I think it wouldn't be such a bad idea to use it for all, perhaps this
> could be a future requirement if not an initial one. (e.g. the
> static_file_downloads would have that in 'creator', but it would have
> multiple 'entries' each representing a static file (with our own fields
> added, filesize, tor_version, etc...)

Plausible, though I can't really comment on this with my limited
knowledge of the HAR format.

> There are a number of perks with using HAR:
> - TBB probably already knows how to record .HAR files so the selenium
> work would basically just be to open a browser and record a few
> navigations to .HAR (I know Chrome can do this easily, so I'm assuming
> our TBB version of Firefox is also capable)

Probably.  Though we'll need to add stream/circuit references.  Can we
do that?

> - We can benefit from any tooling build around HAR, either to
> statistically analyse, or to provide visualisation.
> 
> - There is a a decent amount of research around HAR compression
> (although it basically seems to just be gzipping) but if we can support
> compressed HAR then we can allow servers to store a lot more history.
> 
> There is also the negative that the HAR files will probably provide *too
> much* data, but we could probably prune the files before archiving them
> or as a stage before total deletion.
> 
> While spending some time implementing these things, I have been playing
> with a faked alexa experiment, and I think the HAR format, or atleast
> something that allows for multiple entries per experiment results set is
> necessary for all our experiments (even static file).
> 
> I'll get back to you regarding these data formats in future when I have
> time to actually look at what the other experiments need (I've mainly
> focused on alexa & static downloading this far)

Makes sense.

>>> I think the magically detect and run part can definitely be left for
>>> future, but installation should still be this easy.
>>>
>>> Surely it's just as easy to implement detecting new experiments on
>>> service startup as to implement not doing that. (while still somehow
>>> allowing experiments to be added... is this implying hardcoded experiments?)
>>
>> I guess experiment types will be hard-coded, but experiment instances
>> will be configurable.
>>
>>> Also, perhaps you don't want to support this, but how does the patch and
>>> merge system work for quick deployments of short lived experiments? (Is
>>> there ever such a thing? Karsten?)
>>
>> Yes, there might be such a thing as short-lived experiments.  We'd
>> probably commit such a patch to a separate branch and decide after the
>> experiment if it's worth adding the experiment to the master branch.
>>
>>> Or what if someone does develop a neat set of experiments for their own
>>> personal use that doesn't really apply to the project as a whole, are we
>>> expected to merge them upstream? What if they don't want to share?
>>
>> I think we should only merge experiments that are general enough for
>> others to run.
>>
> 
> This doesn't entirely address my concern. If upstream branches are made
> for short lived experiments (rather than just sharing a folder between
> people), how do the users install that? (since they would have initially
> apt-get installed? not git/svn?)
> 
> And my concern around non-general or non-shared experiments skips the
> issue, how will they distribute them to whoever needs to run it? Their
> own git repo infrastructure? (But of course I agree we should only
> upstream general things)

I'm mostly thinking of developers who would run custom branches.  And if
somebody cannot handle Git, we can given them a tarball.  But really,
custom experiments should be the exception, not the rule.

>>>> Torperf should help with bootstrapping and shutting down tor, because
>>>> that's something that all experiments need.  Locating tor could just be
>>>> a question of passing the path to a tor binary to Torperf.  See above
>>>> for sequential vs. parallel experiments.
>>> Locating Tor should just be settings in 'the Torperf config'.
>>> { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
>>>
>>> Is there requirement to run the *same* experiment across different Tor
>>> versions at the same time (literally parallel) or just to have "I as a
>>> user set this up to run for X,Y,Z versions and ran it one time and got
>>> all my results."?
>>
>> You would typically not run an experiment a single time, but set it up
>> to run for a few days.  And you'd probably set up parallel experiments
>> to start with a short time offset.  (Not sure if this answers your
>> question.)
>>
> I messed up the question with that single word, I'll address this below.
> I meant 'ran torperf once' and 'got my results periodically as the
> schedule defines'
> 
>>> I think this is what Sathya is saying with:
>>>> We could just run a torperf test against a particular tor version, 
>>>> once that's completed, we can run against another tor version and so on.
>>> i.e. for each experiment, there's only one instance of Tor allocated for
>>> it at any time, and it does it's versioned runs sequentially.
>>
>> For each experiment there's one tor instance running a given version.
>> You wouldn't stop, downgrade/upgrade, and restart the tor instance while
>> the experiment is running.  If you want to run an experiment on
>> different tor versions, you'd start multiple experiments.  For example:
>>
>> 1. download 50KiB static file, use tor 0.2.3.x on socks port 9001, start
>> every five minutes starting at :01 of the hour.
>>
>> 2. download 50KiB static file, use tor 0.2.4.x on socks port 9002, start
>> every five minutes starting at :02 of the hour.
>>
>> 3. download 50KiB static file, use tor 0.2.5.x on socks port 9003, start
>> every five minutes starting at :03 of the hour.
>>
> I don't think it's a good idea to have to define such specificity for
> humans. They should be able to just define the 50kb (and probably more
> file sizes) with an five minute interval for versions x, y, z. (I also
> don't think the user should define the socks port to use, but that's a
> minor detail.)
> 
> I think you've answered my question here though. I'll summarise below!
> 
>>> I think the discussion above is talking about two different things, I
>>> think it would be beneficial to decide what needs to be actually
>>> parallel and what just needs to be one-time setup for a user.
>>>
>>> Are there any concerns around parallel requests causing noise in the
>>> timing information? Or are we happy to live with a small 1-2(?)ms noise
>>> level per experiment in order to benefit from faster experiment runtimes
>>> in aggregate?
>>
>> Not sure which of these questions are still open.  We should definitely
>> get this clear in the design document.  What would we write, and where
>> in the document would we put it?
>>
> Perhaps we can cover this near 2.1.2 "If the service operator wants to
> run multiple experiments...."
> 
> I think you've defined above that no experiment should be run at the
> same time as another. I.e. the main service should not be executing code
> for different experiments at the same time. (Above you've humanly
> inputted to start each a minute after, which assumes each experiment
> takes under a minute to execute --- what happens if there are timeouts
> or slow networks?)

Ah, I didn't mean that experiments should finish under 1 minute.  They
can run up to five minutes.  Starting them at :01, :02, and :03 was just
a naive way of avoiding bottlenecks during connection establishment.

> I would agree with this as it will help to keep experiments more
> accurate (e.g. static file download for a 50MB file didn't adversely
> affect the performance for a hidden service test that started at the
> same time)

(If somebody configures their Torperf to download a 50MB file, they
deserve that their tests break horribly.  Let's leave some bandwidth for
actual users. ;))

> I think the service itself should handle scheduling things on a periodic
> basis and should make it clear how late the service happened compared to
> it's desired schedule (e.g. one experiment took longer than a minute, so
> the other started 5 seconds late, so it should start in (interval - 5
> seconds) time.
> 
> How this would happen in practice would be, the service starts up,
> checks results files for last runtimes for each experiment, then runs
> any that haven't run in their last INTERVAL seconds. Assuming
> experiments execute in time lower than
> SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the
> system will perfectly schedule by default.
> 
> There could be problems with this approach though, for example, what
> happens if there is a large experiment, e.g. Alexa when the network is
> slow, which takes say, 5 minutes. If this is scheduled to run every 10
> minutes and there are some other experiments that are scheduled to run
> every 2 minutes, then we have a problem. Either the 2 minute experiments
> run on their intended schedule with potential inaccuracies caused, or
> the 2 minute interval is not a 2minute interval. I think we should aim
> for the latter and warn the user when they have made schedules like the
> above.
> - Another option is to break up experiments into chunks, where overall
> only one request is going at a time, so a request is our atomic
> scheduling option, but that becomes harder to coordinate and is highly
> inefficient in terms of network throughput.
> 
> We could monitor experiments average runtime and determine 'optimal'
> scheduling based on that, but I think the best thing in the short term
> is just to say 'don't schedule long experiments to be run frequently if
> you plan to run lots of other small experiments'

We could make the service smart enough not to start all requests at the
same time, e.g., by adding random delays.  And we could allow users to
override this by defining a manual offset for each experiment.  That
way, new users don't have to care, and expert users can fine-tune things.

>>> On that, can we be clear with our vocabulary, "Torperf tests" means
>>> "Torperf experiments", right?
>>
>> Yes.  Hope my "experiment type" vs. "experiment instance" was not too
>> confusing. ;)
>>
> That's perfectly clear to me :)

How do we proceed?  Would you mind sending me a diff of the changes to
the design document that make these things clearer to you?

Also, I'm thinking about publishing the tech report, even though there's
no running code yet (AFAIK).  The reason is that I'd like to call this
report the output of sponsor F deliverable 8.  Originally, we promised
code, but a design document is better than nothing.

https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year3

I can include changes until, say, Monday, October 28.

Thanks in advance!  And sorry again for the long delay!

All the best,
Karsten



More information about the tor-dev mailing list