[cc tor-dev]
On 16 September 2013 09:47, Karsten Loesing karsten@torproject.org wrote:
Hmm, I don't think the HTTP client/server part is the right interface to write another client and server and call it Torperf compatible. The Torperf data API would be a better interface for that: people could write their own experiments and provide data that is Torperf compatible, or they could use the data that Torperf provides and analyze or visualize it in a better way. But writing only half of an experiment, client or server, wouldn't be of much use.
I thought bit more about this and I don't fully agree (but you've nailed it that the data is the core api). I've come to the conclusion that I think it makes sense for TorPerf to do much of the heavy lifting and the core server aspects, but I think the experiments should be decoupled in such a way as to allow for flexible client implementations.
Below are a bunch of semi related musings on how to get to this:
Of course as many experiments will just be doing simple http requests, the experiments will really just be wrapper scripts around a bundled default client implementation which would be similar to the one in your perfd branch.
Specifically, to aid in this, I'd propose something like the folder structure below: https://etherpad.mozilla.org/iqrgueVFd6
[Why a set of directories? There's not a solid reason other than it forces unique names and gives good opportunity to isolate experiment specific data. If an experiment has no specific data files it should be alright to just have the config file instead of a directory. A directory structure also mimics the idea of a restful web service as mentioned in the pdf, i.e. the user could easily know to go to http:/.../results/myexperiment/ to see results filtered for that single experiment. Either way, it's a minor detail, I just feel it's easier for a user.]
A good use of the specific experiment data could be for static files, where anything in an experiments '/public' folder would be served by the webserver while that experiment is running. If wanting to ensure nothing gets proxy cached, the experiment could be responsible for generating random data files for each run. (This is a clear separation of server vs experiment implementation concern.) Another idea could be custom experiment views(js probably) that would transform the experiment results when viewed on the web dashboard.
The experiments should not be responsible for determining details such as socks port or control port, the TorPerf service should deal with load balancing a bunch of tor instances on different ports and just tell an experiment 'Do your stuff using this sock port, this public ip:port, etc...' via environment variables. (The config could ask that certain aspects of the torrc file are setup specifically but managing ports and stuff is just asking for user error unless there's some experiment that requires it.)
The config should minimally have an execution rate and a command. The command could be to just execute the bundled TorPerf client implementation with boring parameters e.g. just fetch facebook.com and record the normal timings that the default implementation tracks.
A more interesting config's command could be the alexa_top_100 where it's just a basic script/makefile that fetches a new alexa list if the one in the local folder is older than X days and then for each site in the list it runs the TorPerf client instance.
The TorPerf service instance should be able to run experiments by just executing the commands and recording whatever is written to stdout and stderr. After the command exits, if it's non zero then the stderr output is treated as error messages while if it's zero then it's treated as info messages. The stdout data should be treated as TorPerf results and it's an error if it's not well formed. If it's not well formed it should be captured in the results file for debug purposes. An example of informational output might be that the alexa_top_100 experiment updated it's list before this result set. [On the web interface it should be clear which results sets contained errors or information]
Once the experiment finishes, the service should postprocess the results from the experiment and replace any fingerprinted entries (This needs to be well defined) with the server side timing information for that specific fingerprint. Then the server should store the results file in a timestamped file (timestamped probably by experiment start time) and update it's database(if there is one).
The experiments should be able to specify their required tor version in their config, but it should accept placeholder values such as the default 'latest', 'latest-stable' or even 'latest(3)' which would run the experiment for all 3 of the latest tor versions. I think the ability to check the performance of the same experiment over multiple Tor versions could be interesting, especially to determine if any one build has caused anomalies in performance. I would expect very few experiments to run across multiple versions though.
Additionally, something that could be neat, but it's not clearly in the requirements, should TorPerf be responsible for notifying the user when there are new clients available to run as latest? Would it be useful to be able to specify that some experiments should be run on 'master' or a gitref and that it would be pulled between runs? That's probably not practical.
Apologies for the length and lack of order! Kevin
On 9/17/13 3:33 AM, Kevin Butler wrote:
[cc tor-dev]
On 16 September 2013 09:47, Karsten Loesing karsten@torproject.org wrote:
Hmm, I don't think the HTTP client/server part is the right interface to write another client and server and call it Torperf compatible. The Torperf data API would be a better interface for that: people could write their own experiments and provide data that is Torperf compatible, or they could use the data that Torperf provides and analyze or visualize it in a better way. But writing only half of an experiment, client or server, wouldn't be of much use.
I thought bit more about this and I don't fully agree (but you've nailed it that the data is the core api). I've come to the conclusion that I think it makes sense for TorPerf to do much of the heavy lifting and the core server aspects, but I think the experiments should be decoupled in such a way as to allow for flexible client implementations.
Ah, let me clarify what I meant above: splitting the client part and the server part of an experiment doesn't seem of much use to me. For example, the HTTP/SOCKS client that fetches static files and the HTTP server that serves those files shouldn't be distributed to two code repositories or packages. Because if either part changes, the other part needs to be changed, too.
But I totally agree with you that it should be easy to add new experiments to Torperf. When I mentioned the data API, my idea was that somebody writes their own Torperf and provides data in a format that our Torperf understands, or that somebody takes our results and does neat stuff with them.
Of course, another way to allow for adding new experiments is to define a clear interface for extending our Torperf to support them. That's what you have in mind, I think.
Below are a bunch of semi related musings on how to get to this:
Without going into the details, there are some great ideas below!
Can you help me add some structure to your ideas by adding them to the appropriate sections of the design document? Can you clone the Git repo, edit the .tex file, commit your changes, run git format-patch HEAD^, and send me the output? Here's the repository:
https://gitweb.torproject.org/user/karsten/tech-reports.git, branch torperf2
A few quick comments:
Of course as many experiments will just be doing simple http requests, the experiments will really just be wrapper scripts around a bundled default client implementation which would be similar to the one in your perfd branch.
Specifically, to aid in this, I'd propose something like the folder structure below: https://etherpad.mozilla.org/iqrgueVFd6
[Why a set of directories? There's not a solid reason other than it forces unique names and gives good opportunity to isolate experiment specific data. If an experiment has no specific data files it should be alright to just have the config file instead of a directory. A directory structure also mimics the idea of a restful web service as mentioned in the pdf, i.e. the user could easily know to go to http:/.../results/myexperiment/ to see results filtered for that single experiment. Either way, it's a minor detail, I just feel it's easier for a user.]
I like the idea of configuration directories.
A good use of the specific experiment data could be for static files, where anything in an experiments '/public' folder would be served by the webserver while that experiment is running. If wanting to ensure nothing gets proxy cached, the experiment could be responsible for generating random data files for each run. (This is a clear separation of server vs experiment implementation concern.) Another idea could be custom experiment views(js probably) that would transform the experiment results when viewed on the web dashboard.
The experiments should not be responsible for determining details such as socks port or control port, the TorPerf service should deal with load balancing a bunch of tor instances on different ports and just tell an experiment 'Do your stuff using this sock port, this public ip:port, etc...' via environment variables. (The config could ask that certain aspects of the torrc file are setup specifically but managing ports and stuff is just asking for user error unless there's some experiment that requires it.)
The config should minimally have an execution rate and a command. The command could be to just execute the bundled TorPerf client implementation with boring parameters e.g. just fetch facebook.com and record the normal timings that the default implementation tracks.
A more interesting config's command could be the alexa_top_100 where it's just a basic script/makefile that fetches a new alexa list if the one in the local folder is older than X days and then for each site in the list it runs the TorPerf client instance.
The TorPerf service instance should be able to run experiments by just executing the commands and recording whatever is written to stdout and stderr. After the command exits, if it's non zero then the stderr output is treated as error messages while if it's zero then it's treated as info messages. The stdout data should be treated as TorPerf results and it's an error if it's not well formed. If it's not well formed it should be captured in the results file for debug purposes. An example of informational output might be that the alexa_top_100 experiment updated it's list before this result set. [On the web interface it should be clear which results sets contained errors or information]
Executing scripts and reading stdout/stderr is probably too low-level. I think we need a Python/Twisted (or whatever language Torperf will be written in) interface for running an experiment and retrieving results.
Once the experiment finishes, the service should postprocess the results from the experiment and replace any fingerprinted entries (This needs to be well defined) with the server side timing information for that specific fingerprint. Then the server should store the results file in a timestamped file (timestamped probably by experiment start time) and update it's database(if there is one).
The experiments should be able to specify their required tor version in their config, but it should accept placeholder values such as the default 'latest', 'latest-stable' or even 'latest(3)' which would run the experiment for all 3 of the latest tor versions. I think the ability to check the performance of the same experiment over multiple Tor versions could be interesting, especially to determine if any one build has caused anomalies in performance. I would expect very few experiments to run across multiple versions though.
Additionally, something that could be neat, but it's not clearly in the requirements, should TorPerf be responsible for notifying the user when there are new clients available to run as latest? Would it be useful to be able to specify that some experiments should be run on 'master' or a gitref and that it would be pulled between runs? That's probably not practical.
Apologies for the length and lack of order!
Well, thanks for your input! As I said above, it would help a lot if you added these ideas to the appropriate sections of the design document.
Thanks in advance!
All the best, Karsten
Executing scripts and reading stdout/stderr is probably too low-level. I think we need a Python/Twisted (or whatever language Torperf will be written in) interface for running an experiment and retrieving results.
You're probably right on stdout/err being too low level, but most
experiments would be reusing the provided implementation, just wrapping them in a simple manner. Anyway I think there's a language agnostic way to get this working, without forcing any extensibility on matching Torperfs language of choice. I've tried to be pretty generic in my attached changes. :)
Well, thanks for your input! As I said above, it would help a lot if you added these ideas to the appropriate sections of the design document.
Please see attached.
Regards, Kevin
On 9/18/13 3:49 AM, Kevin Butler wrote:
Executing scripts and reading stdout/stderr is probably too low-level. I think we need a Python/Twisted (or whatever language Torperf will be written in) interface for running an experiment and retrieving results.
You're probably right on stdout/err being too low level, but most experiments would be reusing the provided implementation, just wrapping them in a simple manner. Anyway I think there's a language agnostic way to get this working, without forcing any extensibility on matching Torperfs language of choice. I've tried to be pretty generic in my attached changes. :)
I don't see how we could make new experiments language agnostic. These new experiments will want to re-use Torperf classes/components, which they can't do if they're just "shell scripts". They need to implement an interface provided by Torperf, and Torperf needs to provide functionality via an interface that the experiments understand. If an experiment is in fact just a shell script, it should be trivial to write a wrapper class to call that. But I think that's the exception, not the rule.
Or maybe we have a different understanding of an "experiment". Can you give an example for an experiment that is not listed in Section 3 of the design document and state how you'd want to integrate that into Torperf without touching its sources?
Well, thanks for your input! As I said above, it would help a lot if you added these ideas to the appropriate sections of the design document.
Please see attached.
Awesome! I applied your patch, though I tweaked some parts and commented out other parts, explaining my reasons in the LaTeX sources. Happy to discuss these things further if you want!
I also added an Appendix A with suggested data formats. Maybe these data formats make it clearer what I'd expect as output from an experiment.
https://people.torproject.org/~karsten/volatile/torperf2.pdf
If you have further suggestions how to improve the requirements or design, please send me a new patch, and I'll apply it and comment on it. Thanks!
All the best, Karsten
Hi,
I have some comments on the updated pdf -
It should be easy for a user to implement or install an experiment that isn’t bundled with the core distribution. Ideally, installing an experiment should be as simple as unzipping a folder or config file into an experiments folder.
I don't understand how this will work when users just apt-get install torperf. Ideally if someone writes a good experiment, they should send the patches upstream and get it merged, and then we update torperf to include those tests and then the users just update torperf with their package managers.
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
The new Torperf should come with an easy-to-use library to process its results
Torperf results should just be JSON(or similar) files that already have libraries and we should invent a new result format and write a library for it.
request scheduler Start new requests following a previously configured schedule. request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
These are experiment specific. Some tests may not even need to do requests. No need for these to be a part of torperf.
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Thanks, --Sathya
On Wed, Sep 18, 2013 at 6:21 AM, Karsten Loesing karsten@torproject.org wrote:
On 9/18/13 3:49 AM, Kevin Butler wrote:
Executing scripts and reading stdout/stderr is probably too low-level. I think we need a Python/Twisted (or whatever language Torperf will be written in) interface for running an experiment and retrieving results.
You're probably right on stdout/err being too low level, but most experiments would be reusing the provided implementation, just wrapping them in a simple manner. Anyway I think there's a language agnostic way to get this working, without forcing any extensibility on matching Torperfs language of choice. I've tried to be pretty generic in my attached changes. :)
I don't see how we could make new experiments language agnostic. These new experiments will want to re-use Torperf classes/components, which they can't do if they're just "shell scripts". They need to implement an interface provided by Torperf, and Torperf needs to provide functionality via an interface that the experiments understand. If an experiment is in fact just a shell script, it should be trivial to write a wrapper class to call that. But I think that's the exception, not the rule.
Or maybe we have a different understanding of an "experiment". Can you give an example for an experiment that is not listed in Section 3 of the design document and state how you'd want to integrate that into Torperf without touching its sources?
Well, thanks for your input! As I said above, it would help a lot if you added these ideas to the appropriate sections of the design document.
Please see attached.
Awesome! I applied your patch, though I tweaked some parts and commented out other parts, explaining my reasons in the LaTeX sources. Happy to discuss these things further if you want!
I also added an Appendix A with suggested data formats. Maybe these data formats make it clearer what I'd expect as output from an experiment.
https://people.torproject.org/~karsten/volatile/torperf2.pdf
If you have further suggestions how to improve the requirements or design, please send me a new patch, and I'll apply it and comment on it. Thanks!
All the best, Karsten
On 9/23/13 12:53 AM, Sathyanarayanan Gunasekaran wrote:
Hi,
I have some comments on the updated pdf -
Thanks! Much appreciated!
It should be easy for a user to implement or install an experiment that isn’t bundled with the core distribution. Ideally, installing an experiment should be as simple as unzipping a folder or config file into an experiments folder.
I don't understand how this will work when users just apt-get install torperf. Ideally if someone writes a good experiment, they should send the patches upstream and get it merged, and then we update torperf to include those tests and then the users just update torperf with their package managers.
I agree with you that this is a rather unusual requirement and that adding new experiments to Torperf is the better approach. That's why the paragraph said "should" and "ideally". I added your concerns to the design document to make this clearer. (Maybe we should mark requirements as either "must-do", "should-do", or "could-do"?)
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure how to work around. For example, we want a single web server listening on port 80 for all experiments and for providing results.
Why do you think it's hard to run different tor versions or binaries in the same Torperf service instance?
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors. We might also want to measure older versions which aren't contained in package repositories anymore, and we might want to measure custom branches with performance tweaks. Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar). Shadow can download a user-defined tor version ('--tor-version'), or it can build a local tor path ('--tor-prefix'):
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
Why not? The alternative is to build another tool that downloads result files from remote instances. That's what we do right now (see footnote: "For reference, the current Torperf produces measurement results which are re-formatted by metrics-db and visualized by metrics-web with help of metrics-lib. Any change to Torperf triggers subsequent changes to the other three codebases, which is suboptimal.")
The new Torperf should come with an easy-to-use library to process its results
Torperf results should just be JSON(or similar) files that already have libraries and we should invent a new result format and write a library for it.
Yes, that's what I mean. If you understood this differently, can you rephrase the paragraph?
request scheduler Start new requests following a previously configured schedule. request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
These are experiment specific. Some tests may not even need to do requests. No need for these to be a part of torperf.
I'm thinking how we can reduce code duplication as much as possible. The experiments in the design document all make requests, so it would be beneficial for them to have Torperf schedule and handle their requests. If an experiment doesn't have the notion of request it doesn't have to use the request scheduler or runner. But how would such an experiment work? Do you have an example?
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store. I still think a database makes sense here, but this is not a requirement. As long as we can store, retrieve, and periodically delete results, everything's fine.
Again, thanks a lot for your input!
Updated PDF:
https://people.torproject.org/~karsten/volatile/torperf2.pdf
All the best, Karsten
Hi.
On Tue, Sep 24, 2013 at 6:03 AM, Karsten Loesing karsten@torproject.org wrote:
On 9/23/13 12:53 AM, Sathyanarayanan Gunasekaran wrote:
I don't understand how this will work when users just apt-get install torperf. Ideally if someone writes a good experiment, they should send the patches upstream and get it merged, and then we update torperf to include those tests and then the users just update torperf with their package managers.
I agree with you that this is a rather unusual requirement and that adding new experiments to Torperf is the better approach. That's why the paragraph said "should" and "ideally". I added your concerns to the design document to make this clearer. (Maybe we should mark requirements as either "must-do", "should-do", or "could-do"?)
Well, "ideally" implies that we want to do this at some point. Do we?
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure how to work around. For example, we want a single web server listening on port 80 for all experiments and for providing results.
Oh. I did not mean running multiple torperf instances *simultaneously*; I just meant sequentially.
Why do you think it's hard to run different tor versions or binaries in the same Torperf service instance?
Then each experiment needs to deal with locating, bootstrapping, and shutting down Tor. We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on. I'm not against this idea -- it can be done. I just don't think it's high priority.
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors. We might also want to measure older versions which aren't contained in package repositories anymore, and we might want to measure custom branches with performance tweaks. Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar). Shadow can download a user-defined tor version ('--tor-version'), or it can build a local tor path ('--tor-prefix'):
If the user wants to run torperf against tor versions that are not present in the package managers, then the user should download and build tor -- not torperf. Once a local binary is present, the user can run torperf against it with a --tor prefix.
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
Nope, this is perfectly fine. I just don't want torperf to download, verify and build tor.
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
Why not? The alternative is to build another tool that downloads result files from remote instances. That's what we do right now (see footnote: "For reference, the current Torperf produces measurement results which are re-formatted by metrics-db and visualized by metrics-web with help of metrics-lib. Any change to Torperf triggers subsequent changes to the other three codebases, which is suboptimal.")
This could just be a wget script that downloads the results from another server. I just don't want that to be a part of torperf. Torperf should just measure performance and display data, IMHO -- not worry about downloading and aggregating results from another system. Or maybe we can do this later and change it to "Ideally torperf should .."
The new Torperf should come with an easy-to-use library to process its results
Torperf results should just be JSON(or similar) files that already have libraries and we should invent a new result format and write a library for it.
Yes, that's what I mean. If you understood this differently, can you rephrase the paragraph?
"Torperf should store its results in a format that is widely used and already has libraries(like JSON), so that other applications can use the results and build on it". Maybe?
request scheduler Start new requests following a previously configured schedule. request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
These are experiment specific. Some tests may not even need to do requests. No need for these to be a part of torperf.
I'm thinking how we can reduce code duplication as much as possible. The experiments in the design document all make requests, so it would be beneficial for them to have Torperf schedule and handle their requests. If an experiment doesn't have the notion of request it doesn't have to use the request scheduler or runner. But how would such an experiment work? Do you have an example?
Nope I don't have an example. Maybe as I write the tests, I'll have a better idea about the structure. Ignore this comment for now!
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store. I still think a database makes sense here, but this is not a requirement. As long as we can store, retrieve, and periodically delete results, everything's fine.
Cool!
Again, thanks a lot for your input!
Updated PDF:
https://people.torproject.org/~karsten/volatile/torperf2.pdf
Great, thanks!
--Sathya
On 9/24/13 1:39 PM, Sathyanarayanan Gunasekaran wrote:
Hi.
On Tue, Sep 24, 2013 at 6:03 AM, Karsten Loesing karsten@torproject.org wrote:
On 9/23/13 12:53 AM, Sathyanarayanan Gunasekaran wrote:
I don't understand how this will work when users just apt-get install torperf. Ideally if someone writes a good experiment, they should send the patches upstream and get it merged, and then we update torperf to include those tests and then the users just update torperf with their package managers.
I agree with you that this is a rather unusual requirement and that adding new experiments to Torperf is the better approach. That's why the paragraph said "should" and "ideally". I added your concerns to the design document to make this clearer. (Maybe we should mark requirements as either "must-do", "should-do", or "could-do"?)
Well, "ideally" implies that we want to do this at some point. Do we?
I don't feel strongly. I'd prefer a design that makes it easy to add new experiments, but I'm fine with an approach that requires merging patches. We can always add the functionality to drop something in a directory and make Torperf magically detect and run the new experiment, but that can happen later. Maybe we shouldn't let that distract us from getting the first version done. I commented out this section.
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure how to work around. For example, we want a single web server listening on port 80 for all experiments and for providing results.
Oh. I did not mean running multiple torperf instances *simultaneously*; I just meant sequentially.
But what if we want to run multiple experiments at the same time? That's a quite common requirement. Right now, we run 3 Torperf experiments on ferrinii at the same time. A few years ago, we ran 15 experiments with tors using different guard selection strategies and downloading different file sizes.
Why do you think it's hard to run different tor versions or binaries in the same Torperf service instance?
Then each experiment needs to deal with locating, bootstrapping, and shutting down Tor. We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on. I'm not against this idea -- it can be done. I just don't think it's high priority.
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors. We might also want to measure older versions which aren't contained in package repositories anymore, and we might want to measure custom branches with performance tweaks. Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar). Shadow can download a user-defined tor version ('--tor-version'), or it can build a local tor path ('--tor-prefix'):
If the user wants to run torperf against tor versions that are not present in the package managers, then the user should download and build tor -- not torperf. Once a local binary is present, the user can run torperf against it with a --tor prefix.
It's perfectly fine if the first version only supports a '--tor-binary' option and leaves downloading and building of custom tor versions to the user. Of course, Torperf should be able to support the default tor binary that comes with the operating system for non-expert users. But supporting a '--tor-version' option that downloads and builds a tor binary can come in version two. I tried to describe this approach in the design document. (Please rephrase as needed.)
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
Nope, this is perfectly fine. I just don't want torperf to download, verify and build tor.
Perfectly fine to ignore for now. It's not a crazy feature. But let's do this later.
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
Why not? The alternative is to build another tool that downloads result files from remote instances. That's what we do right now (see footnote: "For reference, the current Torperf produces measurement results which are re-formatted by metrics-db and visualized by metrics-web with help of metrics-lib. Any change to Torperf triggers subsequent changes to the other three codebases, which is suboptimal.")
This could just be a wget script that downloads the results from another server. I just don't want that to be a part of torperf. Torperf should just measure performance and display data, IMHO -- not worry about downloading and aggregating results from another system. Or maybe we can do this later and change it to "Ideally torperf should .."
This isn't the most urgent feature to build, though we need it before we can kill the current Torperf and replace it with the new one. However, using wget to download results from another service is exactly the approach that brought us to the current situation of Torperf being a bunch of scripts. I'd rather not want to write a single script for Torperf to do what it's supposed to do, but design it in a way that it can already do all the things we want it to do. Accumulating results and presenting them is part of these things.
The new Torperf should come with an easy-to-use library to process its results
Torperf results should just be JSON(or similar) files that already have libraries and we should invent a new result format and write a library for it.
Yes, that's what I mean. If you understood this differently, can you rephrase the paragraph?
"Torperf should store its results in a format that is widely used and already has libraries(like JSON), so that other applications can use the results and build on it". Maybe?
Changed. (See also the footnote which I put in a few hours ago.)
request scheduler Start new requests following a previously configured schedule. request runner Handle a single request from creation over various possible sub states to timeout, failure, or completion.
These are experiment specific. Some tests may not even need to do requests. No need for these to be a part of torperf.
I'm thinking how we can reduce code duplication as much as possible. The experiments in the design document all make requests, so it would be beneficial for them to have Torperf schedule and handle their requests. If an experiment doesn't have the notion of request it doesn't have to use the request scheduler or runner. But how would such an experiment work? Do you have an example?
Nope I don't have an example. Maybe as I write the tests, I'll have a better idea about the structure. Ignore this comment for now!
Okay.
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store. I still think a database makes sense here, but this is not a requirement. As long as we can store, retrieve, and periodically delete results, everything's fine.
Cool!
Again, thanks a lot for your input!
Updated PDF:
https://people.torproject.org/~karsten/volatile/torperf2.pdf
Great, thanks!
Updated the PDF again, same URL. Thanks!
All the best, Karsten
Hi Karsten, Sathya,
Sorry for the delayed response, been having connection issues all week on anything other than a phone ~_~. I've included updates from Sathya's later mails below also with additional comments added.
I don't see how we could make new experiments language agnostic. These new experiments will want to re-use Torperf classes/components, which they can't do if they're just "shell scripts". They need to implement an interface provided by Torperf, and Torperf needs to provide functionality via an interface that the experiments understand. If an experiment is in fact just a shell script, it should be trivial to write a wrapper class to call that. But I think that's the exception, not the rule.
Or maybe we have a different understanding of an "experiment". Can you give an example for an experiment that is not listed in Section 3 of the design document and state how you'd want to integrate that into Torperf without touching its sources?
I don't think it's such a hard piece to achieve, if you consider the Alexa & Selenium experiment. We're going to need to have a basic HTTP proxy implementation inside the Torperf service (proxying to the socks proxy of a specific tor version, specified by the experiment's config).
If you imagine this, this HTTP proxy is already an interface that applies most of the logic required (times appropriate http and socks timings, socks ports, tor version, time started, etc) so the selenium client is really just responsible for it's unique data.
Then, assuming the result format is not hard to muster up (as you've specified it currently, it should be simple), then gaining agnostic experiments would not be difficult.
A concrete example that differs from Section 3, would be; that, to change the alexa experiment to do say the top5 sites in France or something, it should hopefully be trivial to just change a text file and be done with it instead of having to be familiar with Python/whatever.
That said, it is an 'ideally we could' kind of point, so not a blocker to not aim for it. Either way, the user will be free to hack on whatever experiments so I'm sure it won't be hard for them to do the above by just hacking on the final implementation :) The real users are likely technically adept I guess!(?)
I also added an Appendix A with suggested data formats. Maybe these data formats make it clearer what I'd expect as output from an experiment.
This is great, thanks for this. We're thinking along the same lines for that experiment at least :) I think it would be useful to also add desired/required information on other experiments as we progress as it can definitely help clarify what is required from each implementation.
I agree with you that this is a rather unusual requirement and that adding new experiments to Torperf is the better approach. That's why the paragraph said "should" and "ideally". I added your concerns to the design document to make this clearer. (Maybe we should mark requirements as either "must-do", "should-do", or "could-do"?)
Well, "ideally" implies that we want to do this at some point. Do we?
I don't feel strongly. I'd prefer a design that makes it easy to add new experiments, but I'm fine with an approach that requires merging patches. We can always add the functionality to drop something in a directory and make Torperf magically detect and run the new experiment, but that can happen later. Maybe we shouldn't let that distract us from getting the first version done. I commented out this section.
I think the magically detect and run part can definitely be left for future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on service startup as to implement not doing that. (while still somehow allowing experiments to be added... is this implying hardcoded experiments?)
Also, perhaps you don't want to support this, but how does the patch and merge system work for quick deployments of short lived experiments? (Is there ever such a thing? Karsten?) Or what if someone does develop a neat set of experiments for their own personal use that doesn't really apply to the project as a whole, are we expected to merge them upstream? What if they don't want to share?
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure how to work around. For example, we want a single web server listening on port 80 for all experiments and for providing results.
Oh. I did not mean running multiple torperf instances *simultaneously*; I just meant sequentially.
But what if we want to run multiple experiments at the same time? That's a quite common requirement. Right now, we run 3 Torperf experiments on ferrinii at the same time. A few years ago, we ran 15 experiments with tors using different guard selection strategies and downloading different file sizes.
I disagree with removing this requirement.
Why do you think it's hard to run different tor versions or binaries in the same Torperf service instance?
Then each experiment needs to deal with locating, bootstrapping, and shutting down Tor. We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on. I'm not against this idea -- it can be done. I just don't think it's high priority.
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'. { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor versions at the same time (literally parallel) or just to have "I as a user set this up to run for X,Y,Z versions and ran it one time and got all my results."?
I think this is what Sathya is saying with:
We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for it at any time, and it does it's versioned runs sequentially.
I think the discussion above is talking about two different things, I think it would be beneficial to decide what needs to be actually parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the timing information? Or are we happy to live with a small 1-2(?)ms noise level per experiment in order to benefit from faster experiment runtimes in aggregate?
On that, can we be clear with our vocabulary, "Torperf tests" means "Torperf experiments", right?
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors. We might also want to measure older versions which aren't contained in package repositories anymore, and we might want to measure custom branches with performance tweaks. Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar). Shadow can download a user-defined tor version ('--tor-version'), or it can build a local tor path ('--tor-prefix'):
If the user wants to run torperf against tor versions that are not present in the package managers, then the user should download and build tor -- not torperf. Once a local binary is present, the user can run torperf against it with a --tor prefix.
It's perfectly fine if the first version only supports a '--tor-binary' option and leaves downloading and building of custom tor versions to the user. Of course, Torperf should be able to support the default tor binary that comes with the operating system for non-expert users. But supporting a '--tor-version' option that downloads and builds a tor binary can come in version two. I tried to describe this approach in the design document. (Please rephrase as needed.)
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
Nope, this is perfectly fine. I just don't want torperf to download, verify and build tor.
Perfectly fine to ignore for now. It's not a crazy feature. But let's do this later.
Agree on this, no need to do downloading / verifying / installing Tor in initial releases, it's likely a huge PITA. But I think we should have the tor binary locations listed in the config rather than a command flag. (Listing multiple Tor versions via command flag seems a lot more error prone to me)
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
Why not? The alternative is to build another tool that downloads result files from remote instances. That's what we do right now (see footnote: "For reference, the current Torperf produces measurement results which are re-formatted by metrics-db and visualized by metrics-web with help of metrics-lib. Any change to Torperf triggers subsequent changes to the other three codebases, which is suboptimal.")
This could just be a wget script that downloads the results from another server. I just don't want that to be a part of torperf. Torperf should just measure performance and display data, IMHO -- not worry about downloading and aggregating results from another system. Or maybe we can do this later and change it to "Ideally torperf should .."
This isn't the most urgent feature to build, though we need it before we can kill the current Torperf and replace it with the new one. However, using wget to download results from another service is exactly the approach that brought us to the current situation of Torperf being a bunch of scripts. I'd rather not want to write a single script for Torperf to do what it's supposed to do, but design it in a way that it can already do all the things we want it to do. Accumulating results and presenting them is part of these things.
"Torperf should just measure performance and display data", displaying aggregate data is displaying data! :P
But, surely if Torperf just achieves this by wget'ing stuff, and the user doesn't have to worry about anything other than setting a remote server and an interval to poll, that would be considered done? (Torperf handles the scheduling and managing of the data files)
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store. I still think a database makes sense here, but this is not a requirement. As long as we can store, retrieve, and periodically delete results, everything's fine.
Cool!
I don't think we need a database for the actual results (but a flat file structure is just a crap database! :). I do however think, once we start to provide the data visualisation aspects, it will need a database for performance when doing queries that are more than simple listings.
regards, Kevin
On 9/25/13 10:30 PM, Kevin wrote:
Hi Karsten, Sathya,
Sorry for the delayed response, been having connection issues all week on anything other than a phone ~_~. I've included updates from Sathya's later mails below also with additional comments added.
I don't see how we could make new experiments language agnostic. These new experiments will want to re-use Torperf classes/components, which they can't do if they're just "shell scripts". They need to implement an interface provided by Torperf, and Torperf needs to provide functionality via an interface that the experiments understand. If an experiment is in fact just a shell script, it should be trivial to write a wrapper class to call that. But I think that's the exception, not the rule.
Or maybe we have a different understanding of an "experiment". Can you give an example for an experiment that is not listed in Section 3 of the design document and state how you'd want to integrate that into Torperf without touching its sources?
I don't think it's such a hard piece to achieve, if you consider the Alexa & Selenium experiment. We're going to need to have a basic HTTP proxy implementation inside the Torperf service (proxying to the socks proxy of a specific tor version, specified by the experiment's config).
If you imagine this, this HTTP proxy is already an interface that applies most of the logic required (times appropriate http and socks timings, socks ports, tor version, time started, etc) so the selenium client is really just responsible for it's unique data.
Then, assuming the result format is not hard to muster up (as you've specified it currently, it should be simple), then gaining agnostic experiments would not be difficult.
A concrete example that differs from Section 3, would be; that, to change the alexa experiment to do say the top5 sites in France or something, it should hopefully be trivial to just change a text file and be done with it instead of having to be familiar with Python/whatever.
That said, it is an 'ideally we could' kind of point, so not a blocker to not aim for it. Either way, the user will be free to hack on whatever experiments so I'm sure it won't be hard for them to do the above by just hacking on the final implementation :) The real users are likely technically adept I guess!(?)
Users are likely technically adept, yes, but it might be that some just do us a favor by running Torperf on their well-connected machine and don't want to think hard what they're doing.
But it seems we agree here that we shouldn't include this in the first version.
I also added an Appendix A with suggested data formats. Maybe these data formats make it clearer what I'd expect as output from an experiment.
This is great, thanks for this. We're thinking along the same lines for that experiment at least :) I think it would be useful to also add desired/required information on other experiments as we progress as it can definitely help clarify what is required from each implementation.
Want to help define the remaining data formats? I think we need these formats:
- file_upload would be quite similar to file_download, but for the GET POST performance experiment. Or maybe we can generalize file_download to cover either GET or POST requests and the respective timings.
- We'll need a document for hidden_service_request that does not only contain timings, but also references the client-side circuit used to fetch the hidden service descriptor, rendezvous circuit, and introduction circuit, and server-side introduction and rendezvous circuits.
- These data formats are all for fetching/posting static files. We should decide on a data format for actual website fetches. Rob van der Hoeven suggested HAR, which I included in a footnote. So, maybe we should extend HAR to store the tor-specific stuff, or we should come up with something else.
- Are there any other data formats missing?
I agree with you that this is a rather unusual requirement and that adding new experiments to Torperf is the better approach. That's why the paragraph said "should" and "ideally". I added your concerns to the design document to make this clearer. (Maybe we should mark requirements as either "must-do", "should-do", or "could-do"?)
Well, "ideally" implies that we want to do this at some point. Do we?
I don't feel strongly. I'd prefer a design that makes it easy to add new experiments, but I'm fine with an approach that requires merging patches. We can always add the functionality to drop something in a directory and make Torperf magically detect and run the new experiment, but that can happen later. Maybe we shouldn't let that distract us from getting the first version done. I commented out this section.
I think the magically detect and run part can definitely be left for future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on service startup as to implement not doing that. (while still somehow allowing experiments to be added... is this implying hardcoded experiments?)
I guess experiment types will be hard-coded, but experiment instances will be configurable.
Also, perhaps you don't want to support this, but how does the patch and merge system work for quick deployments of short lived experiments? (Is there ever such a thing? Karsten?)
Yes, there might be such a thing as short-lived experiments. We'd probably commit such a patch to a separate branch and decide after the experiment if it's worth adding the experiment to the master branch.
Or what if someone does develop a neat set of experiments for their own personal use that doesn't really apply to the project as a whole, are we expected to merge them upstream? What if they don't want to share?
I think we should only merge experiments that are general enough for others to run.
It should be possible to run different experiments with different tor versions or binaries in the same Torperf service instance.
I don't think we need this now. I'm totally ok with having users run different torperf instances for different tor versions.
Running multiple Torperf instances has disadvantages that I'm not sure how to work around. For example, we want a single web server listening on port 80 for all experiments and for providing results.
Oh. I did not mean running multiple torperf instances *simultaneously*; I just meant sequentially.
But what if we want to run multiple experiments at the same time? That's a quite common requirement. Right now, we run 3 Torperf experiments on ferrinii at the same time. A few years ago, we ran 15 experiments with tors using different guard selection strategies and downloading different file sizes.
I disagree with removing this requirement.
Yes, so do I.
Why do you think it's hard to run different tor versions or binaries in the same Torperf service instance?
Then each experiment needs to deal with locating, bootstrapping, and shutting down Tor. We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on. I'm not against this idea -- it can be done. I just don't think it's high priority.
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'. { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor versions at the same time (literally parallel) or just to have "I as a user set this up to run for X,Y,Z versions and ran it one time and got all my results."?
You would typically not run an experiment a single time, but set it up to run for a few days. And you'd probably set up parallel experiments to start with a short time offset. (Not sure if this answers your question.)
I think this is what Sathya is saying with:
We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for it at any time, and it does it's versioned runs sequentially.
For each experiment there's one tor instance running a given version. You wouldn't stop, downgrade/upgrade, and restart the tor instance while the experiment is running. If you want to run an experiment on different tor versions, you'd start multiple experiments. For example:
1. download 50KiB static file, use tor 0.2.3.x on socks port 9001, start every five minutes starting at :01 of the hour.
2. download 50KiB static file, use tor 0.2.4.x on socks port 9002, start every five minutes starting at :02 of the hour.
3. download 50KiB static file, use tor 0.2.5.x on socks port 9003, start every five minutes starting at :03 of the hour.
I think the discussion above is talking about two different things, I think it would be beneficial to decide what needs to be actually parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the timing information? Or are we happy to live with a small 1-2(?)ms noise level per experiment in order to benefit from faster experiment runtimes in aggregate?
Not sure which of these questions are still open. We should definitely get this clear in the design document. What would we write, and where in the document would we put it?
On that, can we be clear with our vocabulary, "Torperf tests" means "Torperf experiments", right?
Yes. Hope my "experiment type" vs. "experiment instance" was not too confusing. ;)
It might be beneficial to provide a mechanism to download and verify the signature of new tor versions as they are released. The user could speficy if they plan to test stable, beta or alpha versions of tor with their Torperf instance.
IMHO, torperf should just measure performance, not download Tor or verify signatures. We have good package managers that do that already.
Ah, we don't just want to measure packaged tors. We might also want to measure older versions which aren't contained in package repositories anymore, and we might want to measure custom branches with performance tweaks. Not sure if we actually want to verify signatures of tor versions.
I think we should take Shadow's approach (or something similar). Shadow can download a user-defined tor version ('--tor-version'), or it can build a local tor path ('--tor-prefix'):
If the user wants to run torperf against tor versions that are not present in the package managers, then the user should download and build tor -- not torperf. Once a local binary is present, the user can run torperf against it with a --tor prefix.
It's perfectly fine if the first version only supports a '--tor-binary' option and leaves downloading and building of custom tor versions to the user. Of course, Torperf should be able to support the default tor binary that comes with the operating system for non-expert users. But supporting a '--tor-version' option that downloads and builds a tor binary can come in version two. I tried to describe this approach in the design document. (Please rephrase as needed.)
https://github.com/shadow/shadow/blob/master/setup#L109
Do you see any problems with this?
Nope, this is perfectly fine. I just don't want torperf to download, verify and build tor.
Perfectly fine to ignore for now. It's not a crazy feature. But let's do this later.
Agree on this, no need to do downloading / verifying / installing Tor in initial releases, it's likely a huge PITA. But I think we should have the tor binary locations listed in the config rather than a command flag. (Listing multiple Tor versions via command flag seems a lot more error prone to me)
Ah, my mistake with the command flag. Yes, config option.
A Torperf service instance should be able to accumulate results from its own experiments and remote Torperf service instances
Torperf should not accumulate results from remote Torperf service instances. If by "accumulate", you mean read another file from /results which the *user* has downloaded, then yes. Torperf shouldn't *download* result files from remote instances.
Why not? The alternative is to build another tool that downloads result files from remote instances. That's what we do right now (see footnote: "For reference, the current Torperf produces measurement results which are re-formatted by metrics-db and visualized by metrics-web with help of metrics-lib. Any change to Torperf triggers subsequent changes to the other three codebases, which is suboptimal.")
This could just be a wget script that downloads the results from another server. I just don't want that to be a part of torperf. Torperf should just measure performance and display data, IMHO -- not worry about downloading and aggregating results from another system. Or maybe we can do this later and change it to "Ideally torperf should .."
This isn't the most urgent feature to build, though we need it before we can kill the current Torperf and replace it with the new one. However, using wget to download results from another service is exactly the approach that brought us to the current situation of Torperf being a bunch of scripts. I'd rather not want to write a single script for Torperf to do what it's supposed to do, but design it in a way that it can already do all the things we want it to do. Accumulating results and presenting them is part of these things.
"Torperf should just measure performance and display data", displaying aggregate data is displaying data! :P
But, surely if Torperf just achieves this by wget'ing stuff, and the user doesn't have to worry about anything other than setting a remote server and an interval to poll, that would be considered done? (Torperf handles the scheduling and managing of the data files)
This isn't going to be difficult code, but I'd want to avoid relying on wget if you mean the command-line tool.
results database Store request details, retrieve results, periodically delete old results if configured.
Not sure if we really need a database. These tests look pretty simple to me.
Rephrased to data store. I still think a database makes sense here, but this is not a requirement. As long as we can store, retrieve, and periodically delete results, everything's fine.
Cool!
I don't think we need a database for the actual results (but a flat file structure is just a crap database! :). I do however think, once we start to provide the data visualisation aspects, it will need a database for performance when doing queries that are more than simple listings.
Yup, don't feel strongly.
Thanks for your feedback!
All the best, Karsten
Hi Karsten, Sathya,
Hope you've both had great weekends, please see inline!
Want to help define the remaining data formats? I think we need these formats:
- file_upload would be quite similar to file_download, but for the GET
POST performance experiment. Or maybe we can generalize file_download to cover either GET or POST requests and the respective timings.
- We'll need a document for hidden_service_request that does not only
contain timings, but also references the client-side circuit used to fetch the hidden service descriptor, rendezvous circuit, and introduction circuit, and server-side introduction and rendezvous circuits.
- These data formats are all for fetching/posting static files. We
should decide on a data format for actual website fetches. Rob van der Hoeven suggested HAR, which I included in a footnote. So, maybe we should extend HAR to store the tor-specific stuff, or we should come up with something else.
- Are there any other data formats missing?
I think extending the HAR format (with minimal changes really, it's already reasonably generic) would be a good fit for the real fetches indeed. Do you feel HAR is overkill for the others?
I think it wouldn't be such a bad idea to use it for all, perhaps this could be a future requirement if not an initial one. (e.g. the static_file_downloads would have that in 'creator', but it would have multiple 'entries' each representing a static file (with our own fields added, filesize, tor_version, etc...)
There are a number of perks with using HAR: - TBB probably already knows how to record .HAR files so the selenium work would basically just be to open a browser and record a few navigations to .HAR (I know Chrome can do this easily, so I'm assuming our TBB version of Firefox is also capable)
- We can benefit from any tooling build around HAR, either to statistically analyse, or to provide visualisation.
- There is a a decent amount of research around HAR compression (although it basically seems to just be gzipping) but if we can support compressed HAR then we can allow servers to store a lot more history.
There is also the negative that the HAR files will probably provide *too much* data, but we could probably prune the files before archiving them or as a stage before total deletion.
While spending some time implementing these things, I have been playing with a faked alexa experiment, and I think the HAR format, or atleast something that allows for multiple entries per experiment results set is necessary for all our experiments (even static file).
I'll get back to you regarding these data formats in future when I have time to actually look at what the other experiments need (I've mainly focused on alexa & static downloading this far)
I think the magically detect and run part can definitely be left for future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on service startup as to implement not doing that. (while still somehow allowing experiments to be added... is this implying hardcoded experiments?)
I guess experiment types will be hard-coded, but experiment instances will be configurable.
Also, perhaps you don't want to support this, but how does the patch and merge system work for quick deployments of short lived experiments? (Is there ever such a thing? Karsten?)
Yes, there might be such a thing as short-lived experiments. We'd probably commit such a patch to a separate branch and decide after the experiment if it's worth adding the experiment to the master branch.
Or what if someone does develop a neat set of experiments for their own personal use that doesn't really apply to the project as a whole, are we expected to merge them upstream? What if they don't want to share?
I think we should only merge experiments that are general enough for others to run.
This doesn't entirely address my concern. If upstream branches are made for short lived experiments (rather than just sharing a folder between people), how do the users install that? (since they would have initially apt-get installed? not git/svn?)
And my concern around non-general or non-shared experiments skips the issue, how will they distribute them to whoever needs to run it? Their own git repo infrastructure? (But of course I agree we should only upstream general things)
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'. { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor versions at the same time (literally parallel) or just to have "I as a user set this up to run for X,Y,Z versions and ran it one time and got all my results."?
You would typically not run an experiment a single time, but set it up to run for a few days. And you'd probably set up parallel experiments to start with a short time offset. (Not sure if this answers your question.)
I messed up the question with that single word, I'll address this below. I meant 'ran torperf once' and 'got my results periodically as the schedule defines'
I think this is what Sathya is saying with:
We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for it at any time, and it does it's versioned runs sequentially.
For each experiment there's one tor instance running a given version. You wouldn't stop, downgrade/upgrade, and restart the tor instance while the experiment is running. If you want to run an experiment on different tor versions, you'd start multiple experiments. For example:
- download 50KiB static file, use tor 0.2.3.x on socks port 9001, start
every five minutes starting at :01 of the hour.
- download 50KiB static file, use tor 0.2.4.x on socks port 9002, start
every five minutes starting at :02 of the hour.
- download 50KiB static file, use tor 0.2.5.x on socks port 9003, start
every five minutes starting at :03 of the hour.
I don't think it's a good idea to have to define such specificity for humans. They should be able to just define the 50kb (and probably more file sizes) with an five minute interval for versions x, y, z. (I also don't think the user should define the socks port to use, but that's a minor detail.)
I think you've answered my question here though. I'll summarise below!
I think the discussion above is talking about two different things, I think it would be beneficial to decide what needs to be actually parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the timing information? Or are we happy to live with a small 1-2(?)ms noise level per experiment in order to benefit from faster experiment runtimes in aggregate?
Not sure which of these questions are still open. We should definitely get this clear in the design document. What would we write, and where in the document would we put it?
Perhaps we can cover this near 2.1.2 "If the service operator wants to run multiple experiments...."
I think you've defined above that no experiment should be run at the same time as another. I.e. the main service should not be executing code for different experiments at the same time. (Above you've humanly inputted to start each a minute after, which assumes each experiment takes under a minute to execute --- what happens if there are timeouts or slow networks?)
I would agree with this as it will help to keep experiments more accurate (e.g. static file download for a 50MB file didn't adversely affect the performance for a hidden service test that started at the same time)
I think the service itself should handle scheduling things on a periodic basis and should make it clear how late the service happened compared to it's desired schedule (e.g. one experiment took longer than a minute, so the other started 5 seconds late, so it should start in (interval - 5 seconds) time.
How this would happen in practice would be, the service starts up, checks results files for last runtimes for each experiment, then runs any that haven't run in their last INTERVAL seconds. Assuming experiments execute in time lower than SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the system will perfectly schedule by default.
There could be problems with this approach though, for example, what happens if there is a large experiment, e.g. Alexa when the network is slow, which takes say, 5 minutes. If this is scheduled to run every 10 minutes and there are some other experiments that are scheduled to run every 2 minutes, then we have a problem. Either the 2 minute experiments run on their intended schedule with potential inaccuracies caused, or the 2 minute interval is not a 2minute interval. I think we should aim for the latter and warn the user when they have made schedules like the above. - Another option is to break up experiments into chunks, where overall only one request is going at a time, so a request is our atomic scheduling option, but that becomes harder to coordinate and is highly inefficient in terms of network throughput.
We could monitor experiments average runtime and determine 'optimal' scheduling based on that, but I think the best thing in the short term is just to say 'don't schedule long experiments to be run frequently if you plan to run lots of other small experiments'
On that, can we be clear with our vocabulary, "Torperf tests" means "Torperf experiments", right?
Yes. Hope my "experiment type" vs. "experiment instance" was not too confusing. ;)
That's perfectly clear to me :)
Regards, Kevin
On 10/1/13 3:03 AM, Kevin Butler wrote:
Hi Karsten, Sathya,
Hope you've both had great weekends, please see inline!
Hi Kevin,
apologies for not replying earlier! Finally, replying now.
Want to help define the remaining data formats? I think we need these formats:
- file_upload would be quite similar to file_download, but for the GET
POST performance experiment. Or maybe we can generalize file_download to cover either GET or POST requests and the respective timings.
- We'll need a document for hidden_service_request that does not only
contain timings, but also references the client-side circuit used to fetch the hidden service descriptor, rendezvous circuit, and introduction circuit, and server-side introduction and rendezvous circuits.
- These data formats are all for fetching/posting static files. We
should decide on a data format for actual website fetches. Rob van der Hoeven suggested HAR, which I included in a footnote. So, maybe we should extend HAR to store the tor-specific stuff, or we should come up with something else.
- Are there any other data formats missing?
I think extending the HAR format (with minimal changes really, it's already reasonably generic) would be a good fit for the real fetches indeed. Do you feel HAR is overkill for the others?
I'm not sure, because I don't know the HAR format (this was a suggestion that didn't look crazy to me, so I added it to the PDF). But I think we could use HAR for all kinds of requests. We'll probably need to use something else for stream and circuit information, because they can be unrelated to specific requests.
I think it wouldn't be such a bad idea to use it for all, perhaps this could be a future requirement if not an initial one. (e.g. the static_file_downloads would have that in 'creator', but it would have multiple 'entries' each representing a static file (with our own fields added, filesize, tor_version, etc...)
Plausible, though I can't really comment on this with my limited knowledge of the HAR format.
There are a number of perks with using HAR:
- TBB probably already knows how to record .HAR files so the selenium
work would basically just be to open a browser and record a few navigations to .HAR (I know Chrome can do this easily, so I'm assuming our TBB version of Firefox is also capable)
Probably. Though we'll need to add stream/circuit references. Can we do that?
- We can benefit from any tooling build around HAR, either to
statistically analyse, or to provide visualisation.
- There is a a decent amount of research around HAR compression
(although it basically seems to just be gzipping) but if we can support compressed HAR then we can allow servers to store a lot more history.
There is also the negative that the HAR files will probably provide *too much* data, but we could probably prune the files before archiving them or as a stage before total deletion.
While spending some time implementing these things, I have been playing with a faked alexa experiment, and I think the HAR format, or atleast something that allows for multiple entries per experiment results set is necessary for all our experiments (even static file).
I'll get back to you regarding these data formats in future when I have time to actually look at what the other experiments need (I've mainly focused on alexa & static downloading this far)
Makes sense.
I think the magically detect and run part can definitely be left for future, but installation should still be this easy.
Surely it's just as easy to implement detecting new experiments on service startup as to implement not doing that. (while still somehow allowing experiments to be added... is this implying hardcoded experiments?)
I guess experiment types will be hard-coded, but experiment instances will be configurable.
Also, perhaps you don't want to support this, but how does the patch and merge system work for quick deployments of short lived experiments? (Is there ever such a thing? Karsten?)
Yes, there might be such a thing as short-lived experiments. We'd probably commit such a patch to a separate branch and decide after the experiment if it's worth adding the experiment to the master branch.
Or what if someone does develop a neat set of experiments for their own personal use that doesn't really apply to the project as a whole, are we expected to merge them upstream? What if they don't want to share?
I think we should only merge experiments that are general enough for others to run.
This doesn't entirely address my concern. If upstream branches are made for short lived experiments (rather than just sharing a folder between people), how do the users install that? (since they would have initially apt-get installed? not git/svn?)
And my concern around non-general or non-shared experiments skips the issue, how will they distribute them to whoever needs to run it? Their own git repo infrastructure? (But of course I agree we should only upstream general things)
I'm mostly thinking of developers who would run custom branches. And if somebody cannot handle Git, we can given them a tarball. But really, custom experiments should be the exception, not the rule.
Torperf should help with bootstrapping and shutting down tor, because that's something that all experiments need. Locating tor could just be a question of passing the path to a tor binary to Torperf. See above for sequential vs. parallel experiments.
Locating Tor should just be settings in 'the Torperf config'. { ... tor_versions: { '0.X.X' => '/Path/to/0/x/x/' } ... }
Is there requirement to run the *same* experiment across different Tor versions at the same time (literally parallel) or just to have "I as a user set this up to run for X,Y,Z versions and ran it one time and got all my results."?
You would typically not run an experiment a single time, but set it up to run for a few days. And you'd probably set up parallel experiments to start with a short time offset. (Not sure if this answers your question.)
I messed up the question with that single word, I'll address this below. I meant 'ran torperf once' and 'got my results periodically as the schedule defines'
I think this is what Sathya is saying with:
We could just run a torperf test against a particular tor version, once that's completed, we can run against another tor version and so on.
i.e. for each experiment, there's only one instance of Tor allocated for it at any time, and it does it's versioned runs sequentially.
For each experiment there's one tor instance running a given version. You wouldn't stop, downgrade/upgrade, and restart the tor instance while the experiment is running. If you want to run an experiment on different tor versions, you'd start multiple experiments. For example:
- download 50KiB static file, use tor 0.2.3.x on socks port 9001, start
every five minutes starting at :01 of the hour.
- download 50KiB static file, use tor 0.2.4.x on socks port 9002, start
every five minutes starting at :02 of the hour.
- download 50KiB static file, use tor 0.2.5.x on socks port 9003, start
every five minutes starting at :03 of the hour.
I don't think it's a good idea to have to define such specificity for humans. They should be able to just define the 50kb (and probably more file sizes) with an five minute interval for versions x, y, z. (I also don't think the user should define the socks port to use, but that's a minor detail.)
I think you've answered my question here though. I'll summarise below!
I think the discussion above is talking about two different things, I think it would be beneficial to decide what needs to be actually parallel and what just needs to be one-time setup for a user.
Are there any concerns around parallel requests causing noise in the timing information? Or are we happy to live with a small 1-2(?)ms noise level per experiment in order to benefit from faster experiment runtimes in aggregate?
Not sure which of these questions are still open. We should definitely get this clear in the design document. What would we write, and where in the document would we put it?
Perhaps we can cover this near 2.1.2 "If the service operator wants to run multiple experiments...."
I think you've defined above that no experiment should be run at the same time as another. I.e. the main service should not be executing code for different experiments at the same time. (Above you've humanly inputted to start each a minute after, which assumes each experiment takes under a minute to execute --- what happens if there are timeouts or slow networks?)
Ah, I didn't mean that experiments should finish under 1 minute. They can run up to five minutes. Starting them at :01, :02, and :03 was just a naive way of avoiding bottlenecks during connection establishment.
I would agree with this as it will help to keep experiments more accurate (e.g. static file download for a 50MB file didn't adversely affect the performance for a hidden service test that started at the same time)
(If somebody configures their Torperf to download a 50MB file, they deserve that their tests break horribly. Let's leave some bandwidth for actual users. ;))
I think the service itself should handle scheduling things on a periodic basis and should make it clear how late the service happened compared to it's desired schedule (e.g. one experiment took longer than a minute, so the other started 5 seconds late, so it should start in (interval - 5 seconds) time.
How this would happen in practice would be, the service starts up, checks results files for last runtimes for each experiment, then runs any that haven't run in their last INTERVAL seconds. Assuming experiments execute in time lower than SHORTEST_ALLOWED_INTERVAL/NUMBER_OF_EXPERIMENTS then on average the system will perfectly schedule by default.
There could be problems with this approach though, for example, what happens if there is a large experiment, e.g. Alexa when the network is slow, which takes say, 5 minutes. If this is scheduled to run every 10 minutes and there are some other experiments that are scheduled to run every 2 minutes, then we have a problem. Either the 2 minute experiments run on their intended schedule with potential inaccuracies caused, or the 2 minute interval is not a 2minute interval. I think we should aim for the latter and warn the user when they have made schedules like the above.
- Another option is to break up experiments into chunks, where overall
only one request is going at a time, so a request is our atomic scheduling option, but that becomes harder to coordinate and is highly inefficient in terms of network throughput.
We could monitor experiments average runtime and determine 'optimal' scheduling based on that, but I think the best thing in the short term is just to say 'don't schedule long experiments to be run frequently if you plan to run lots of other small experiments'
We could make the service smart enough not to start all requests at the same time, e.g., by adding random delays. And we could allow users to override this by defining a manual offset for each experiment. That way, new users don't have to care, and expert users can fine-tune things.
On that, can we be clear with our vocabulary, "Torperf tests" means "Torperf experiments", right?
Yes. Hope my "experiment type" vs. "experiment instance" was not too confusing. ;)
That's perfectly clear to me :)
How do we proceed? Would you mind sending me a diff of the changes to the design document that make these things clearer to you?
Also, I'm thinking about publishing the tech report, even though there's no running code yet (AFAIK). The reason is that I'd like to call this report the output of sponsor F deliverable 8. Originally, we promised code, but a design document is better than nothing.
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year3
I can include changes until, say, Monday, October 28.
Thanks in advance! And sorry again for the long delay!
All the best, Karsten