Hi! This is the outcome of some discussion I've had with dgoulet, and some work I've done at identifying current problem-points in our use of integration testing tools.
I'm posting it here for initial feedback, and so I have a URL to link to in my monthly report. :)
INTEGRATION TEST PLANS FOR TOR November 2014
1. Goals and nongoals and scope
This is not a list of all the tests we need; this is just a list of the kind of tests we can and should run with Chutney.
These tests need to be the kind that a random developer can run on their own and reasonably hope to do. Longer/more expensive tests may be okay too, but if it needs anything spiffer than a linux desktop, consider using Shadow instead.
Setting up an environment to run the tests needs to be so easy that nobody who writes C for Tor is likely to be dissuaded from running them.
Writing new tests needs to be pretty simple too.
Most tests need to be runnable on all the platforms we support.
We should support load tests. Though doing so is not likely to give an accurate picture of how the network behaves under load, it's probably good enough to identify bottlenecks in the code. (David Goulet has had some success here already for identifying HS performance issues.)
We should specify our design and interfaces to keep components loosely coupled and easily replaceable. With that done, we should avoid over-designing components at first: experience teaches that only experience can teach what facilities need which features.
2. Architecture
Here are the components. I'm treating them as conceptually separate, though in practice several of them may get folded into Chutney.
A. Usage simulator
One or more programs that emulate users and servers on the internet. It reports what succeeded and what failed and how long everything took. Right now we're using curl and nc for this.
B. Network manager
This is what Chutney does today. It launches a set of Tor nodes according to a provided configuration.
C. Testcase scripts
We do this in shell today: launch the network, wait for it to bootstrap, send some traffic through it, and report success or failure.
D. Test driver
This part is responsible for determining which testcases to run, in what order, on what network.
There is no current analogue to this step; we've only got the one test-network.sh script, and it assumes a single type of network.
One thing to notice here is that testcase scripts need to work with multiple kinds of network manager configurations. For example, we'd like to be able to run HTTP-style connectivity tests on small networks, large networks, heterogenous networks, dispersed networks, and so on. We therefore need to make sure that each kind of network can work with as many tests as possible, so that the work needed to write the tests doesn't grow quadratically.
The coupling for the components will go as possible:
A. Usage simulations will need to expose their status to test scripts.
B. The network manager will need to expose information about available networks and network configurations to the test scripts, so that the test scripts know how to configure usage simulations to use them. It will need to expose commands like "wait until bootstrapped", "check server logs for trouble", etc.
C. Each testcase needs to be able to identify which features it needs from a network, invoke the network commands it needs, and invoke usage simulations. It needs to export information about its running status, and whether it's making progress.
D. The test driver needs to be able to enumerate networks and testcases and figure out which are compatible with each other, and which can run locally, and which ones meet the user's requirements.
2.1. Minimal versions of the above:
A. The minimal user tools are an http server and client. Use appropriate tooling to support generating and receiving hundreds to thousands of simultaneous requests.
B. The minimal network manager is probably chutney. It needs the ability to export information about networks, to "wait until bootstrapped", to export information about servers, and so on. Its network list needs to turn into a database of named networks.
C. Testcases need to be independent, and ideally abstracted. They shouldn't run in-process with chutney. For starters, they can duplicate the current functionality of test-network and of dgoulet's hidden service tests. Probing for features can be keyword-based. reporting results can use some logging framework.
D. The test driver can do its initial matching by keyword tags exported by the other objects. It should treat testcases and networks as abitrary subprocesses that it can launch, so that they can be written in any language.
3. A short inventory of use-cases
- Voting - Controllers - HS * IP+RP up/down * HSDir up/down * Authenticated HS - Bad clients * Wrong handshake * Wrong desc. - Bad relays * dropping cells/circ/traffic * BAD TLS - Pathing * Does it behaves the way we think? * 4 hops for HS - Relay * Up/Down * Multiple IPs for a single key * OOM handling * Scheduling - Client * Does traffic goes through? - For HS and Exit * DNS testing - Caches at the exit * Stream isolation using n SocksPort * AddressMap
4. Longer-term
We might look into using "The Internet" (no, not that one. The one at https://github.com/nsec/the-internet) on Linux to simulate latency between nodes.
When designing tools and systems, we should do so with an eye to migrating them into shadow.
We should refactor or adapt chutney to support using stem, to take advantage of stem's improved templating and control features, so we can better inspect running servers. We should probably retain the original hands-off non-controller chutney design too, to better detect heisenbugs.
5. Immediate steps
- Turn the above into a work plan.
- Specify initial interfaces, with plan for migration to better ones once we have more experience.
- Identify current chutney and test-network issues standing in the way of reliably getting current tests to work.
- Refactor our current integration tests to the above framework.