[ooni-dev] Feedback on new OONI test deck format

Wed Aug 13 17:00:37 UTC 2014

Hi all,

We have been discussing lately some changes that we would like to make
to the ooni-probe test deck format [1] and would like to have some
feedback on what we have come up with.

For those of you not familiar with ooni-probe, a test deck is basically
a way of telling it "Run this list of OONI tests with these inputs and
by the way be sure you also set these options properly when doing so".

The previous deck format involved a lot of boilerplate and did not make
it possible to bake inputs into the deck itself. This new format is
supposed to overcome some of the limitations of the old design and we
hope that a major redesign will not be needed in the near future.

The specification can be found in git [3], but to facilitate commenting
I am also going to paste it into this email. All feedback is greatly
appreciated.

-- BEGIN SPEC --

# Test deck specification

* version: 0.1.0
* date: 2014-08-13
* author: Alejandro López (kudrom), Arturo Filastò

# 0. Terminology
Analyst: The person who writes a deck.

Collector: A machine running the ooni-backend that collects the reports
generated by the execution of ooni-probe.

Nettest: A test whose execution by ooniprobe creates a report sent later
to the
collector if provided.

Ooni: Open observatory of network interference.

Ooni-probe: The client side of ooni used to execute a set of nettests.

Tester: The person who executes ooni-probe.

YAML: A human-readable data serialization format.

# 1. Rationale

To ease the execution of nettests, the ooni developers came up with the
idea of
a container that would allow a tester to easily execute a bunch of
configured
nettests.That container was called a deck.

This way, an analyst interested in a particular behaviour would write a
deck;
then, she would distribute that deck to every tester interested in the
ongoing
analysis. Finally, the tester would execute ooni-probe with that deck to
properly create and send the nettests' reports.

Unfortunately, right now both the deck format and ooni-probe don't allow
that
level of automation. The way we want to solve this situation is by writing a
new spec for the deck format that would allow us to write, with great
confidence, the given functionality to ooni-probe. The document that you're
reading is that spec.

# 2. Goals

Allow an analyst to reuse well written and tested nettests in an easy way.

Allow a tester to execute easily a battery of nettests with a complex
configuration.

Grant privacy of the tester.

# 3. The data format

Every deck is a yaml file composed of two major sections: the header and the
body.

The header is a dictionary that provides all the shared and global
configuration of every nettest included in the deck. Its main purpose is to
reduce boilerplate by letting the analyst express common behaviour in one
section instead of in every nettest execution. The ooni-probe options
allowed
in this section are:

1. collector: Address of the collector of test results.
2. bouncer: Address of the bouncer for test helpers.
3. annotations: Annotate the report with a key:value[, key:value] format.
4. no-collector: Disable the collector. (FLAG)
5. no-geoip: Disable the geoip support. (FLAG)

Every option of the header section is a dictionary's key except that labeled
with a FLAG, which are members of a list called flags. So for example a
valid
header section is the following:

```
header:
  collector: 'http://localhost'
  annotations:
  key1: value1
  key2: value2
  flags:
  - no-collector
```

The deck header can also contain metadata associated to the test deck. The
possible fields are:

1. name: The name of the test deck.
2. description: A short description of the test deck.
3. author: The author of the test deck.
4. version: A version number for the test deck.
5. requires-root: A flag to indicate that the test deck requires root to
run. (FLAG)
6. requires-tor: A flag to indicate that the test deck requires tor to
run. (FLAG)

The header may also be omitted.

The body is a list composed of one element per nettest execution. Every
nettest
execution is a dictionary composed of the following three keys:

1. nettest: name of the nettest to execute (MANDATORY)
2. local_options: local_options of the test execution (OPTIONAL)
3. global_options: global options of the test execution (OPTIONAL)

In the same way that with the header, every option can be a member of a flag
list if that options doesn't have any arguments. A valid body can be:

```
body:
- nettest: manipulation/http_request
  local_options:
    url: 'http://torproject.org'
  global_options:
    collector: 'http://localhost'
    flags:
    - no-geoip
    - nettest: manipulation/captiveportal
```

All file paths must be relative and they must start with "deck/" if they are
referring to files contained inside of the test deck. They must start with
"http(o|s)://" if they are referring to files to be downloaded via Tor.

All other file paths should be ignored and raise an exception. This is
because
we do not want an analyst creating a deck to be pass as arguments to a test
arbitrary files on the testers filesystem.

# 3.1 Container format

The container proposed is tar+gzip because it's well supported in
python. The
deck container will be composed of a directory named "deck" containing
the deck
file and the inputs.

The directory layout will be:

deck/test.deck

deck/input-filename-1.txt

deck/input-filename-2.txt

This will then be compressed using tar+gzip.

# 4. Implementation details

## 4.1 Introduction

Each execution of a nettest in ooniprobe needs four main inputs.

1. The global config file
2. The global cmd options
3. The nettest/deck
4. The local options for each nettest

The first three ones are mandatory to run a nettest, the last one depends on
the nettest.

When a single nettest is executed, all options except the first one are
passed in the cmd line.

The difference between the global config options and the global cmd options
is that the second one has some shared options with the config file plus
some
additional subcommands to ooniprobe.

When a deck is involved in the execution, the last three inputs are meant to
be passed (at least partially) in the deck. In fact, the deck is no more
than
the three last inputs, with some subtleties that I expect to explain in the
following sections.

## 4.2 Who overwrites who

In a single-nettest execution, the global config is parsed and a global
object
is built. Then the cmd line is parsed and it overwrites the global configs.
This is done to allow the setting of the most dynamic options in the console
without the need of writing them each time to the config file.

All the ooni-probe's code base is allowed to access that global object,
included the nettests. So, the global options affect the way ooni-probe
behaves
but also can affect the way the executing nettest behaves. That's the reason
why the deck must provide the specification of some global options, but
not all
because there are some powerful options that would put in danger the tester.
The allowed options are listed in the section [3. The data format].

In a deck-nettest execution, the header section of the deck is parsed
first and
a global object is built. Then the config file is parsed and the object is
overwritten when it applies. Finally the cmd line of the ooni-probe is
parsed
and the object is overwritten.

We read the header section first to avoid the analyst overwrite some
sensitive
options of the config file, which only should be modified by the tester.

## 4.3 Copy-on-write

To reduce the boilerplate in the deck, the file is splitted into a shared
section for all nettests and a local one for each nettest.

The idea is to allow the writer of the deck to express common and global
behaviour of all the nettests in the header of the deck and to put
specific and
local options in each nettest element, as was already explained in [3.
The data
format].

What this means for the execution of the nettests is that when a nettest
overwrites a global configuration of the header, this change is only
visible to
this nettest, not to every nettest who may ever execute in the same
instance of
ooni-probe. So what follows is that every change to the global options of
ooni-probe in the local section of a deck should attend a copy-on-write
policy
regarding the global config object.

## 4.4 The input file

Ooni-probe invokes every nettest method with the information saved in a file
called the input file. This file is part of the local configuration of every
test, and therefore must be provided with the deck.

So the deck should be a compressed container which includes both the
deck file
and every input file necessary to every nettest included in the deck.
Otherwise, the analyst would have to send to the tester the input files
separately, which is unacceptable.

The container proposed is tar+gzip because it's well supported in python.

# 5. Example deck

The complete.deck provided with each installation of ooni-probe would be:

```
header:
  name: Complete
  description: Runs all the existing ooniprobe tests
  author: 'OONI <ooni-dev at lists.torproject.org>'
  version: 0.1.0
  flags:
  - requires-root
  - requires-tor
body:
- nettest: blocking/http_request
  local_options:
    input_file:
'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1'

- nettest: blocking/dns_consistency
  local_options:
    input_file:
'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1'

- nettest: manipulation/http_invalid_request_line

- nettest: manipulation/http_header_field_manipulation

- nettest: manipulation/traceroute

- nettest: blocking/http_host
  local_options:
    input_file:
'httpo://ihiderha53f36lsd.onion/input/37e60e13536f6afe47a830bfb6b371b5cf65da66d7ad65137344679b24fdccd1'
```

-- END SPEC --

~ Art.

[1] https://trac.torproject.org/projects/tor/ticket/12823
[2]
https://gitweb.torproject.org/ooni/spec.git/blob_plain/HEAD:/test-decks/td-spec.md