[tor-dev] [OONI] Designing the OONI Backend (OONIB). RESTful API vs rsynch

15 Jul 2012

      I would like to follow up on the discussion we had in Florence on some
design choices behind OONIB.

In particular the most controversy was around using HTTP or rsync.

Before discussion the pro and contra about one choice over the other it
would be useful to frame
what are exactly the requirements for OONIB.

# What is OONIB

OONIB is the backend of OONI. It will run mainly on one centralized
machine and may in
a later stage run distributed across multiple ones. Currently we have
not though of how
to make it scale to being distributed, so we will look at it as if it
were running only
on one central machine.

It will be responsible for:

* Reporting
  a) Collecting reports from OONIProbes. Such reports are the results of
tests.
  b) Collecting reports from third party censorship measurement tools
(e.x. Bismark, NeuBot, etc.)

* Assistance in test running
  Certain OONI Tests require to have a backend component (e.x. b0wser).
On OONIB we will have
  the serverside component that will assist us in running the test.

  Note: Certain tests require the server to make connections to the
client. This means that the
  client will need to request the server to probe them.

* Control Channel Signaling
  This is required for making some measurements to verify that the data
received by the backend
  specific to the test matches with the one sent by the client.

# What properties we would like it to have
note: these are not ordered.

* Efficient even over high latency networks.

* Ease of integration for third party developers.

* Expandable to support requirements of new tests we develop.

* Anonymous

* Secure

# HTTP and rsych comparison
note: I will not deal with the security aspects of OONIB. We will
suppose to have
an encrypted and authenticated transport (this can be TLS, Tor Hidden
Services,
etc.)

## Rsync:

Pro:

* It supports good compression algorithms

* It's efficient and supports resume

* It does integrity checking on the uploaded files

Contra:

* It's designed only for copying files, this means we can't implement
any more advanced
API like logic. [*]

* It's not supported by many languages (for example in python we only
have an implementation
of the rsync algorithm, not of the protocol [1])

* It's not as commonly used by other application developers that have
similar requirements.

* Painful to do sanitization of the data sent by clients.

* Does not allow bidirectional communication (Request-Response pattern)

[*] I would like to be able to create a session ID for a specific test
and be able to reference
such test ID when interacting with the Test helpers. rsync is one way, I
push data to the server,
but the server cannot signal me back with some data. This largely
impeeds it's usefulness as an
API interface.

## HTTP:
note: I am not necessarily talking only about HTTP, we could use any
other protocol with similar
properties (e.x. SPDY). I will discuss HTTP because it is the one that I
am most familiar with,
but don't

Pro:

* Industry standard for exposing APIs

* Supported natively in most programming languages

* Well understood protocol

* Implementation of sanitization of passed data can be done more easily

* Allows bidirectional communication

* Good support in twisted (what we use as a language for OONIB)

Contra:

* Compression is not enabled by default (we can use gzip compression
with HTTP 1.1), and no compression for
  headers.

* No resume support (this can be implemented on top of HTTP, we could
even implement the rsyc algorithm
on top of HTTP).

* No support for deltas (we can use rsych protocol over HTTP if we
really need this).

I feel like we are a bit comparing apples and oranges here and I don't
see why we could not use
rsync algorithm on top of HTTP. Anyways I would like to get some
feedback as to what we should use
for something that should have the above described properties.

Thoughts?

- Art.

[1] https://github.com/isislovecruft/pyrsync

[tor-dev] [OONI] Designing the OONI Backend (OONIB). RESTful API vs rsynch

Arturo Filastò