[tor-dev] Tor Bandwidth Measurements Document Format

teor teor2345 at gmail.com
Tue Apr 17 13:01:30 UTC 2018


Hi,

Thanks for writing this draft spec.

Please see my suggested changes below:

> On 17 Apr 2018, at 21:23, juga <juga at riseup.net> wrote:
> 
> Hi,
> 
> as commented with teor and pastly, i send in-line a draft specification
> for the document format that the bandwidth scanner implementations
> should produce.
> 
> I've left my own questions/notes in square brackets.
> 
> Thanks,
> juga.
> 
> =======================================
> 
>          Tor Bandwidth Measurements Document Format
>          [juga: which name should we give to this document?]

That's a fine name.
You can leave out the "Document" if you want.

> 1. Scope and preliminaries
> 
>  This document describes the format of Tor's bandwidth measurements
>  document, version X.X.X [juga: which version should be this?]

It doesn't matter, so let's use semantic versioning:
* the original torflow format was 1.0.0
* the format in this spec adds the "version" feature, so it is 1.1.0
  (it is compatible with 1.0.0, as long as parsers ignore unrecognised
  lines)

>  and later.
> 
>  Since Tor version X.X.X [juga: which tor version?]

It looks like 0.2.4.12-alpha added measured bandwidths
https://gitweb.torproject.org/tor.git/tree/ChangeLog#n12710

> the directory
>  authorities use the bandwidth measurements document called
>  "V3BandwidthsFile" and produced by Torflow [1]
>  (format described in README.spec.txt [2]).
> 
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
>    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
>    "OPTIONAL" in this document are to be interpreted as described in
>    RFC 2119.
> 
> 1.2. Acknowledgements
> 
>  The original bandwidth measurement scanner (Torflow) and format was
>  created by mike. Teor suggested to write this specification while
>  contributing on pastly's new bandwidth scanner implementation.
> 
>  This specification was revised after feedback from:
> 
>    XXX
> 
> 1.3 Outline
> 
>  The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
>  of dir-spec.txt [3] are obtained by bandwidth authorities, which are
>  either directory authorities or other servers running bandwidth
>  measurement scanners and sending the results to the former.
>  [juga: it seems that bandwidth authorities have not been formally
>  before]

You could use the definition in the man page:
"the bandwidth-authority generated file storing information on
relays' measured bandwidth capacities"

> 2. Format details
> 
>  Bandwidth measurements MUST contain the following sections:
>  - Header (exactly once)
>  - Relays measurements (zero or more times)
> 
>  Each section (or entry) ends with a separator.

This line is a copy-paste error, it should be deleted.

> 2.1. Nonterminals
> 
>  The following nonterminals are defined in the Onionoo details
>  document specification [4]:
> 
>    fingerprint
>    nickname

This file format gets the fingerprint and nickname from the
consensus, so you should reference dir-spec.txt.

(dir-list-spec.txt gets relay fingerprints and nicknames from
Onionoo. That's why it uses the Onionoo definitions.)

Here are the definitions of hexdigest (fingerprint) and nickname:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1268

>  In the bandwidth measurement documents nickname is optional.

"optional" is not relevant in a definition.
Let's delete this line, it's already documented as optional later on.

>  The following nonterminals are defined in the in dir-spec.txt:
> 
>    NL      (newline)
>    SP      (space)
> 
>    "bw" = INT, the aggregated measured bandwidth of this relay, in
>    kilobytes per second.

bw is not defined in dir-spec.txt. And the formatting is confusing.
Double quotes are used for ASCII literal strings in dir-spec.txt.
Can you please follow the format used in dir-spec.txt?

Here is one example:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n210

Here's how you can define bw using the Int definition from
dir-spec.txt:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n795

bw = Int

bw is the aggregated measured bandwidth of this relay, in kilobytes
per second.

>  We introduce the following nonterminals:
>  [juga: this should probably be defined more formally and should
>  probably link to other documents, which ones?]

dir-spec.txt

>    "version" = The name and the version of the bandwidth scannner
>    software, such as "sbws 0.1.0".

Our newest spec uses "version" for the file format version:
https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n148

So please don't make a field with a different meaning and structure,
and call it "version".

I suggest:
* use "version" for the file format version (or don't use "version")
* use "source" for the implementation software name and version

Please fix the formatting of this definition to be like dir-spec.txt.
This definition has two arguments separated by spaces, the name,
and the version.

>    The name of the software, if absent, is assumed to be "torflow".
>    [juga: which should be the version if absent?]

"if absent" is not relevant in a definition.
Let's move these lines to the header section.

The software version should be optional.
Torflow does not have a version, so we cannot require a version.

>    "timestamp" = INT, the Unix Epoch time when the file was created.

Please fix the formatting of this definition to be like dir-spec.txt.

> 2.2. Header format

We should say if order matters.
We should say how new items get added to the header.
(We could say that parsers MUST ignore unrecognised lines.)

>  It MUST consists of:
> 
>    "timestamp" timestamp NL
>    "version" version NL

The sbws sample data has:
1523911758
version=0.1.0

The first line does not have a "timestamp" string literal.
The second line has an equals sign.
The second line is optional (see the torflow sample data).

Does Tor ignore the version line?
If it does, we should document it.

> 2.3. Relay measurements format

You should say that order on a line doesn't matter, and relay order
also doesn't matter.

>  Relays measurements MUST consist of the following items.
> 
>    "node_id" fingerprint SP
>    "bw" bandwidth SP

The format has equals signs, but this definition does not.

>  When there are no more items, the "bw" item ends with NL instead of
>  SP.

It might be easier to say that each line allows extra arguments, and
reference the dir-spec.txt definition:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n261

But does each argument need an equals sign?

> 2.4. Optional extra items
> 
>  Different implementations of the bandwidth measurements scanners MAY
>  include other items per relay.
> 
>  For instance, sbws includes:
> 
>    "rtt" = INT, Round Trip Time (to obtain 1B)

This definition belongs in the definition section.
Please fix the formatting of this definition to be like dir-spec.txt.

>  Every relay measurement in sbws consists of:
> 
>    "node_id" fingerprint SP
>    "bw" bandwidth SP

The format has equals signs, but this definition does not.

The fingerprints in the sample data have $ signs.
Does Tor require them? Or are they optional?
We should document it either way.

>    "nick=" nickname SP
>    "rtt=" rtt SP
>    "time=" timestamp NL

The equals signs are correct here.

>  Every relay measurement in Torflow consists of:
> 
>    "node_id" fingerprint SP
>    "bw" bandwidth SP

The format has equals signs, but this definition does not.

>    "nick=" nickname SP
>    "measured_at=" slice timestamp NL

slice is not defined, just use "timestamp", then explain using
the next line.

>  The "measured_at" does not correspond to the "time" in sbws.

Is it worth explaining the difference?

>  [juga: actually, if bwauths use "measured_at", then the code on them
>  or sbws should be changed].

Tor does not contain the string "measured_at":
https://github.com/torproject/tor/search?q=measured_at

For consistency, please remove "measured_at", or add "updated_at".

>  Torflow includes other items that are out of the scope of this
>  document.

We should think about which torflow fields are worth documenting.

> References:
> 
> 1. https://gitweb.torproject.org/torflow.git
> 2.
> https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
> 3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
> 4. https://metrics.torproject.org/onionoo.html#details
> 
> 
> A. Sample data

Maybe the sample data should contain more than one relay?

> A.1. Torflow
> 
> 1523911758
> node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
> measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
> pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
> circ_fail=0.2 scanner=/filepath
> 
> A.2. sbws
> 
> 1523911758
> version=0.1.0
> node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
> rtt=380 time=1523911725

T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20180417/a2d8ce50/attachment-0001.html>


More information about the tor-dev mailing list