[tor-dev] Tor Bandwidth List Format specification

juga juga at riseup.net
Wed May 9 12:08:00 UTC 2018


Hi,

after nick, irl and teor reviewed the last version i sent [1], i paste
below a new version of the specification versions 1.1.0 and 1.0.0.
It's the same version as commit
https://github.com/juga0/torspec/commit/c7cdfd4fcb4b5623e1407e2bec38e9fdf7b70e6b.

The main question that came up was whether we should create a backwards
incompatible specification version 2.0.0.

Since right now it's faster to implement the version 1.1.0 of this
specification, and assuming that we can add the specification and the
code later, i'd propose to continue with 1.1.0.

I've asked dirauths about their opinion about this.

Thanks,
juga

[1] https://lists.torproject.org/pipermail/tor-dev/2018-May/013141.html
[2] https://lists.torproject.org/pipermail/tor-dev/2018-May/013154.html

-----------------------------------------------------------------------

                  Tor Bandwidth List Format
                            juga
                            teor

1. Scope and preliminaries

  This document describes the format of Tor's Bandwidth List,
  version 1.0.0, 1.1.0 and later.
  It is new specification for the existing format 1.0.0.
  Describes a new format 1.1.0, which is backwards compatible with
  1.0.0 parsers.

  Since Tor version 0.2.4.12-alpha the directory authorities use
  the Bandwidth List file called "V3BandwidthsFile" generated by
  Torflow [1]. The format is described in Torflow's README.spec.txt and
  is considered to be version 1.0.0.

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
    "OPTIONAL" in this document are to be interpreted as described in
    RFC 2119.

1.2. Acknowledgements

  The original bandwidth generator (Torflow) and format was
  created by mike. Teor suggested to write this specification while
  contributing on pastly's new bandwidth generator implementation.

  This specification was revised after feedback from:

    Nick Mathewson (nickm)
    Iain Learmonth (irl)

1.3 Outline

  The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
  and 3.4.2, use the term bandwidth measurements, to refer to what
  here is called Bandwidth List.
  A Bandwidth List file contains information on relays' bandwidth
  capacities and is produced by bandwidth generators, previously known
  as bandwidth scanners.

1.4. Format Versions

   1.0.0 - The legacy fallback Bandwidth List format

   1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues
           to RelayLines and format versions.

  All Tor versions can consume format version 1.0.0.
  All Tor versions can consume format version 1.1.0,
  but they warn on additional header Lines.
  [TODO: this might be fixed, and if it is fixed should be said which
  version of Tor]

2. Format details

  The Bandwidth List MUST contain the following sections:
  - Header List (exactly once)
  - Relays' Bandwidth List (zero or more times)
  If it does not contain these sections, parsers SHOULD ignore the file.

2.1. Definitions

  The following nonterminals are defined in Tor directory protocol
  sections 1.2., 2.1.1., 2.1.3.:

    Int
    SP (space)
    NL (newline)
    Keyword
    ArgumentChar
    nickname
    hexdigest (a '$', followed by 40 hexadecimal characters
      ([A-Fa-f0-9]))

  Nonterminal defined section 2 of version-spec.txt [4]:

    version_number

  We define the following nonterminals:

    Line ::= ArgumentChar* NL
    RelayLine ::= KeyValue (SP KeyValue)* NL
    KeyValue ::= Keyword "=" Value
    Value ::= ArgumentCharValue+
    ArgumentCharValue ::= any printing ASCII character except NL and SP.
    Terminator ::= "====="
    Timestamp ::= Int
    Bandwidth ::= Int
    MasterKey ::= a base64-encoded Ed25519 public key, with
    padding characters omitted.
    DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601

  Note that key_value and value are defined in Tor directory protocol
  with different formats to KeyValue and Value here.

  All Lines in the file MUST be 510 characters or less, to allow for the
  trailing newline and NULL characters.
  The previous limit was 254 characters in Tor 0.2.6.2-alpha and
  earlier.
  The parser MAY ignore longer Lines.
  [TODO: Change this restriction in 1.1.0 or later]

2.2. Header List format

Some header Lines MUST appear in specific positions, as documented
below.
All other Lines can appear in any order.
If a parser does not recognize any extra material in a header Line,
the Line MUST be ignored.
If a header Line does not conform to this format, the Line SHOULD be
ignored by parsers.

It consists of:

  Timestamp NL

    [At start, exactly once.]

    The Unix Epoch time in seconds when the file was created.
    It does not follow the KeyValue format for backwards
    compatibility with version 1.0.0.

  "version=" version_number NL

    [In second position, zero or one time.]

    The specification document format version.
    It uses semantic versioning [5].

    This Line has been added in version 1.1.0 of this specification.

    Version 1.0.0 documents do not contain this Line, and the
    version_number is considered to be "1.0.0".

  "software=" Value NL

    [Zero or one time.]

    The name of the software that created the document.

    This Line has been added in version 1.1.0 of this specification.

    Version 1.0.0 documents do not contain this Line, and the software
    is considered to be "torflow".

  "software_version=" Value NL

    [Zero or one time.]

    The version of the software that created the document.
    The version may be a version_number, a git commit, or some other
    version scheme.

    This Line has been added in version 1.1.0 of this specification.

  "generator_started=" DateTime NL

    [Zero or one time.]

    The date and time timestamp in ISO 8601 format and UTC time zone
    when the generator started.

    This Line has been added in version 1.1.0 of this specification.

  "earliest_bandwidth=" DateTime NL

    [Zero or one time.]

    The date and time timestamp in ISO 8601 format and UTC time zone
    when the first relay bandwidth was obtained.

    This Line has been added in version 1.1.0 of this specification.

  KeyValue NL

    [Zero or more times.]

    There MUST NOT be multiple KeyValue header Lines with the same key.
    If there are, the parser SHOULD choose an arbitrary Line.

    If a parser does not recognize a Keyword in a KeyValue Line, it
    MUST be ignored.

    Future format versions may include additional KeyValue header Lines.
    Additional header Lines will be accompanied by a minor version
    increment.

    Implementations MAY add additional header Lines as needed. This
    specification SHOULD be updated to avoid conflicting meanings for
    the same header keys.

    Parsers MUST NOT rely on the order of these additional Lines.

    Additional header Lines MUST NOT use any keywords specified in the
    relay measurements format.
    If there are, the parser MAY ignore conflicting keywords.

  Terminator NL

    [Zero or one time.]

    The Header List section ends with this Terminator.

    In version 1.0.0, Header List ends when the first relay bandwidth
    is found conforming to the next section.
    Implementations of version 1.1.0 SHOULD include this Line.

2.3. Relays' Bandwidth List format

It consists of zero or more RelayLines with the relays' bandwidth
in arbitrary order.

There MUST NOT be multiple KeyValue pairs with the same key in the same
RelayLine.
If there are, the parser SHOULD choose an arbitrary Value.

There MUST NOT be multiple RelayLine per relay identity (node_id or
master_key_ed25519).
If there are, parsers SHOULD issue a warning and MAY choose an arbitrary
value or ignore both values.

If a parser does not recognize any extra material in a RelayLine,
the extra material MUST be ignored.

Each RelayLine MUST include the following KeyValue pairs:
In version 1.0.0, node_id MUST NOT be at the end of the Line.
In version 1.1.0, the KeyValue can be in any arbitrary order.
[TODO: list of Tor version that support it, when it's done]

  "node_id=" hexdigest

    [Exactly once.]

    The fingerprint for the relay's RSA identity key.

  "master_key_ed25519=" MasterKey

    [Zero or one time.]

    The relays's master Ed25519 key, base64 encoded,
    without trailing "="s, to avoid ambiguity with KeyValue "="
    character.

    Implementations of version 1.1.0 SHOULD include both node_id and
    master_key_ed25519.
    Parsers SHOULD accept Lines that contain at least one of them.

  "bw=" Bandwidth

    [Exactly once.]

    The measured bandwidth of this relay.

    Tor accepts zero bandwidths, but they trigger bugs in older Tor
    implementations. Therefore, implementations SHOULD NOT produce zero
    bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
    If there are zero bandwidths, the parser MAY ignore them.

    Multiple measurements can be aggregated using an averaging scheme,
    such as a mean, median, or decaying average.

    Torflow scales bandwidths to kilobytes per second. Other
    implementations SHOULD use kilobytes per second for their initial
    bandwidth scaling.

    If different implementations or configurations are used in votes for
    the same network, their measurements MAY need further scaling. See
    Appendix B for information about scaling, and one possible scaling
    method.

  KeyValue

    [Zero or more times.]

    Future format versions may include additional KeyValue pairs on a
    RelayLine.
    Additional KeyValue pairs will be accompanied by a minor version
    increment.

    Implementations MAY add additional relay KeyValue pairs as needed.
    This specification SHOULD be updated to avoid conflicting meanings
    for the same Keywords.

    Parsers MUST NOT rely on the order of these additional KeyValue
    pairs.

    Additional KeyValue pairs MUST NOT use any keywords specified in the
    header format.
    If there are, the parser MAY ignore conflicting keywords.

2.4. Implementation notes

KeyValue pairs in RelayLines that current implementations generate.

2.4.1. Simple Bandwidth Scanner

Every RelayLine in sbws version 0.1.0 consists of:

  "node_id=" hexdigest SP

    As above.

  "bw=" Bandwidth SP

    As above.

  "nick=" nickname SP

    [Exactly once.]

    The relay nickname.

  "rtt=" Int SP

    [Exactly once.]

    The Round Trip Time in milliseconds to obtain 1 byte of data.

  "time=" DateTime NL

    [Exactly once.]

    The date and time timestamp in ISO 8601 format and UTC time zone
    when the last bandwidth was obtained.

2.4.2. Torflow

Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].

References:

1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
5. https://semver.org/

A. Sample data

The following has not been obtained from any real measurement.

A.1. Generated by Torflow

This an example version 1.0.0 document:

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath

A.2. Generated by sbws version 0.1.X
[TODO: this needs to be implemented when this spec is finished]

1523911758
version=1.1.0
software=sbws
software_version=0.1.0
generator_started=2018-05-08T16:13:25
earliest_bandwidth=2018-05-08T16:13:26
====
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80
master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760
nick=Test rtt=380 time=2018-05-08T16:13:26
node_id=$96C15995F30895689291F455587BD94CA427B6FC
master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189
nick=Test2 rtt=378 time=2018-05-08T16:13:36

B. Scaling bandwidths

B.1. Scaling requirements

Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
following checks:
 * If the total bandwidth is zero, all relays should be given equal
   bandwidths.
 * If the scaled bandwidth is zero, it should be rounded up to one.

Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.

B.2. A linear scaling method

If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
the same total bandwidth:

1. Calculate the relay quota by dividing the total measured bandwidth
   in all votes, by the number of relays with measured bandwidth
   votes. In the public tor network, this is approximately 7500 as of
   April 2018. The quota should be a consensus parameter, so it can be
   adjusted for all generators on the network.

2. Calculate a vote quota by multiplying the relay quota by the number
   of relays this bandwidth authority has measured
   bandwidths for.

3. Calculate a scaling factor by dividing the vote quota by the
   total unscaled measured bandwidth in this bandwidth
   authority's upcoming vote.

4. Multiply each unscaled measured bandwidth by the scaling
   factor.

Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.

B.3. Quota changes

If all generators are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.


More information about the tor-dev mailing list