Hi,
after nick, irl and teor reviewed the last version i sent [1], i paste below a new version of the specification versions 1.1.0 and 1.0.0. It's the same version as commit https://github.com/juga0/torspec/commit/c7cdfd4fcb4b5623e1407e2bec38e9fdf7b7....
The main question that came up was whether we should create a backwards incompatible specification version 2.0.0.
Since right now it's faster to implement the version 1.1.0 of this specification, and assuming that we can add the specification and the code later, i'd propose to continue with 1.1.0.
I've asked dirauths about their opinion about this.
Thanks, juga
[1] https://lists.torproject.org/pipermail/tor-dev/2018-May/013141.html [2] https://lists.torproject.org/pipermail/tor-dev/2018-May/013154.html
-----------------------------------------------------------------------
Tor Bandwidth List Format juga teor
1. Scope and preliminaries
This document describes the format of Tor's Bandwidth List, version 1.0.0, 1.1.0 and later. It is new specification for the existing format 1.0.0. Describes a new format 1.1.0, which is backwards compatible with 1.0.0 parsers.
Since Tor version 0.2.4.12-alpha the directory authorities use the Bandwidth List file called "V3BandwidthsFile" generated by Torflow [1]. The format is described in Torflow's README.spec.txt and is considered to be version 1.0.0.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth generator (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth generator implementation.
This specification was revised after feedback from:
Nick Mathewson (nickm) Iain Learmonth (irl)
1.3 Outline
The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1 and 3.4.2, use the term bandwidth measurements, to refer to what here is called Bandwidth List. A Bandwidth List file contains information on relays' bandwidth capacities and is produced by bandwidth generators, previously known as bandwidth scanners.
1.4. Format Versions
1.0.0 - The legacy fallback Bandwidth List format
1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues to RelayLines and format versions.
All Tor versions can consume format version 1.0.0. All Tor versions can consume format version 1.1.0, but they warn on additional header Lines. [TODO: this might be fixed, and if it is fixed should be said which version of Tor]
2. Format details
The Bandwidth List MUST contain the following sections: - Header List (exactly once) - Relays' Bandwidth List (zero or more times) If it does not contain these sections, parsers SHOULD ignore the file.
2.1. Definitions
The following nonterminals are defined in Tor directory protocol sections 1.2., 2.1.1., 2.1.3.:
Int SP (space) NL (newline) Keyword ArgumentChar nickname hexdigest (a '$', followed by 40 hexadecimal characters ([A-Fa-f0-9]))
Nonterminal defined section 2 of version-spec.txt [4]:
version_number
We define the following nonterminals:
Line ::= ArgumentChar* NL RelayLine ::= KeyValue (SP KeyValue)* NL KeyValue ::= Keyword "=" Value Value ::= ArgumentCharValue+ ArgumentCharValue ::= any printing ASCII character except NL and SP. Terminator ::= "=====" Timestamp ::= Int Bandwidth ::= Int MasterKey ::= a base64-encoded Ed25519 public key, with padding characters omitted. DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
Note that key_value and value are defined in Tor directory protocol with different formats to KeyValue and Value here.
All Lines in the file MUST be 510 characters or less, to allow for the trailing newline and NULL characters. The previous limit was 254 characters in Tor 0.2.6.2-alpha and earlier. The parser MAY ignore longer Lines. [TODO: Change this restriction in 1.1.0 or later]
2.2. Header List format
Some header Lines MUST appear in specific positions, as documented below. All other Lines can appear in any order. If a parser does not recognize any extra material in a header Line, the Line MUST be ignored. If a header Line does not conform to this format, the Line SHOULD be ignored by parsers.
It consists of:
Timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created. It does not follow the KeyValue format for backwards compatibility with version 1.0.0.
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version. It uses semantic versioning [5].
This Line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this Line, and the version_number is considered to be "1.0.0".
"software=" Value NL
[Zero or one time.]
The name of the software that created the document.
This Line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this Line, and the software is considered to be "torflow".
"software_version=" Value NL
[Zero or one time.]
The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme.
This Line has been added in version 1.1.0 of this specification.
"generator_started=" DateTime NL
[Zero or one time.]
The date and time timestamp in ISO 8601 format and UTC time zone when the generator started.
This Line has been added in version 1.1.0 of this specification.
"earliest_bandwidth=" DateTime NL
[Zero or one time.]
The date and time timestamp in ISO 8601 format and UTC time zone when the first relay bandwidth was obtained.
This Line has been added in version 1.1.0 of this specification.
KeyValue NL
[Zero or more times.]
There MUST NOT be multiple KeyValue header Lines with the same key. If there are, the parser SHOULD choose an arbitrary Line.
If a parser does not recognize a Keyword in a KeyValue Line, it MUST be ignored.
Future format versions may include additional KeyValue header Lines. Additional header Lines will be accompanied by a minor version increment.
Implementations MAY add additional header Lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys.
Parsers MUST NOT rely on the order of these additional Lines.
Additional header Lines MUST NOT use any keywords specified in the relay measurements format. If there are, the parser MAY ignore conflicting keywords.
Terminator NL
[Zero or one time.]
The Header List section ends with this Terminator.
In version 1.0.0, Header List ends when the first relay bandwidth is found conforming to the next section. Implementations of version 1.1.0 SHOULD include this Line.
2.3. Relays' Bandwidth List format
It consists of zero or more RelayLines with the relays' bandwidth in arbitrary order.
There MUST NOT be multiple KeyValue pairs with the same key in the same RelayLine. If there are, the parser SHOULD choose an arbitrary Value.
There MUST NOT be multiple RelayLine per relay identity (node_id or master_key_ed25519). If there are, parsers SHOULD issue a warning and MAY choose an arbitrary value or ignore both values.
If a parser does not recognize any extra material in a RelayLine, the extra material MUST be ignored.
Each RelayLine MUST include the following KeyValue pairs: In version 1.0.0, node_id MUST NOT be at the end of the Line. In version 1.1.0, the KeyValue can be in any arbitrary order. [TODO: list of Tor version that support it, when it's done]
"node_id=" hexdigest
[Exactly once.]
The fingerprint for the relay's RSA identity key.
"master_key_ed25519=" MasterKey
[Zero or one time.]
The relays's master Ed25519 key, base64 encoded, without trailing "="s, to avoid ambiguity with KeyValue "=" character.
Implementations of version 1.1.0 SHOULD include both node_id and master_key_ed25519. Parsers SHOULD accept Lines that contain at least one of them.
"bw=" Bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth. If there are zero bandwidths, the parser MAY ignore them.
Multiple measurements can be aggregated using an averaging scheme, such as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other implementations SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for the same network, their measurements MAY need further scaling. See Appendix B for information about scaling, and one possible scaling method.
KeyValue
[Zero or more times.]
Future format versions may include additional KeyValue pairs on a RelayLine. Additional KeyValue pairs will be accompanied by a minor version increment.
Implementations MAY add additional relay KeyValue pairs as needed. This specification SHOULD be updated to avoid conflicting meanings for the same Keywords.
Parsers MUST NOT rely on the order of these additional KeyValue pairs.
Additional KeyValue pairs MUST NOT use any keywords specified in the header format. If there are, the parser MAY ignore conflicting keywords.
2.4. Implementation notes
KeyValue pairs in RelayLines that current implementations generate.
2.4.1. Simple Bandwidth Scanner
Every RelayLine in sbws version 0.1.0 consists of:
"node_id=" hexdigest SP
As above.
"bw=" Bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" DateTime NL
[Exactly once.]
The date and time timestamp in ISO 8601 format and UTC time zone when the last bandwidth was obtained.
2.4.2. Torflow
Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
References:
1. https://gitweb.torproject.org/torflow.git 2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R... 3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt 4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt 5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.X [TODO: this needs to be implemented when this spec is finished]
1523911758 version=1.1.0 software=sbws software_version=0.1.0 generator_started=2018-05-08T16:13:25 earliest_bandwidth=2018-05-08T16:13:26 ==== node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760 nick=Test rtt=380 time=2018-05-08T16:13:26 node_id=$96C15995F30895689291F455587BD94CA427B6FC master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189 nick=Test2 rtt=378 time=2018-05-08T16:13:36
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, scaling methods SHOULD perform the following checks: * If the total bandwidth is zero, all relays should be given equal bandwidths. * If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for torflow and sbws, because their measured bandwidths are similar enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling method, which ensures that all bandwidth votes contain approximately the same total bandwidth:
1. Calculate the relay quota by dividing the total measured bandwidth in all votes, by the number of relays with measured bandwidth votes. In the public tor network, this is approximately 7500 as of April 2018. The quota should be a consensus parameter, so it can be adjusted for all generators on the network.
2. Calculate a vote quota by multiplying the relay quota by the number of relays this bandwidth authority has measured bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the total unscaled measured bandwidth in this bandwidth authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling factor.
Now, the total scaled bandwidth in the upcoming vote is approximately equal to the quota.
B.3. Quota changes
If all generators are using scaling, the quota can be gradually reduced or increased as needed. Smaller quotas decrease the size of uncompressed consensuses, and may decrease the size of consensus diffs and compressed consensuses. But if the relay quota is too small, some relays may be over- or under-weighted.
Hi,
On 09/05/18 13:08, juga wrote:
after nick, irl and teor reviewed the last version i sent [1], i paste below a new version of the specification versions 1.1.0 and 1.0.0. It's the same version as commit https://github.com/juga0/torspec/commit/c7cdfd4fcb4b5623e1407e2bec38e9fdf7b7....
Awesome. As soon as the opportunity presents itself I will take another read through.
The main question that came up was whether we should create a backwards incompatible specification version 2.0.0.
I do think it may be easier to finish this specification first before moving on to a 2.0.0 version. We discussed this specification at the last Tor Metrics meeting: Currently Tor Metrics' descriptor parsing library (metrics-lib) does not have a parser for the current (1.0.0) bandwidth lists. We should implement this, and hopefully it would not be too much extra work to make this also work for 1.1.0 lists.
If there were to be a 2.0.0 version, I would hope this brings the syntax closer to other Tor descriptors to enable us to reuse existing code in our parser (and also for tor to reuse existing code too) rather than being something entirely new.
Thanks, Iain.
Iain Learmonth:
The main question that came up was whether we should create a backwards incompatible specification version 2.0.0.
I do think it may be easier to finish this specification first before moving on to a 2.0.0 version. We discussed this specification at the last Tor Metrics meeting: Currently Tor Metrics' descriptor parsing library (metrics-lib) does not have a parser for the current (1.0.0) bandwidth lists. We should implement this, and hopefully it would not be too much extra work to make this also work for 1.1.0 lists.
It shouldn't be much extra work, if metrics-lib ignore the extra header lines and extra bandwidth KeyValue (as Tor currently does).
If there were to be a 2.0.0 version, I would hope this brings the syntax closer to other Tor descriptors to enable us to reuse existing code in our parser (and also for tor to reuse existing code too) rather than being something entirely new.
That was the idea :)
Thanks! juga.
Hi,
as commented by nickm [0], the specification has been merged into torspec git repository.
If you find any issues with it, as usual, we can open a ticket and patch the specification.
Thanks, juga.
[0] https://trac.torproject.org/projects/tor/ticket/25869#comment:10