commit 9465f9c0713b2185643a976ef14e0959d3885e80 Author: juga0 juga@riseup.net Date: Mon May 14 14:31:05 2018 -0400
Bandwidth-measurement file specification, as sent to tor-dev --- bandwidth-file-spec.txt | 412 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 412 insertions(+)
diff --git a/bandwidth-file-spec.txt b/bandwidth-file-spec.txt new file mode 100644 index 0000000..0755329 --- /dev/null +++ b/bandwidth-file-spec.txt @@ -0,0 +1,412 @@ + Tor Bandwidth List Format + juga + teor + +1. Scope and preliminaries + + This document describes the format of Tor's Bandwidth List, + version 1.0.0, 1.1.0 and later. + It is new specification for the existing format 1.0.0. + Describes a new format 1.1.0, which is backwards compatible with + 1.0.0 parsers. + + Since Tor version 0.2.4.12-alpha the directory authorities use + the Bandwidth List file called "V3BandwidthsFile" generated by + Torflow [1]. The format is described in Torflow's README.spec.txt and + is considered to be version 1.0.0. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL + NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + RFC 2119. + +1.2. Acknowledgements + + The original bandwidth generator (Torflow) and format was + created by mike. Teor suggested to write this specification while + contributing on pastly's new bandwidth generator implementation. + + This specification was revised after feedback from: + + Nick Mathewson (nickm) + Iain Learmonth (irl) + +1.3 Outline + + The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1 + and 3.4.2, use the term bandwidth measurements, to refer to what + here is called Bandwidth List. + A Bandwidth List file contains information on relays' bandwidth + capacities and is produced by bandwidth generators, previously known + as bandwidth scanners. + +1.4. Format Versions + + 1.0.0 - The legacy fallback Bandwidth List format + + 1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues + to RelayLines and format versions. + + All Tor versions can consume format version 1.0.0. + All Tor versions can consume format version 1.1.0, + but they warn on additional header Lines. + [TODO: this might be fixed, and if it is fixed should be said which + version of Tor] + +2. Format details + + The Bandwidth List MUST contain the following sections: + - Header List (exactly once) + - Relays' Bandwidth List (zero or more times) + If it does not contain these sections, parsers SHOULD ignore the file. + +2.1. Definitions + + The following nonterminals are defined in Tor directory protocol + sections 1.2., 2.1.1., 2.1.3.: + + Int + SP (space) + NL (newline) + Keyword + ArgumentChar + nickname + hexdigest (a '$', followed by 40 hexadecimal characters + ([A-Fa-f0-9])) + + Nonterminal defined section 2 of version-spec.txt [4]: + + version_number + + We define the following nonterminals: + + Line ::= ArgumentChar* NL + RelayLine ::= KeyValue (SP KeyValue)* NL + KeyValue ::= Keyword "=" Value + Value ::= ArgumentCharValue+ + ArgumentCharValue ::= any printing ASCII character except NL and SP. + Terminator ::= "=====" + Timestamp ::= Int + Bandwidth ::= Int + MasterKey ::= a base64-encoded Ed25519 public key, with + padding characters omitted. + DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601 + + Note that key_value and value are defined in Tor directory protocol + with different formats to KeyValue and Value here. + + All Lines in the file MUST be 510 characters or less, to allow for the + trailing newline and NULL characters. + The previous limit was 254 characters in Tor 0.2.6.2-alpha and + earlier. + The parser MAY ignore longer Lines. + [TODO: Change this restriction in 1.1.0 or later] + +2.2. Header List format + +Some header Lines MUST appear in specific positions, as documented +below. +All other Lines can appear in any order. +If a parser does not recognize any extra material in a header Line, +the Line MUST be ignored. +If a header Line does not conform to this format, the Line SHOULD be +ignored by parsers. + +It consists of: + + Timestamp NL + + [At start, exactly once.] + + The Unix Epoch time in seconds when the file was created. + It does not follow the KeyValue format for backwards + compatibility with version 1.0.0. + + "version=" version_number NL + + [In second position, zero or one time.] + + The specification document format version. + It uses semantic versioning [5]. + + This Line has been added in version 1.1.0 of this specification. + + Version 1.0.0 documents do not contain this Line, and the + version_number is considered to be "1.0.0". + + "software=" Value NL + + [Zero or one time.] + + The name of the software that created the document. + + This Line has been added in version 1.1.0 of this specification. + + Version 1.0.0 documents do not contain this Line, and the software + is considered to be "torflow". + + "software_version=" Value NL + + [Zero or one time.] + + The version of the software that created the document. + The version may be a version_number, a git commit, or some other + version scheme. + + This Line has been added in version 1.1.0 of this specification. + + "generator_started=" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the generator started. + + This Line has been added in version 1.1.0 of this specification. + + "earliest_bandwidth=" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the first relay bandwidth was obtained. + + This Line has been added in version 1.1.0 of this specification. + + KeyValue NL + + [Zero or more times.] + + There MUST NOT be multiple KeyValue header Lines with the same key. + If there are, the parser SHOULD choose an arbitrary Line. + + If a parser does not recognize a Keyword in a KeyValue Line, it + MUST be ignored. + + Future format versions may include additional KeyValue header Lines. + Additional header Lines will be accompanied by a minor version + increment. + + Implementations MAY add additional header Lines as needed. This + specification SHOULD be updated to avoid conflicting meanings for + the same header keys. + + Parsers MUST NOT rely on the order of these additional Lines. + + Additional header Lines MUST NOT use any keywords specified in the + relay measurements format. + If there are, the parser MAY ignore conflicting keywords. + + Terminator NL + + [Zero or one time.] + + The Header List section ends with this Terminator. + + In version 1.0.0, Header List ends when the first relay bandwidth + is found conforming to the next section. + Implementations of version 1.1.0 SHOULD include this Line. + +2.3. Relays' Bandwidth List format + +It consists of zero or more RelayLines with the relays' bandwidth +in arbitrary order. + +There MUST NOT be multiple KeyValue pairs with the same key in the same +RelayLine. +If there are, the parser SHOULD choose an arbitrary Value. + +There MUST NOT be multiple RelayLine per relay identity (node_id or +master_key_ed25519). +If there are, parsers SHOULD issue a warning and MAY choose an arbitrary +value or ignore both values. + +If a parser does not recognize any extra material in a RelayLine, +the extra material MUST be ignored. + +Each RelayLine MUST include the following KeyValue pairs: +In version 1.0.0, node_id MUST NOT be at the end of the Line. +In version 1.1.0, the KeyValue can be in any arbitrary order. +[TODO: list of Tor version that support it, when it's done] + + "node_id=" hexdigest + + [Exactly once.] + + The fingerprint for the relay's RSA identity key. + + "master_key_ed25519=" MasterKey + + [Zero or one time.] + + The relays's master Ed25519 key, base64 encoded, + without trailing "="s, to avoid ambiguity with KeyValue "=" + character. + + Implementations of version 1.1.0 SHOULD include both node_id and + master_key_ed25519. + Parsers SHOULD accept Lines that contain at least one of them. + + "bw=" Bandwidth + + [Exactly once.] + + The measured bandwidth of this relay. + + Tor accepts zero bandwidths, but they trigger bugs in older Tor + implementations. Therefore, implementations SHOULD NOT produce zero + bandwidths. Instead, they SHOULD use one as their minimum bandwidth. + If there are zero bandwidths, the parser MAY ignore them. + + Multiple measurements can be aggregated using an averaging scheme, + such as a mean, median, or decaying average. + + Torflow scales bandwidths to kilobytes per second. Other + implementations SHOULD use kilobytes per second for their initial + bandwidth scaling. + + If different implementations or configurations are used in votes for + the same network, their measurements MAY need further scaling. See + Appendix B for information about scaling, and one possible scaling + method. + + KeyValue + + [Zero or more times.] + + Future format versions may include additional KeyValue pairs on a + RelayLine. + Additional KeyValue pairs will be accompanied by a minor version + increment. + + Implementations MAY add additional relay KeyValue pairs as needed. + This specification SHOULD be updated to avoid conflicting meanings + for the same Keywords. + + Parsers MUST NOT rely on the order of these additional KeyValue + pairs. + + Additional KeyValue pairs MUST NOT use any keywords specified in the + header format. + If there are, the parser MAY ignore conflicting keywords. + +2.4. Implementation notes + +KeyValue pairs in RelayLines that current implementations generate. + +2.4.1. Simple Bandwidth Scanner + +Every RelayLine in sbws version 0.1.0 consists of: + + "node_id=" hexdigest SP + + As above. + + "bw=" Bandwidth SP + + As above. + + "nick=" nickname SP + + [Exactly once.] + + The relay nickname. + + "rtt=" Int SP + + [Exactly once.] + + The Round Trip Time in milliseconds to obtain 1 byte of data. + + "time=" DateTime NL + + [Exactly once.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the last bandwidth was obtained. + +2.4.2. Torflow + +Torflow RelayLines include node_id and bw, and other KeyValue pairs [2]. + +References: + +1. https://gitweb.torproject.org/torflow.git +2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R... +3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt +4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt +5. https://semver.org/ + +A. Sample data + +The following has not been obtained from any real measurement. + +A.1. Generated by Torflow + +This an example version 1.0.0 document: + +1523911758 +node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath +node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath + +A.2. Generated by sbws version 0.1.X +[TODO: this needs to be implemented when this spec is finished] + +1523911758 +version=1.1.0 +software=sbws +software_version=0.1.0 +generator_started=2018-05-08T16:13:25 +earliest_bandwidth=2018-05-08T16:13:26 +==== +node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760 nick=Test rtt=380 time=2018-05-08T16:13:26 +node_id=$96C15995F30895689291F455587BD94CA427B6FC master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189 nick=Test2 rtt=378 time=2018-05-08T16:13:36 + +B. Scaling bandwidths + +B.1. Scaling requirements + +Tor accepts zero bandwidths, but they trigger bugs in older Tor +implementations. Therefore, scaling methods SHOULD perform the +following checks: + * If the total bandwidth is zero, all relays should be given equal + bandwidths. + * If the scaled bandwidth is zero, it should be rounded up to one. + +Initial experiments indicate that scaling may not be needed for +torflow and sbws, because their measured bandwidths are similar +enough already. + +B.2. A linear scaling method + +If scaling is required, here is a simple linear bandwith scaling +method, which ensures that all bandwidth votes contain approximately +the same total bandwidth: + +1. Calculate the relay quota by dividing the total measured bandwidth + in all votes, by the number of relays with measured bandwidth + votes. In the public tor network, this is approximately 7500 as of + April 2018. The quota should be a consensus parameter, so it can be + adjusted for all generators on the network. + +2. Calculate a vote quota by multiplying the relay quota by the number + of relays this bandwidth authority has measured + bandwidths for. + +3. Calculate a scaling factor by dividing the vote quota by the + total unscaled measured bandwidth in this bandwidth + authority's upcoming vote. + +4. Multiply each unscaled measured bandwidth by the scaling + factor. + +Now, the total scaled bandwidth in the upcoming vote is +approximately equal to the quota. + +B.3. Quota changes + +If all generators are using scaling, the quota can be gradually +reduced or increased as needed. Smaller quotas decrease the size +of uncompressed consensuses, and may decrease the size of +consensus diffs and compressed consensuses. But if the relay +quota is too small, some relays may be over- or under-weighted.
tor-commits@lists.torproject.org