[torflow/master] Add first draft of bw scanner spec.

25 Oct 2011

commit 3a2811beb90af3d28c8e2cb41d86517f3ec858fe
Author: Karsten Loesing <karsten.loesing@gmx.net>
Date:   Thu Apr 14 13:00:42 2011 +0200

    Add first draft of bw scanner spec.
---
 bwauth-spec.txt |  326 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 326 insertions(+), 0 deletions(-)

diff --git a/bwauth-spec.txt b/bwauth-spec.txt
new file mode 100644
index 0000000..bb9451d
--- /dev/null
+++ b/bwauth-spec.txt
@@ -0,0 +1,326 @@
+
+                      Bandwidth Scanner specification
+
+
+          "This is Fail City and sqlalchemy is running for mayor"
+                                  - or -
+   How to Understand What The Heck the Tor Bandwidth Scanners are Doing
+
+
+                             Karsten Loesing
+                                Mike Perry
+
+0. Preliminaries
+
+   The Tor bandwidth scanners measure the bandwidth of relays in the Tor
+   network to adjust the relays' self-advertised bandwidth values.  The
+   bandwidth scanners are run by a subset of Tor directory authorities
+   which include the results in their network status votes.  Consensus
+   bandwidth weights are then used by Tor clients to make better path
+   selection decisions.  The outcome is a better load balanced Tor network
+   with a more efficient use of the available bandwidth capacity by users.
+
+   This document describes the implementation of the bandwidth scanners as
+   part of the Torflow and TorCtl packages.  This document has two main
+   sections:
+
+    - Section 1 covers the operation of the continuously running bandwidth
+      scanners to split the set of running relays into workable subsets,
+      select two-hop paths between these relays, perform downloads, and
+      write performance results to disk.
+
+    - Section 2 describes the periodically run step to aggregate results
+      in order to include them in the network status voting process.
+
+   The "interfaces" of this document are Tor's control and SOCKS protocol
+   for performing measurements and Tor's directory protocol for including
+   results int the network status voting process.
+
+   The focus of this document is the functionality of the bandwidth
+   scanners in their default configuration.  Whenever there are
+   configuration options that significantly change behavior, this is
+   noted.  But this document is not a manual and does not describe any
+   configuration options in detail.  Refer to README.BwAuthorities for the
+   operation of bandwidth scanners.
+
+1. Measuring relay bandwidth
+
+   Every directory authority that wants to include bandwidth scanner
+   results in its vote operates a set of four bandwidth scanners running
+   in parallel.  These bandwidth scanners divide the Tor network into four
+   partitions from fastest to slowest relays and continuously measure the
+   relays' bandwidth capacity.  Each bandwidth scanner runs the steps as
+   described in this section.  The results of all four bandwidth scanners
+   are periodically aggregated as described in the next section.
+
+1.1. Configuring and running a Tor client
+
+   All four bandwidth scanners use a single Tor client for their
+   measurements.  This Tor client has two non-standard configuration
+   options set.  The first:
+
+      FetchUselessDescriptors 1
+
+   configures Tor to fetch descriptors of non-running relays.  The second:
+
+      __LeaveStreamsUnattached 1
+
+   instructs Tor to leave streams unattached and let the controller attach
+   new streams to circuits.
+#
+# Why does bwauthority.py set and reset these configuration options when
+# the provided torrc already contains them?  Particularly the resetting
+# part seems to be broken, because bwauthority.py sets
+# __LeaveStreamsUnattached 0 even though other scanners might still be
+# running.  The whole code should be removed from bwauthority.py.  -KL
+
+1.2. Connecting to Tor via its control port
+
+   At startup, the bandwidth scanners connect to the Tor client via its
+   control port using cookie authentication.  The bandwidth scanners
+   register for events of the following types:
+
+    - NEWCONSENSUS
+    - NEWDESC
+    - CIRC
+    - STREAM
+    - BW
+    - STREAM_BW
+
+   These events are used to learn about updated Tor directory information
+   and about measurement progress.
+
+1.3. Selecting slices of relays
+
+   Each of the four bandwidth scanners is responsible for a subset of
+   running relays, determined by a fixed percentile range of bandwidths
+   listed in the network status consensus.  By default the four scanners
+   are responsible for the relays with consensus bandwidth:
+
+    1. from  0th to  12th percentile (fastest relays),
+    2. from 12th to  35th percentile (fast relays),
+    3. from 35th to  60th percentile (slow relays), and
+    4. from 60th to 100th percentile (slowest relays).
+
+   The bandwidth scanners further subdivide the share of relays they are
+   responsible for into slices of 50 relays to perform measurements.
+
+   A slice does not consist of 50 fixed relays, but is defined by a
+   percentile range containing 50 relays.  The lower bound of the
+   percentile range equals the former upper bound of the previous slice or
+   0 if this is the first slice.  The upper bound is determined from the
+   network status consensus at the time of starting the slice.  The upper
+   percentile may exceed the percentile range that the bandwidth scanner
+   is responsible for, whereas the lower percentile isn't.  The set of
+   relays contained in the slice can change arbitrary often while
+   performing measurements.
+#
+# What if we approach the upper bound of the interval we're responsible
+# for and there are no 50 relays left?  Is the last slice going to have
+# fewer relays, or do we decrease the lower percentile until we have 50
+# relays?  Example: There are 101 relays between 60th and 100th
+# percentile, and we just finished relays 51 to 100.  Is the next slice
+# going to have only 1 relay?  I saw output files from 100th to 102nd
+# percentile on gabelmoo.  How's that possible?  -KL
+#
+# The paragraph above contains a lot of guesswork and may be completely
+# wrong.  But we need some definition of what relays are contained in a
+# slice and whether membership can change over time.  -KL
+
+   A bandwidth scanner keeps measuring the bandwidth of the relays in a
+   slice until:
+
+    - every relay in the slice has been selected for measurement at least
+      5 times, and
+
+    - the number of successful fetches is at least 65% of the possible
+      path combinations (5 x number of relays / 2).
+
+   Note that the second requirement makes no assumptions about successful
+   fetches for a given relay or path.  It is just an abstract number to
+   avoid skipping slices in case of temporary network failure.
+#
+# If selection is random, isn't there a small chance of never picking a
+# relay and never reaching the 5 measurements for this relay?  -KL
+
+1.4. Selecting paths for measurements
+
+   Before selecting a new path for a measurement, a bandwidth scanner
+   makes sure that it has a valid consensus, and if it doesn't, it waits
+   for the Tor client to provide one.
+
+   The bandwidth scanners also check the local system time and avoid
+   starting new measurements between 01:30 and 04:30 local time.
+#
+# Why do the authorities sleep for three hours in the *default*
+# configuration?  It seems useful to have this as a configuration option,
+# but why is it enabled by default?  -KL
+#
+# It seems that after waking up from this 3 hour break, we don't wait for
+# a valid consensus.  Should we?  -KL
+
+   The bandwidth scanners then select a path and instruct Tor to build a
+   circuit that meets the following requirements:
+
+    - All relays for the new path need to be members of the current slice.
+
+    - The minimum consensus bandwidth for relays to be selected is 1
+      KiB/s.
+
+    - Path length is always 2.
+
+    - Selection is uniform, that is, there is no preference for relays,
+      e.g., based on bandwidth.
+
+    - Relays in the paths must come from different /16 subnets.
+
+    - Entry relays must have the Running and Fast flags and must not
+      permit exiting to 255.255.255.255:443.
+
+    - Exit relays must have the Running and Fast flags, must not have the
+      BadExit flag, and must permit exiting to 255.255.255.255:443.
+#
+# If the Fast flag is really required for both positions, does this mean
+# that non-Fast relays are not measured?  How does this work with the
+# criteria to consider a slice finished?  And what if the criteria for
+# assigning the Fast flag are tightened in the future?  -KL
+#
+# The sets of entry and exit relays don't overlap, right?  What if a slice
+# of 50 relays has entry or exit relays, but none of the other set?
+# Right, it's highly unlikely, but does this mean we wouldn't measure
+# anything?  -KL
+#
+# There's even more guesswork involved here.  This needs review!  -KL
+
+1.5. Performing measurements
+
+   Once the circuit is built, the bandwidth scanners download a test file
+   via Tor's SOCKS port using SOCKS protocol version 5.
+
+   All downloads go to same bandwidth authority server.
+
+   All requests are sent to port 443 using https to avoid caching on the
+   exit relay.
+#
+# Do the bandwidth scanners check the result size and/or the bandwidth
+# authority certificate somewhere?  If not, should they?  Otherwise,
+# malicious exits could manipulate their bandwidth weights too easily.
+# -KL
+
+   The requested resource for performing the measurement varies with the
+   lower percentile of the slice under investigation.  The default file
+   sizes by lower percentiles are:
+
+    -  0th to  10th percentile:   2 MiB
+    - 10th to  20th percentile:   1 MiB
+    - 20th to  30th percentile: 512 KiB
+    - 30th to  50th percentile: 256 KiB
+    - 50th to 100th percentile: 128 KiB
+#
+# In choose_url(), we raise an exception saying that no nodes are left for
+# the URL choice, but really we can only run into this exception when we
+# pass a value > 100 for percentile.  -KL
+
+   The bandwidth scanners use the following fixed user-agent string for
+   their requests:
+
+      Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; \
+      .NET CLR 1.0.3705; .NET CLR 1.1.4322)
+
+   Unfinished downloads are aborted after 30 minutes.
+#
+# That's a pretty high timeout, right?  This can slow us down
+# significantly, given that downloads are not run in parallel for a given
+# bandwidth scanner.  A better timeout might be 10 or 15 minutes.  -KL
+#
+# There's a code line "if ret == 1 and build_exit:" with the else case
+# including build_exit in the log message.  What if ret == 0 and
+# build_exit is null?  -KL
+
+   For each download, the bandwidth scanners collect the following data:
+#
+# TODO Most of this happens in TorCtl-land that I'm even less familiar
+# with than with Torflow-land.  Mike, can you give me some pointers what
+# code parts to look at in order to understand which Tor controller events
+# are processed where and what we learn from them?  -KL
+
+1.6. Writing measurement results
+
+   Once a bandwidth scanner has completed a slice of relays, it writes the
+   measurement results to disk.
+
+   The output file contains information about the slice number, the
+   timestamp of completing the slice, and the measurement results for the
+   measured relays.
+
+   Only relays with at least 1 successful measurement, non-negative
+   filtered stream bandwidth, and non-negative stream bandwidth are
+   included in the output file.
+#
+# What's the difference between stream and filtered stream?  -KL
+
+   The filename of an output file is derived from the lower and upper
+   slice percentiles and the measurement completion time.  The format is
+
+      "bws-" lower percentile ":" upper percentile "-done-" timestamp
+
+   Both lower and upper percentiles are decimal numbers rounded to 1
+   decimal place.  The timestamp is formatted "YYYY-MM-DD-HH:MM:SS".
+
+   The first line of an output file contains the slice number:
+
+      "slicenum=" slice number NL
+
+   The second line contains the UNIX timestamp when the output file was
+   written:
+
+      timestamp NL
+
+   Subsequent lines contain the measurement results of all relays in the
+   slice in arbitrary order.  There can be at most one such line per relay
+   identity:
+
+      "node_id=" fingerprint SP
+      "nick=" nickname SP
+      "strm_bw=" stream bandwidth SP
+      "filt_bw=" filtered stream bandwidth SP
+      "desc_bw=" descriptor bandwidth SP
+      "ns_bw=" network status bandwidth NL
+
+   The meaning of these fields is as follows: fingerprint is the
+   hex-encoded, upper-case relay identity fingerprint; nickname is the
+   relay's nickname; stream bandwidth and filtered stream bandwidth
+   contain the average measurements; descriptor bandwidth is the average
+   self-advertised bandwidth contained in relay descriptors; and network
+   status bandwidth is the average relay bandwidth contained in network
+   status consensuses.
+#
+# Which nickname is chosen here if a relay changes its nickname between
+# two measurements?  Does it matter?  -KL
+#
+# Starting to count slices at 0 whenever we start at the lower end of our
+# percentile range seems error-prone.  What if the number of slices
+# changes while we're only half through with all slices?  Isn't there a
+# potential from overlooking results?  Or do we not care about the slice
+# number when aggregating results?  -KL
+
+2. Aggregating scanner results
+
+   Every few hours, the bandwidth scanner results are aggregated in order
+   to include them in the network status consensus process.  This
+   aggregation step looks at the finished measurements, ....
+
+2.1. Connecting to Tor client
+
+# The aggregate script connects to the same Tor client that bandwidth
+# scanners use and requests the currently valid network status consensus
+# from it.  Does that mean we won't have an opinion on relays that are
+# offline right now?  -KL
+
+2.2. [...]
+
+# BETA, GUARD_BETA, ALPHA, and GUARD_ALPHA are all set to 0 in the default
+# configuration.  Is the plan to change their values and use the more
+# complex aggregation mechanism anytime soon?  Or were they only in the
+# code to run experiments and should go away?  -KL
+

    

mikeperry＠torproject.org

tags

participants (1)