[or-cvs] [tor/master] Proposal: Including Network Statistics in Extra-Info Documents

Nick Mathewson nickm at seul.org
Thu Jul 23 14:59:45 UTC 2009


Author: Karsten Loesing <karsten.loesing at gmx.net>
Date: Thu, 23 Jul 2009 10:59:00 -0400
Subject: Proposal: Including Network Statistics in Extra-Info Documents
Commit: 884c621aa73edd9727cb97ef249cefdaf8ec4741

---
 doc/spec/proposals/000-index.txt                   |    2 +
 .../proposals/166-statistics-extra-info-docs.txt   |  391 ++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 doc/spec/proposals/166-statistics-extra-info-docs.txt

diff --git a/doc/spec/proposals/000-index.txt b/doc/spec/proposals/000-index.txt
index b26f382..af1f40b 100644
--- a/doc/spec/proposals/000-index.txt
+++ b/doc/spec/proposals/000-index.txt
@@ -86,6 +86,7 @@ Proposals by number:
 163  Detecting whether a connection comes from a client [OPEN]
 164  Reporting the status of server votes [OPEN]
 165  Easy migration for voting authority sets [OPEN]
+166  Including Network Statistics in Extra-Info Documents [OPEN]
 
 
 Proposals by status:
@@ -113,6 +114,7 @@ Proposals by status:
    163  Detecting whether a connection comes from a client [for 0.2.2]
    164  Reporting the status of server votes [for 0.2.2]
    165  Easy migration for voting authority sets
+   166  Including Network Statistics in Extra-Info Documents [for 0.2.2]
  ACCEPTED:
    110  Avoiding infinite length circuits [for 0.2.1.x] [in 0.2.1.3-alpha]
    117  IPv6 exits [for 0.2.1.x]
diff --git a/doc/spec/proposals/166-statistics-extra-info-docs.txt b/doc/spec/proposals/166-statistics-extra-info-docs.txt
new file mode 100644
index 0000000..3716c04
--- /dev/null
+++ b/doc/spec/proposals/166-statistics-extra-info-docs.txt
@@ -0,0 +1,391 @@
+Filename: 166-statistics-extra-info-docs.txt
+Title: Including Network Statistics in Extra-Info Documents
+Author: Karsten Loesing
+Created: 21-Jul-2009
+Target: 0.2.2
+Status: Open
+
+Change history:
+
+  21-Jul-2009  Initial proposal for or-dev
+
+
+Overview:
+
+  The Tor network has grown to almost two thousand relays and millions
+  of casual users over the past few years. With growth has come
+  increasing performance problems and attempts by some countries to
+  block access to the Tor network. In order to address these problems,
+  we need to learn more about the Tor network. This proposal suggests to
+  measure additional statistics and include them in extra-info documents
+  to help us understand the Tor network better.
+
+
+Introduction:
+
+  As of May 2009, relays, bridges, and directories gather the following
+  data for statistical purposes:
+
+  - Relays and bridges count the number of bytes that they have pushed
+    in 15-minute intervals over the past 24 hours. Relays and bridges
+    include these data in extra-info documents that they send to the
+    directory authorities whenever they publish their server descriptor.
+
+  - Bridges further include a rough number of clients per country that
+    they have seen in the past 48 hours in their extra-info documents.
+
+  - Directories can be configured to count the number of clients they
+    see per country in the past 24 hours and to write them to a local
+    file.
+
+  Since then we extended the network statistics in Tor. These statistics
+  include:
+
+  - Directories now gather more precise statistics about connecting
+    clients. Fixes include measuring in intervals of exactly 24 hours,
+    counting unsuccessful requests, measuring download times, etc. The
+    directories append their statistics to a local file every 24 hours.
+
+  - Entry guards count the number of clients per country per day like
+    bridges do and write them to a local file every 24 hours.
+
+  - Relays measure statistics of the number of cells in their circuit
+    queues and how much time these cells spend waiting there. Relays
+    write these statistics to a local file every 24 hours.
+
+  - Exit nodes count the number of read and written bytes on exit
+    connections per port as well as the number of opened exit streams
+    per port in 24-hour intervals. Exit nodes write their statistics to
+    a local file.
+
+  The following four sections contain descriptions for adding these
+  statistics to the relays' extra-info documents.
+
+
+Directory request statistics:
+
+  The first type of statistics aims at measuring directory requests sent
+  by clients to a directory mirror or directory authority. More
+  precisely, these statistics aim at requests for v2 and v3 network
+  statuses only. These directory requests are sent non-anonymously,
+  either via HTTP-like requests to a directory's Dir port or tunneled
+  over a 1-hop circuit.
+
+  Measuring directory request statistics is useful for several reasons:
+  First, the number of locally seen directory requests can be used to
+  estimate the total number of clients in the Tor network. Second, the
+  country-wise classification of requests using a GeoIP database can
+  help counting the relative and absolute number of users per country.
+  Third, the download times can give hints on the available bandwidth
+  capacity at clients.
+
+  Directory requests do not give any hints on the contents that clients
+  send or receive over the Tor network. Every client requests network
+  statuses from the directories, so that there are no anonymity-related
+  concerns to gather these statistics. It might be, though, that clients
+  wish to hide the fact that they are connecting to the Tor network.
+  Therefore, IP addresses are resolved to country codes in memory,
+  events are accumulated over 24 hours, and numbers are rounded up to
+  multiples of 4 or 8.
+
+   "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
+      [At most once.]
+
+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
+      interval of length NSEC seconds (86400 seconds by default).
+
+      A "dirreq-stats-end" line, as well as any other "dirreq-*" line,
+      is only added when the relay has opened its Dir port and after 24
+      hours of measuring directory requests.
+
+   "dirreq-v2-ips" CC=N,CC=N,... NL
+      [At most once.]
+   "dirreq-v3-ips" CC=N,CC=N,... NL
+      [At most once.]
+
+      List of mappings from two-letter country codes to the number of
+      unique IP addresses that have connected from that country to
+      request a v2/v3 network status, rounded up to the nearest multiple
+      of 8. Only those IP addresses are counted that the directory can
+      answer with a 200 OK status code.
+
+   "dirreq-v2-reqs" CC=N,CC=N,... NL
+      [At most once.]
+   "dirreq-v3-reqs" CC=N,CC=N,... NL
+      [At most once.]
+
+      List of mappings from two-letter country codes to the number of
+      requests for v2/v3 network statuses from that country, rounded up
+      to the nearest multiple of 8. Only those requests are counted that
+      the directory can answer with a 200 OK status code.
+
+   "dirreq-v2-share" num% NL
+      [At most once.]
+   "dirreq-v3-share" num% NL
+      [At most once.]
+
+      The share of v2/v3 network status requests that the directory
+      expects to receive from clients based on its advertised bandwidth
+      compared to the overall network bandwidth capacity. Shares are
+      formatted in percent with two decimal places. Shares are
+      calculated as means over the whole 24-hour interval.
+
+   "dirreq-v2-resp" status=num,... NL
+      [At most once.]
+   "dirreq-v3-resp" status=nul,... NL
+      [At most once.]
+
+      List of mappings from response statuses to the number of requests
+      for v2/v3 network statuses that were answered with that response
+      status, rounded up to the nearest multiple of 4. Only response
+      statuses with at least 1 response are reported. New response
+      statuses can be added at any time. The current list of response
+      statuses is as follows:
+
+      "ok": a network status request is answered; this number
+         corresponds to the sum of all requests as reported in
+         "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
+         rounding up.
+      "not-enough-sigs: a version 3 network status is not signed by a
+         sufficient number of requested authorities.
+      "unavailable": a requested network status object is unavailable.
+      "not-found": a requested network status is not found.
+      "not-modified": a network status has not been modified since the
+         If-Modified-Since time that is included in the request.
+      "busy": the directory is busy.
+
+   "dirreq-v2-direct-dl" key=val,... NL
+      [At most once.]
+   "dirreq-v3-direct-dl" key=val,... NL
+      [At most once.]
+   "dirreq-v2-tunneled-dl" key=val,... NL
+      [At most once.]
+   "dirreq-v3-tunneled-dl" key=val,... NL
+      [At most once.]
+
+      List of statistics about possible failures in the download process
+      of v2/v3 network statuses. Requests are either "direct"
+      HTTP-encoded requests over the relay's directory port, or
+      "tunneled" requests using a BEGIN_DIR cell over the relay's OR
+      port. The list of possible statistics can change, and statistics
+      can be left out from reporting. The current list of statistics is
+      as follows:
+
+      Successful downloads and failures:
+
+      "complete": a client has finished the download successfully.
+      "timeout": a download did not finish within 10 minutes after
+         starting to send the response.
+      "running": a download is still running at the end of the
+         measurement period for less than 10 minutes after starting to
+         send the response.
+
+      Download times:
+
+      "min", "max": smallest and largest measured bandwidth in B/s.
+      "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
+         bandwidth in B/s. For a given decile i, i/10 of all downloads
+         had a smaller bandwidth than di, and (10-i)/10 of all downloads
+         had a larger bandwidth than di.
+      "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
+         fourth of all downloads had a smaller bandwidth than q1, one
+         fourth of all downloads had a larger bandwidth than q3, and the
+         remaining half of all downloads had a bandwidth between q1 and
+         q3.
+      "md": median of measured bandwidth in B/s. Half of the downloads
+         had a smaller bandwidth than md, the other half had a larger
+         bandwidth than md.
+
+
+Entry guard statistics:
+
+  Entry guard statistics include the number of clients per country and
+  per day that are connecting directly to an entry guard.
+
+  Entry guard statistics are important to learn more about the
+  distribution of clients to countries. In the future, this knowledge
+  can be useful to detect if there are or start to be any restrictions
+  for clients connecting from specific countries.
+
+  The information which client connects to a given entry guard is very
+  sensitive. This information must not be combined with the information
+  what contents are leaving the network at the exit nodes. Therefore,
+  entry guard statistics need to be aggregated to prevent them from
+  becoming useful for de-anonymization. Aggregation includes resolving
+  IP addresses to country codes, counting events over 24-hour intervals,
+  and rounding up numbers to the next multiple of 8.
+
+   "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
+      [At most once.]
+
+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
+      interval of length NSEC seconds (86400 seconds by default).
+
+      An "entry-stats-end" line, as well as any other "entry-*"
+      line, is first added after the relay has been running for at least
+      24 hours.
+
+   "entry-ips" CC=N,CC=N,... NL
+      [At most once.]
+
+      List of mappings from two-letter country codes to the number of
+      unique IP addresses that have connected from that country to the
+      relay and which are no known other relays, rounded up to the
+      nearest multiple of 8.
+
+
+Cell statistics:
+
+  The third type of statistics have to do with the time that cells spend
+  in circuit queues. In order to gather these statistics, the relay
+  memorizes when it puts a given cell in a circuit queue and when this
+  cell is flushed. The relay further notes the life time of the circuit.
+  These data are sufficient to determine the mean number of cells in a
+  queue over time and the mean time that cells spend in a queue.
+
+  Cell statistics are necessary to learn more about possible reasons for
+  the poor network performance of the Tor network, especially high
+  latencies. The same statistics are also useful to determine the
+  effects of design changes by comparing today's data with future data.
+
+  There are basically no privacy concerns from measuring cell
+  statistics, regardless of a node being an entry, middle, or exit node.
+
+   "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
+      [At most once.]
+
+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
+      interval of length NSEC seconds (86400 seconds by default).
+
+      A "cell-stats-end" line, as well as any other "cell-*" line,
+      is first added after the relay has been running for at least 24
+      hours.
+
+   "cell-processed-cells" num,...,num NL
+      [At most once.]
+
+      Mean number of processed cells per circuit, subdivided into
+      deciles of circuits by the number of cells they have processed in
+      descending order from loudest to quietest circuits.
+
+   "cell-queued-cells" num,...,num NL
+      [At most once.]
+
+      Mean number of cells contained in queues by circuit decile. These
+      means are calculated by 1) determining the mean number of cells in
+      a single circuit between its creation and its termination and 2)
+      calculating the mean for all circuits in a given decile as
+      determined in "cell-processed-cells". Numbers have a precision of
+      two decimal places.
+
+   "cell-time-in-queue" num,...,num NL
+      [At most once.]
+
+      Mean time cells spend in circuit queues in milliseconds. Times are
+      calculated by 1) determining the mean time cells spend in the
+      queue of a single circuit and 2) calculating the mean for all
+      circuits in a given decile as determined in
+      "cell-processed-cells".
+
+   "cell-circuits-per-decile" num NL
+      [At most once.]
+
+      Mean number of circuits that are included in any of the deciles,
+      rounded up to the next integer.
+
+
+Exit statistics:
+
+  The last type of statistics affects exit nodes counting the number of
+  bytes written and read and the number of streams opened per port and
+  per 24 hours. Exit port statistics can be measured from looking of
+  headers of BEGIN and DATA cells. A BEGIN cell contains the exit port
+  that is required for the exit node to open a new exit stream.
+  Subsequent DATA cells coming from the client or being sent back to the
+  client contain a length field stating how many bytes of application
+  data are contained in the cell.
+
+  Exit port statistics are important to measure in order to identify
+  possible load-balancing problems with respect to exit policies. Exit
+  nodes that permit more ports than others are very likely overloaded
+  with traffic for those ports plus traffic for other ports. Improving
+  load balancing in the Tor network improves the overall utilization of
+  bandwidth capacity.
+
+  Exit traffic is one of the most sensitive parts of network data in the
+  Tor network. Even though these statistics do not require looking at
+  traffic contents, statistics are aggregated so that they are not
+  useful for de-anonymizing users. Only those ports are reported that
+  have seen at least 0.1% of exiting or incoming bytes, numbers of bytes
+  are rounded up to full kibibytes (KiB), and stream numbers are rounded
+  up to the next multiple of 4.
+
+   "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
+      [At most once.]
+
+      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
+      interval of length NSEC seconds (86400 seconds by default).
+
+      An "exit-stats-end" line, as well as any other "exit-*" line, is
+      first added after the relay has been running for at least 24 hours
+      and only if the relay permits exiting (where exiting to a single
+      port and IP address is sufficient).
+
+   "exit-kibibytes-written" port=N,port=N,... NL
+      [At most once.]
+   "exit-kibibytes-read" port=N,port=N,... NL
+      [At most once.]
+
+      List of mappings from ports to the number of kibibytes that the
+      relay has written to or read from exit connections to that port,
+      rounded up to the next full kibibyte.
+
+   "exit-streams-opened" port=N,port=N,... NL
+      [At most once.]
+
+      List of mappings from ports to the number of opened exit streams
+      to that port, rounded up to the nearest multiple of 4.
+
+
+Implementation notes:
+
+  Right now, relays that are configured accordingly write similar
+  statistics to those described in this proposal to disk every 24 hours.
+  With this proposal being implemented, relays include the contents of
+  these files in extra-info documents.
+
+  The following steps are necessary to implement this proposal:
+
+  1. The current format of [dirreq|entry|buffer|exit]-stats files needs
+     to be adapted to the description in this proposal. This step
+     basically means renaming keywords.
+
+  2. The timing of writing the four *-stats files should be unified, so
+     that they are written exactly after 24 hours after starting the
+     relay. Right now, the measurement intervals for dirreq, entry, and
+     exit stats starts with the first observed request, and files are
+     written when observing the first request that occurs more than 24
+     hours after the beginning of the measurement interval. With this
+     proposal, the measurement intervals should all start at the same
+     time, and files should be written exactly 24 hours later.
+
+  3. It is advantageous to cache statistics in local files in the data
+     directory until they are included in extra-info documents. The
+     reason is that the 24-hour measurement interval can be very
+     different from the 18-hour publication interval of extra-info
+     documents. When a relay crashed after finishing a measurement
+     interval, but before publishing the next extra-info document,
+     statistics would get lost. Therefore, statistics are written to
+     disk when finishing a measurement interval and read from disk when
+     generating an extra-info document. As a result, the *-stats files
+     need to be overwritten after 24 hours, rather than appending new
+     statistics to them. Further, the contents of the *-stats files need
+     to be checked in the process of generating extra-info documents.
+
+  4. With the statistics patches being tested, the ./configure options
+     should be removed and the statistics code be compiled by default.
+     It is still required for relay operators to add configuration
+     options (DirReqStatistics, ExitPortStatistics, etc.) to enable
+     gathering statistics. However, in the near future, statistics shall
+     be enabled gathered by all relays by default, where requiring a
+     ./configure option would be a barrier for many relay operators.
-- 
1.5.6.5



More information about the tor-commits mailing list