[tor-commits] [metrics-web/master] Move contents from Statistics page to text file.

karsten at torproject.org karsten at torproject.org
Thu Jun 26 14:48:14 UTC 2014


commit 84336ccac18d3a6347d5768e067ca5d9719d917e
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Thu Jun 26 16:06:18 2014 +0200

    Move contents from Statistics page to text file.
    
    The Statistics page is more like a spec, and it's likely only interesting
    for 5% of visitors.  Let's not overwhelm the remaining 95% with something
    they don't care about.
---
 doc/stats-spec.txt             |  264 ++++++++++++++++++++++++++++++++++++++++
 website/web/WEB-INF/banner.jsp |    3 -
 website/web/WEB-INF/error.jsp  |    1 -
 website/web/WEB-INF/stats.jsp  |    6 +
 4 files changed, 270 insertions(+), 4 deletions(-)

diff --git a/doc/stats-spec.txt b/doc/stats-spec.txt
new file mode 100644
index 0000000..a0c45c3
--- /dev/null
+++ b/doc/stats-spec.txt
@@ -0,0 +1,264 @@
+Statistics produced by Tor Metrics
+==================================
+
+Tor Metrics aggregates large amounts of Tor network data and visualizes
+results in customizable graphs and tables.  All aggregated data are also
+available for download, so that people can easily plot their own graphs or
+even develop a prettier metrics website without writing their own data
+aggregation code.  Data formats of aggregate statistics are specified
+below.
+
+Statistics files are available for download at:
+
+  https://metrics.torproject.org/stats/
+
+
+Number of relays and bridges
+----------------------------
+
+Statistics file servers.csv contains the average number of relays and
+bridges in the Tor network.  All averages are calculated per day by
+evaluating the relay and bridge lists published by the directory
+authorities.  Statistics include subsets of relays or bridges by relay
+flag (only relays), country code (only relays, only until February 2013),
+Tor software version (only relays), operating system (only relays), and
+EC2 cloud (only bridges).  The statistics file contains the following
+columns:
+
+ - date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as
+   running.
+
+ - flag: Relay flag assigned by the directory authorities.  Examples are
+   "Exit", "Guard", "Fast", "Stable", and "HSDir".  Relays can have none,
+   some, or all these relay flags assigned.  Relays that don't have the
+   "Running" flag are not included in these statistics regardless of their
+   other flags.  If this column contains the empty string, all running
+   relays are included, regardless of assigned flags.  There are no
+   statistics on the number of bridges by relay flag.
+
+ - country: Two-letter lower-case country code as found in a GeoIP
+   database by resolving the relay's first onion-routing IP address, or
+   "??" if an IP addresses could not be resolved.  If this column contains
+   the empty string, all running relays are included, regardless of their
+   resolved country code.  Statistics on relays by country code are only
+   available until January 31, 2013.  There are no statistics on the
+   number of bridges by country code.
+
+ - version: First three dotted numbers of the Tor software version as
+   reported by the relay.  An example is "0.2.5".  If this column contains
+   the empty string, all running relays are included, regardless of the
+   Tor software version they run.  There are no statistics on the number
+   of bridges by Tor software version.
+
+ - platform: Operating system as reported by the relay.  Examples are
+   "Linux", "Darwin" (Mac OS X), "FreeBSD", "Windows", and "Other".  If
+   this column contains the empty string, all running relays are included,
+   regardless of the operating system they run on.  There are no
+   statistics on the number of bridges by operating system.
+
+ - ec2bridge: Whether bridges are running in the EC2 cloud or not.  More
+   precisely, bridges in the EC2 cloud running an image provided by Tor by
+   default set their nickname to "ec2bridger" plus 8 random hex
+   characters.  This column either contains "t" for bridges matching this
+   naming scheme, or the empty string for all bridges regardless of their
+   nickname.  There are no statistics on the number of relays running in
+   the EC2 cloud.
+
+ - relays: The average number of relays matching the criteria in the
+   previous columns.  If the values in previous columns are specific to
+   bridges only, this column contains the empty string.
+
+ - bridges: The average number of bridges matching the criteria in the
+   previous columns.  If the values in previous columns are specific to
+   relays only, this column contains the empty string.
+
+
+Bandwidth provided and consumed by relays
+-----------------------------------------
+
+Statistics on bandwidth provided and consumed by relays are contained in
+file bandwidth.csv.  This file contains three different bandwidth metrics:
+(1) bandwidth that relays are capable to provide and bandwidth that relays
+report to have consumed, either (2) for any traffic, or (3) only traffic
+from serving directory data.  Relays providing bandwidth statistics are
+categorized by having the "Exit" and "Guard" relay flag, having both, or
+not having either.  The statistics file contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) that relays reported bandwidth data for.
+
+ - isexit: Whether relays included in this line have the "Exit" relay flag
+   or not, which can be "t" or "f".  If this column contains the empty
+   string, bandwidth data from all running relays are included, regardless
+   of assigned relay flags.
+
+ - isguard: Whether relays included in this line have the "Guard" relay
+   flag or not, which can be "t" or "f".  If this column contains the
+   empty string, bandwidth data from all running relays are included,
+   regardless of assigned relay flags.
+
+ - advbw: Total advertised bandwidth in bytes per second that relays are
+   capable to provide.
+
+ - bwread: Total bandwidth in bytes per second that relays have read.
+   This metric includes any kind of traffic.
+
+ - bwwrite: Similar to bwread, but for traffic written by relays.
+
+ - dirread: Bandwidth in bytes per second that relays have read when
+   serving directory data.  Not all relays report how many bytes they read
+   when serving directory data which is why this value is an estimate from
+   the available data.  This metric is not available for subsets of relays
+   with certain relay flags, so that this column will contain the empty
+   string if either isexit or isguard is non-empty.
+
+ - dirwrite: Similar to dirread, but for traffic written by relays when
+   serving directory data.
+
+
+Advertised bandwidth distribution and n-th fastest relays
+---------------------------------------------------------
+
+Statistics file advbwdist.csv contains statistics on the advertised
+bandwidth of relays in the network.  These statistics include advertised
+bandwidth percentiles and advertised bandwidth values of the n-th fastest
+relays.  The statistics file contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) when relays have been listed as running.
+
+ - isexit: Whether relays included in this line have the "Exit" relay
+   flag, which would be indicated as "t".  If this column contains the
+   empty string, advertised bandwidths from all running relays are
+   included, regardless of assigned relay flags.
+
+ - relay: Position of the relay in an ordered list of all advertised
+   bandwidths, starting at 1 for the fastest relay in the network.  May be
+   the empty string if this line contains advertised bandwidth by
+   percentile.
+
+ - percentile: Advertised bandwidth percentile given in this line.  May be
+   the empty string if this line contains advertised bandwidth by fastest
+   relays.
+
+ - advbw: Advertised bandwidth in B/s.
+
+
+Estimated number of clients in the Tor network
+----------------------------------------------
+
+Statistics file clients.csv contains estimates on the number of clients in
+the Tor network.  These estimates are based on the number of directory
+requests counted on directory mirrors and bridges.  Statistics are
+available for clients connecting directly to the Tor network and clients
+connecting via bridges.  For relays, there exist statistics on the number
+of clients by country, and for bridges, statistics are available by
+country, by transport, and by IP version.  Statistics further include
+expected client numbers from past observations which can be used to detect
+censorship or release of censorship.  The statistics file contains the
+following columns:
+
+ - date: UTC date (YYYY-MM-DD) for which client numbers are estimated.
+
+ - node: The node type to which clients connect first, which can be either
+   "relay" or "bridge".
+
+ - country: Two-letter lower-case country code as found in a GeoIP
+   database by resolving clients' IP addresses, or "??" if client IP
+   addresses could not be resolved.  If this column contains the empty
+   string, all clients are included, regardless of their country code.
+
+ - transport: Transport name used by clients to connect to the Tor network
+   using bridges.  Examples are "obfs2", "obfs3", "websocket", or "<OR>"
+   (original onion routing protocol).  If this column contains the empty
+   string, all clients are included, regardless of their transport.  There
+   are no statistics on the number of clients by transport that connect to
+   the Tor network via relays.
+
+ - version: IP version used by clients to connect to the Tor network using
+   bridges.  Examples are "v4" and "v6".  If this column contains the
+   empty string, all clients are included, regardless of their IP version.
+   There are no statistics on the number of clients by IP version that
+   connect directly to the Tor network using relays.
+
+ - lower: Lower number of expected clients under the assumption that there
+   has been no censorship event.  If this column contains the empty
+   string, there are no expectations on the number of clients.
+
+ - upper: Upper number of expected clients under the assumption that there
+   has been no release of censorship.  If this column contains the empty
+   string, there are no expectations on the number of clients.
+
+ - clients: Estimated number of clients.
+
+ - frac: Fraction of relays or bridges in percent that the estimate is
+   based on.  The higher this value, the more reliable is the estimate.
+   Values above 50 can be considered reliable enough for most purposes,
+   lower values should be handled with more care.
+
+
+Performance of downloading static files over Tor
+------------------------------------------------
+
+Statistics file torperf.csv contains aggregate statistics on download
+performance over time.  These statistics come from the Torperf service
+that periodically downloads static files over Tor.  The statistics file
+contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) when download performance was measured.
+
+ - size: Size of the downloaded file in bytes.
+
+ - source: Name of the Torperf service performing measurements.  If this
+   column contains the empty string, all measurements are included,
+   regardless of which Torperf service performed them.  Examples are
+   "moria", "siv", and "torperf".
+
+ - q1: First quartile of time until receiving the last byte in
+   milliseconds.
+
+ - md: Median of time until receiving the last byte in milliseconds.
+
+ - q3: Third quartile of time until receiving the last byte in
+   milliseconds.
+
+ - timeouts: Number of timeouts that occurred when attempting to download
+   the static file over Tor.
+
+ - failures: Number of failures that occurred when attempting to download
+   the static file over Tor.
+
+ - requests: Total number of requests made to download the static file
+   over Tor.
+
+
+Fraction of connections used uni-/bidirectionally
+-------------------------------------------------
+
+Statistics file connbidirect.csv contains statistics on the fraction of
+connections that is used uni- or bidirectionally.  Every 10 seconds,
+relays determine for every connection whether they read and wrote less
+than a threshold of 20 KiB.  For the remaining connections, relays report
+whether they read/wrote at least 10 times as many bytes as they
+wrote/read.  If so, they classify a connection as "mostly reading" or
+"mostly writing," respectively.  All other connections are classified as
+"both reading and writing."  After classifying connections, read and write
+counters are reset for the next 10-second interval.  Statistics are
+aggregated over 24 hours.  The statistics file contains the following
+columns:
+
+ - date: UTC date (YYYY-MM-DD) for which statistics on uni-/bidirectional
+   connection usage were reported.
+
+ - source: Fingerprint of the relay reporting statistics.
+
+ - below: Number of 10-second intervals of connections with less than
+   20 KiB read and written data.
+
+ - read: Number of 10-second intervals of connections with 10 times as
+   many read bytes as written bytes.
+
+ - write: Number of 10-second intervals of connections with 10 times as
+   many written bytes as read bytes.
+
+ - both: Number of 10-second intervals of connections with less than
+   10 times as many written or read bytes as in the other direction.
+
diff --git a/website/web/WEB-INF/banner.jsp b/website/web/WEB-INF/banner.jsp
index 2b27632..3a3cf5d 100644
--- a/website/web/WEB-INF/banner.jsp
+++ b/website/web/WEB-INF/banner.jsp
@@ -20,9 +20,6 @@
     <a <% if (currentPage.endsWith("performance.jsp")) {
         %>class="current"<%} else {%>href="/performance.html"<%}
         %>>Performance</a>
-    <a <% if (currentPage.endsWith("stats.jsp")) {
-        %>class="current"<%} else {%>href="/stats.html"<%}
-        %>>Statistics</a>
   </td>
   <td class="banner-right"></td>
 </tr>
diff --git a/website/web/WEB-INF/error.jsp b/website/web/WEB-INF/error.jsp
index e6f1e71..bd6d442 100644
--- a/website/web/WEB-INF/error.jsp
+++ b/website/web/WEB-INF/error.jsp
@@ -46,7 +46,6 @@ Maybe you find what you're looking for on our sitemap:
 <li><a href="bubbles.html">Diversity</a></li>
 <li><a href="users.html">Users</a></li>
 <li><a href="performance.html">Performance</a></li>
-<li><a href="stats.html">Statistics</a></li>
 </ul>
 </p>
 
diff --git a/website/web/WEB-INF/stats.jsp b/website/web/WEB-INF/stats.jsp
index d708910..005235e 100644
--- a/website/web/WEB-INF/stats.jsp
+++ b/website/web/WEB-INF/stats.jsp
@@ -13,6 +13,12 @@
 <h2>Tor Metrics: Statistics</h2>
 <br>
 
+<p><font color="red"><b>Notice:</b> The specification on this page has
+moved
+<a href="https://gitweb.torproject.org/metrics-web.git/blob/HEAD:/doc/stats-spec.txt">here</a>.
+This page will be removed after July 26, 2014.</font>
+</p>
+
 <p>Tor Metrics aggregates large amounts of Tor network
 <a href="data.html">data</a> and visualizes results in customizable
 <a href="graphs.html">graphs</a> and tables.





More information about the tor-commits mailing list