commit 84336ccac18d3a6347d5768e067ca5d9719d917e Author: Karsten Loesing karsten.loesing@gmx.net Date: Thu Jun 26 16:06:18 2014 +0200
Move contents from Statistics page to text file.
The Statistics page is more like a spec, and it's likely only interesting for 5% of visitors. Let's not overwhelm the remaining 95% with something they don't care about. --- doc/stats-spec.txt | 264 ++++++++++++++++++++++++++++++++++++++++ website/web/WEB-INF/banner.jsp | 3 - website/web/WEB-INF/error.jsp | 1 - website/web/WEB-INF/stats.jsp | 6 + 4 files changed, 270 insertions(+), 4 deletions(-)
diff --git a/doc/stats-spec.txt b/doc/stats-spec.txt new file mode 100644 index 0000000..a0c45c3 --- /dev/null +++ b/doc/stats-spec.txt @@ -0,0 +1,264 @@ +Statistics produced by Tor Metrics +================================== + +Tor Metrics aggregates large amounts of Tor network data and visualizes +results in customizable graphs and tables. All aggregated data are also +available for download, so that people can easily plot their own graphs or +even develop a prettier metrics website without writing their own data +aggregation code. Data formats of aggregate statistics are specified +below. + +Statistics files are available for download at: + + https://metrics.torproject.org/stats/ + + +Number of relays and bridges +---------------------------- + +Statistics file servers.csv contains the average number of relays and +bridges in the Tor network. All averages are calculated per day by +evaluating the relay and bridge lists published by the directory +authorities. Statistics include subsets of relays or bridges by relay +flag (only relays), country code (only relays, only until February 2013), +Tor software version (only relays), operating system (only relays), and +EC2 cloud (only bridges). The statistics file contains the following +columns: + + - date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as + running. + + - flag: Relay flag assigned by the directory authorities. Examples are + "Exit", "Guard", "Fast", "Stable", and "HSDir". Relays can have none, + some, or all these relay flags assigned. Relays that don't have the + "Running" flag are not included in these statistics regardless of their + other flags. If this column contains the empty string, all running + relays are included, regardless of assigned flags. There are no + statistics on the number of bridges by relay flag. + + - country: Two-letter lower-case country code as found in a GeoIP + database by resolving the relay's first onion-routing IP address, or + "??" if an IP addresses could not be resolved. If this column contains + the empty string, all running relays are included, regardless of their + resolved country code. Statistics on relays by country code are only + available until January 31, 2013. There are no statistics on the + number of bridges by country code. + + - version: First three dotted numbers of the Tor software version as + reported by the relay. An example is "0.2.5". If this column contains + the empty string, all running relays are included, regardless of the + Tor software version they run. There are no statistics on the number + of bridges by Tor software version. + + - platform: Operating system as reported by the relay. Examples are + "Linux", "Darwin" (Mac OS X), "FreeBSD", "Windows", and "Other". If + this column contains the empty string, all running relays are included, + regardless of the operating system they run on. There are no + statistics on the number of bridges by operating system. + + - ec2bridge: Whether bridges are running in the EC2 cloud or not. More + precisely, bridges in the EC2 cloud running an image provided by Tor by + default set their nickname to "ec2bridger" plus 8 random hex + characters. This column either contains "t" for bridges matching this + naming scheme, or the empty string for all bridges regardless of their + nickname. There are no statistics on the number of relays running in + the EC2 cloud. + + - relays: The average number of relays matching the criteria in the + previous columns. If the values in previous columns are specific to + bridges only, this column contains the empty string. + + - bridges: The average number of bridges matching the criteria in the + previous columns. If the values in previous columns are specific to + relays only, this column contains the empty string. + + +Bandwidth provided and consumed by relays +----------------------------------------- + +Statistics on bandwidth provided and consumed by relays are contained in +file bandwidth.csv. This file contains three different bandwidth metrics: +(1) bandwidth that relays are capable to provide and bandwidth that relays +report to have consumed, either (2) for any traffic, or (3) only traffic +from serving directory data. Relays providing bandwidth statistics are +categorized by having the "Exit" and "Guard" relay flag, having both, or +not having either. The statistics file contains the following columns: + + - date: UTC date (YYYY-MM-DD) that relays reported bandwidth data for. + + - isexit: Whether relays included in this line have the "Exit" relay flag + or not, which can be "t" or "f". If this column contains the empty + string, bandwidth data from all running relays are included, regardless + of assigned relay flags. + + - isguard: Whether relays included in this line have the "Guard" relay + flag or not, which can be "t" or "f". If this column contains the + empty string, bandwidth data from all running relays are included, + regardless of assigned relay flags. + + - advbw: Total advertised bandwidth in bytes per second that relays are + capable to provide. + + - bwread: Total bandwidth in bytes per second that relays have read. + This metric includes any kind of traffic. + + - bwwrite: Similar to bwread, but for traffic written by relays. + + - dirread: Bandwidth in bytes per second that relays have read when + serving directory data. Not all relays report how many bytes they read + when serving directory data which is why this value is an estimate from + the available data. This metric is not available for subsets of relays + with certain relay flags, so that this column will contain the empty + string if either isexit or isguard is non-empty. + + - dirwrite: Similar to dirread, but for traffic written by relays when + serving directory data. + + +Advertised bandwidth distribution and n-th fastest relays +--------------------------------------------------------- + +Statistics file advbwdist.csv contains statistics on the advertised +bandwidth of relays in the network. These statistics include advertised +bandwidth percentiles and advertised bandwidth values of the n-th fastest +relays. The statistics file contains the following columns: + + - date: UTC date (YYYY-MM-DD) when relays have been listed as running. + + - isexit: Whether relays included in this line have the "Exit" relay + flag, which would be indicated as "t". If this column contains the + empty string, advertised bandwidths from all running relays are + included, regardless of assigned relay flags. + + - relay: Position of the relay in an ordered list of all advertised + bandwidths, starting at 1 for the fastest relay in the network. May be + the empty string if this line contains advertised bandwidth by + percentile. + + - percentile: Advertised bandwidth percentile given in this line. May be + the empty string if this line contains advertised bandwidth by fastest + relays. + + - advbw: Advertised bandwidth in B/s. + + +Estimated number of clients in the Tor network +---------------------------------------------- + +Statistics file clients.csv contains estimates on the number of clients in +the Tor network. These estimates are based on the number of directory +requests counted on directory mirrors and bridges. Statistics are +available for clients connecting directly to the Tor network and clients +connecting via bridges. For relays, there exist statistics on the number +of clients by country, and for bridges, statistics are available by +country, by transport, and by IP version. Statistics further include +expected client numbers from past observations which can be used to detect +censorship or release of censorship. The statistics file contains the +following columns: + + - date: UTC date (YYYY-MM-DD) for which client numbers are estimated. + + - node: The node type to which clients connect first, which can be either + "relay" or "bridge". + + - country: Two-letter lower-case country code as found in a GeoIP + database by resolving clients' IP addresses, or "??" if client IP + addresses could not be resolved. If this column contains the empty + string, all clients are included, regardless of their country code. + + - transport: Transport name used by clients to connect to the Tor network + using bridges. Examples are "obfs2", "obfs3", "websocket", or "<OR>" + (original onion routing protocol). If this column contains the empty + string, all clients are included, regardless of their transport. There + are no statistics on the number of clients by transport that connect to + the Tor network via relays. + + - version: IP version used by clients to connect to the Tor network using + bridges. Examples are "v4" and "v6". If this column contains the + empty string, all clients are included, regardless of their IP version. + There are no statistics on the number of clients by IP version that + connect directly to the Tor network using relays. + + - lower: Lower number of expected clients under the assumption that there + has been no censorship event. If this column contains the empty + string, there are no expectations on the number of clients. + + - upper: Upper number of expected clients under the assumption that there + has been no release of censorship. If this column contains the empty + string, there are no expectations on the number of clients. + + - clients: Estimated number of clients. + + - frac: Fraction of relays or bridges in percent that the estimate is + based on. The higher this value, the more reliable is the estimate. + Values above 50 can be considered reliable enough for most purposes, + lower values should be handled with more care. + + +Performance of downloading static files over Tor +------------------------------------------------ + +Statistics file torperf.csv contains aggregate statistics on download +performance over time. These statistics come from the Torperf service +that periodically downloads static files over Tor. The statistics file +contains the following columns: + + - date: UTC date (YYYY-MM-DD) when download performance was measured. + + - size: Size of the downloaded file in bytes. + + - source: Name of the Torperf service performing measurements. If this + column contains the empty string, all measurements are included, + regardless of which Torperf service performed them. Examples are + "moria", "siv", and "torperf". + + - q1: First quartile of time until receiving the last byte in + milliseconds. + + - md: Median of time until receiving the last byte in milliseconds. + + - q3: Third quartile of time until receiving the last byte in + milliseconds. + + - timeouts: Number of timeouts that occurred when attempting to download + the static file over Tor. + + - failures: Number of failures that occurred when attempting to download + the static file over Tor. + + - requests: Total number of requests made to download the static file + over Tor. + + +Fraction of connections used uni-/bidirectionally +------------------------------------------------- + +Statistics file connbidirect.csv contains statistics on the fraction of +connections that is used uni- or bidirectionally. Every 10 seconds, +relays determine for every connection whether they read and wrote less +than a threshold of 20 KiB. For the remaining connections, relays report +whether they read/wrote at least 10 times as many bytes as they +wrote/read. If so, they classify a connection as "mostly reading" or +"mostly writing," respectively. All other connections are classified as +"both reading and writing." After classifying connections, read and write +counters are reset for the next 10-second interval. Statistics are +aggregated over 24 hours. The statistics file contains the following +columns: + + - date: UTC date (YYYY-MM-DD) for which statistics on uni-/bidirectional + connection usage were reported. + + - source: Fingerprint of the relay reporting statistics. + + - below: Number of 10-second intervals of connections with less than + 20 KiB read and written data. + + - read: Number of 10-second intervals of connections with 10 times as + many read bytes as written bytes. + + - write: Number of 10-second intervals of connections with 10 times as + many written bytes as read bytes. + + - both: Number of 10-second intervals of connections with less than + 10 times as many written or read bytes as in the other direction. + diff --git a/website/web/WEB-INF/banner.jsp b/website/web/WEB-INF/banner.jsp index 2b27632..3a3cf5d 100644 --- a/website/web/WEB-INF/banner.jsp +++ b/website/web/WEB-INF/banner.jsp @@ -20,9 +20,6 @@ <a <% if (currentPage.endsWith("performance.jsp")) { %>class="current"<%} else {%>href="/performance.html"<%} %>>Performance</a> - <a <% if (currentPage.endsWith("stats.jsp")) { - %>class="current"<%} else {%>href="/stats.html"<%} - %>>Statistics</a> </td> <td class="banner-right"></td> </tr> diff --git a/website/web/WEB-INF/error.jsp b/website/web/WEB-INF/error.jsp index e6f1e71..bd6d442 100644 --- a/website/web/WEB-INF/error.jsp +++ b/website/web/WEB-INF/error.jsp @@ -46,7 +46,6 @@ Maybe you find what you're looking for on our sitemap: <li><a href="bubbles.html">Diversity</a></li> <li><a href="users.html">Users</a></li> <li><a href="performance.html">Performance</a></li> -<li><a href="stats.html">Statistics</a></li> </ul> </p>
diff --git a/website/web/WEB-INF/stats.jsp b/website/web/WEB-INF/stats.jsp index d708910..005235e 100644 --- a/website/web/WEB-INF/stats.jsp +++ b/website/web/WEB-INF/stats.jsp @@ -13,6 +13,12 @@ <h2>Tor Metrics: Statistics</h2> <br>
+<p><font color="red"><b>Notice:</b> The specification on this page has +moved +<a href="https://gitweb.torproject.org/metrics-web.git/blob/HEAD:/doc/stats-spec.txt">here</a>. +This page will be removed after July 26, 2014.</font> +</p> + <p>Tor Metrics aggregates large amounts of Tor network <a href="data.html">data</a> and visualizes results in customizable <a href="graphs.html">graphs</a> and tables.
tor-commits@lists.torproject.org