commit 447e394f0c69124ee8254f55e3762e1b1e4efd73 Author: Karsten Loesing karsten.loesing@gmx.net Date: Wed Jun 4 17:19:42 2014 +0200
Extend new CollecTor homepage.
At this point, everything relevant from the metrics website should be included. --- web/css/style.css | 31 +-- web/formats.html | 551 +++++++++++++++++++++++++++++++++++++++++++++++++++++ web/index.html | 151 +++++++++++---- 3 files changed, 673 insertions(+), 60 deletions(-)
diff --git a/web/css/style.css b/web/css/style.css index 9978a5d..344d8e6 100644 --- a/web/css/style.css +++ b/web/css/style.css @@ -1,38 +1,13 @@ body { font-family: "Open Sans","lucida grande","Segoe UI",arial,verdana, "lucida sans unicode",tahoma,sans-serif; background: #fafafa; font-size: 13px; line-height: 22px; color: #222; } -h1 { font-size: 20px; font-weight: normal; text-align: center; } -h3 { color: #7D4698; position: relative } a { color: #7D4698; text-decoration: none; font-weight: bold; } -ul { list-style: none; padding: 0; margin: 0; } p { margin: 0; padding: 10px; } a[name] { padding: 0; margin: 0; } .box { max-width: 850px; width: 100%; margin: 0 auto 30px auto; padding-bottom: 30px; background: white; border: 1px solid #eee; } .box > * { margin-left: 30px; margin-right: 30px; } -.box h3 a { visibility: hidden; } -.box:hover h3 a { visibility: visible; } -.api-request { border-bottom: 1px solid #eee; position: relative } -.request-url, .request-type, .request-response { padding: 8px 10px; - vertical-align: middle } -.request-type { color: #57145F; display: inline-block; } -.request-url { color: #333; font-size: 18px; } -.request-response { position: absolute; color: #666; right: 0; } -h3 .request-response { padding: 0 !important; } -.api-urls>li:last-child { border-bottom: 0; } -.required-true, .required-false, .typeof { display: inline-block; - vertical-align: middle; padding: 5px 10px; } -.required-true { color: #1d7508; } -.required-false { color: #aaa; } -.properties { margin-top: 10px; margin-bottom: 10px; - border: 1px solid #eee; } -.properties li { padding: 5px 0; } -.properties li ul { border: 1px solid #eee; margin: 10px 10px 10px 40px; - background: white; } -.properties .properties { margin-left: 10px; } -.properties li:nth-child(even) { background: #fafafa; } -.properties p { padding: 10px 15px; } -.properties b { padding: 5px 10px; display: inline-block; - vertical-align: middle; } -.api-urls{ margin-top: 30px; margin-bottom: 30px; } +.box h2 a { visibility: hidden; } +.box:hover h2 a { visibility: visible; } +h3 .type-annotation { float: right; color: #666; }
diff --git a/web/formats.html b/web/formats.html new file mode 100644 index 0000000..68fddee --- /dev/null +++ b/web/formats.html @@ -0,0 +1,551 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> +<html> +<head> +<title>CollecTor — What is in the data?</title> +<link href="css/style.css" type="text/css" rel="stylesheet"> +<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> +<link href="favicon.ico" type="image/x-icon" rel="shortcut icon"> +</head> +<body> + +<div class="box"> + +<h1><a href="index.html">CollecTor</a> —</h1> +<h2>What is in the data?</h2> + +<p> +The Tor network data provided here comes from five different sources which +are explained in more detail on this page. +You may either read through the entire page or jump to the type of data +you're most interested in: + +<ul> +<li><a href="#relay-descriptors">Tor relay descriptors</a></li> +<li><a href="#bridge-descriptors">Tor bridge descriptors</a></li> +<li><a href="#bridge-pool-assignments">BridgeDB's bridge pool +assignments</a></li> +<li><a href="#exit-lists">TorDNSEL's exit lists</a></li> +<li><a href="#torperf">Torperf's performance data</a></li> +</ul> + +<p> +Each descriptor provided here contains an <tt>@type</tt> annotation using +the format <tt>@type $descriptortype $major.$minor</tt>. +Any tool that processes these descriptors may parse files without meta +data or with an unknown descriptor type at its own risk, can safely parse +files with known descriptor type and same major version number, and should +not parse files with known descriptor type and higher major version +number. +</p> + +</div> <!-- box --> + +<div class="box"> + +<a name="relay-descriptors"></a> +<h2>Tor relay descriptors <a href="#relay-descriptors">#</a></h2> + +<p> +Relays and directory authorities publish relay descriptors, so that +clients can select relays for their paths through the Tor network. +All these relay descriptors are specified in the +<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt">Tor +directory protocol, version 3</a> specification document (or in the +earlier protocol versions +<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec-v2.txt">2</a> or +<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/attic/dir-spec-v1.txt">1</a>). +This page shall give a quick overview of what relay descriptors are +available. +</p> + +<h3>Server descriptors +(<a href="archive/relay-descriptors/server-descriptors/">archive</a>, +<a href="recent/relay-descriptors/server-descriptors/">recent</a>) +<span class="type-annotation"><tt>@type server-descriptor 1.0</tt></span> +</h3> + +<p> +Server descriptors contain information that relays publish about +themselves. +Tor clients once downloaded this information, but now they use +microdescriptors instead. +The server descriptors in +<a href="archive/relay-descriptors/server-descriptors/">archive</a> +contain one descriptor per file, whereas the files in +<a href="recent/relay-descriptors/server-descriptors/">recent</a> +contain all descriptors collected in an hour concatenated into a single +file. +</p> + +<h3>Extra-info descriptors +(<a href="archive/relay-descriptors/extra-infos/">archive</a>, +<a href="recent/relay-descriptors/extra-infos/">recent</a>) +<span class="type-annotation"><tt>@type extra-info 1.0</tt></span> +</h3> + +<p> +Extra-info descriptors contain relay information that Tor clients do not +need in order to function. +This is self-published, like server descriptors, but not downloaded by +clients by default. +The extra-info descriptors in +<a href="archive/relay-descriptors/extra-infos/">archive</a> +contain one descriptor per file, whereas the files in +<a href="recent/relay-descriptors/extra-infos/">recent</a> +contain all descriptors collected in an hour concatenated into a single +file. +</p> + +<h3>Network status consensuses +(<a href="archive/relay-descriptors/consensuses/">archive</a>, +<a href="recent/relay-descriptors/consensuses/">recent</a>) +<span class="type-annotation"><tt>@type network-status-consensus-3 +1.0</tt></span> +</h3> + +<p> +Though Tor relays are decentralized, the directories that track the +overall network are not. +These central points are called directory authorities, and every hour they +publish a document called a consensus, or network status document. +The consensus in turn is made up of router status entries containing +flags, heuristics used for relay selection, etc. +</p> + +<h3>Network status votes +(<a href="archive/relay-descriptors/votes/">archive</a>, +<a href="recent/relay-descriptors/votes/">recent</a>) +<span class="type-annotation"><tt>@type network-status-vote-3 +1.0</tt></span> +</h3> + +<p> +The directory authorities exchange votes every hour to come up with a +common consensus. +Vote documents are by far the largest documents provided here. +</p> + +<h3>Directory key certificates +(<a href="archive/relay-descriptors/certs.xz">archive</a>) +<span class="type-annotation"><tt>@type dir-key-certificate-3 +1.0</tt></span> +</h3> + +<p> +The directory authorities sign their votes and the consensus with their +key that they publish in a key certificate. +These key certificates change once every few months, so they are only +available in the +<a href="archive/relay-descriptors/certs.xz">archive</a>. +</p> + +<h3>Microdescriptor consensuses +(<a href="archive/relay-descriptors/microdescs/">archive</a>, +<a href="recent/relay-descriptors/microdescs/">recent</a>) +<span class="type-annotation"><tt>@type +network-status-microdesc-consensus-3 1.0</tt></span> +</h3> + +<p> +Tor clients used to download all server descriptors of active relays, but +now they only download the smaller microdescriptors which are derived from +server descriptors. +The microdescriptor consensus lists all active relays and references their +currently used microdescriptor. +The tarballs in +<a href="archive/relay-descriptors/microdescs/">archive</a> +contain both microdescriptor consensuses and referenced microdescriptors +together. +</p> + +<h3>Microdescriptors +(<a href="archive/relay-descriptors/microdescs/">archive</a>, +<a href="recent/relay-descriptors/microdescs/">recent</a>) +<span class="type-annotation"><tt>@type microdescriptor 1.0</tt></span> +</h3> + +<p> +Microdescriptors are minimalistic documents that just includes the +information necessary for Tor clients to work. +The tarballs in +<a href="archive/relay-descriptors/microdescs/">archive</a> +contain both microdescriptor consensuses and referenced microdescriptors +together. +The microdescriptors in +<a href="archive/relay-descriptors/microdescs/">archive</a> +contain one descriptor per file, whereas the files in +<a href="recent/relay-descriptors/microdescs/">recent</a> +contain all descriptors collected in an hour concatenated into a single +file. +</p> + +<h3>Version 2 network statuses +(<a href="archive/relay-descriptors/statuses/">archive</a>) +<span class="type-annotation"><tt>@type network-status-2 1.0</tt></span> +</h3> + +<p> +Version 2 network statuses have been published by the directory +authorities before consensuses have been introduced. +In contrast to consensuses, each directory authority published their own +authoritative view on the network, and clients combined these documents +locally. +We stopped archiving version 2 network statuses in 2012. +</p> + +<h3>Version 1 directories +(<a href="archive/relay-descriptors/tor/">archive</a>) +<span class="type-annotation"><tt>@type directory 1.0</tt></span> +</h3> + +<p> +The first directory protocol version combined the list of active relays +with server descriptors in a single directory document. +We stopped archiving version 1 directories in 2007. +</p> + +</div> <!-- box --> + +<div class="box"> + +<a name="bridge-descriptors"></a> +<h2>Tor bridge descriptors <a href="#bridge-descriptors">#</a></h2> + +<p> +Bridges and the bridge authority publish bridge descriptors that are used +by censored clients to connect to the Tor network. +We cannot, however, make bridge descriptors available as we do with relay +descriptors, because that would defeat the purpose of making bridges hard +to enumerate for censors. +We therefore sanitize bridge descriptors by removing all potentially +identifying information and publish sanitized versions here. +The sanitizing steps are as follows: +</p> + +<ol> +<li><b>Replace the bridge identity with its SHA1 value:</b> Clients +can request a bridge's current descriptor by sending its identity string +to the bridge authority. +This is a feature to make bridges on dynamic IP addresses useful. +Therefore, the original identities (and anything that could be used to +derive them) need to be removed from the descriptors. +The bridge identity is replaced with its SHA1 hash value. +The idea is to have a consistent replacement that remains stable over +months or even years (without keeping a secret for a keyed hash +function).</li> +<li><b>Remove all cryptographic keys and signatures:</b> It would be +straightforward to learn about the bridge identity from the bridge's +public key. +Replacing keys by newly generated ones seemed to be unnecessary (and would +involve keeping a state over months/years), so that all cryptographic +objects have simply been removed.</li> +<li><b>Replace IP address with IP address hash:</b> Of course, IP +addresses need to be sanitized, too. +<ul><li>IPv4 addresses are replaced with <tt>10.x.x.x</tt> with +<tt>x.x.x</tt> being the 3 byte output of +<tt>H(IP address | bridge identity | secret)[:3]</tt>. +The input <tt>IP address</tt> is the 4-byte long binary representation of +the bridge's current IP address. +The <tt>bridge identity</tt> is the 20-byte long binary representation of +the bridge's long-term identity fingerprint. +The <tt>secret</tt> is a 31-byte long secure random string that changes +once per month for all descriptors and statuses published in that month. +<tt>H()</tt> is SHA-256. +The <tt>[:3]</tt> operator means that we pick the 3 most significant bytes +of the result.</li> +<li>IPv6 addresses are replaced with <tt>[fd9f:2e19:3bcf::xx:xxxx]</tt> +with <tt>xx:xxxx</tt> being the hex-formatted 3 byte output of a similar +hash function as described for IPv4 addresses. +The only differences are that the input <tt>IP address</tt> is 16 bytes +long and the <tt>secret</tt> is only 19 bytes long.</li></ul> +<li><b>Replace contact information:</b> If there is contact information in +a descriptor, the contact line is changed to +<tt>somebody</tt>.</li> +<li><b>Remove pluggable transport addresses and arguments:</b> Bridges may +provide transports in addition to the onion-routing protocol and include +information about these transports in their extra-info descriptors for +BridgeDB. +In that case, any IP addresses, TCP ports, or additional arguments are +removed, only leaving in the supported transport names.</li> +<li><b>Append descriptor digest:</b> Descriptors are often referenced by +their digest, but that is not possible anymore once their content is +changed. +As a workaround, sanitized descriptors may contain a new line +<tt>router-digest</tt> with the hex representation of the SHA-1 of the +original descriptor digest. +</ol> + +<h3>Network statuses +(<a href="archive/bridge-descriptors/">archive</a>, +<a href="recent/bridge-descriptors/statuses/">recent</a>) +<span class="type-annotation"><tt>@type bridge-network-status +1.0</tt></span> +</h3> + +<p> +Sanitized bridge network statuses are similar to version 2 relay network +statuses, but with only a <tt>published</tt> line in the header and +without any lines in the footer. +The tarballs in +<a href="archive/bridge-descriptors/">archive</a> contain all bridge +descriptors of a given month, not just network statuses. +</p> + +<h3>Server descriptors +(<a href="archive/bridge-descriptors/">archive</a>, +<a href="recent/bridge-descriptors/server-descriptors/">recent</a>) +<span class="type-annotation"><tt>@type bridge-server-descriptor +1.0</tt></span> +</h3> + +<p> +Bridge server descriptors follow the same format as relay server +descriptors, except for the sanitizing steps described above. +The tarballs in +<a href="archive/bridge-descriptors/">archive</a> contain all bridge +descriptors of a given month, not just server descriptors. +These tarballs contain one descriptor per file, whereas the +files in +<a href="recent/bridge-descriptors/server-descriptors/">recent</a> +contain all descriptors collected in an hour concatenated into a single +file to reduce the number of files. +</p> + +<h3>Extra-info descriptors +(<a href="archive/bridge-descriptors/">archive</a>, +<a href="recent/bridge-descriptors/extra-infos/">recent</a>) +<span class="type-annotation"><tt>@type bridge-extra-info 1.2</tt></span> +</h3> + +<p> +Bridge server descriptors follow the same format as relay server +descriptors, except for the sanitizing steps described above. +The format has changed over time to accomodate changes to the sanitizing +process, with earlier versions being: +</p> + +<ul> +<li><font color="#666"><tt>@type bridge-extra-info 1.0</tt> was the first +version.</font></li> +<li><font color="#666"><tt>@type bridge-extra-info 1.1</tt> added +sanitized <tt>transport</tt> lines</font>.</li> +<li><tt>@type bridge-extra-info 1.2</tt> added <tt>ntor-onion-key</tt> +lines.</li> +</ul> + +<p> +The tarballs in +<a href="archive/bridge-descriptors/">archive</a> contain all bridge +descriptors of a given month, not just extra-info descriptors. +These tarballs contain one descriptor per file, whereas the +files in +<a href="recent/bridge-descriptors/extra-infos/">recent</a> +contain all descriptors collected in an hour concatenated into a single +file to reduce the number of files. +</p> + +</div> <!-- box --> + +<div class="box"> + +<a name="bridge-pool-assignments"></a> +<h2>BridgeDB's bridge pool assignments +<a href="#bridge-pool-assignments">#</a></h2> + +<p> +The bridge distribution service BridgeDB publishes bridge pool assignments +describing which bridges it has assigned to which distribution pool. +BridgeDB receives bridge network statuses from the bridge authority, +assigns these bridges to persistent distribution rings, and hands them out +to bridge users. +BridgeDB periodically dumps the list of running bridges with information +about the rings, subrings, and file buckets to which they are assigned to +a local file. +The sanitized versions of these lists containing SHA-1 hashes of bridge +fingerprints instead of the original fingerprints are available for +statistical analysis. +</p> + +<h3>Bridge pool assignments +(<a href="archive/bridge-pool-assignments/">archive</a>, +<a href="recent/bridge-pool-assignments/">recent</a>) +<span class="type-annotation"><tt>@type bridge-pool-assignment +1.0</tt></span> +</h3> + +<p> +The document below shows a BridgeDB pool assignment file +from March 13, 2011. +Every such file begins with a line containing the timestamp when BridgeDB +wrote this file. +Subsequent lines start with the SHA-1 hash of a bridge fingerprint, +followed by ring, subring, and/or file bucket information. +There are currently three distributor ring types in BridgeDB: +</p> + +<ol> +<li><b>unallocated:</b> These bridges are not distributed by BridgeDB, +but are either reserved for manual distribution or are written to file +buckets for distribution via an external tool. +If a bridge in the <tt>unallocated</tt> ring is assigned to a file bucket, +this is noted by <tt>bucket=$bucketname</tt>.</li> +<li><b>email:</b> These bridges are distributed via an e-mail +autoresponder. Bridges can be assigned to subrings by their OR port or +relay flag which is defined by <tt>port=$port</tt> and/or <tt>flag=$flag</tt>. +</li> +<li><b>https:</b> These bridges are distributed via https server. +There are multiple https rings to further distribute bridges by IP address +ranges, which is denoted by <tt>ring=$ring</tt>. +Bridges in the <tt>https</tt> ring can also be assigned to subrings by +OR port or relay flag which is defined by <tt>port=$port</tt> and/or +<tt>flag=$flag</tt>.</li> +</ol> + +<pre> +bridge-pool-assignment 2011-03-13 14:38:03 +00b834117566035736fc6bd4ece950eace8e057a unallocated +00e923e7a8d87d28954fee7503e480f3a03ce4ee email port=443 flag=stable +0103bb5b00ad3102b2dbafe9ce709a0a7c1060e4 https ring=2 port=443 flag=stable +[...] +</pre> + +</div> <!-- box --> + +<div class="box"> + +<a name="exit-lists"></a> +<h2>TorDNSEL's exit lists <a href="#exit-lists">#</a></h2> + +<p> +The exit list service +<a href="https://www.torproject.org/tordnsel/dist/">TorDNSEL</a> +publishes exit lists containing the IP addresses of relays that it found +when exiting through them. +</p> + +<h3>Exit lists +(<a href="archive/exit-lists/">archive</a>, +<a href="recent/exit-lists/">recent</a>) +<span class="type-annotation"><tt>@type tordnsel 1.0</tt></span> +</h3> + +<p> +Tor Check makes the list of known exits and corresponding exit IP +addresses available in a specific format. +The document below shows an entry of the exit list written on +December 28, 2010 at 15:21:44 UTC. +This entry means that the relay with fingerprint <tt>63BA..</tt> which +published a descriptor at 07:35:55 and was contained in a version 2 +network status from 08:10:11 uses two different IP addresses for exiting. +The first address <tt>91.102.152.236</tt> was found in a test performed at +07:10:30. +When looking at the corresponding server descriptor, one finds that this +is also the IP address on which the relay accepts connections from inside +the Tor network. +A second test performed at 10:35:30 reveals that the relay also uses IP +address <tt>91.102.152.227</tt> for exiting. +</p> + +<pre> +ExitNode 63BA28370F543D175173E414D5450590D73E22DC +Published 2010-12-28 07:35:55 +LastStatus 2010-12-28 08:10:11 +ExitAddress 91.102.152.236 2010-12-28 07:10:30 +ExitAddress 91.102.152.227 2010-12-28 10:35:30 +</pre> + +</div> <!-- box --> + +<div class="box"> + +<a name="torperf"></a> +<h2>Torperf's performance data <a href="#torperf">#</a></h2> + +<p> +The performance measurement service Torperf publishes performance data +from making simple HTTP requests over the Tor network. +Torperf uses a trivial SOCKS client to download files of various sizes +over the Tor network and notes how long substeps take. +</p> + +<h3>Torperf measurement results +(<a href="archive/torperf/">archive</a>, +<a href="recent/torperf/">recent</a>) +<span class="type-annotation"><tt>@type torperf 1.0</tt></span> +</h3> + +<p> +A Torperf results file contains a single line per Torperf run with +<tt>key=value</tt> pairs. +Such a result line is sufficient to learn about 1) the Tor and Torperf +configuration, 2) measurement results, and 3) additional information that +might help explain the results. +Known keys are explained below. +</p> +<ul> +<li>Configuration +<ul> +<li><tt>SOURCE:</tt> Configured name of the data source; required.</li> +<li><tt>FILESIZE:</tt> Configured file size in bytes; required.</li> +<li>Other meta data describing the Tor or Torperf configuration, e.g., +GUARD for a custom guard choice; optional.</li> +</ul> +<li>Measurement results +<ul> +<li><tt>START:</tt> Time when the connection process starts; +required.</li> +<li><tt>SOCKET:</tt> Time when the socket was created; required.</li> +<li><tt>CONNECT:</tt> Time when the socket was connected; required.</li> +<li><tt>NEGOTIATE:</tt> Time when SOCKS 5 authentication methods have been +negotiated; required.</li> +<li><tt>REQUEST:</tt> Time when the SOCKS request was sent; required.</li> +<li><tt>RESPONSE:</tt> Time when the SOCKS response was received; +required.</li> +<li><tt>DATAREQUEST:</tt> Time when the HTTP request was written; +required.</li> +<li><tt>DATARESPONSE:</tt> Time when the first response was received; +required.</li> +<li><tt>DATACOMPLETE:</tt> Time when the payload was complete; +required.</li> +<li><tt>WRITEBYTES:</tt> Total number of bytes written; required.</li> +<li><tt>READBYTES:</tt> Total number of bytes read; required.</li> +<li><tt>DIDTIMEOUT:</tt> 1 if the request timed out, 0 otherwise; +optional.</li> +<li><tt>DATAPERCx:</tt> Time when x% of expected bytes were read for +x = { 10, 20, 30, 40, 50, 60, 70, 80, 90 }; optional.</li> +<li>Other measurement results, e.g., START_RENDCIRC, GOT_INTROCIRC, etc. +for hidden-service measurements; optional.</li> +</ul> +<li>Additional information +<ul> +<li><tt>LAUNCH:</tt> Time when the circuit was launched; optional.</li> +<li><tt>USED_AT:</tt> Time when this circuit was used; optional.</li> +<li><tt>PATH:</tt> List of relays in the circuit, separated by commas; +optional.</li> +<li><tt>BUILDTIMES:</tt> List of times when circuit hops were built, +separated by commas; optional.</li> +<li><tt>TIMEOUT:</tt> Circuit build timeout that the Tor client used when +building this circuit; optional.</li> +<li><tt>QUANTILE:</tt> Circuit build time quantile that the Tor client +uses to determine its circuit-build timeout; optional.</li> +<li><tt>CIRC_ID:</tt> Circuit identifier of the circuit used for this +measurement; optional.</li> +<li><tt>USED_BY:</tt> Stream identifier of the stream used for this +measurement; optional.</li> +<li>Other fields containing additional information; optional.</li> +</ul> +</ul> + +<p> +The files in <a href="recent/torperf/extra-infos/">recent</a> +accumulate all new Torperf measurements of a given day, which means that +they may change throughout the day. +This is different from all other files in the <a href="recent/">recent</a> +directory which do not change once they are written. +</p> + +</div> <!-- box --> + +</body> +</html> + diff --git a/web/index.html b/web/index.html index c687798..e4eadc2 100644 --- a/web/index.html +++ b/web/index.html @@ -1,7 +1,7 @@ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> -<title>CollecTor — your friendly data-collecting service in the Tor +<title>CollecTor — Your friendly data-collecting service in the Tor network</title> <link href="css/style.css" type="text/css" rel="stylesheet"> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> @@ -11,8 +11,8 @@ network</title>
<div class="box">
-<h1>CollecTor — your friendly data-collecting service in the Tor -network</h1> +<h1><a href="index.html">CollecTor</a> —</h1> +<h2>Your friendly data-collecting service in the Tor network</h2>
<p> Welcome to CollecTor, your friendly data-collecting service in the Tor @@ -23,12 +23,51 @@ If you're doing research on the Tor network, or if you're developing an application that uses Tor network data, this is your place to start. </p>
+<ul> +<li><a href="#formats">What is in the data?</a></li> +<li><a href="#download">Where do I get the data?</a></li> +<li><a href="#libraries">How can I parse the data?</a></li> +<li><a href="#references">What did others do with the data?</a></li> +<li><a href="#support">How can I get support?</a></li> +</ul> + +</div> <!-- box --> + +<div class="box"> + +<a name="formats"></a> +<h2>What is in the data? <a href="#formats">#</a></h2> + +<p> +The Tor network data provided here comes from currently five different +sources (each of which is explained in more detail on a +<a href="formats.html">separate page</a>): +</p> + +<ol> +<li>Relays and directory authorities publish +<a href="formats.html#relay-descriptors">relay descriptors</a>, so that +clients can select relays for their paths through the Tor network.</li> +<li>Bridges and the bridge authority publish +<a href="formats.html#bridge-descriptors">bridge descriptors</a> that are +used by censored clients to connect to the Tor network.</li> +<li>The bridge distribution service BridgeDB publishes +<a href="formats.html#bridge-pool-assignments">bridge pool assignments</a> +describing which bridges it has assigned to which distribution pool.</li> +<li>The exit list service TorDNSEL publishes +<a href="formats.html#exit-lists">exit lists</a> containing the IP +addresses of relays that it found when exiting through them.</li> +<li>The performance measurement service Torperf publishes +<a href="formats.html#torperf">performance data</a> from making simple +HTTP requests over the Tor network.</li> +</ol> + </div> <!-- box -->
<div class="box">
-<a name="archive"></a> -<h3>Archive of monthly tarballs <a href="#archive">#</a></h3> +<a name="download"></a> +<h2>Where do I get the data? <a href="#download">#</a></h2>
<p> We have over 10 years of Tor network data available for download in @@ -36,56 +75,104 @@ monthly tarballs. The latest tarballs are updated every few days. So, if you want to fetch data covering an extended period of time, monthly tarballs are for you. -Note that tarballs can decompress to 20 times the compressed size or even -more. +Just be careful: these tarballs can decompress to 20 times the compressed +size or even more. +Monthly tarballs can be browsed and downloaded in the +<a href="archive/"><tt>archive/</tt></a> subdirectory. </p>
<p> -Monthly tarballs can be browsed and downloaded here: +If you're only interested in recently published data, we also have data +from the last 72 hours available for you. +In contrast to monthly tarballs, this data set is updated every hour. +If you have already bootstrapped your application with monthly tarballs +and want to stay up-to-date, or if you just want to take a peak at the +latest data, this is your data set. +If you're using special software to download these files, you may want to +configure it to accept gzip-compressed data to save us all some bandwidth. +The latest 72 hours of data are available in the +<a href="recent/"><tt>recent/</tt></a> subdirectory. </p>
-<pre> - <a href="archive/">https://collector.torproject.org/archive/</a> -</pre> - </div> <!-- box -->
<div class="box">
-<a name="recent"></a> -<h3>The latest 72 hours <a href="#recent">#</a></h3> +<a name="libraries"></a> +<h2>How can I parse the data? <a href="#libraries">#</a></h2>
<p> -If you're only interested in recently published data, we also have data -from the last 72 hours available for you. -In contrast to monthly tarballs, this data set is updated every hour. -If you have already bootstrapped your application with monthly tarballs -and want to stay up-to-date, or if you just want to take a peak at the -latest data, this is your data set. +We developed two parsing libraries, one for Java and one for Python: </p>
+<ul> +<li>If you're programming in Java, try out the +<a href="https://gitweb.torproject.org/metrics-lib.git">metrics-lib</a> +library.</li> +<li>If you're writing in Python, +<a href="https://stem.torproject.org/">Stem</a> is your library.</li> +</ul> + <p> -The latest 72 hours of data are also available here: +If you developed a parsing library for another language and want it to be +listed here, <a href="#support">please let us know</a>!</h2> </p>
-<pre> - <a href="recent/">https://collector.torproject.org/recent/</a> -</pre> +</div> <!-- box --> + +<div class="box"> + +<a name="references"></a> +<h2>What did others do with the data? <a href="#references">#</a></h2> + +<p> +We wrote a couple of applications, and researchers wrote research papers +using the Tor network data provided here. +The following list is not at all exhaustive: +</p> + +<ul> +<li>The metrics portal shows graphs of +<a href="https://metrics.torproject.org/network.html">network growth over +time</a> and <a href="https://metrics.torproject.org/users.html">estimates +of users derived from directory activity</a>.</li> +<li>The <a href="https://exonerator.torproject.org/">ExoneraTor +service</a> allows people to look up whether a given IP address was part +of the Tor network in the past.</li> +<li>The websites <a href="https://atlas.torproject.org/">Atlas</a>, +<a href="https://globe.torproject.org/">Globe</a>, and +<a href="https://compass.torproject.org/">Compass</a> let users explore +how specific relays or bridges contribute to the Tor network. +They all use <a href="https://onionoo.torproject.org/">Onionoo</a> as +their data back-end service which in turn uses the Tor network data +provided here.</li> +<li>The <a href="https://shadow.github.io/">Shadow Simulator</a> uses +archived Tor directory data to generate network topologies that match the +real Tor network as close as possible.</li> +<li>The <a href="https://torps.github.io/">Tor Path Simulator</a> uses Tor +directory archive data to simulate the effect of changes to Tor's path +selection algorithm.</li> +</ul> + +<p> +If you wrote an application or research paper that uses Tor network data +and that is not yet listed here, <a href="#support">please let us +know</a>!</h2> +Please include a short description what your application does or what your +research was about. +</p>
</div> <!-- box -->
<div class="box">
-<a name="next"></a> -<h3>What's next? <a href="#next">#</a></h3> +<a name="support"></a> +<h2>How can I get support? <a href="#support">#</a></h2>
<p> -Do you need support? -If you have any questions or feedback about the Tor network data provided -here, we'd like to hear from you! -Please send mail to the -<a href="mailto:tor-dev@lists.torproject.org">Tor development mailing -list</a>. +If you have any questions about the Tor network data provided here, we'd +like to <a href="mailto:help@rt.torproject.org">hear from you</a>! +Of course, suggestions or other feedback are welcome, too. </p>
</div>
tor-commits@lists.torproject.org