commit b9ce7127ccb722bfe2a368450ba4569d68cd11e3 Author: Karsten Loesing karsten.loesing@gmx.net Date: Mon Oct 28 15:44:37 2013 +0100
Retire old user number estimates. --- web/WEB-INF/users.jsp | 407 +++++++++++++++++-------------------------------- 1 file changed, 143 insertions(+), 264 deletions(-)
diff --git a/web/WEB-INF/users.jsp b/web/WEB-INF/users.jsp index 06d7f4a..788b3a8 100644 --- a/web/WEB-INF/users.jsp +++ b/web/WEB-INF/users.jsp @@ -16,238 +16,11 @@ <h2>Tor Metrics Portal: Users</h2> <br>
-<a name="direct-users"></a> -<h3><a href="#direct-users" class="anchor">Directly connecting Tor -users</a></h3> -<br> -<p>After being connected to the Tor network, users need to refresh their -list of running relays on a regular basis. They send their requests to one -out of a few hundred directory mirrors to save bandwidth of the directory -authorities. The following graphs show an estimate of recurring Tor users -based on the requests seen by a few dozen directory mirrors.</p> -<p><b>Daily directly connecting users:</b></p> -<img src="direct-users.png${direct_users_url}" - width="576" height="360" alt="Direct users graph"> -<form action="users.html#direct-users"> - <div class="formrow"> - <input type="hidden" name="graph" value="direct-users"> - <p> - <label>Start date (yyyy-mm-dd):</label> - <input type="text" name="start" size="10" - value="<c:choose><c:when test="${fn:length(direct_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${direct_users_start[0]}</c:otherwise></c:choose>"> - <label>End date (yyyy-mm-dd):</label> - <input type="text" name="end" size="10" - value="<c:choose><c:when test="${fn:length(direct_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${direct_users_end[0]}</c:otherwise></c:choose>"> - </p><p> - Source: <select name="country"> - <option value="all"<c:if test="${direct_users_country[0] eq 'all'}"> selected</c:if>>All users</option> - <c:forEach var="country" items="${countries}" > - <option value="${country[0]}"<c:if test="${direct_users_country[0] eq country[0]}"> selected</c:if>>${country[1]}</option> - </c:forEach> - </select> - </p><p> - Show possible censorship events if available (<a - href="http://research.torproject.org/techreports/detector-2011-09-09.pdf%22%3EBETA</a>) - <select name="events"> - <option value="off">Off</option> - <option value="on"<c:if test="${direct_users_events[0] eq 'on'}"> selected</c:if>>On: both points and expected range</option> - <option value="points"<c:if test="${direct_users_events[0] eq 'points'}"> selected</c:if>>On: points only, no expected range</option> - </select> - </p><p> - <input class="submit" type="submit" value="Update graph"> - </p> - </div> -</form> -<p>Download graph as -<a href="direct-users.pdf${direct_users_url}">PDF</a> or -<a href="direct-users.svg${direct_users_url}">SVG</a>.</p> -<hr> -<a name="direct-users-table"></a> -<p><b>Top-10 countries by directly connecting users:</b></p> -<form action="users.html#direct-users-table"> - <div class="formrow"> - <input type="hidden" name="table" value="direct-users"> - <p> - <label>Start date (yyyy-mm-dd):</label> - <input type="text" name="start" size="10" - value="<c:choose><c:when test="${fn:length(direct_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${direct_users_start[0]}</c:otherwise></c:choose>"> - <label>End date (yyyy-mm-dd):</label> - <input type="text" name="end" size="10" - value="<c:choose><c:when test="${fn:length(direct_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${direct_users_end[0]}</c:otherwise></c:choose>"> - </p><p> - <input class="submit" type="submit" value="Update table"> - </p> - </div> -</form> -<br> -<table> - <tr> - <th>Country</th> - <th>Mean daily users</th> - </tr> - <c:forEach var="row" items="${direct_users_tabledata}"> - <tr> - <td><a href="users.html?graph=direct-users&country=${row['cc']}#direct-users">${row['country']}</a> </td> - <td>${row['abs']} (<fmt:formatNumber type="number" minFractionDigits="2" value="${row['rel']}" /> %)</td> - </tr> - </c:forEach> -</table> -<hr> -<a name="censorship-events"></a> -<p><b>Top-10 countries by possible censorship events (<a - href="http://research.torproject.org/techreports/detector-2011-09-09.pdf%22%3EBETA</a>):</b></p> -<form action="users.html#censorship-events"> - <div class="formrow"> - <input type="hidden" name="table" value="censorship-events"> - <p> - <label>Start date (yyyy-mm-dd):</label> - <input type="text" name="start" size="10" - value="<c:choose><c:when test="${fn:length(censorship_events_start) == 0}">${default_start_date}</c:when><c:otherwise>${censorship_events_start[0]}</c:otherwise></c:choose>"> - <label>End date (yyyy-mm-dd):</label> - <input type="text" name="end" size="10" - value="<c:choose><c:when test="${fn:length(censorship_events_end) == 0}">${default_end_date}</c:when><c:otherwise>${censorship_events_end[0]}</c:otherwise></c:choose>"> - </p><p> - <input class="submit" type="submit" value="Update table"> - </p> - </div> -</form> -<br> -<table> - <tr> - <th>Country</th> - <th>Downturns</th> - <th>Upturns</th> - </tr> - <c:forEach var="row" items="${censorship_events_tabledata}"> - <tr> - <td><a href="users.html?graph=direct-users&country=${row['cc']}&events=on#direct-users">${row['country']}</a> </td> - <td>${row['downturns']}</td> - <td>${row['upturns']}</td> - </tr> - </c:forEach> -</table> -<hr> -<p><a href="csv/direct-users.csv">CSV</a> file containing daily directly -connecting users by country.</p> -<p><a href="csv/monthly-users-peak.csv">CSV</a> file containing peak daily -Tor users (direct and bridge) per month by country.</p> -<p><a href="csv/monthly-users-average.csv">CSV</a> file containing average -daily Tor users (direct and bridge) per month by country.</p> -<br> - -<a name="bridge-users"></a> -<h3><a href="#bridge-users" class="anchor">Tor users via bridges</a></h3> -<br> -<p>Users who cannot connect directly to the Tor network instead connect -via bridges, which are non-public relays. The following graphs display an -estimate of Tor users via bridges based on the unique IP addresses as seen -by a few hundred bridges.</p> -<img src="bridge-users.png${bridge_users_url}" - width="576" height="360" alt="Bridge users graph"> -<form action="users.html#bridge-users"> - <div class="formrow"> - <input type="hidden" name="graph" value="bridge-users"> - <p> - <label>Start date (yyyy-mm-dd):</label> - <input type="text" name="start" size="10" - value="<c:choose><c:when test="${fn:length(bridge_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${bridge_users_start[0]}</c:otherwise></c:choose>"> - <label>End date (yyyy-mm-dd):</label> - <input type="text" name="end" size="10" - value="<c:choose><c:when test="${fn:length(bridge_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${bridge_users_end[0]}</c:otherwise></c:choose>"> - </p><p> - Source: <select name="country"> - <option value="all"<c:if test="${bridge_users_country[0] eq 'all'}"> selected</c:if>>All users</option> - <c:forEach var="country" items="${countries}" > - <option value="${country[0]}"<c:if test="${bridge_users_country[0] eq country[0]}"> selected</c:if>>${country[1]}</option> - </c:forEach> - </select> - </p><p> - <input class="submit" type="submit" value="Update graph"> - </p> - </div> -</form> -<p>Download graph as -<a href="bridge-users.pdf${bridge_users_url}">PDF</a> or -<a href="bridge-users.svg${bridge_users_url}">SVG</a>.</p> -<hr> -<a name="bridge-users-table"></a> -<p><b>Top-10 countries by bridge users:</b></p> -<form action="users.html#bridge-users-table"> - <div class="formrow"> - <input type="hidden" name="table" value="bridge-users"> - <p> - <label>Start date (yyyy-mm-dd):</label> - <input type="text" name="start" size="10" - value="<c:choose><c:when test="${fn:length(bridge_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${bridge_users_start[0]}</c:otherwise></c:choose>"> - <label>End date (yyyy-mm-dd):</label> - <input type="text" name="end" size="10" - value="<c:choose><c:when test="${fn:length(bridge_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${bridge_users_end[0]}</c:otherwise></c:choose>"> - </p><p> - <input class="submit" type="submit" value="Update table"> - </p> - </div> -</form> -<br> -<table> - <tr> - <th>Country</th> - <th>Mean daily users</th> - </tr> - <c:forEach var="row" items="${bridge_users_tabledata}"> - <tr> - <td><a href="users.html?graph=bridge-users&country=${row['cc']}#bridge-users">${row['country']}</a> </td> - <td>${row['abs']} (<fmt:formatNumber type="number" minFractionDigits="2" value="${row['rel']}" /> %)</td> - </tr> - </c:forEach> -</table> -<hr> -<p><a href="csv/bridge-users.csv">CSV</a> file containing all data.</p> -<p><a href="csv/monthly-users-peak.csv">CSV</a> file containing peak daily -Tor users (direct and bridge) per month by country.</p> -<p><a href="csv/monthly-users-average.csv">CSV</a> file containing average -daily Tor users (direct and bridge) per month by country.</p> -<br> - -<hr> -<hr> - -<a name="userstats"></a> -<h3><a href="#userstats" class="anchor">New approach to estimating daily -Tor users (BETA)</a></h3> -<br> -<p>As of April 2013, we are experimenting with a new approach to estimating -daily Tor users. -The new approach works very similar to the existing approach to estimate -directly connecting users, but can also be applied to bridge users. -This new approach can break down user numbers by country, pluggable -transport, and IP version. -See the tech report on -<a href="https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf">counting daily bridge users</a> -and the -<a href="https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462">source code</a> -for details. - <a name="userstats-relay-country"></a> -<p><b>Direct users by country (BETA):</b></p> - -<font color="red"> -<p>This graph is quite similar to the graphs above, -except for the following differences:</p> -<ul> -<li>In contrast to the graphs above, this graph is based on -requests to directory mirrors <i>and</i> directory authorities. -The idea is that we want to estimate both new and recurring users. -That is why the numbers here are higher.</li> -<li>This graph uses byte histories for written <i>directory bytes</i> -rather than general byte history to weight what fraction of directory -requests a relay has answered in the network.</li> -<li>The implementation behind this graph is much more efficient, which -reduces time to graph from about 3 days to about 1 day.</li> -</ul> -</font> +<p><b>Direct users by country:</b></p>
<img src="userstats-relay-country.png${userstats_relay_country_url}" - width="576" height="360" alt="Direct users by country graph (BETA)"> + width="576" height="360" alt="Direct users by country graph"> <form action="users.html#userstats-relay-country"> <div class="formrow"> <input type="hidden" name="graph" value="userstats-relay-country"> @@ -283,7 +56,7 @@ reduces time to graph from about 3 days to about 1 day.</li> <a href="userstats-relay-country.svg${userstats_relay_country_url}">SVG</a>.</p> <hr> <a name="userstats-relay-table"></a> -<p><b>Top-10 countries by directly connecting users (BETA):</b></p> +<p><b>Top-10 countries by directly connecting users:</b></p> <form action="users.html#userstats-relay-table"> <div class="formrow"> <input type="hidden" name="table" value="userstats-relay"> @@ -349,16 +122,10 @@ reduces time to graph from about 3 days to about 1 day.</li> <hr>
<a name="userstats-bridge-country"></a> -<p><b>Bridge users by country (BETA):</b></p> - -<p> -<font color="red">In contrast to the bridge-user graph above, this graph -uses directory requests to estimate user numbers, not unique IP address sets. -It's yet to be decided which approach is more correct.</font> -</p> +<p><b>Bridge users by country:</b></p>
<img src="userstats-bridge-country.png${userstats_bridge_country_url}" - width="576" height="360" alt="Bridge users by country graph (BETA)"> + width="576" height="360" alt="Bridge users by country graph"> <form action="users.html#userstats-bridge-country"> <div class="formrow"> <input type="hidden" name="graph" value="userstats-bridge-country"> @@ -386,7 +153,7 @@ It's yet to be decided which approach is more correct.</font> <a href="userstats-bridge-country.svg${userstats_bridge_country_url}">SVG</a>.</p> <hr> <a name="userstats-bridge-table"></a> -<p><b>Top-10 countries by bridge users (BETA):</b></p> +<p><b>Top-10 countries by bridge users:</b></p> <form action="users.html#userstats-bridge-table"> <div class="formrow"> <input type="hidden" name="table" value="userstats-bridge"> @@ -418,19 +185,10 @@ It's yet to be decided which approach is more correct.</font> <hr>
<a name="userstats-bridge-transport"></a> -<p><b>Bridge users by transport (BETA):</b></p> - -<p> -<font color="red">Almost none of the currently running bridges report the -transport name of connecting users, which is why non-OR transport usage is -so low. -By default, we consider all users of a bridge OR transport users, unless told -otherwise. -Non-OR transport numbers will become more accurate over time.</font> -</p> +<p><b>Bridge users by transport:</b></p>
<img src="userstats-bridge-transport.png${userstats_bridge_transport_url}" - width="576" height="360" alt="Bridge users by transport graph (BETA)"> + width="576" height="360" alt="Bridge users by transport graph"> <form action="users.html#userstats-bridge-transport"> <div class="formrow"> <input type="hidden" name="graph" value="userstats-bridge-transport"> @@ -460,18 +218,10 @@ Non-OR transport numbers will become more accurate over time.</font> <hr>
<a name="userstats-bridge-version"></a> -<p><b>Bridge users by IP version (BETA):</b></p> - -<p> -<font color="red">Not all of the currently running bridges report the -IP version of connecting users. -By default, we consider all users of a bridge IPv4 users, unless told -otherwise. -IPv6 numbers will become more accurate over time.</font> -</p> +<p><b>Bridge users by IP version:</b></p>
<img src="userstats-bridge-version.png${userstats_bridge_version_url}" - width="576" height="360" alt="Bridge users by IP version graph (BETA)"> + width="576" height="360" alt="Bridge users by IP version graph"> <form action="users.html#userstats-bridge-version"> <div class="formrow"> <input type="hidden" name="graph" value="userstats-bridge-version"> @@ -498,14 +248,143 @@ IPv6 numbers will become more accurate over time.</font> <hr>
<p><a href="csv/userstats.csv">CSV</a> file containing new user -estimates (BETA).</p> +estimates.</p> <p><a href="csv/monthly-userstats-peak.csv">CSV</a> file containing peak -daily Tor users (direct and bridge) per month by country (BETA).</p> +daily Tor users (direct and bridge) per month by country.</p> <p><a href="csv/monthly-userstats-average.csv">CSV</a> file containing -average daily Tor users (direct and bridge) per month by country -(BETA).</p> +average daily Tor users (direct and bridge) per month by country.</p> <br>
+<hr> +<a name="questions-and-answers"></a> +<p><b>Questions and answers</b></p> +<p> +Q: How is it even possible to count users in an anonymity network?<br/> +A: We actually don't count users, but we count requests to the directories +that clients make periodically to update their list of relays and estimate +user numbers indirectly from there. +</p> +<p> +Q: Do all directories report these directory request numbers?<br/> +A: No, but we can see what fraction of directories reported them, and then +we can extrapolate the total number in the network. +</p> + +<p> +Q: How do you get from these directory requests to user numbers?<br/> +A: We put in the assumption that the average client makes 10 such requests +per day. A tor client that is connected 24/7 makes about 15 requests per +day, but not all clients are connected 24/7, so we picked the number 10 +for the average client. We simply divide directory requests by 10 and +consider the result as the number of users. +</p> + +<p> +Q: So, are these distinct users per day, average number of users connected +over the day, or what?<br/> +A: Average number of users connected over the day. We can't say how many +distinct users there are. +</p> + +<p> +Q: Are these tor clients or users? What if there's more than one user +behind a tor client?<br/> +A: Then we count those users as one. We really count clients, but it's +more intuitive for most people to think of users, that's why we say users +and not clients. +</p> + +<p> +Q: What if a user runs tor on a laptop and changes their IP address a few +times per day? Don't you overcount that user?<br/> +A: No, because that user updates their list of relays as often as a user +that doesn't change IP address over the day. +</p> + +<p> +Q: How do you know which countries users come from?<br/> +A: The directories resolve IP addresses to country codes and report these +numbers in aggregate form. This is one of the reasons why tor ships with +a GeoIP database. +</p> + +<p> +Q: Why are there so few bridge users that are not using the default OR +protocol or that are using IPv6?<br/> +A: Very few bridges report data on transports or IP versions yet, and by +default we consider requests to use the default OR protocol and IPv4. +Once more bridges report these data, the numbers will become more +accurate. +</p> + +<p> +Q: Why do the graphs end 2 days in the past and not today?<br/> +A: Relays and bridges report some of the data in 24-hour intervals which +may end at any time of the day. And after such an interval is over relays +and bridges might take another 18 hours to report the data. We cut off +the last two days from the graphs, because we want to avoid that the last +data point in a graph indicates a recent trend change which is in fact +just an artifact of the algorithm. +</p> + +<p> +Q: But I noticed that the last data point went up/down a bit since I last +looked a few hours ago. Why is that?<br/> +A: You're an excellent observer! The reason is that we publish user +numbers once we're confident enough that they won't change significantly +anymore. But it's always possible that a directory reports data a few +hours after we were confident enough, but which then slightly changed the +graph. +</p> + +<p> +Q: Why are no numbers available before September 2011?<br/> +A: We do have descriptor archives from before that time, but those +descriptors didn't contain all the data we use to estimate user numbers. +We do have older user numbers from an earlier estimation approach here +(add link), but we believe the current approach is more accurate. +</p> + +<p> +Q: Why do you believe the current approach to estimate user numbers is +more accurate?<br/> +A: For direct users, we include all directories which we didn't do in the +old approach. We also use histories that only contain bytes written to +answer directory requests, which is more precise than using general byte +histories. +</p> + +<p> +Q: And what about the advantage of the current approach over the old one +when it comes to bridge users?<br/> +A: Oh, that's a whole different story. We wrote a 13 page long +<a href="https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf">technical +report</a> explaining the reasons for retiring the old approach. But the +old data is still <a href="/data/old-user-number-estimates.tar.gz">available</a>. +tl;dr: in the old approach we measured the wrong thing, and now we measure +the right thing. +</p> + +<p> +Q: Are the data and the source code for estimating these user numbers +available?<br/> +A: Sure, <a href="/data.html">data</a> and +<a href="https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462">source +code</a> are publicly available. +</p> + +<p> +Q: What are these red and blue dots indicating possible censorship +events?<br/> +A: We run an anomaly-based censorship-detection system that looks at +estimated user numbers over a series of days and predicts the user number +in the next days. If the actual number is higher or lower, this might +indicate a possible censorship event or release of censorship. For more +details, see our +<a href="https://research.torproject.org/techreports/detector-2011-09-09.pdf">technical +report</a>. +</p> + </div> </div> <div class="bottom" id="bottom">