commit f995b0f64febfe288af0f09efe0ef3b68c02100d Author: Karsten Loesing karsten.loesing@gmx.net Date: Thu Jun 26 16:43:03 2014 +0200
Move Q-and-A about user statistics to a text file.
These questions and answers are likely read by less than 10% of visitors, and the remaining 90% wonder what that wall of text is. We should write a more general Q-and-A section covering the entire website. Whoever cares about how user statistics are calculated can read the text file.
Tweak the Q's and A's a tiny bit while converting them to plain text. --- doc/users-q-and-a.txt | 94 +++++++++++++++++++++++++++++ website/web/WEB-INF/users.jsp | 131 +---------------------------------------- 2 files changed, 96 insertions(+), 129 deletions(-)
diff --git a/doc/users-q-and-a.txt b/doc/users-q-and-a.txt new file mode 100644 index 0000000..15a1084 --- /dev/null +++ b/doc/users-q-and-a.txt @@ -0,0 +1,94 @@ +Questions and answers about user statistics +=========================================== + +Q: How is it even possible to count users in an anonymity network? +A: We actually don't count users, but we count requests to the directories +that clients make periodically to update their list of relays and estimate +user numbers indirectly from there. + +Q: Do all directories report these directory request numbers? +A: No, but we can see what fraction of directories reported them, and then +we can extrapolate the total number in the network. + +Q: How do you get from these directory requests to user numbers? +A: We put in the assumption that the average client makes 10 such requests +per day. A tor client that is connected 24/7 makes about 15 requests per +day, but not all clients are connected 24/7, so we picked the number 10 +for the average client. We simply divide directory requests by 10 and +consider the result as the number of users. Another way of looking at it, +is that we assume that each request represents a client that stays online +for one tenth of a day, so 2 hours and 24 minutes. + +Q: So, are these distinct users per day, average number of users connected +over the day, or what? +A: Average number of concurrent users, estimated from data collected over +a day. We can't say how many distinct users there are. + +Q: Are these tor clients or users? What if there's more than one user +behind a tor client? +A: Then we count those users as one. We really count clients, but it's +more intuitive for most people to think of users, that's why we say users +and not clients. + +Q: What if a user runs tor on a laptop and changes their IP address a few +times per day? Don't you overcount that user? +A: No, because that user updates their list of relays as often as a user +that doesn't change IP address over the day. + +Q: How do you know which countries users come from? +A: The directories resolve IP addresses to country codes and report these +numbers in aggregate form. This is one of the reasons why tor ships with +a GeoIP database. + +Q: Why are there so few bridge users that are not using the default OR +protocol or that are using IPv6? +A: Very few bridges report data on transports or IP versions yet, and by +default we consider requests to use the default OR protocol and IPv4. +Once more bridges report these data, the numbers will become more +accurate. + +Q: Why do the graphs end 2 days in the past and not today? +A: Relays and bridges report some of the data in 24-hour intervals which +may end at any time of the day. And after such an interval is over relays +and bridges might take another 18 hours to report the data. We cut off +the last two days from the graphs, because we want to avoid that the last +data point in a graph indicates a recent trend change which is in fact +just an artifact of the algorithm. + +Q: But I noticed that the last data point went up/down a bit since I last +looked a few hours ago. Why is that? +A: The reason is that we publish user numbers once we're confident enough +that they won't change significantly anymore. But it's always possible +that a directory reports data a few hours after we were confident enough, +but which then slightly changed the graph. + +Q: Why are no numbers available before September 2011? +A: We do have descriptor archives from before that time, but those +descriptors didn't contain all the data we use to estimate user numbers. + +Q: Why do you believe the current approach to estimate user numbers is +more accurate? +A: For direct users, we include all directories which we didn't do in the +old approach. We also use histories that only contain bytes written to +answer directory requests, which is more precise than using general byte +histories. + +Q: And what about the advantage of the current approach over the old one +when it comes to bridge users? +A: Oh, that's a whole different story. We wrote a 13 page long technical +report explaining the reasons for retiring the old approach. tl;dr: in +the old approach we measured the wrong thing, and now we measure the right +thing. + + https://research.torproject.org/techreports/counting-daily-bridge-users-2012... + +Q: What are these red and blue dots indicating possible censorship +events? +A: We run an anomaly-based censorship-detection system that looks at +estimated user numbers over a series of days and predicts the user number +in the next days. If the actual number is higher or lower, this might +indicate a possible censorship event or release of censorship. For more +details, see our technical report. + + https://research.torproject.org/techreports/detector-2011-09-09.pdf + diff --git a/website/web/WEB-INF/users.jsp b/website/web/WEB-INF/users.jsp index 84cab43..0a31569 100644 --- a/website/web/WEB-INF/users.jsp +++ b/website/web/WEB-INF/users.jsp @@ -269,136 +269,9 @@ estimates.</p> <br>
<hr> -<a name="questions-and-answers"></a> -<p><b>Questions and answers</b></p> -<p> -Q: How is it even possible to count users in an anonymity network?<br/> -A: We actually don't count users, but we count requests to the directories -that clients make periodically to update their list of relays and estimate -user numbers indirectly from there. -</p> -<p> -Q: Do all directories report these directory request numbers?<br/> -A: No, but we can see what fraction of directories reported them, and then -we can extrapolate the total number in the network. -</p>
-<p> -Q: How do you get from these directory requests to user numbers?<br/> -A: We put in the assumption that the average client makes 10 such requests -per day. A tor client that is connected 24/7 makes about 15 requests per -day, but not all clients are connected 24/7, so we picked the number 10 -for the average client. We simply divide directory requests by 10 and -consider the result as the number of users. Another way of looking at it, -is that we assume that each request represents a client that stays online -for 2 hours and 24 minutes. -</p> - -<p> -Q: So, are these distinct users per day, average number of users connected -over the day, or what?<br/> -A: Average number of concurrent users, estimated from data collected over -a day. We can't say how many distinct users there are. -</p> - -<p> -Q: Are these tor clients or users? What if there's more than one user -behind a tor client?<br/> -A: Then we count those users as one. We really count clients, but it's -more intuitive for most people to think of users, that's why we say users -and not clients. -</p> - -<p> -Q: What if a user runs tor on a laptop and changes their IP address a few -times per day? Don't you overcount that user?<br/> -A: No, because that user updates their list of relays as often as a user -that doesn't change IP address over the day. -</p> - -<p> -Q: How do you know which countries users come from?<br/> -A: The directories resolve IP addresses to country codes and report these -numbers in aggregate form. This is one of the reasons why tor ships with -a GeoIP database. -</p> - -<p> -Q: Why are there so few bridge users that are not using the default OR -protocol or that are using IPv6?<br/> -A: Very few bridges report data on transports or IP versions yet, and by -default we consider requests to use the default OR protocol and IPv4. -Once more bridges report these data, the numbers will become more -accurate. -</p> - -<p> -Q: Why do the graphs end 2 days in the past and not today?<br/> -A: Relays and bridges report some of the data in 24-hour intervals which -may end at any time of the day. And after such an interval is over relays -and bridges might take another 18 hours to report the data. We cut off -the last two days from the graphs, because we want to avoid that the last -data point in a graph indicates a recent trend change which is in fact -just an artifact of the algorithm. -</p> - -<p> -Q: But I noticed that the last data point went up/down a bit since I last -looked a few hours ago. Why is that?<br/> -A: You're an excellent observer! The reason is that we publish user -numbers once we're confident enough that they won't change significantly -anymore. But it's always possible that a directory reports data a few -hours after we were confident enough, but which then slightly changed the -graph. -</p> - -<p> -Q: Why are no numbers available before September 2011?<br/> -A: We do have descriptor archives from before that time, but those -descriptors didn't contain all the data we use to estimate user numbers. -We do have older user numbers from an earlier estimation approach -<a href="/data/old-user-number-estimates.tar.gz">here</a>, but we believe -the current approach is more accurate. -</p> - -<p> -Q: Why do you believe the current approach to estimate user numbers is -more accurate?<br/> -A: For direct users, we include all directories which we didn't do in the -old approach. We also use histories that only contain bytes written to -answer directory requests, which is more precise than using general byte -histories. -</p> - -<p> -Q: And what about the advantage of the current approach over the old one -when it comes to bridge users?<br/> -A: Oh, that's a whole different story. We wrote a 13 page long -<a href="https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf">technical -report</a> explaining the reasons for retiring the old approach. -tl;dr: in the old approach we measured the wrong thing, and now we measure -the right thing. -</p> - -<p> -Q: Are the data and the source code for estimating these user numbers -available?<br/> -A: Sure, <a href="/data.html">data</a> and -<a href="https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462">source -code</a> are publicly available. -</p> - -<p> -Q: What are these red and blue dots indicating possible censorship -events?<br/> -A: We run an anomaly-based censorship-detection system that looks at -estimated user numbers over a series of days and predicts the user number -in the next days. If the actual number is higher or lower, this might -indicate a possible censorship event or release of censorship. For more -details, see our -<a href="https://research.torproject.org/techreports/detector-2011-09-09.pdf">technical -report</a>. -</p> +<p><a href="https://gitweb.torproject.org/metrics-web.git/blob/HEAD:/doc/users-q-and-a.txt">Questions +and answers about users statistics</a></p>
</div> </div>
tor-commits@lists.torproject.org