commit 75f5c0c13c2a858ad77309b4b468b39f1003721c Author: Karsten Loesing karsten.loesing@gmx.net Date: Sun Feb 8 19:19:10 2015 +0100
Tweak extrapolation report before publication. --- .../extrapolating-hidserv-stats.tex | 482 ++++++++++++-------- 1 file changed, 285 insertions(+), 197 deletions(-)
diff --git a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex index 053a081..bef857f 100644 --- a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex +++ b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex @@ -4,32 +4,69 @@ \usepackage{url} \begin{document}
-\title{Extrapolating network totals from hidden-service statistics} +\title{Extrapolating network totals\from hidden-service statistics}
-\author{yet unnamed authors} +\author{George Kadianakis and Karsten Loesing}
-\reportid{DRAFT} -\date{to be published in January 2015} +\contact{ +\href{mailto:asn@torproject.org}{asn@torproject.org},% +\href{mailto:karsten@torproject.org}{karsten@torproject.org}} + +\reportid{2015-01-001} +\date{January 31, 2015}
\maketitle
\begin{abstract} Starting on December 19, 2014, we added two new statistics to the Tor software that shall give us some first insights into hidden-service usage. -The first is the number of .onion addresses observed by a hidden-service -directory, and the second is the number of cells on rendezvous circuits -observed by a rendezvous point. +The first statistic is the number of cells on rendezvous circuits observed +by a rendezvous point, and the second is the number of unique .onion +addresses observed by a hidden-service directory. Each relay that opts in to reporting these statistics publishes these two numbers for 24-hour intervals of operation. -In the following, we explain our approach for extrapolating network totals +In the following, we describe an approach for extrapolating network totals from these statistics. -The goal is to learn how many unique .onion addresses exist in the network -and what amount of traffic can be attributed to hidden-service usage. -We show that we can extrapolate network totals from hidden-service -statistics with reasonable accuracy as long as at least 1% of relays -report these statistics. +The goal is to learn what amount of traffic can be attributed to +hidden-service usage and how many unique .onion addresses exist in the +network. +We show that we can extrapolate network totals with reasonable accuracy as +long as at least 1% of relays report these statistics. \end{abstract}
+\section*{Introduction} + +As of December 19, 2014, a small number of relays has started reporting +statistics on hidden-service usage. +Similar to other statistics, these statistics are based solely on what the +reporting relay observes, without exchanging observations with other +relays. +In this report we describe a method for extrapolating these statistics to +network totals. + +\begin{figure} +\centering +\includegraphics[width=.8\textwidth]{overview.pdf} +\caption{Overview of the extrapolation method used for extrapolating +network totals from hidden-service statistics.} +\label{fig:overview} +\end{figure} + +Figure~\ref{fig:overview} gives an overview of the extrapolation method +where each step corresponds to a section in this report. +In step~1 we parse the statistics that relays report in their extra-info +descriptors. +These statistics contain noise that was added by relays to obfuscate +original observations, which we attempt to remove in step~2. +In step~3 we process consensuses to derive network fractions of reporting +relays, that is, what fraction of hidden-service usage a relay should have +observed. +We use these fractions to remove implausible statistics in step~4. +Then we extrapolate network totals in step~5, where each extrapolation is +based on the report from a single relay. +Finally, in step~6 we select daily averages from these network totals +which constitutes our result. + \section{Parsing reported statistics}
There are two types of documents produced by Tor relays that we consider @@ -40,6 +77,19 @@ The second are consensuses that indicate what fraction of hidden-service descriptors a hidden-service directory has observed and what fraction of rendezvous circuits a relay has handled.
+We start by describing how we're parsing and processing hidden-service +statistics from extra-info descriptors. +Figure~\ref{fig:num-reported-stats} shows the number of statistics +reported by day, and Figure~\ref{fig:extrainfo} shows a sample. +The relevant parts for this analysis are: + +\begin{figure}[b] +\centering +\includegraphics[width=\textwidth]{graphics/num-reported-stats.pdf} +\caption{Number of reported hidden-service statistics.} +\label{fig:num-reported-stats} +\end{figure} + % SAMPLE: % fingerprint F528DED21EACD2E4E9301EC0AABD370EDCAD2C47 % stats_start 2014-12-31 16:17:33 @@ -49,7 +99,7 @@ rendezvous circuits a relay has handled. % prob_rend_point 0.01509326 % frac_hsdesc 0.00069757
-\begin{figure}[b] +\begin{figure} \begin{verbatim} extra-info ryroConoha F528DED21EACD2E4E9301EC0AABD370EDCAD2C47 [...] @@ -62,12 +112,6 @@ descriptor.} \label{fig:extrainfo} \end{figure}
-We start by describing how we're parsing and processing hidden-service -statistics from extra-info descriptors. -Figure~\ref{fig:extrainfo} shows a sample of hidden-service statistics as -contained in extra-info descriptors. -The relevant parts for this analysis are: - \begin{itemize} \item The \verb+extra-info+ line tells us which relay reported these statistics, which we need to know to derive what fraction of @@ -81,21 +125,14 @@ The value for \verb+bin_size+ is the bin size used for rounding up the originally observed cell number, and the values for \verb+delta_f+ and \verb+epsilon+ are inputs for the additive noise following a Laplace distribution. +For more information on how obfuscation is performed, please see Tor +proposal 238.% +\footnote{\url{https://gitweb.torproject.org/torspec.git/tree/proposals/238-hs-relay-stats.... \item And finally, the \verb+hidserv-dir-onions-seen+ line tells us the number of .onion addresses that the relay observed in published hidden-service descriptors in its role as hidden-service directory. \end{itemize}
-\begin{figure} -\centering -\includegraphics[width=\textwidth]{graphics/num-reported-stats.pdf} -\caption{Number of relays reporting hidden-service statistics.} -\label{fig:num-reported-stats} -\end{figure} - -Figure~\ref{fig:num-reported-stats} shows the number of statistics -reported by day. - \section{Removing previously added noise}
When processing hidden-service statistics, we need to handle the fact that @@ -112,24 +149,19 @@ Following these steps, the statistics reported in Figure~\ref{fig:extrainfo} are processed to 152599040~cells and 84~.onion addresses. For the subsequent analysis we're also converting cells/day to -bytes/second by multiplying cell numbers with 512~bytes/cell, dividing by -86400~seconds/day, and dividing by 2 to account for the fact that -statistics include cells in both incoming and outgoing direction. -As a result we obtain 452~KB/s in the given sample. +bits/second by multiplying cell numbers with 512~bytes/cell, multiplying +with 8~bits/byte, dividing by 86400~seconds/day, and dividing by 2 to +account for the fact that statistics include cells in both incoming and +outgoing direction. +As a result we obtain 3.6~Mbit/s in the given sample.
Figure~\ref{fig:stats-by-day} shows parsed values after removing previously added noise. Negative values are the result of relays adding negative -Laplace-distributed noise values to very small observed values. -We will describe an attempt to remove such values shortly. -\footnote{A plausible step three in the previously described process could -have been to round negative values to 0, because that represents the most -likely rounded value before Laplace noise was added. -However, removing negative values would add bias to the result, because it -would only remove negative noise without being able to detect and remove -positive noise. -That's why we'd rather want to remove implausible values based on other -criteria.} +Laplace-distributed noise values to very small observed values, which we +cannot remove easily. +We will describe an attempt to remove such values in +Sections~\ref{sec:implausible} and \ref{sec:averages}.
\begin{figure} \centering @@ -144,12 +176,12 @@ Laplace-distributed noise values to very small observed values.} \section{Deriving network fractions from consensuses}
The second document type that we consider in our analysis are consensuses. -Not all hidden-service directories observe the same number of -hidden-service descriptors, and the probability of chosing a relay as -rendezvous point is even less uniformly distributed. -Fortunately, we can derive what fraction of descriptors a directory was -responsible for and what fraction of rendezvous circuits a relay has -handled. +The probability of choosing a relay as rendezvous point varies a lot +between relays, and not all hidden-service directories handle the same +number of hidden-service descriptors. +Fortunately, we can derive what fraction of rendezvous circuits a relay +has handled and what fraction of descriptors a directory was responsible +for.
\begin{figure} \begin{verbatim} @@ -179,11 +211,33 @@ directories preceding it.} \end{figure}
Figure~\ref{fig:consensusentry} shows the consensus entry of the relay -that submitted the sample hidden-service statistics mentioned above. +that submitted the sample hidden-service statistics mentioned above, plus +neighboring consensus entries. + +The first fraction that we compute is the probability of a relay to be +selected as rendezvous point. +Clients only select relays with the \verb+Fast+ flag and in some cases the +\verb+Stable+ flag, and they weight relays differently based on their +bandwidth and depending on whether they have the \verb+Exit+ and/or +\verb+Guard+ flags. +(Clients require relays to have the \verb+Stable+ flag if they attempt to +establish a long-running connection, e.g., to a hidden SSH server, but in +the following analysis, we assume that most clients establish connections +that don't need to last for long, e.g., to hidden webservers.) +Clients weight the bandwidth value contained in the consensus entry with +the value of \verb+Wmg+, \verb+Wme+, \verb+Wmd+, or \verb+Wmm+, depending +on whether the relay has only the \verb+Guard+ flag, only the \verb+Exit+ +flag, both such flags, or neither of them. + +Our sample relay, \texttt{ryroConoha}, has the \verb+Fast+ flag, a +bandwidth value of 117000, and neither \verb+Guard+ nor \verb+Exit+ flag. +Its probability for being selected as rendezvous point is calculated as +$117000 \times 10000/10000$ divided by the sum of all such weights in the +consensus, in this case $1.42%$.
-The first fraction that we can derive from this entry is the fraction of -descriptor space that this relay was responsible for in its role as -hidden-service directory. +The second fraction that we can derive from this consensus entry is the +fraction of descriptor space that this relay was responsible for in its +role as hidden-service directory. The Tor Rendezvous Specification\footnote{\url{https://gitweb.torproject.org/torspec.git/tree/rend-spec.txt%7D%7D contains the following definition that is relevant here: @@ -195,68 +249,66 @@ three identity digests of HSDir relays following the descriptor ID in a circular list.} \end{quote}
+Based on the fraction of descriptor space that a directory was responsible +for we can compute the fraction of descriptors that this directory has +seen. +Intuitively, one might think that these fractions are the same. +However, this is not the case: each descriptor that is published to a +directory is also published to two other directories. +As a result we need to divide the fraction of descriptor space by +\emph{three} to obtain the fraction of descriptors observed the directory. +Note that, without dividing by three, fractions of all directories would +not add up to 100%. + In the sample consensus entry, we'd extract the base64-encoded fingerprint of the statistics-reporting relay, \verb+9Sje0h6...+, and the fingerprint of the hidden-service directory that precedes the relay by three positions, \verb+9PodlaV...+, and compute what fraction of descriptor -space that is, in this case $0.07%$. +space that is, in this case $0.071$. +So, the relay has observed $0.024%$ of descriptors in the network.
-The second fraction that we compute is the probability of a relay to be -selected as rendezvous point. -Clients select only relays with the \verb+Fast+ and in some cases the -\verb+Stable+ flag, and they weigh relays differently based on their -bandwidth and depending on whether they have the \verb+Exit+ and/or -\verb+Guard+ flags. -(Clients further require relays to have the \verb+Stable+ flag if they -attempt to establish a long-running connection, e.g., to a hidden SSH -server, but in the following analysis, we assume that most clients -establish connections that don't need to last for long, e.g., to a hidden -webserver.) -Clients weigh the bandwidth value contained in the consensus with the -value of \verb+Wmg+, \verb+Wme+, \verb+Wmd+, or \verb+Wmm+, depending on -whether the relay has only the \verb+Guard+ flag, only the \verb+Exit+ -flag, both such flags, or neither of them. - -Our sample relay has the \verb+Fast+ flag, a bandwidth value of 117,000, -and neither \verb+Guard+ nor \verb+Exit+ flag. -Its probability for being selected as rendezvous point is calculated as -$117000 \times 10000/10000$ divided by the sum of all such weights in the -consensus, in this case $1.42%$ +% 9Sje0h6... -> F528DED2 -> 4113096402 +% 9PodlaV... -> F4FA1D95 -> - 4110032277 +% = 3064125 +% / 4294967296 +% = 0.00071342 +% / 3 +% = 0.00023781
\begin{figure} \centering \includegraphics[width=\textwidth]{graphics/probs-by-relay.pdf} -\caption{Calculated probabilities for observing hidden-service activity.} +\caption{Calculated network fractions of relays observing hidden-service activity.} \label{fig:probs-by-relay} \end{figure}
-Figure~\ref{fig:probs-by-relay} shows calculated probabilities of -observing hidden-service activities of relays reporting hidden-service +Figure~\ref{fig:probs-by-relay} shows calculated fractions of +hidden-service activity observed by relays that report hidden-service statistics. -That figure shows that most relays have roughly the same (small) -probability for observing a hidden-service descriptor with only few -outliers. -The probability for being selected as rendezvous point is much smaller for -most relays, with only the outliers having a realistic chance of being +The probability for being selected as rendezvous point is very small for +most relays, with only very few relays having a realistic chance of being selected. +In comparison, most relays have roughly the same (small) probability for +observing a hidden-service descriptor with only few exceptions.
\section{Removing implausible statistics} +\label{sec:implausible}
A relay that opts in to gathering hidden-service statistics reports them even if it couldn't plausibly have observed them. -In particular, a relay that did not have the \verb+HSDir+ flag could not -have observed a single .onion address, and a relay with the \verb+Exit+ -flag could not have been selected as rendezvous point as long as -\verb+Wmd+ and \verb+Wme+ are zero. +In particular, a relay with the \verb+Exit+ flag could not have been +selected as rendezvous point as long as \verb+Wmd+ and \verb+Wme+ are +zero, and a relay that did not have the \verb+HSDir+ flag could not have +observed a single .onion address. + Figure~\ref{fig:zero} shows distributions of reported statistics of relays -with calculated probabilities of exactly zero. +with calculated fractions of exactly zero. These reported values approximately follow the plotted Laplace distributions with $\mu=0$ and $b=2048/0.3$ or $b=8/0.3$ as defined for -the respective statistics. -We can assume that the vast majority of these reported values are just -noise. -In the following analysis, we exclude relays with calculated probabilities -of exactly 0. +the respective statistics, which gives us confidence that the vast +majority of these reported values are just noise. +In the following analysis, we exclude relays with calculated fractions of +exactly 0.
\begin{figure} \centering @@ -271,38 +323,36 @@ of exactly 0. \caption{Statistics reported by relays with calculated probabilities of observing these statistics of zero. The blue lines show Laplace distributions with $\mu=0$ and $b=2048/0.3$ or -$b=8/0.3$ as defined for the respective statistics.} +$b=8/0.3$ as defined for the respective statistics. +The lowest 1% and highest 1% of values have been removed for display +purposes.} \label{fig:zero} \end{figure}
-Another cause for implausible statistics could be very large positive or -negative noise added by the Laplace distribution. +Another kind of implausible statistics are very high or very low absolute +reported numbers. +These numbers could be the result of adding very large positive or +negative numbers from the Laplace distribution. In theory, a single relay, with non-zero probability of observing hidden-service activity, could have added noise from $-\infty$ to -$\infty$, which could derail statistics for the entire day. -These extreme values could be removed by calculating an interval of -plausible values for each relay, based on the probability of observing -hidden-service activity, and discarding values outside that interval. -Another option for avoiding these extreme values would be to cap noise -added at relays by adapting concepts from ($\epsilon,\delta)$-differential -privacy to the noise-generating code used by relays.% -\footnote{Whether or not either of these approaches is necessary depends -on whether or not our extrapolation method can handle outliers.} - -\section{Extrapolating hidden-service traffic in the network} - -We start the extrapolation of network totals with reported cells on -rendezvous circuits. -We do this by summing up all observations per day and dividing by the -total fraction of observations made by all reporting relays. -The underlying assumption of this approach is that reported statistics -grow linearly with calculated fractions. -Figure~\ref{fig:corr-probs-by-relay}~(left) shows that this is roughly -the case. -Figure~\ref{fig:corr-probs-by-day}~(left) shows total reported -statistics and calculated probabilities per day, and -Figure~\ref{fig:extrapolated-network-totals}~(bottom) shows extrapolated -network totals based on daily sums. +$\infty$. +Further, relays could lie about hidden-service usage and report very low +or very high absolute values in their statistics in an attempt to derail +statistics. +It seems difficult to define a range of plausible values, and such a range +might change over time. +It seems easier to handle these extreme values by treating a certain +fraction of extrapolated statistics as outliers, which is what we're going +to do in Section~\ref{sec:averages}. + +\section{Extrapolating network totals} + +We are now ready to extrapolate network totals from reported statistics. +We do this by dividing reported statistics by the calculated fraction of +observations made by the reporting relay. +The underlying assumption is that statistics grow linearly with calculated +fractions. +Figure~\ref{fig:corr-probs-by-relay} shows that this is roughly the case.
\begin{figure} \centering @@ -319,32 +369,9 @@ calculated probability for observing such activity.} \label{fig:corr-probs-by-relay} \end{figure}
-\begin{figure} -\centering -\begin{subfigure}{.5\textwidth} -\centering -\includegraphics[width=\textwidth]{graphics/corr-probs-cells-by-day.pdf} -\end{subfigure}% -\begin{subfigure}{.5\textwidth} -\centering -\includegraphics[width=\textwidth]{graphics/corr-probs-onions-by-day.pdf} -\end{subfigure}% -\caption{Correlation between the sum of all reports per day and the sum of -calculated probabilities for observing such activity per day.} -\label{fig:corr-probs-by-day} -\end{figure} - -\begin{figure} -\centering -\includegraphics[width=\textwidth]{graphics/extrapolated-network-totals.pdf} -\caption{Extrapolated network totals.} -\label{fig:extrapolated-network-totals} -\end{figure} - -\section{Estimating unique .onion addresses in the network} - -Estimating the number of .onion addresses in the network is slightly more -difficult. +While we can expect this method to work as described for extrapolating +cells on rendezvous circuits, we need to take another step for estimating +the number of unique .onion addresses in the network. The reason is that a .onion address is not only known to a single relay, but to a couple of relays, all of which include that .onion address in their statistics. @@ -369,49 +396,111 @@ statistics. However, for the subsequent analysis, we assume that neither of these cases affects results substantially.
-Similar to the analysis of hidden-service traffic, we want to compute the -fraction of hidden-service activity that a directory observes, where -hidden-service activity means publication of a hidden-service descriptor. -We define this fraction as the part of descriptor space that the directory -is responsible for, divided by \emph{three}, because each descriptor -published to this descriptor is also published to two other directories. -Note that, without dividing the fraction of a relay's descriptor space by -three, fractions would not add up to 100%. -Figure~\ref{fig:corr-probs-by-relay}~(right) shows the correlation of -reported .onion addresses and fraction of hidden-service activity. - -We can now extrapolate reported unique .onion addresses to network totals: -we sum up all reported statistics for a given day, divide by the fraction -of hidden-service activity that we received statistics for on that day, -and divide the result by twelve, following the assumption from above that -each service publishes its descriptor to twelve hidden-service -directories. -Figure~\ref{fig:corr-probs-by-day}~(right) and -\ref{fig:extrapolated-network-totals}~(top) show results. +We can now extrapolate reported unique .onion addresses to network totals. +Figure~\ref{fig:extrapolated} shows the distributions of extrapolated +network totals for all days in the analysis period.
-\section{Simulating extrapolation methods} +\begin{figure} +\centering +\begin{subfigure}{.5\textwidth} +\centering +\includegraphics[width=\textwidth]{graphics/extrapolated-cells.pdf} +\end{subfigure}% +\begin{subfigure}{.5\textwidth} +\centering +\includegraphics[width=\textwidth]{graphics/extrapolated-onions.pdf} +\end{subfigure}% +\caption{Distribution of extrapolated network totals for all days in the +analysis period, excluding lowest 1% and highest 1% for display +purposes.} +\label{fig:extrapolated} +\end{figure} + +\section{Selecting daily averages} +\label{sec:averages} + +As last step in the analysis, we aggregate extrapolated network totals for +a given day to obtain a daily average. +We considered a few options for calculating the average, each of which +having their advantages and drawbacks. + +We started looking at the \emph{weighted mean} of extrapolated network +totals, which is the mean of all values but which uses relay fractions as +weights, so that smaller relays cannot influence the overall result too +much. +This metric is equivalent to summing up all reported statistics and +dividing by the sum of network fractions of reporting relays. +The nice property of this metric is that it considers all statistics +reported by relays on a given day. +But this property is also the biggest disadvantage: single extreme +statistics can affect the overall result. +For example, relays that added very large noise values to their statistics +cannot be filtered out. +The same holds for relays that lie about their statistics. + +Another metric we looked at was the \emph{weighted median}, which also +takes into account that relays contribute different fractions to the +overall statistic. +While this metric is not affected by outliers, basing the daily statistics +on the data from a single relay doesn't seem very robust. + +In the end we decided to pick the \emph{weighted interquartile mean} as +metric for the daily average. +For this metric we order extrapolated network totals by their value, +discard the lower and the upper quartile by weight, and compute the +weighted mean of the remaining values. +This metric is robust against noisy statistics and lying relays and +considers half of the reported statistics. + +We further define a threshold of 1% for the total fraction of relays +reporting statistics. +If less than these 1% of relays report statistics on a given day, we +don't display that day in the end results. +Figure~\ref{fig:probs-by-day} shows total calculated network fractions per +day, and Figure~\ref{fig:extrapolated-network-totals} shows weighted +interquartile of the extrapolated network totals per day. + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{graphics/probs-by-day.pdf} +\caption{Total calculated network fractions per day.} +\label{fig:probs-by-day} +\end{figure} + +\begin{figure} +\centering +\includegraphics[width=\textwidth]{graphics/extrapolated-network-totals.pdf} +\caption{Daily averages of extrapolated network totals, calculated as +weighted interquartile means of extrapolations based on statistics by +single relays.} +\label{fig:extrapolated-network-totals} +\end{figure} + +\section*{Evaluation} + +We conducted two simulations to demonstrate that the extrapolation method +used here delivers approximately correct results and to gain some sense +of confidence in the results if only very few relays report +statistics.
-We conducted two simulations to demonstrate that the extrapolation methods -used here deliver approximately correct results. In the first simulation we created a network of 3000 middle relays with consensus weights following an exponential distribution. We then randomly selected relays as rendezvous points and assigned them, -in total, $10^9$ cells containing hidden-service traffic. -Each relay obfuscated its real cell count and reported obfuscated +in total, $10^9$ cells containing hidden-service traffic in chunks with +chunk sizes following an exponential distribution with $\lambda=0.0001$. +Each relay obfuscated its observed cell count and reported obfuscated statistics. Finally, we picked different fractions of reported statistics and extrapolated total cell counts in the network based on these. -Figure~\ref{fig:sim}~(left) shows the median and the 95%~confidence -interval for the extrapolation. -As long as we included at least 1% of relays by consensus weight in the -extrapolation, network totals did not deviate by more than 10% in -positive or negative direction. - We also conducted a second simulation with 3000 hidden-service directories -and 40000 hidden services. -Similar to the first simulation, Figure~\ref{fig:sim}~(right) shows that -our extrapolation is roughly accurate if we include statistics from at -least 1% of hidden-service directories. +and 40000 hidden services, each of them publishing descriptors to 12 +directories. + +Figure~\ref{fig:sim} shows the median and the range between 2.5th and +97.5th percentile for the extrapolation. +As long as we included at least 1% of relays by consensus weight in the +extrapolation, network totals did not deviate by more than 5% in positive +or negative direction.
\begin{figure} \centering @@ -423,26 +512,25 @@ least 1% of hidden-service directories. \centering \includegraphics[width=\textwidth]{graphics/sim-onions.pdf} \end{subfigure}% -\caption{Median and confidence interval of simulated extrapolations.} +\caption{Median and range from 2.5th to 97.5th percentile of simulated +extrapolations.} \label{fig:sim} \end{figure}
-\section{Open questions} +\section*{Conclusion}
-\begin{itemize} -\item Maybe we should switch back to the first extrapolation method, where -we're extrapolating from single observations, and then take the weighted -mean as best extrapolation result. -This has some advantages for handling outliers. -We'll want to run new simulations using this method. -\item The ribbon in Figure~\ref{fig:extrapolated-network-totals} implies a -confidence interval of some sort, but it's really only the standard error -of the local regression algorithm added by the graphing software. -We should instead calculate the confidence interval of our extrapolation, -similar to the simulation, and graph that. -One option might be to run simulations as part of the extrapolation -process. -\end{itemize} +In this report we described a method for extrapolating network totals from +the two recently added hidden-service statistics. +We showed that we can extrapolate network totals with reasonable accuracy +as long as at least 1% of relays report these statistics. + +\section*{Acknowledgements} + +Thanks to Aaron Johnson for providing invaluable feedback on extrapolating +statistics and on running simulations. +Thanks to the relay operators who enabled the new hidden-service +statistics on their relays and provided us with the data to write this +report.
\end{document}
tor-commits@lists.torproject.org