tor-commits
Threads by month
- ----- 2025 -----
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
February 2015
- 15 participants
- 2121 discussions

[tech-reports/master] Tweak extrapolation report before publication.
by karsten@torproject.org 09 Feb '15
by karsten@torproject.org 09 Feb '15
09 Feb '15
commit 75f5c0c13c2a858ad77309b4b468b39f1003721c
Author: Karsten Loesing <karsten.loesing(a)gmx.net>
Date: Sun Feb 8 19:19:10 2015 +0100
Tweak extrapolation report before publication.
---
.../extrapolating-hidserv-stats.tex | 482 ++++++++++++--------
1 file changed, 285 insertions(+), 197 deletions(-)
diff --git a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
index 053a081..…
[View More]bef857f 100644
--- a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
+++ b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
@@ -4,32 +4,69 @@
\usepackage{url}
\begin{document}
-\title{Extrapolating network totals from hidden-service statistics}
+\title{Extrapolating network totals\\from hidden-service statistics}
-\author{yet unnamed authors}
+\author{George Kadianakis and Karsten Loesing}
-\reportid{DRAFT}
-\date{to be published in January 2015}
+\contact{
+\href{mailto:asn@torproject.org}{asn@torproject.org},%
+\href{mailto:karsten@torproject.org}{karsten@torproject.org}}
+
+\reportid{2015-01-001}
+\date{January 31, 2015}
\maketitle
\begin{abstract}
Starting on December 19, 2014, we added two new statistics to the Tor
software that shall give us some first insights into hidden-service usage.
-The first is the number of .onion addresses observed by a hidden-service
-directory, and the second is the number of cells on rendezvous circuits
-observed by a rendezvous point.
+The first statistic is the number of cells on rendezvous circuits observed
+by a rendezvous point, and the second is the number of unique .onion
+addresses observed by a hidden-service directory.
Each relay that opts in to reporting these statistics publishes these two
numbers for 24-hour intervals of operation.
-In the following, we explain our approach for extrapolating network totals
+In the following, we describe an approach for extrapolating network totals
from these statistics.
-The goal is to learn how many unique .onion addresses exist in the network
-and what amount of traffic can be attributed to hidden-service usage.
-We show that we can extrapolate network totals from hidden-service
-statistics with reasonable accuracy as long as at least 1\% of relays
-report these statistics.
+The goal is to learn what amount of traffic can be attributed to
+hidden-service usage and how many unique .onion addresses exist in the
+network.
+We show that we can extrapolate network totals with reasonable accuracy as
+long as at least 1\% of relays report these statistics.
\end{abstract}
+\section*{Introduction}
+
+As of December 19, 2014, a small number of relays has started reporting
+statistics on hidden-service usage.
+Similar to other statistics, these statistics are based solely on what the
+reporting relay observes, without exchanging observations with other
+relays.
+In this report we describe a method for extrapolating these statistics to
+network totals.
+
+\begin{figure}
+\centering
+\includegraphics[width=.8\textwidth]{overview.pdf}
+\caption{Overview of the extrapolation method used for extrapolating
+network totals from hidden-service statistics.}
+\label{fig:overview}
+\end{figure}
+
+Figure~\ref{fig:overview} gives an overview of the extrapolation method
+where each step corresponds to a section in this report.
+In step~1 we parse the statistics that relays report in their extra-info
+descriptors.
+These statistics contain noise that was added by relays to obfuscate
+original observations, which we attempt to remove in step~2.
+In step~3 we process consensuses to derive network fractions of reporting
+relays, that is, what fraction of hidden-service usage a relay should have
+observed.
+We use these fractions to remove implausible statistics in step~4.
+Then we extrapolate network totals in step~5, where each extrapolation is
+based on the report from a single relay.
+Finally, in step~6 we select daily averages from these network totals
+which constitutes our result.
+
\section{Parsing reported statistics}
There are two types of documents produced by Tor relays that we consider
@@ -40,6 +77,19 @@ The second are consensuses that indicate what fraction of hidden-service
descriptors a hidden-service directory has observed and what fraction of
rendezvous circuits a relay has handled.
+We start by describing how we're parsing and processing hidden-service
+statistics from extra-info descriptors.
+Figure~\ref{fig:num-reported-stats} shows the number of statistics
+reported by day, and Figure~\ref{fig:extrainfo} shows a sample.
+The relevant parts for this analysis are:
+
+\begin{figure}[b]
+\centering
+\includegraphics[width=\textwidth]{graphics/num-reported-stats.pdf}
+\caption{Number of reported hidden-service statistics.}
+\label{fig:num-reported-stats}
+\end{figure}
+
% SAMPLE:
% fingerprint F528DED21EACD2E4E9301EC0AABD370EDCAD2C47
% stats_start 2014-12-31 16:17:33
@@ -49,7 +99,7 @@ rendezvous circuits a relay has handled.
% prob_rend_point 0.01509326
% frac_hsdesc 0.00069757
-\begin{figure}[b]
+\begin{figure}
\begin{verbatim}
extra-info ryroConoha F528DED21EACD2E4E9301EC0AABD370EDCAD2C47
[...]
@@ -62,12 +112,6 @@ descriptor.}
\label{fig:extrainfo}
\end{figure}
-We start by describing how we're parsing and processing hidden-service
-statistics from extra-info descriptors.
-Figure~\ref{fig:extrainfo} shows a sample of hidden-service statistics as
-contained in extra-info descriptors.
-The relevant parts for this analysis are:
-
\begin{itemize}
\item The \verb+extra-info+ line tells us which relay reported these
statistics, which we need to know to derive what fraction of
@@ -81,21 +125,14 @@ The value for \verb+bin_size+ is the bin size used for rounding up the
originally observed cell number, and the values for \verb+delta_f+ and
\verb+epsilon+ are inputs for the additive noise following a Laplace
distribution.
+For more information on how obfuscation is performed, please see Tor
+proposal 238.%
+\footnote{\url{https://gitweb.torproject.org/torspec.git/tree/proposals/238-hs-relay-stats.txt}}
\item And finally, the \verb+hidserv-dir-onions-seen+ line tells us the
number of .onion addresses that the relay observed in published
hidden-service descriptors in its role as hidden-service directory.
\end{itemize}
-\begin{figure}
-\centering
-\includegraphics[width=\textwidth]{graphics/num-reported-stats.pdf}
-\caption{Number of relays reporting hidden-service statistics.}
-\label{fig:num-reported-stats}
-\end{figure}
-
-Figure~\ref{fig:num-reported-stats} shows the number of statistics
-reported by day.
-
\section{Removing previously added noise}
When processing hidden-service statistics, we need to handle the fact that
@@ -112,24 +149,19 @@ Following these steps, the statistics reported in
Figure~\ref{fig:extrainfo} are processed to 152599040~cells and 84~.onion
addresses.
For the subsequent analysis we're also converting cells/day to
-bytes/second by multiplying cell numbers with 512~bytes/cell, dividing by
-86400~seconds/day, and dividing by 2 to account for the fact that
-statistics include cells in both incoming and outgoing direction.
-As a result we obtain 452~KB/s in the given sample.
+bits/second by multiplying cell numbers with 512~bytes/cell, multiplying
+with 8~bits/byte, dividing by 86400~seconds/day, and dividing by 2 to
+account for the fact that statistics include cells in both incoming and
+outgoing direction.
+As a result we obtain 3.6~Mbit/s in the given sample.
Figure~\ref{fig:stats-by-day} shows parsed values after removing
previously added noise.
Negative values are the result of relays adding negative
-Laplace-distributed noise values to very small observed values.
-We will describe an attempt to remove such values shortly.
-\footnote{A plausible step three in the previously described process could
-have been to round negative values to 0, because that represents the most
-likely rounded value before Laplace noise was added.
-However, removing negative values would add bias to the result, because it
-would only remove negative noise without being able to detect and remove
-positive noise.
-That's why we'd rather want to remove implausible values based on other
-criteria.}
+Laplace-distributed noise values to very small observed values, which we
+cannot remove easily.
+We will describe an attempt to remove such values in
+Sections~\ref{sec:implausible} and \ref{sec:averages}.
\begin{figure}
\centering
@@ -144,12 +176,12 @@ Laplace-distributed noise values to very small observed values.}
\section{Deriving network fractions from consensuses}
The second document type that we consider in our analysis are consensuses.
-Not all hidden-service directories observe the same number of
-hidden-service descriptors, and the probability of chosing a relay as
-rendezvous point is even less uniformly distributed.
-Fortunately, we can derive what fraction of descriptors a directory was
-responsible for and what fraction of rendezvous circuits a relay has
-handled.
+The probability of choosing a relay as rendezvous point varies a lot
+between relays, and not all hidden-service directories handle the same
+number of hidden-service descriptors.
+Fortunately, we can derive what fraction of rendezvous circuits a relay
+has handled and what fraction of descriptors a directory was responsible
+for.
\begin{figure}
\begin{verbatim}
@@ -179,11 +211,33 @@ directories preceding it.}
\end{figure}
Figure~\ref{fig:consensusentry} shows the consensus entry of the relay
-that submitted the sample hidden-service statistics mentioned above.
+that submitted the sample hidden-service statistics mentioned above, plus
+neighboring consensus entries.
+
+The first fraction that we compute is the probability of a relay to be
+selected as rendezvous point.
+Clients only select relays with the \verb+Fast+ flag and in some cases the
+\verb+Stable+ flag, and they weight relays differently based on their
+bandwidth and depending on whether they have the \verb+Exit+ and/or
+\verb+Guard+ flags.
+(Clients require relays to have the \verb+Stable+ flag if they attempt to
+establish a long-running connection, e.g., to a hidden SSH server, but in
+the following analysis, we assume that most clients establish connections
+that don't need to last for long, e.g., to hidden webservers.)
+Clients weight the bandwidth value contained in the consensus entry with
+the value of \verb+Wmg+, \verb+Wme+, \verb+Wmd+, or \verb+Wmm+, depending
+on whether the relay has only the \verb+Guard+ flag, only the \verb+Exit+
+flag, both such flags, or neither of them.
+
+Our sample relay, \texttt{ryroConoha}, has the \verb+Fast+ flag, a
+bandwidth value of 117000, and neither \verb+Guard+ nor \verb+Exit+ flag.
+Its probability for being selected as rendezvous point is calculated as
+$117000 \times 10000/10000$ divided by the sum of all such weights in the
+consensus, in this case $1.42\%$.
-The first fraction that we can derive from this entry is the fraction of
-descriptor space that this relay was responsible for in its role as
-hidden-service directory.
+The second fraction that we can derive from this consensus entry is the
+fraction of descriptor space that this relay was responsible for in its
+role as hidden-service directory.
The Tor Rendezvous
Specification\footnote{\url{https://gitweb.torproject.org/torspec.git/tree/…
contains the following definition that is relevant here:
@@ -195,68 +249,66 @@ three identity digests of HSDir relays following the descriptor ID in a
circular list.}
\end{quote}
+Based on the fraction of descriptor space that a directory was responsible
+for we can compute the fraction of descriptors that this directory has
+seen.
+Intuitively, one might think that these fractions are the same.
+However, this is not the case: each descriptor that is published to a
+directory is also published to two other directories.
+As a result we need to divide the fraction of descriptor space by
+\emph{three} to obtain the fraction of descriptors observed the directory.
+Note that, without dividing by three, fractions of all directories would
+not add up to 100\%.
+
In the sample consensus entry, we'd extract the base64-encoded fingerprint
of the statistics-reporting relay, \verb+9Sje0h6...+, and the fingerprint
of the hidden-service directory that precedes the relay by three
positions, \verb+9PodlaV...+, and compute what fraction of descriptor
-space that is, in this case $0.07\%$.
+space that is, in this case $0.071$.
+So, the relay has observed $0.024\%$ of descriptors in the network.
-The second fraction that we compute is the probability of a relay to be
-selected as rendezvous point.
-Clients select only relays with the \verb+Fast+ and in some cases the
-\verb+Stable+ flag, and they weigh relays differently based on their
-bandwidth and depending on whether they have the \verb+Exit+ and/or
-\verb+Guard+ flags.
-(Clients further require relays to have the \verb+Stable+ flag if they
-attempt to establish a long-running connection, e.g., to a hidden SSH
-server, but in the following analysis, we assume that most clients
-establish connections that don't need to last for long, e.g., to a hidden
-webserver.)
-Clients weigh the bandwidth value contained in the consensus with the
-value of \verb+Wmg+, \verb+Wme+, \verb+Wmd+, or \verb+Wmm+, depending on
-whether the relay has only the \verb+Guard+ flag, only the \verb+Exit+
-flag, both such flags, or neither of them.
-
-Our sample relay has the \verb+Fast+ flag, a bandwidth value of 117,000,
-and neither \verb+Guard+ nor \verb+Exit+ flag.
-Its probability for being selected as rendezvous point is calculated as
-$117000 \times 10000/10000$ divided by the sum of all such weights in the
-consensus, in this case $1.42\%$
+% 9Sje0h6... -> F528DED2 -> 4113096402
+% 9PodlaV... -> F4FA1D95 -> - 4110032277
+% = 3064125
+% / 4294967296
+% = 0.00071342
+% / 3
+% = 0.00023781
\begin{figure}
\centering
\includegraphics[width=\textwidth]{graphics/probs-by-relay.pdf}
-\caption{Calculated probabilities for observing hidden-service activity.}
+\caption{Calculated network fractions of relays observing hidden-service activity.}
\label{fig:probs-by-relay}
\end{figure}
-Figure~\ref{fig:probs-by-relay} shows calculated probabilities of
-observing hidden-service activities of relays reporting hidden-service
+Figure~\ref{fig:probs-by-relay} shows calculated fractions of
+hidden-service activity observed by relays that report hidden-service
statistics.
-That figure shows that most relays have roughly the same (small)
-probability for observing a hidden-service descriptor with only few
-outliers.
-The probability for being selected as rendezvous point is much smaller for
-most relays, with only the outliers having a realistic chance of being
+The probability for being selected as rendezvous point is very small for
+most relays, with only very few relays having a realistic chance of being
selected.
+In comparison, most relays have roughly the same (small) probability for
+observing a hidden-service descriptor with only few exceptions.
\section{Removing implausible statistics}
+\label{sec:implausible}
A relay that opts in to gathering hidden-service statistics reports them
even if it couldn't plausibly have observed them.
-In particular, a relay that did not have the \verb+HSDir+ flag could not
-have observed a single .onion address, and a relay with the \verb+Exit+
-flag could not have been selected as rendezvous point as long as
-\verb+Wmd+ and \verb+Wme+ are zero.
+In particular, a relay with the \verb+Exit+ flag could not have been
+selected as rendezvous point as long as \verb+Wmd+ and \verb+Wme+ are
+zero, and a relay that did not have the \verb+HSDir+ flag could not have
+observed a single .onion address.
+
Figure~\ref{fig:zero} shows distributions of reported statistics of relays
-with calculated probabilities of exactly zero.
+with calculated fractions of exactly zero.
These reported values approximately follow the plotted Laplace
distributions with $\mu=0$ and $b=2048/0.3$ or $b=8/0.3$ as defined for
-the respective statistics.
-We can assume that the vast majority of these reported values are just
-noise.
-In the following analysis, we exclude relays with calculated probabilities
-of exactly 0.
+the respective statistics, which gives us confidence that the vast
+majority of these reported values are just noise.
+In the following analysis, we exclude relays with calculated fractions of
+exactly 0.
\begin{figure}
\centering
@@ -271,38 +323,36 @@ of exactly 0.
\caption{Statistics reported by relays with calculated probabilities of
observing these statistics of zero.
The blue lines show Laplace distributions with $\mu=0$ and $b=2048/0.3$ or
-$b=8/0.3$ as defined for the respective statistics.}
+$b=8/0.3$ as defined for the respective statistics.
+The lowest 1\% and highest 1\% of values have been removed for display
+purposes.}
\label{fig:zero}
\end{figure}
-Another cause for implausible statistics could be very large positive or
-negative noise added by the Laplace distribution.
+Another kind of implausible statistics are very high or very low absolute
+reported numbers.
+These numbers could be the result of adding very large positive or
+negative numbers from the Laplace distribution.
In theory, a single relay, with non-zero probability of observing
hidden-service activity, could have added noise from $-\infty$ to
-$\infty$, which could derail statistics for the entire day.
-These extreme values could be removed by calculating an interval of
-plausible values for each relay, based on the probability of observing
-hidden-service activity, and discarding values outside that interval.
-Another option for avoiding these extreme values would be to cap noise
-added at relays by adapting concepts from ($\epsilon,\delta)$-differential
-privacy to the noise-generating code used by relays.%
-\footnote{Whether or not either of these approaches is necessary depends
-on whether or not our extrapolation method can handle outliers.}
-
-\section{Extrapolating hidden-service traffic in the network}
-
-We start the extrapolation of network totals with reported cells on
-rendezvous circuits.
-We do this by summing up all observations per day and dividing by the
-total fraction of observations made by all reporting relays.
-The underlying assumption of this approach is that reported statistics
-grow linearly with calculated fractions.
-Figure~\ref{fig:corr-probs-by-relay}~(left) shows that this is roughly
-the case.
-Figure~\ref{fig:corr-probs-by-day}~(left) shows total reported
-statistics and calculated probabilities per day, and
-Figure~\ref{fig:extrapolated-network-totals}~(bottom) shows extrapolated
-network totals based on daily sums.
+$\infty$.
+Further, relays could lie about hidden-service usage and report very low
+or very high absolute values in their statistics in an attempt to derail
+statistics.
+It seems difficult to define a range of plausible values, and such a range
+might change over time.
+It seems easier to handle these extreme values by treating a certain
+fraction of extrapolated statistics as outliers, which is what we're going
+to do in Section~\ref{sec:averages}.
+
+\section{Extrapolating network totals}
+
+We are now ready to extrapolate network totals from reported statistics.
+We do this by dividing reported statistics by the calculated fraction of
+observations made by the reporting relay.
+The underlying assumption is that statistics grow linearly with calculated
+fractions.
+Figure~\ref{fig:corr-probs-by-relay} shows that this is roughly the case.
\begin{figure}
\centering
@@ -319,32 +369,9 @@ calculated probability for observing such activity.}
\label{fig:corr-probs-by-relay}
\end{figure}
-\begin{figure}
-\centering
-\begin{subfigure}{.5\textwidth}
-\centering
-\includegraphics[width=\textwidth]{graphics/corr-probs-cells-by-day.pdf}
-\end{subfigure}%
-\begin{subfigure}{.5\textwidth}
-\centering
-\includegraphics[width=\textwidth]{graphics/corr-probs-onions-by-day.pdf}
-\end{subfigure}%
-\caption{Correlation between the sum of all reports per day and the sum of
-calculated probabilities for observing such activity per day.}
-\label{fig:corr-probs-by-day}
-\end{figure}
-
-\begin{figure}
-\centering
-\includegraphics[width=\textwidth]{graphics/extrapolated-network-totals.pdf}
-\caption{Extrapolated network totals.}
-\label{fig:extrapolated-network-totals}
-\end{figure}
-
-\section{Estimating unique .onion addresses in the network}
-
-Estimating the number of .onion addresses in the network is slightly more
-difficult.
+While we can expect this method to work as described for extrapolating
+cells on rendezvous circuits, we need to take another step for estimating
+the number of unique .onion addresses in the network.
The reason is that a .onion address is not only known to a single relay,
but to a couple of relays, all of which include that .onion address in
their statistics.
@@ -369,49 +396,111 @@ statistics.
However, for the subsequent analysis, we assume that neither of these
cases affects results substantially.
-Similar to the analysis of hidden-service traffic, we want to compute the
-fraction of hidden-service activity that a directory observes, where
-hidden-service activity means publication of a hidden-service descriptor.
-We define this fraction as the part of descriptor space that the directory
-is responsible for, divided by \emph{three}, because each descriptor
-published to this descriptor is also published to two other directories.
-Note that, without dividing the fraction of a relay's descriptor space by
-three, fractions would not add up to 100\%.
-Figure~\ref{fig:corr-probs-by-relay}~(right) shows the correlation of
-reported .onion addresses and fraction of hidden-service activity.
-
-We can now extrapolate reported unique .onion addresses to network totals:
-we sum up all reported statistics for a given day, divide by the fraction
-of hidden-service activity that we received statistics for on that day,
-and divide the result by twelve, following the assumption from above that
-each service publishes its descriptor to twelve hidden-service
-directories.
-Figure~\ref{fig:corr-probs-by-day}~(right) and
-\ref{fig:extrapolated-network-totals}~(top) show results.
+We can now extrapolate reported unique .onion addresses to network totals.
+Figure~\ref{fig:extrapolated} shows the distributions of extrapolated
+network totals for all days in the analysis period.
-\section{Simulating extrapolation methods}
+\begin{figure}
+\centering
+\begin{subfigure}{.5\textwidth}
+\centering
+\includegraphics[width=\textwidth]{graphics/extrapolated-cells.pdf}
+\end{subfigure}%
+\begin{subfigure}{.5\textwidth}
+\centering
+\includegraphics[width=\textwidth]{graphics/extrapolated-onions.pdf}
+\end{subfigure}%
+\caption{Distribution of extrapolated network totals for all days in the
+analysis period, excluding lowest 1\% and highest 1\% for display
+purposes.}
+\label{fig:extrapolated}
+\end{figure}
+
+\section{Selecting daily averages}
+\label{sec:averages}
+
+As last step in the analysis, we aggregate extrapolated network totals for
+a given day to obtain a daily average.
+We considered a few options for calculating the average, each of which
+having their advantages and drawbacks.
+
+We started looking at the \emph{weighted mean} of extrapolated network
+totals, which is the mean of all values but which uses relay fractions as
+weights, so that smaller relays cannot influence the overall result too
+much.
+This metric is equivalent to summing up all reported statistics and
+dividing by the sum of network fractions of reporting relays.
+The nice property of this metric is that it considers all statistics
+reported by relays on a given day.
+But this property is also the biggest disadvantage: single extreme
+statistics can affect the overall result.
+For example, relays that added very large noise values to their statistics
+cannot be filtered out.
+The same holds for relays that lie about their statistics.
+
+Another metric we looked at was the \emph{weighted median}, which also
+takes into account that relays contribute different fractions to the
+overall statistic.
+While this metric is not affected by outliers, basing the daily statistics
+on the data from a single relay doesn't seem very robust.
+
+In the end we decided to pick the \emph{weighted interquartile mean} as
+metric for the daily average.
+For this metric we order extrapolated network totals by their value,
+discard the lower and the upper quartile by weight, and compute the
+weighted mean of the remaining values.
+This metric is robust against noisy statistics and lying relays and
+considers half of the reported statistics.
+
+We further define a threshold of 1\% for the total fraction of relays
+reporting statistics.
+If less than these 1\% of relays report statistics on a given day, we
+don't display that day in the end results.
+Figure~\ref{fig:probs-by-day} shows total calculated network fractions per
+day, and Figure~\ref{fig:extrapolated-network-totals} shows weighted
+interquartile of the extrapolated network totals per day.
+
+\begin{figure}
+\centering
+\includegraphics[width=\textwidth]{graphics/probs-by-day.pdf}
+\caption{Total calculated network fractions per day.}
+\label{fig:probs-by-day}
+\end{figure}
+
+\begin{figure}
+\centering
+\includegraphics[width=\textwidth]{graphics/extrapolated-network-totals.pdf}
+\caption{Daily averages of extrapolated network totals, calculated as
+weighted interquartile means of extrapolations based on statistics by
+single relays.}
+\label{fig:extrapolated-network-totals}
+\end{figure}
+
+\section*{Evaluation}
+
+We conducted two simulations to demonstrate that the extrapolation method
+used here delivers approximately correct results and to gain some sense
+of confidence in the results if only very few relays report
+statistics.
-We conducted two simulations to demonstrate that the extrapolation methods
-used here deliver approximately correct results.
In the first simulation we created a network of 3000 middle relays with
consensus weights following an exponential distribution.
We then randomly selected relays as rendezvous points and assigned them,
-in total, $10^9$ cells containing hidden-service traffic.
-Each relay obfuscated its real cell count and reported obfuscated
+in total, $10^9$ cells containing hidden-service traffic in chunks with
+chunk sizes following an exponential distribution with $\lambda=0.0001$.
+Each relay obfuscated its observed cell count and reported obfuscated
statistics.
Finally, we picked different fractions of reported statistics and
extrapolated total cell counts in the network based on these.
-Figure~\ref{fig:sim}~(left) shows the median and the 95\%~confidence
-interval for the extrapolation.
-As long as we included at least 1\% of relays by consensus weight in the
-extrapolation, network totals did not deviate by more than 10\% in
-positive or negative direction.
-
We also conducted a second simulation with 3000 hidden-service directories
-and 40000 hidden services.
-Similar to the first simulation, Figure~\ref{fig:sim}~(right) shows that
-our extrapolation is roughly accurate if we include statistics from at
-least 1\% of hidden-service directories.
+and 40000 hidden services, each of them publishing descriptors to 12
+directories.
+
+Figure~\ref{fig:sim} shows the median and the range between 2.5th and
+97.5th percentile for the extrapolation.
+As long as we included at least 1\% of relays by consensus weight in the
+extrapolation, network totals did not deviate by more than 5\% in positive
+or negative direction.
\begin{figure}
\centering
@@ -423,26 +512,25 @@ least 1\% of hidden-service directories.
\centering
\includegraphics[width=\textwidth]{graphics/sim-onions.pdf}
\end{subfigure}%
-\caption{Median and confidence interval of simulated extrapolations.}
+\caption{Median and range from 2.5th to 97.5th percentile of simulated
+extrapolations.}
\label{fig:sim}
\end{figure}
-\section{Open questions}
+\section*{Conclusion}
-\begin{itemize}
-\item Maybe we should switch back to the first extrapolation method, where
-we're extrapolating from single observations, and then take the weighted
-mean as best extrapolation result.
-This has some advantages for handling outliers.
-We'll want to run new simulations using this method.
-\item The ribbon in Figure~\ref{fig:extrapolated-network-totals} implies a
-confidence interval of some sort, but it's really only the standard error
-of the local regression algorithm added by the graphing software.
-We should instead calculate the confidence interval of our extrapolation,
-similar to the simulation, and graph that.
-One option might be to run simulations as part of the extrapolation
-process.
-\end{itemize}
+In this report we described a method for extrapolating network totals from
+the two recently added hidden-service statistics.
+We showed that we can extrapolate network totals with reasonable accuracy
+as long as at least 1\% of relays report these statistics.
+
+\section*{Acknowledgements}
+
+Thanks to Aaron Johnson for providing invaluable feedback on extrapolating
+statistics and on running simulations.
+Thanks to the relay operators who enabled the new hidden-service
+statistics on their relays and provided us with the data to write this
+report.
\end{document}
[View Less]
1
0

[tech-reports/master] Add graphics, fix an error in extrapolation report.
by karsten@torproject.org 09 Feb '15
by karsten@torproject.org 09 Feb '15
09 Feb '15
commit 85c141e98e828602bc853e2d96d7db2ad2a85b6b
Author: Karsten Loesing <karsten.loesing(a)gmx.net>
Date: Mon Feb 9 18:49:49 2015 +0100
Add graphics, fix an error in extrapolation report.
---
2015/extrapolating-hidserv-stats/.gitignore | 1 +
.../extrapolating-hidserv-stats.tex | 16 ++++++++++------
.../graphics/corr-probs-cells-by-relay.pdf | Bin 0 -> 72487 bytes
.../graphics/corr-probs-onions-by-relay.pdf | Bin 0 -…
[View More]> 76829 bytes
.../graphics/extrapolated-cells.pdf | Bin 0 -> 13117 bytes
.../graphics/extrapolated-network-totals.pdf | Bin 0 -> 5247 bytes
.../graphics/extrapolated-onions.pdf | Bin 0 -> 12040 bytes
.../graphics/num-reported-stats.pdf | Bin 0 -> 4804 bytes
2015/extrapolating-hidserv-stats/graphics/overview.odg | Bin 0 -> 18134 bytes
2015/extrapolating-hidserv-stats/graphics/overview.pdf | Bin 0 -> 20330 bytes
.../graphics/probs-by-day.pdf | Bin 0 -> 5568 bytes
.../graphics/probs-by-relay.pdf | Bin 0 -> 11110 bytes
.../extrapolating-hidserv-stats/graphics/sim-cells.pdf | Bin 0 -> 5181 bytes
.../graphics/sim-onions.pdf | Bin 0 -> 5141 bytes
.../graphics/stats-by-day.pdf | Bin 0 -> 11217 bytes
.../graphics/zero-prob-cells.pdf | Bin 0 -> 7169 bytes
.../graphics/zero-prob-onions.pdf | Bin 0 -> 7016 bytes
17 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/2015/extrapolating-hidserv-stats/.gitignore b/2015/extrapolating-hidserv-stats/.gitignore
index 75d2c50..8ae5f9d 100644
--- a/2015/extrapolating-hidserv-stats/.gitignore
+++ b/2015/extrapolating-hidserv-stats/.gitignore
@@ -1,2 +1,3 @@
extrapolating-hidserv-stats.pdf
+extrapolating-hidserv-stats-2015-01-31.pdf
diff --git a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
index bef857f..126c79c 100644
--- a/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
+++ b/2015/extrapolating-hidserv-stats/extrapolating-hidserv-stats.tex
@@ -46,7 +46,7 @@ network totals.
\begin{figure}
\centering
-\includegraphics[width=.8\textwidth]{overview.pdf}
+\includegraphics[width=.8\textwidth]{graphics/overview.pdf}
\caption{Overview of the extrapolation method used for extrapolating
network totals from hidden-service statistics.}
\label{fig:overview}
@@ -150,10 +150,14 @@ Figure~\ref{fig:extrainfo} are processed to 152599040~cells and 84~.onion
addresses.
For the subsequent analysis we're also converting cells/day to
bits/second by multiplying cell numbers with 512~bytes/cell, multiplying
-with 8~bits/byte, dividing by 86400~seconds/day, and dividing by 2 to
-account for the fact that statistics include cells in both incoming and
-outgoing direction.
-As a result we obtain 3.6~Mbit/s in the given sample.
+with 8~bits/byte, and dividing by 86400~seconds/day.%
+\footnote{The originally published report had another quotient of 2 in
+this calculation, based on the false assumption that we would otherwise
+double-count cells going in incoming and outgoing direction.
+But this is not the case: we're counting each cell going in either
+incoming or outgoing direction only once when relaying it.
+All subsequent results have been fixed accordingly.}
+As a result we obtain 7.2~Mbit/s in the given sample.
Figure~\ref{fig:stats-by-day} shows parsed values after removing
previously added noise.
@@ -264,7 +268,7 @@ In the sample consensus entry, we'd extract the base64-encoded fingerprint
of the statistics-reporting relay, \verb+9Sje0h6...+, and the fingerprint
of the hidden-service directory that precedes the relay by three
positions, \verb+9PodlaV...+, and compute what fraction of descriptor
-space that is, in this case $0.071$.
+space that is, in this case $0.071\%$.
So, the relay has observed $0.024\%$ of descriptors in the network.
% 9Sje0h6... -> F528DED2 -> 4113096402
diff --git a/2015/extrapolating-hidserv-stats/graphics/corr-probs-cells-by-relay.pdf b/2015/extrapolating-hidserv-stats/graphics/corr-probs-cells-by-relay.pdf
new file mode 100644
index 0000000..c8106f5
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/corr-probs-cells-by-relay.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/corr-probs-onions-by-relay.pdf b/2015/extrapolating-hidserv-stats/graphics/corr-probs-onions-by-relay.pdf
new file mode 100644
index 0000000..d4d2756
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/corr-probs-onions-by-relay.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/extrapolated-cells.pdf b/2015/extrapolating-hidserv-stats/graphics/extrapolated-cells.pdf
new file mode 100644
index 0000000..fabacc8
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/extrapolated-cells.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/extrapolated-network-totals.pdf b/2015/extrapolating-hidserv-stats/graphics/extrapolated-network-totals.pdf
new file mode 100644
index 0000000..75e4dfb
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/extrapolated-network-totals.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/extrapolated-onions.pdf b/2015/extrapolating-hidserv-stats/graphics/extrapolated-onions.pdf
new file mode 100644
index 0000000..7ea7c51
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/extrapolated-onions.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/num-reported-stats.pdf b/2015/extrapolating-hidserv-stats/graphics/num-reported-stats.pdf
new file mode 100644
index 0000000..96c2753
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/num-reported-stats.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/overview.odg b/2015/extrapolating-hidserv-stats/graphics/overview.odg
new file mode 100644
index 0000000..46ee61a
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/overview.odg differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/overview.pdf b/2015/extrapolating-hidserv-stats/graphics/overview.pdf
new file mode 100644
index 0000000..74bca3e
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/overview.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/probs-by-day.pdf b/2015/extrapolating-hidserv-stats/graphics/probs-by-day.pdf
new file mode 100644
index 0000000..47b1f6a
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/probs-by-day.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/probs-by-relay.pdf b/2015/extrapolating-hidserv-stats/graphics/probs-by-relay.pdf
new file mode 100644
index 0000000..4cb3a87
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/probs-by-relay.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/sim-cells.pdf b/2015/extrapolating-hidserv-stats/graphics/sim-cells.pdf
new file mode 100644
index 0000000..7205e72
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/sim-cells.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/sim-onions.pdf b/2015/extrapolating-hidserv-stats/graphics/sim-onions.pdf
new file mode 100644
index 0000000..ad12430
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/sim-onions.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/stats-by-day.pdf b/2015/extrapolating-hidserv-stats/graphics/stats-by-day.pdf
new file mode 100644
index 0000000..6fb99c8
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/stats-by-day.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/zero-prob-cells.pdf b/2015/extrapolating-hidserv-stats/graphics/zero-prob-cells.pdf
new file mode 100644
index 0000000..90d77d9
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/zero-prob-cells.pdf differ
diff --git a/2015/extrapolating-hidserv-stats/graphics/zero-prob-onions.pdf b/2015/extrapolating-hidserv-stats/graphics/zero-prob-onions.pdf
new file mode 100644
index 0000000..8419242
Binary files /dev/null and b/2015/extrapolating-hidserv-stats/graphics/zero-prob-onions.pdf differ
[View Less]
1
0

09 Feb '15
commit b490d1dbd7f78e66316f5214f0990f553311de62
Author: Translation commit bot <translation(a)torproject.org>
Date: Mon Feb 9 17:45:05 2015 +0000
Update translations for gettor
---
ta/gettor.po | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/ta/gettor.po b/ta/gettor.po
index dbd9af2..2ed35db 100644
--- a/ta/gettor.po
+++ b/ta/gettor.po
@@ -10,7 +10,7 @@ msgstr ""
"Project-Id-Version: The Tor Project\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date:…
[View More] 2013-01-19 13:40+0100\n"
-"PO-Revision-Date: 2015-02-09 17:15+0000\n"
+"PO-Revision-Date: 2015-02-09 17:31+0000\n"
"Last-Translator: git12a <git12(a)openmailbox.org>\n"
"Language-Team: Tamil (http://www.transifex.com/projects/p/torproject/language/ta/)\n"
"MIME-Version: 1.0\n"
@@ -352,14 +352,14 @@ msgid ""
"The Tor Browser Bundle package for Windows operating systems. If you're \n"
"running some version of Windows, like Windows XP, Windows Vista or \n"
"Windows 7, this is the package you should get."
-msgstr ""
+msgstr "windows:\nவிண்டோஸ் இயங்குமென்பொருளுக்கான Tor Browser கட்டு. நீங்கள் \nவிண்டோஸ் XP, விண்டோஸ் Vista, அல்லது விண்டோஸ் 7, உபயோகிக்கிரீர்கள் என்றால், \nஇது நீங்கள் பெற வேண்டும் தொகுப்பு ஆகும்."
#: lib/gettor/i18n.py:227
msgid ""
"macos-i386:\n"
"The Tor Browser Bundle package for OS X, Intel CPU architecture. In \n"
"general, newer Mac hardware will require you to use this package."
-msgstr ""
+msgstr "macos-i386:\nOS X, Intel CPU கட்டமைப்புக்கான Tor Browser கட்டு. பொதுவாக, \nபுதிய Mac வன்பொருளில் இந்த தொகுப்பு பயன்படுத்த வேண்டியிருக்கும்."
#: lib/gettor/i18n.py:231
msgid ""
@@ -481,7 +481,7 @@ msgstr "இது நீங்கள் பயன்படுத்தும்
#: lib/gettor/i18n.py:299
msgid "How do I extract the file(s) you sent me?"
-msgstr "நீங்கள் அனுப்பிய கோப்புகளை விரிவாக்குவது எப்படி?"
+msgstr "நீங்கள் அனுப்பிய கோப்பு(களை) விரிவாக்குவது எப்படி?"
#: lib/gettor/i18n.py:301
msgid "QUESTION:"
[View Less]
1
0

[metrics-tasks/master] Update hidserv-stats extrapolation code (#13192).
by karsten@torproject.org 09 Feb '15
by karsten@torproject.org 09 Feb '15
09 Feb '15
commit 3c90c181a13dcc12c69e8e8fa013948b1a6405e2
Author: Karsten Loesing <karsten.loesing(a)gmx.net>
Date: Mon Feb 9 18:25:38 2015 +0100
Update hidserv-stats extrapolation code (#13192).
---
task-13192/.gitignore | 1 +
task-13192/src/R/blog.R | 47 ++
task-13192/src/R/plot.R | 214 +++----
task-13192/src/java/Aggregate.java | 129 ++++
task-13192/src/java/Extrapolate.java |…
[View More] 424 +++++++++++++
task-13192/src/java/ExtrapolateHidServStats.java | 722 ----------------------
task-13192/src/java/Simulate.java | 357 +++++++++++
7 files changed, 1066 insertions(+), 828 deletions(-)
diff --git a/task-13192/.gitignore b/task-13192/.gitignore
index 7e8bf3b..89d161b 100644
--- a/task-13192/.gitignore
+++ b/task-13192/.gitignore
@@ -4,4 +4,5 @@ in/
src/bash/
src/bin/
out/
+Rplots.pdf
diff --git a/task-13192/src/R/blog.R b/task-13192/src/R/blog.R
new file mode 100644
index 0000000..82a07e9
--- /dev/null
+++ b/task-13192/src/R/blog.R
@@ -0,0 +1,47 @@
+# Load required libraries.
+require(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
+require(scales, warn.conflicts = FALSE, quietly = TRUE)
+require(reshape, warn.conflicts = FALSE, quietly = TRUE)
+require(splines, warn.conflicts = FALSE, quietly = TRUE)
+require(Hmisc, warn.conflicts = FALSE, quietly = TRUE)
+
+# Read .csv files written by Java.
+h <- read.csv("out/csv/hidserv-stats.csv", stringsAsFactors = FALSE)
+
+# Create directories for graphs.
+dir.create(file.path("out", "graphs", "blog"), showWarnings = FALSE,
+ recursive = TRUE)
+
+# Cut off last two days, because stats might be incomplete for those.
+h <- h[as.Date(h$stats_end) < max(as.Date(h$stats_end) - 1), ]
+
+# Graph the number of reported stats by day.
+h7 <- data.frame(date = as.Date(h$stats_end), reports = 1)
+ggplot(h7, aes(x = date)) +
+geom_bar(colour = 'lightgray', width = .7, binwidth = 1) +
+scale_x_date("") +
+scale_y_continuous("")
+ggsave("out/graphs/blog/num-reported-stats.png", width = 10, height = 3,
+ dpi = 100)
+
+e <- read.csv("out/csv/hidserv-stats-extrapolated.csv",
+ stringsAsFactors = FALSE)
+e <- melt(e, by = c("date", "type"))
+e <- e[e$variable == "wiqm", ]
+e <- rbind(e, data.frame(date = NA, type = c("onions", "cells"),
+ variable = NA, value = 0))
+
+ggplot(e[e$type == "cells", ], aes(x = as.Date(date), y = value)) +
+geom_line() +
+scale_x_date(name = "") +
+scale_y_continuous(name = "")
+ggsave("out/graphs/blog/extrapolated-cells.png", width = 10,
+ height = 3, dpi = 100)
+
+ggplot(e[e$type != "cells", ], aes(x = as.Date(date), y = value)) +
+geom_line() +
+scale_x_date(name = "") +
+scale_y_continuous(name = "")
+ggsave("out/graphs/blog/extrapolated-onions.png", width = 10,
+ height = 3, dpi = 100)
+
diff --git a/task-13192/src/R/plot.R b/task-13192/src/R/plot.R
index 991928b..552b810 100644
--- a/task-13192/src/R/plot.R
+++ b/task-13192/src/R/plot.R
@@ -5,17 +5,12 @@ require(reshape, warn.conflicts = FALSE, quietly = TRUE)
require(splines, warn.conflicts = FALSE, quietly = TRUE)
require(Hmisc, warn.conflicts = FALSE, quietly = TRUE)
-# Avoid scientific notation.
-options(scipen = 15)
-
# Read .csv file written by Java.
h <- read.csv("out/csv/hidserv-stats.csv", stringsAsFactors = FALSE)
# Create directories for graphs.
dir.create(file.path("out", "graphs", "report"), showWarnings = FALSE,
recursive = TRUE)
-dir.create(file.path("out", "graphs", "slides"), showWarnings = FALSE,
- recursive = TRUE)
# Cut off last two days, because stats might be incomplete for those.
h <- h[as.Date(h$stats_end) < max(as.Date(h$stats_end) - 1), ]
@@ -28,17 +23,15 @@ scale_x_date("") +
scale_y_continuous("")
ggsave("out/graphs/report/num-reported-stats.pdf", width = 10, height = 3,
dpi = 100)
-ggsave("out/graphs/slides/hidserv-12.png", width = 8, height = 3,
- dpi = 100)
# Graph distributions of reported values by day.
h1 <- data.frame(date = as.Date(h$stats_end),
- traffic = h$hidserv_rend_relayed_cells * 512 / (86400 * 1000 * 1000),
+ traffic = h$hidserv_rend_relayed_cells * 512 * 8 / (86400 * 1e6),
services = h$hidserv_dir_onions_seen)
h1 <- melt(h1, "date")
h1 <- data.frame(date = h1$date,
- variable = ifelse(h1$variable == "traffic", "traffic in MB/s",
- ".onion addresses"), value = h1$value)
+ variable = ifelse(h1$variable == "traffic", "traffic in Mbit/s",
+ "unique .onion addresses"), value = h1$value)
ggplot(h1, aes(x = date, y = value, group = date)) +
geom_boxplot() +
facet_grid(variable ~ ., scales = "free_y") +
@@ -49,23 +42,22 @@ ggsave("out/graphs/report/stats-by-day.pdf", width = 10, height = 5,
# Graph distributions of calculated fractions by day.
h2 <- data.frame(date = as.Date(h$stats_end),
- prob_rend_point = h$prob_rend_point,
- x_frac_hsdesc = h$frac_hsdesc / 3.0)
+ frac_rend_relayed_cells = h$frac_rend_relayed_cells, x_frac_dir_onions_seen = h$frac_dir_onions_seen)
h2 <- melt(h2, "date")
h2 <- data.frame(date = h2$date,
- variable = ifelse(h2$variable == "prob_rend_point",
- "selected as rendezvous point", "responsible for a descriptor"),
+ variable = ifelse(h2$variable == "frac_rend_relayed_cells",
+ "cells on rendezvous circuits", "hidden-service descriptors"),
value = h2$value)
ggplot(h2, aes(x = date, y = value, group = date)) +
geom_boxplot() +
facet_grid(variable ~ ., scales = "free_y") +
scale_x_date("") +
-scale_y_continuous("Calculated probabilities\n", labels = percent)
+scale_y_continuous("Calculated network fractions\n", labels = percent)
ggsave("out/graphs/report/probs-by-relay.pdf", width = 10, height = 5,
dpi = 100)
# Graph ECDF of cells reported by relays with rend point probability of 0.
-h8 <- h[h$prob_rend_point == 0,
+h8 <- h[h$frac_rend_relayed_cells == 0,
"hidserv_rend_relayed_cells" ]
h8 <- sort(h8)
h8 <- data.frame(x = h8, y = (1:length(h8)) / length(h8))
@@ -75,13 +67,14 @@ laplace_cells <- function(x) {
ggplot(h8, aes(x = x, y = y)) +
geom_line() +
stat_function(fun = laplace_cells, colour = "blue") +
-scale_x_continuous("\nReported cells on rendezvous circuits") +
-scale_y_continuous("Cumulative probability\n")
+scale_x_continuous("\nReported cells on rendezvous circuits",
+ limits = c(max(h8[h8$y < 0.01, "x"]), min(h8[h8$y > 0.99, "x"]))) +
+scale_y_continuous("Cumulative probability\n", labels = percent)
ggsave("out/graphs/report/zero-prob-cells.pdf", width = 5, height = 3,
dpi = 100)
# Graph ECDF of .onions reported by relays with HSDir probability of 0.
-h9 <- h[h$frac_hsdesc == 0, "hidserv_dir_onions_seen"]
+h9 <- h[h$frac_dir_onions_seen == 0, "hidserv_dir_onions_seen"]
h9 <- sort(h9)
h9 <- data.frame(x = h9, y = (1:length(h9)) / length(h9))
laplace_onions <- function(x) {
@@ -90,51 +83,54 @@ laplace_onions <- function(x) {
ggplot(h9, aes(x = x, y = y)) +
geom_line() +
stat_function(fun = laplace_onions, colour = "blue") +
-scale_x_continuous("\nReported .onion addresses") +
-scale_y_continuous("Cumulative probability\n")
+scale_x_continuous("\nReported .onion addresses",
+ limits = c(max(h9[h9$y < 0.01, "x"]), min(h9[h9$y > 0.99, "x"]))) +
+scale_y_continuous("Cumulative probability\n", labels = percent)
ggsave("out/graphs/report/zero-prob-onions.pdf", width = 5, height = 3,
dpi = 100)
# Graph correlation between reports and fractions per relay.
h3 <- rbind(
- data.frame(x = h$frac_hsdesc / 3.0,
- y = ifelse(h$frac_hsdesc == 0, NA, h$hidserv_dir_onions_seen),
+ data.frame(x = h$frac_dir_onions_seen,
+ y = ifelse(h$frac_dir_onions_seen == 0, NA, h$hidserv_dir_onions_seen),
facet = ".onion addresses"),
- data.frame(x = h$prob_rend_point,
- y = ifelse(h$prob_rend_point == 0, NA,
- h$hidserv_rend_relayed_cells * 512 / (86400 * 1000)),
- facet = "traffic in kB/s"))
+ data.frame(x = h$frac_rend_relayed_cells,
+ y = ifelse(h$frac_rend_relayed_cells == 0, NA,
+ h$hidserv_rend_relayed_cells * 512 * 8 / (86400 * 1e6)),
+ facet = "traffic in Mbit/s"))
ggplot(h3[h3$facet == ".onion addresses", ], aes(x = x, y = y)) +
geom_point(alpha = 0.5) +
stat_smooth(method = "lm") +
-scale_x_continuous(name = "\nProbability", labels = percent) +
+scale_x_continuous(name = "\nFraction", labels = percent) +
scale_y_continuous(name = "Reported .onion addresses\n")
ggsave("out/graphs/report/corr-probs-onions-by-relay.pdf", width = 5,
height = 3, dpi = 100)
-ggplot(h3[h3$facet == "traffic in kB/s", ], aes(x = x, y = y)) +
+ggplot(h3[h3$facet == "traffic in Mbit/s", ], aes(x = x, y = y)) +
geom_point(alpha = 0.5) +
stat_smooth(method = "lm") +
-scale_x_continuous(name = "\nProbability", labels = percent) +
-scale_y_continuous(name = "Reported traffic in kB/s\n")
+scale_x_continuous(name = "\nFraction", labels = percent) +
+scale_y_continuous(name = "Reported traffic in Mbit/s\n")
ggsave("out/graphs/report/corr-probs-cells-by-relay.pdf", width = 5,
height = 3, dpi = 100)
# Graph correlation between reports and fractions per day.
h5 <- rbind(
data.frame(date = as.Date(h$stats_end),
- prob = ifelse(h$frac_hsdesc == 0, NA, h$frac_hsdesc / 3.0),
+ prob = ifelse(h$frac_dir_onions_seen == 0, NA, h$frac_dir_onions_seen),
reported = h$hidserv_dir_onions_seen, facet = "published descriptor"),
data.frame(date = as.Date(h$stats_end),
- prob = ifelse(h$prob_rend_point == 0, NA, h$prob_rend_point),
- reported = h$hidserv_rend_relayed_cells * 512 / (86400 * 1000 * 1000),
- facet = "traffic in MB/s"))
+ prob = ifelse(h$frac_rend_relayed_cells == 0, NA,
+ h$frac_rend_relayed_cells),
+ reported = h$hidserv_rend_relayed_cells * 512 * 8 / (86400 * 1e6),
+ facet = "traffic in Mbit/s"))
h5 <- na.omit(h5)
h5 <- aggregate(list(prob = h5$prob, reported = h5$reported),
by = list(date = h5$date, facet = h5$facet), FUN = sum)
-ggplot(h5[h5$facet == "traffic in MB/s", ], aes(x = prob, y = reported)) +
+ggplot(h5[h5$facet == "traffic in Mbit/s", ],
+ aes(x = prob, y = reported)) +
geom_point(alpha = 0.5) +
scale_x_continuous(name = "\nTotal probability", labels = percent) +
-scale_y_continuous(name = "Total traffic in MB/s\n") +
+scale_y_continuous(name = "Total traffic in Mbit/s\n") +
stat_smooth(method = "lm") +
geom_vline(xintercept = 0.01, linetype = 2)
ggsave("out/graphs/report/corr-probs-cells-by-day.pdf", width = 5,
@@ -149,98 +145,104 @@ geom_vline(xintercept = 0.01, linetype = 2)
ggsave("out/graphs/report/corr-probs-onions-by-day.pdf", width = 5,
height = 3, dpi = 100)
+# Graph ECDF of extrapolated cells.
+h20 <- h[h$frac_rend_relayed_cells > 0, ]
+h20 <- h20$hidserv_rend_relayed_cells *
+ 512 * 8 / (86400 * 1e6 * h20$frac_rend_relayed_cells)
+h20 <- sort(h20)
+h20 <- data.frame(x = h20, y = (1:length(h20)) / length(h20))
+ggplot(h20, aes(x = x, y = y)) +
+geom_line() +
+scale_x_continuous("\nExtrapolated total traffic in Mbit/s",
+ limits = c(max(h20[h20$y < 0.01, "x"]), min(h20[h20$y > 0.99, "x"]))) +
+scale_y_continuous("Cumulative probability\n", labels = percent)
+ggsave("out/graphs/report/extrapolated-cells.pdf", width = 5, height = 3,
+ dpi = 100)
+
+# Graph ECDF of extrapolated .onions.
+h21 <- h[h$frac_dir_onions_seen > 0, ]
+h21 <- h21$hidserv_dir_onions_seen / (12 * h21$frac_dir_onions_seen)
+h21 <- sort(h21)
+h21 <- data.frame(x = h21, y = (1:length(h21)) / length(h21))
+ggplot(h21, aes(x = x, y = y)) +
+geom_line() +
+scale_x_continuous("\nExtrapolated total .onion addresses",
+ limits = c(max(h21[h21$y < 0.01, "x"]), min(h21[h21$y > 0.99, "x"]))) +
+scale_y_continuous("Cumulative probability\n", labels = percent)
+ggsave("out/graphs/report/extrapolated-onions.pdf", width = 5, height = 3,
+ dpi = 100)
+
# Graph extrapolated network totals.
-h6 <- data.frame(date = as.Date(h$stats_end),
- traffic = ifelse(h$prob_rend_point == 0, 0,
- h$hidserv_rend_relayed_cells * 512 / (86400 * 1000 * 1000)),
- prob_rend_point = h$prob_rend_point,
- onions = ifelse(h$frac_hsdesc == 0, 0, h$hidserv_dir_onions_seen),
- prob_onion = h$frac_hsdesc * 4.0)
-h6 <- aggregate(list(traffic = h6$traffic,
- prob_rend_point = h6$prob_rend_point,
- onions = h6$onions,
- prob_onion = h6$prob_onion), by = list(date = h6$date), FUN = sum)
-h6 <- data.frame(date = h6$date,
- traffic = ifelse(h6$prob_rend_point < 0.01, 0,
- h6$traffic / h6$prob_rend_point),
- onions = ifelse(h6$prob_onion / 12.0 < 0.01, 0,
- h6$onions / h6$prob_onion))
-h6 <- melt(h6, "date")
-h6 <- h6[h6$value > 0, ]
-h6 <- rbind(h6, data.frame(date = NA, variable = c('traffic', 'onions'),
- value = 0))
-h6 <- data.frame(date = h6$date,
- variable = ifelse(h6$variable == "traffic", "total traffic in MB/s",
- ".onion addresses"), value = h6$value)
-ggplot(h6, aes(date, value)) +
-facet_grid(variable ~ ., scales = "free_y") +
-geom_point() +
-stat_smooth() +
+e <- read.csv("out/csv/hidserv-stats-extrapolated.csv",
+ stringsAsFactors = FALSE)
+e <- melt(e, by = c("date", "type"))
+e <- e[e$variable == "wiqm", ]
+e <- rbind(e, data.frame(date = NA, type = c("onions", "cells"),
+ variable = NA, value = 0))
+e <- data.frame(e, label = ifelse(e$type == "cells", "traffic in Mbit/s",
+ "unique .onion addresses"))
+ggplot(e, aes(x = as.Date(date), y = value)) +
+geom_line() +
+facet_grid(label ~ ., scales = "free_y") +
scale_x_date(name = "") +
scale_y_continuous(name = "Extrapolated network totals\n")
ggsave("out/graphs/report/extrapolated-network-totals.pdf", width = 10,
height = 5, dpi = 100)
-# Graph extrapolated number of .onion addresses.
-h11 <- h6[h6$variable == ".onion addresses", ]
-ggplot(h11, aes(x = date, y = value)) +
-geom_point() +
-stat_smooth() +
-scale_x_date(name = "") +
-scale_y_continuous(name = "")
-ggsave("out/graphs/slides/hidserv-13.png", width = 8, height = 3,
- dpi = 100)
-
-# Graph extrapolated fraction of hidden-service traffic.
-b <- read.csv("in/metrics/bandwidth.csv", stringsAsFactors = FALSE)
-b <- b[b$isexit == '' & b$isguard == '' & b$date > '2014-12-20', ]
-h10 <- data.frame(date = as.Date(h$stats_end),
- traffic = h$hidserv_rend_relayed_cells * 512 / (86400 * 1000 * 1000),
- prob_rend_point = h$prob_rend_point)
-h10 <- aggregate(list(traffic = h10$traffic,
- prob_rend_point = h10$prob_rend_point), by = list(date = h10$date),
- FUN = sum)
-h10 <- data.frame(date = h10$date,
- traffic = ifelse(h10$prob_rend_point < 0.01, 0,
- h10$traffic / h10$prob_rend_point))
-h10 <- melt(h10, "date")
-h10 <- h10[h10$value > 0, ]
-h10 <- rbind(h10, data.frame(date = as.Date(b$date), variable = "bw",
- value = b$bwread + b$bwwrite))
-h10 <- cast(h10, date ~ variable, value = "value")
-h10 <- na.omit(h10)
-h10 <- data.frame(date = h10$date,
- value = h10$traffic * 1000 * 1000 / h10$bw)
-h10 <- rbind(h10, data.frame(date = NA, value = 0))
-ggplot(h10, aes(x = date, y = value)) +
-geom_point() +
-scale_x_date(name = "") +
-scale_y_continuous(name = "", labels = percent) +
-stat_smooth()
-ggsave("out/graphs/slides/hidserv-14.png", width = 8, height = 3,
+# Graph distributions of calculated fractions by day.
+h71 <- data.frame(date = as.Date(h$stats_end),
+ frac_rend_relayed_cells = h$frac_rend_relayed_cells,
+ frac_dir_onions_seen = h$frac_dir_onions_seen)
+summary(h71)
+h71 <- aggregate(list(
+ frac_rend_relayed_cells = h71$frac_rend_relayed_cells,
+ frac_dir_onions_seen = h71$frac_dir_onions_seen),
+ by = list(date = h71$date), FUN = sum)
+summary(h71)
+h71 <- melt(h71, "date")
+summary(h71)
+h71 <- data.frame(date = h71$date,
+ variable = ifelse(h71$variable == "frac_rend_relayed_cells",
+ "cells on rendezvous circuits", "hidden-service descriptors"),
+ value = h71$value)
+ggplot(h71, aes(x = date, y = value)) +
+geom_line() +
+facet_grid(variable ~ ., scales = "free_y") +
+geom_hline(yintercept = 0.01, linetype = 2) +
+scale_x_date("") +
+scale_y_continuous("Total calculated network fractions per day\n",
+ labels = percent)
+ggsave("out/graphs/report/probs-by-day.pdf", width = 10, height = 5,
dpi = 100)
# Graph simulation results for cells on rendezvous circuits.
s <- read.csv("out/csv/sim-cells.csv")
-ggplot(s, aes(x = frac, y = (p500 - 1e10) / 1e10,
- ymin = (p025 - 1e10) / 1e10, ymax = (p975 - 1e10) / 1e10)) +
+s <- do.call(data.frame, aggregate(list(X = s$wiqm),
+ by = list(frac = s$frac), FUN = quantile, probs = c(0.025, 0.5, 0.975)))
+ggplot(s, aes(x = frac, y = (X.50. - 1e10) / 1e10,
+ ymin = (X.2.5. - 1e10) / 1e10, ymax = (X.97.5. - 1e10) / 1e10)) +
geom_line() +
geom_ribbon(alpha = 0.2) +
scale_x_continuous("\nRendezvous points included in extrapolation",
labels = percent) +
-scale_y_continuous("Deviation from network totals\n", labels = percent)
+scale_y_continuous("Deviation from actual\nhidden-service traffic\n",
+ labels = percent)
ggsave("out/graphs/report/sim-cells.pdf", width = 5, height = 3,
dpi = 100)
# Graph simulation results for .onion addresses.
o <- read.csv("out/csv/sim-onions.csv")
-ggplot(o, aes(x = frac, y = (p500 - 40000) / 40000,
- ymin = (p025 - 40000) / 40000, ymax = (p975 - 40000) / 40000)) +
+o <- do.call(data.frame, aggregate(list(X = o$wiqm),
+ by = list(frac = o$frac), FUN = quantile, probs = c(0.025, 0.5, 0.975)))
+ggplot(o, aes(x = frac, y = (X.50. / 12 - 40000) / 40000,
+ ymin = (X.2.5. / 12 - 40000) / 40000,
+ ymax = (X.97.5. / 12 - 40000) / 40000)) +
geom_line() +
geom_ribbon(alpha = 0.2) +
scale_x_continuous("\nDirectories included in extrapolation",
labels = percent) +
-scale_y_continuous("Deviation from network totals\n", labels = percent)
+scale_y_continuous("Deviation from actual\nnumber of .onions\n",
+ labels = percent)
ggsave("out/graphs/report/sim-onions.pdf", width = 5, height = 3,
dpi = 100)
diff --git a/task-13192/src/java/Aggregate.java b/task-13192/src/java/Aggregate.java
new file mode 100644
index 0000000..56bac2c
--- /dev/null
+++ b/task-13192/src/java/Aggregate.java
@@ -0,0 +1,129 @@
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileReader;
+import java.io.FileWriter;
+import java.text.DateFormat;
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TimeZone;
+import java.util.TreeMap;
+
+public class Aggregate {
+
+ private static File hidservStatsCsvFile =
+ new File("out/csv/hidserv-stats.csv");
+
+ private static File hidservStatsExtrapolatedCsvFile =
+ new File("out/csv/hidserv-stats-extrapolated.csv");
+
+ public static void main(String[] args) throws Exception {
+ aggregate();
+ }
+
+ private static final DateFormat DATE_TIME_FORMAT, DATE_FORMAT;
+
+ static {
+ DATE_TIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
+ DATE_TIME_FORMAT.setLenient(false);
+ DATE_TIME_FORMAT.setTimeZone(TimeZone.getTimeZone("UTC"));
+ DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd");
+ DATE_FORMAT.setLenient(false);
+ DATE_FORMAT.setTimeZone(TimeZone.getTimeZone("UTC"));
+ }
+
+ private static void aggregate() throws Exception {
+ if (!hidservStatsCsvFile.exists() ||
+ hidservStatsCsvFile.isDirectory()) {
+ System.err.println("Unable to read "
+ + hidservStatsCsvFile.getAbsolutePath() + ". Exiting.");
+ System.exit(1);
+ }
+ SortedMap<String, List<double[]>>
+ extrapolatedCells = new TreeMap<String, List<double[]>>(),
+ extrapolatedOnions = new TreeMap<String, List<double[]>>();
+ BufferedReader br = new BufferedReader(new FileReader(
+ hidservStatsCsvFile));
+ String line = br.readLine();
+ while ((line = br.readLine()) != null) {
+ String[] parts = line.split(",");
+ String date = DATE_FORMAT.format(DATE_TIME_FORMAT.parse(parts[2]));
+ double hidservRendRelayedCells = Double.parseDouble(parts[3]),
+ hidservDirOnionsSeen = Double.parseDouble(parts[4]),
+ fracRendRelayedCells = Double.parseDouble(parts[5]),
+ fracDirOnionsSeen = Double.parseDouble(parts[6]);
+
+ if (fracRendRelayedCells > 0.0) {
+ if (!extrapolatedCells.containsKey(date)) {
+ extrapolatedCells.put(date, new ArrayList<double[]>());
+ }
+ extrapolatedCells.get(date).add(new double[] {
+ hidservRendRelayedCells * 512.0 * 8.0
+ / (86400.0 * 1000000.0 * fracRendRelayedCells),
+ fracRendRelayedCells });
+ }
+ if (fracDirOnionsSeen > 0.0) {
+ if (!extrapolatedOnions.containsKey(date)) {
+ extrapolatedOnions.put(date, new ArrayList<double[]>());
+ }
+ extrapolatedOnions.get(date).add(new double[] {
+ hidservDirOnionsSeen / (12.0 * fracDirOnionsSeen),
+ fracDirOnionsSeen });
+ }
+ }
+ br.close();
+ hidservStatsExtrapolatedCsvFile.getParentFile().mkdirs();
+ BufferedWriter bw = new BufferedWriter(new FileWriter(
+ hidservStatsExtrapolatedCsvFile));
+ bw.write("date,type,wmean,wmedian,wiqm\n");
+ for (int i = 0; i < 2; i++) {
+ String type = i == 0 ? "cells" : "onions";
+ SortedMap<String, List<double[]>> extrapolated = i == 0
+ ? extrapolatedCells : extrapolatedOnions;
+ for (Map.Entry<String, List<double[]>> e :
+ extrapolated.entrySet()) {
+ String date = e.getKey();
+ List<double[]> weightedValues = e.getValue();
+ double totalFrac = 0.0;
+ for (double[] d : weightedValues) {
+ totalFrac += d[1];
+ }
+ if (totalFrac < 0.01) {
+ continue;
+ }
+ Collections.sort(weightedValues,
+ new Comparator<double[]>() {
+ public int compare(double[] o1, double[] o2) {
+ return o1[0] < o2[0] ? -1 : o1[0] > o2[0] ? 1 : 0;
+ }
+ });
+ double totalWeighted = 0.0, totalProbability = 0.0;
+ double totalInterquartileFrac = 0.0,
+ totalWeightedInterquartile = 0.0;
+ Double weightedMedian = null;
+ for (double[] d : weightedValues) {
+ totalWeighted += d[0] * d[1];
+ totalProbability += d[1];
+ if (weightedMedian == null &&
+ totalProbability > totalFrac * 0.5) {
+ weightedMedian = d[0];
+ }
+ if (totalProbability >= totalFrac * 0.25 &&
+ totalProbability - d[1] <= totalFrac * 0.75) {
+ totalWeightedInterquartile += d[0] * d[1];
+ totalInterquartileFrac += d[1];
+ }
+ }
+ bw.write(String.format("%s,%s,%.0f,%.0f,%.0f%n", date, type,
+ totalWeighted / totalProbability, weightedMedian,
+ totalWeightedInterquartile / totalInterquartileFrac));
+ }
+ }
+ bw.close();
+ }
+}
diff --git a/task-13192/src/java/Extrapolate.java b/task-13192/src/java/Extrapolate.java
new file mode 100644
index 0000000..29ff518
--- /dev/null
+++ b/task-13192/src/java/Extrapolate.java
@@ -0,0 +1,424 @@
+import java.io.BufferedWriter;
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.FileWriter;
+import java.math.BigInteger;
+import java.text.DateFormat;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Date;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Scanner;
+import java.util.SortedMap;
+import java.util.SortedSet;
+import java.util.TimeZone;
+import java.util.TreeMap;
+import java.util.TreeSet;
+
+import org.torproject.descriptor.Descriptor;
+import org.torproject.descriptor.DescriptorFile;
+import org.torproject.descriptor.DescriptorReader;
+import org.torproject.descriptor.DescriptorSourceFactory;
+import org.torproject.descriptor.ExtraInfoDescriptor;
+import org.torproject.descriptor.NetworkStatusEntry;
+import org.torproject.descriptor.RelayNetworkStatusConsensus;
+
+public class Extrapolate {
+
+ private static File archiveExtraInfosDirectory =
+ new File("in/collector/archive/relay-descriptors/extra-infos/");
+
+ private static File recentExtraInfosDirectory =
+ new File("in/collector/recent/relay-descriptors/extra-infos/");
+
+ private static File archiveConsensuses =
+ new File("in/collector/archive/relay-descriptors/consensuses/");
+
+ private static File recentConsensuses =
+ new File("in/collector/recent/relay-descriptors/consensuses/");
+
+ private static File hidservStatsCsvFile =
+ new File("out/csv/hidserv-stats.csv");
+
+ public static void main(String[] args) throws Exception {
+ System.out.println("Extracting hidserv-* lines from extra-info "
+ + "descriptors...");
+ SortedMap<String, SortedSet<HidServStats>> hidServStats =
+ extractHidServStats();
+ System.out.println("Extracting fractions from consensuses...");
+ SortedMap<String, SortedSet<ConsensusFraction>> consensusFractions =
+ extractConsensusFractions(hidServStats.keySet());
+ System.out.println("Extrapolating statistics...");
+ extrapolateHidServStats(hidServStats, consensusFractions);
+ System.out.println(new Date() + " Terminating.");
+ }
+
+ private static final DateFormat DATE_TIME_FORMAT;
+
+ static {
+ DATE_TIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
+ DATE_TIME_FORMAT.setLenient(false);
+ DATE_TIME_FORMAT.setTimeZone(TimeZone.getTimeZone("UTC"));
+ }
+
+ private static class HidServStats implements Comparable<HidServStats> {
+
+ /* Hidden-service statistics end timestamp in milliseconds. */
+ private long statsEndMillis;
+
+ /* Statistics interval length in seconds. */
+ private long statsIntervalSeconds;
+
+ /* Number of relayed cells reported by the relay and adjusted by
+ * rounding to the nearest right side of a bin and subtracting half of
+ * the bin size. */
+ private long rendRelayedCells;
+
+ /* Number of .onions reported by the relay and adjusted by rounding to
+ * the nearest right side of a bin and subtracting half of the bin
+ * size. */
+ private long dirOnionsSeen;
+
+ private HidServStats(long statsEndMillis, long statsIntervalSeconds,
+ long rendRelayedCells, long dirOnionsSeen) {
+ this.statsEndMillis = statsEndMillis;
+ this.statsIntervalSeconds = statsIntervalSeconds;
+ this.rendRelayedCells = rendRelayedCells;
+ this.dirOnionsSeen = dirOnionsSeen;
+ }
+
+ @Override
+ public boolean equals(Object otherObject) {
+ if (!(otherObject instanceof HidServStats)) {
+ return false;
+ }
+ HidServStats other = (HidServStats) otherObject;
+ return this.statsEndMillis == other.statsEndMillis &&
+ this.statsIntervalSeconds == other.statsIntervalSeconds &&
+ this.rendRelayedCells == other.rendRelayedCells &&
+ this.dirOnionsSeen == other.dirOnionsSeen;
+ }
+
+ @Override
+ public int compareTo(HidServStats other) {
+ return this.statsEndMillis < other.statsEndMillis ? -1 :
+ this.statsEndMillis > other.statsEndMillis ? 1 : 0;
+ }
+ }
+
+ /* Extract fingerprint and hidserv-* lines from extra-info descriptors
+ * located in in/{archive,recent}/relay-descriptors/extra-infos/. */
+ private static SortedMap<String, SortedSet<HidServStats>>
+ extractHidServStats() {
+ SortedMap<String, SortedSet<HidServStats>> extractedHidServStats =
+ new TreeMap<String, SortedSet<HidServStats>>();
+ DescriptorReader descriptorReader =
+ DescriptorSourceFactory.createDescriptorReader();
+ descriptorReader.addDirectory(archiveExtraInfosDirectory);
+ descriptorReader.addDirectory(recentExtraInfosDirectory);
+ Iterator<DescriptorFile> descriptorFiles =
+ descriptorReader.readDescriptors();
+ while (descriptorFiles.hasNext()) {
+ DescriptorFile descriptorFile = descriptorFiles.next();
+ for (Descriptor descriptor : descriptorFile.getDescriptors()) {
+ if (!(descriptor instanceof ExtraInfoDescriptor)) {
+ continue;
+ }
+ String fingerprint =
+ ((ExtraInfoDescriptor) descriptor).getFingerprint();
+ Scanner scanner = new Scanner(new ByteArrayInputStream(
+ descriptor.getRawDescriptorBytes()));
+ Long statsEndMillis = null, statsIntervalSeconds = null,
+ rendRelayedCells = null, dirOnionsSeen = null;
+ try {
+ while (scanner.hasNext()) {
+ String line = scanner.nextLine();
+ if (line.startsWith("hidserv-")) {
+ String[] parts = line.split(" ");
+ if (parts[0].equals("hidserv-stats-end")) {
+ if (parts.length != 5 || !parts[3].startsWith("(") ||
+ !parts[4].equals("s)")) {
+ /* Will warn below, because statsEndMillis and
+ * statsIntervalSeconds are still null. */
+ continue;
+ }
+ statsEndMillis = DATE_TIME_FORMAT.parse(
+ parts[1] + " " + parts[2]).getTime();
+ statsIntervalSeconds =
+ Long.parseLong(parts[3].substring(1));
+ } else if (parts[0].equals("hidserv-rend-relayed-cells")) {
+ if (parts.length != 5 ||
+ !parts[4].startsWith("bin_size=")) {
+ /* Will warn below, because rendRelayedCells is still
+ * null. */
+ continue;
+ }
+ rendRelayedCells = removeNoise(Long.parseLong(parts[1]),
+ Long.parseLong(parts[4].substring(9)));
+ } else if (parts[0].equals("hidserv-dir-onions-seen")) {
+ if (parts.length != 5 ||
+ !parts[4].startsWith("bin_size=")) {
+ /* Will warn below, because dirOnionsSeen is still
+ * null. */
+ continue;
+ }
+ dirOnionsSeen = removeNoise(Long.parseLong(parts[1]),
+ Long.parseLong(parts[4].substring(9)));
+ }
+ }
+ }
+ } catch (ParseException e) {
+ e.printStackTrace();
+ continue;
+ } catch (NumberFormatException e) {
+ e.printStackTrace();
+ continue;
+ }
+ if (statsEndMillis == null && statsIntervalSeconds == null &&
+ rendRelayedCells == null && dirOnionsSeen == null) {
+ continue;
+ } else if (statsEndMillis != null && statsIntervalSeconds != null
+ && rendRelayedCells != null && dirOnionsSeen != null) {
+ if (!extractedHidServStats.containsKey(fingerprint)) {
+ extractedHidServStats.put(fingerprint,
+ new TreeSet<HidServStats>());
+ }
+ extractedHidServStats.get(fingerprint).add(new HidServStats(
+ statsEndMillis, statsIntervalSeconds, rendRelayedCells,
+ dirOnionsSeen));
+ } else {
+ System.err.println("Relay " + fingerprint + " published "
+ + "incomplete hidserv-stats. Ignoring.");
+ }
+ }
+ }
+ return extractedHidServStats;
+ }
+
+ private static long removeNoise(long reportedNumber, long binSize) {
+ long roundedToNearestRightSideOfTheBin =
+ ((reportedNumber + binSize / 2) / binSize) * binSize;
+ long subtractedHalfOfBinSize =
+ roundedToNearestRightSideOfTheBin - binSize / 2;
+ return subtractedHalfOfBinSize;
+ }
+
+ private static class ConsensusFraction
+ implements Comparable<ConsensusFraction> {
+
+ /* Valid-after timestamp of the consensus in milliseconds. */
+ private long validAfterMillis;
+
+ /* Fresh-until timestamp of the consensus in milliseconds. */
+ private long freshUntilMillis;
+
+ /* Probability for being selected by clients as rendezvous point. */
+ private double probabilityRendezvousPoint;
+
+ /* Probability for being selected as directory. This is the fraction
+ * of descriptors identifiers that this relay has been responsible
+ * for, divided by 3. */
+ private double fractionResponsibleDescriptors;
+
+ private ConsensusFraction(long validAfterMillis,
+ long freshUntilMillis,
+ double probabilityRendezvousPoint,
+ double fractionResponsibleDescriptors) {
+ this.validAfterMillis = validAfterMillis;
+ this.freshUntilMillis = freshUntilMillis;
+ this.probabilityRendezvousPoint = probabilityRendezvousPoint;
+ this.fractionResponsibleDescriptors =
+ fractionResponsibleDescriptors;
+ }
+
+ @Override
+ public boolean equals(Object otherObject) {
+ if (!(otherObject instanceof ConsensusFraction)) {
+ return false;
+ }
+ ConsensusFraction other = (ConsensusFraction) otherObject;
+ return this.validAfterMillis == other.validAfterMillis &&
+ this.freshUntilMillis == other.freshUntilMillis &&
+ this.fractionResponsibleDescriptors ==
+ other.fractionResponsibleDescriptors &&
+ this.probabilityRendezvousPoint ==
+ other.probabilityRendezvousPoint;
+ }
+
+ @Override
+ public int compareTo(ConsensusFraction other) {
+ return this.validAfterMillis < other.validAfterMillis ? -1 :
+ this.validAfterMillis > other.validAfterMillis ? 1 : 0;
+ }
+ }
+
+ /* Extract fractions that relays were responsible for from consensuses
+ * located in in/{archive,recent}/relay-descriptors/consensuses/. */
+ private static SortedMap<String, SortedSet<ConsensusFraction>>
+ extractConsensusFractions(Collection<String> fingerprints) {
+ SortedMap<String, SortedSet<ConsensusFraction>>
+ extractedConsensusFractions =
+ new TreeMap<String, SortedSet<ConsensusFraction>>();
+ DescriptorReader descriptorReader =
+ DescriptorSourceFactory.createDescriptorReader();
+ descriptorReader.addDirectory(archiveConsensuses);
+ descriptorReader.addDirectory(recentConsensuses);
+ Iterator<DescriptorFile> descriptorFiles =
+ descriptorReader.readDescriptors();
+ while (descriptorFiles.hasNext()) {
+ DescriptorFile descriptorFile = descriptorFiles.next();
+ for (Descriptor descriptor : descriptorFile.getDescriptors()) {
+ if (!(descriptor instanceof RelayNetworkStatusConsensus)) {
+ continue;
+ }
+ RelayNetworkStatusConsensus consensus =
+ (RelayNetworkStatusConsensus) descriptor;
+ SortedSet<String> weightKeys = new TreeSet<String>(Arrays.asList(
+ "Wmg,Wmm,Wme,Wmd".split(",")));
+ weightKeys.removeAll(consensus.getBandwidthWeights().keySet());
+ if (!weightKeys.isEmpty()) {
+ System.err.println("Consensus with valid-after time "
+ + DATE_TIME_FORMAT.format(consensus.getValidAfterMillis())
+ + " doesn't contain expected Wmx weights. Skipping.");
+ continue;
+ }
+ double wmg = ((double) consensus.getBandwidthWeights().get("Wmg"))
+ / 10000.0;
+ double wmm = ((double) consensus.getBandwidthWeights().get("Wmm"))
+ / 10000.0;
+ double wme = ((double) consensus.getBandwidthWeights().get("Wme"))
+ / 10000.0;
+ double wmd = ((double) consensus.getBandwidthWeights().get("Wmd"))
+ / 10000.0;
+ SortedSet<String> hsDirs = new TreeSet<String>(
+ Collections.reverseOrder());
+ double totalWeightsRendezvousPoint = 0.0;
+ SortedMap<String, Double> weightsRendezvousPoint =
+ new TreeMap<String, Double>();
+ for (Map.Entry<String, NetworkStatusEntry> e :
+ consensus.getStatusEntries().entrySet()) {
+ String fingerprint = e.getKey();
+ NetworkStatusEntry statusEntry = e.getValue();
+ SortedSet<String> flags = statusEntry.getFlags();
+ if (flags.contains("HSDir")) {
+ hsDirs.add(statusEntry.getFingerprint());
+ }
+ double weightRendezvousPoint = 0.0;
+ if (flags.contains("Fast")) {
+ weightRendezvousPoint = (double) statusEntry.getBandwidth();
+ if (flags.contains("Guard") && flags.contains("Exit")) {
+ weightRendezvousPoint *= wmd;
+ } else if (flags.contains("Guard")) {
+ weightRendezvousPoint *= wmg;
+ } else if (flags.contains("Exit")) {
+ weightRendezvousPoint *= wme;
+ } else {
+ weightRendezvousPoint *= wmm;
+ }
+ }
+ weightsRendezvousPoint.put(fingerprint, weightRendezvousPoint);
+ totalWeightsRendezvousPoint += weightRendezvousPoint;
+ }
+ /* Add all HSDir fingerprints with leading "0" and "1" to
+ * simplify the logic to traverse the ring start. */
+ SortedSet<String> hsDirsCopy = new TreeSet<String>(hsDirs);
+ hsDirs.clear();
+ for (String fingerprint : hsDirsCopy) {
+ hsDirs.add("0" + fingerprint);
+ hsDirs.add("1" + fingerprint);
+ }
+ final double RING_SIZE = new BigInteger(
+ "10000000000000000000000000000000000000000",
+ 16).doubleValue();
+ for (String fingerprint : fingerprints) {
+ double probabilityRendezvousPoint = 0.0,
+ fractionDescriptors = 0.0;
+ NetworkStatusEntry statusEntry =
+ consensus.getStatusEntry(fingerprint);
+ if (statusEntry != null) {
+ if (hsDirs.contains("1" + fingerprint)) {
+ String startResponsible = fingerprint;
+ int positionsToGo = 3;
+ for (String hsDirFingerprint :
+ hsDirs.tailSet("1" + fingerprint)) {
+ startResponsible = hsDirFingerprint;
+ if (positionsToGo-- <= 0) {
+ break;
+ }
+ }
+ fractionDescriptors =
+ new BigInteger("1" + fingerprint, 16).subtract(
+ new BigInteger(startResponsible, 16)).doubleValue()
+ / RING_SIZE;
+ fractionDescriptors /= 3.0;
+ }
+ probabilityRendezvousPoint =
+ weightsRendezvousPoint.get(fingerprint)
+ / totalWeightsRendezvousPoint;
+ }
+ if (!extractedConsensusFractions.containsKey(fingerprint)) {
+ extractedConsensusFractions.put(fingerprint,
+ new TreeSet<ConsensusFraction>());
+ }
+ extractedConsensusFractions.get(fingerprint).add(
+ new ConsensusFraction(consensus.getValidAfterMillis(),
+ consensus.getFreshUntilMillis(), probabilityRendezvousPoint,
+ fractionDescriptors));
+ }
+ }
+ }
+ return extractedConsensusFractions;
+ }
+
+ private static void extrapolateHidServStats(
+ SortedMap<String, SortedSet<HidServStats>> hidServStats,
+ SortedMap<String, SortedSet<ConsensusFraction>>
+ consensusFractions) throws Exception {
+ hidservStatsCsvFile.getParentFile().mkdirs();
+ BufferedWriter bw = new BufferedWriter(
+ new FileWriter(hidservStatsCsvFile));
+ bw.write("fingerprint,stats_start,stats_end,"
+ + "hidserv_rend_relayed_cells,hidserv_dir_onions_seen,"
+ + "frac_rend_relayed_cells,frac_dir_onions_seen\n");
+ for (Map.Entry<String, SortedSet<HidServStats>> e :
+ hidServStats.entrySet()) {
+ String fingerprint = e.getKey();
+ if (!consensusFractions.containsKey(fingerprint)) {
+ System.err.println("We have hidserv-stats but no consensus "
+ + "fractions for " + fingerprint + ". Skipping.");
+ continue;
+ }
+ for (HidServStats stats : e.getValue()) {
+ long statsStartMillis = stats.statsEndMillis
+ - stats.statsIntervalSeconds * 1000L;
+ double sumProbabilityRendezvousPoint = 0.0,
+ sumResponsibleDescriptors = 0.0;
+ int statusEntries = 0;
+ for (ConsensusFraction frac :
+ consensusFractions.get(fingerprint)) {
+ if (statsStartMillis <= frac.validAfterMillis &&
+ frac.validAfterMillis < stats.statsEndMillis) {
+ sumProbabilityRendezvousPoint +=
+ frac.probabilityRendezvousPoint;
+ sumResponsibleDescriptors +=
+ frac.fractionResponsibleDescriptors;
+ statusEntries++;
+ }
+ }
+ double fracCells = sumProbabilityRendezvousPoint / statusEntries,
+ fracDescs = sumResponsibleDescriptors / statusEntries;
+ bw.write(String.format("%s,%s,%s,%d,%d,%.8f,%.8f%n", fingerprint,
+ DATE_TIME_FORMAT.format(statsStartMillis),
+ DATE_TIME_FORMAT.format(stats.statsEndMillis),
+ stats.rendRelayedCells, stats.dirOnionsSeen, fracCells,
+ fracDescs));
+ }
+ }
+ bw.close();
+ }
+}
+
diff --git a/task-13192/src/java/ExtrapolateHidServStats.java b/task-13192/src/java/ExtrapolateHidServStats.java
deleted file mode 100644
index 100520d..0000000
--- a/task-13192/src/java/ExtrapolateHidServStats.java
+++ /dev/null
@@ -1,722 +0,0 @@
-import java.io.BufferedWriter;
-import java.io.ByteArrayInputStream;
-import java.io.File;
-import java.io.FileWriter;
-import java.math.BigInteger;
-import java.text.DateFormat;
-import java.text.ParseException;
-import java.text.SimpleDateFormat;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.Random;
-import java.util.Scanner;
-import java.util.SortedMap;
-import java.util.SortedSet;
-import java.util.TimeZone;
-import java.util.TreeMap;
-import java.util.TreeSet;
-
-import org.torproject.descriptor.Descriptor;
-import org.torproject.descriptor.DescriptorFile;
-import org.torproject.descriptor.DescriptorReader;
-import org.torproject.descriptor.DescriptorSourceFactory;
-import org.torproject.descriptor.ExtraInfoDescriptor;
-import org.torproject.descriptor.NetworkStatusEntry;
-import org.torproject.descriptor.RelayNetworkStatusConsensus;
-
-public class ExtrapolateHidServStats {
-
- private static File archiveExtraInfosDirectory =
- new File("in/collector/archive/relay-descriptors/extra-infos/");
-
- private static File recentExtraInfosDirectory =
- new File("in/collector/recent/relay-descriptors/extra-infos/");
-
- private static File archiveConsensuses =
- new File("in/collector/archive/relay-descriptors/consensuses/");
-
- private static File recentConsensuses =
- new File("in/collector/recent/relay-descriptors/consensuses/");
-
- private static File hidservStatsCsvFile =
- new File("out/csv/hidserv-stats.csv");
-
- private static File simCellsCsvFile =
- new File("out/csv/sim-cells.csv");
-
- private static File simOnionsCsvFile =
- new File("out/csv/sim-onions.csv");
-
- public static void main(String[] args) throws Exception {
- System.out.println("Extracting hidserv-* lines from extra-info "
- + "descriptors...");
- SortedMap<String, SortedSet<HidServStats>> hidServStats =
- extractHidServStats();
- System.out.println("Extracting fractions from consensuses...");
- SortedMap<String, SortedSet<ConsensusFraction>> consensusFractions =
- extractConsensusFractions(hidServStats.keySet());
- System.out.println("Extrapolating statistics...");
- extrapolateHidServStats(hidServStats, consensusFractions);
- System.out.println("Simulating extrapolation of rendezvous cells...");
- simulateCells();
- System.out.println("Simulating extrapolation of .onions...");
- simulateOnions();
- System.out.println("Terminating.");
- }
-
- private static final DateFormat DATE_TIME_FORMAT;
-
- static {
- DATE_TIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
- DATE_TIME_FORMAT.setLenient(false);
- DATE_TIME_FORMAT.setTimeZone(TimeZone.getTimeZone("UTC"));
- }
-
- private static class HidServStats implements Comparable<HidServStats> {
-
- /* Hidden-service statistics end timestamp in milliseconds. */
- private long statsEndMillis;
-
- /* Statistics interval length in seconds. */
- private long statsIntervalSeconds;
-
- /* Number of relayed cells reported by the relay and adjusted by
- * rounding to the nearest right side of a bin and subtracting half of
- * the bin size. */
- private long rendRelayedCells;
-
- /* Number of .onions reported by the relay and adjusted by rounding to
- * the nearest right side of a bin and subtracting half of the bin
- * size. */
- private long dirOnionsSeen;
-
- private HidServStats(long statsEndMillis, long statsIntervalSeconds,
- long rendRelayedCells, long dirOnionsSeen) {
- this.statsEndMillis = statsEndMillis;
- this.statsIntervalSeconds = statsIntervalSeconds;
- this.rendRelayedCells = rendRelayedCells;
- this.dirOnionsSeen = dirOnionsSeen;
- }
-
- @Override
- public boolean equals(Object otherObject) {
- if (!(otherObject instanceof HidServStats)) {
- return false;
- }
- HidServStats other = (HidServStats) otherObject;
- return this.statsEndMillis == other.statsEndMillis &&
- this.statsIntervalSeconds == other.statsIntervalSeconds &&
- this.rendRelayedCells == other.rendRelayedCells &&
- this.dirOnionsSeen == other.dirOnionsSeen;
- }
-
- @Override
- public int compareTo(HidServStats other) {
- return this.statsEndMillis < other.statsEndMillis ? -1 :
- this.statsEndMillis > other.statsEndMillis ? 1 : 0;
- }
- }
-
- /* Extract fingerprint and hidserv-* lines from extra-info descriptors
- * located in in/{archive,recent}/relay-descriptors/extra-infos/. */
- private static SortedMap<String, SortedSet<HidServStats>>
- extractHidServStats() {
- SortedMap<String, SortedSet<HidServStats>> extractedHidServStats =
- new TreeMap<String, SortedSet<HidServStats>>();
- DescriptorReader descriptorReader =
- DescriptorSourceFactory.createDescriptorReader();
- descriptorReader.addDirectory(archiveExtraInfosDirectory);
- descriptorReader.addDirectory(recentExtraInfosDirectory);
- Iterator<DescriptorFile> descriptorFiles =
- descriptorReader.readDescriptors();
- while (descriptorFiles.hasNext()) {
- DescriptorFile descriptorFile = descriptorFiles.next();
- for (Descriptor descriptor : descriptorFile.getDescriptors()) {
- if (!(descriptor instanceof ExtraInfoDescriptor)) {
- continue;
- }
- String fingerprint =
- ((ExtraInfoDescriptor) descriptor).getFingerprint();
- Scanner scanner = new Scanner(new ByteArrayInputStream(
- descriptor.getRawDescriptorBytes()));
- Long statsEndMillis = null, statsIntervalSeconds = null,
- rendRelayedCells = null, dirOnionsSeen = null;
- try {
- while (scanner.hasNext()) {
- String line = scanner.nextLine();
- if (line.startsWith("hidserv-")) {
- String[] parts = line.split(" ");
- if (parts[0].equals("hidserv-stats-end")) {
- if (parts.length != 5 || !parts[3].startsWith("(") ||
- !parts[4].equals("s)")) {
- /* Will warn below, because statsEndMillis and
- * statsIntervalSeconds are still null. */
- continue;
- }
- statsEndMillis = DATE_TIME_FORMAT.parse(
- parts[1] + " " + parts[2]).getTime();
- statsIntervalSeconds =
- Long.parseLong(parts[3].substring(1));
- } else if (parts[0].equals("hidserv-rend-relayed-cells")) {
- if (parts.length != 5 ||
- !parts[4].startsWith("bin_size=")) {
- /* Will warn below, because rendRelayedCells is still
- * null. */
- continue;
- }
- rendRelayedCells = removeNoise(Long.parseLong(parts[1]),
- Long.parseLong(parts[4].substring(9)));
- } else if (parts[0].equals("hidserv-dir-onions-seen")) {
- if (parts.length != 5 ||
- !parts[4].startsWith("bin_size=")) {
- /* Will warn below, because dirOnionsSeen is still
- * null. */
- continue;
- }
- dirOnionsSeen = removeNoise(Long.parseLong(parts[1]),
- Long.parseLong(parts[4].substring(9)));
- }
- }
- }
- } catch (ParseException e) {
- e.printStackTrace();
- continue;
- } catch (NumberFormatException e) {
- e.printStackTrace();
- continue;
- }
- if (statsEndMillis == null && statsIntervalSeconds == null &&
- rendRelayedCells == null && dirOnionsSeen == null) {
- continue;
- } else if (statsEndMillis != null && statsIntervalSeconds != null
- && rendRelayedCells != null && dirOnionsSeen != null) {
- if (!extractedHidServStats.containsKey(fingerprint)) {
- extractedHidServStats.put(fingerprint,
- new TreeSet<HidServStats>());
- }
- extractedHidServStats.get(fingerprint).add(new HidServStats(
- statsEndMillis, statsIntervalSeconds, rendRelayedCells,
- dirOnionsSeen));
- } else {
- System.err.println("Relay " + fingerprint + " published "
- + "incomplete hidserv-stats. Ignoring.");
- }
- }
- }
- return extractedHidServStats;
- }
-
- private static long removeNoise(long reportedNumber, long binSize) {
- long roundedToNearestRightSideOfTheBin =
- ((reportedNumber + binSize / 2) / binSize) * binSize;
- long subtractedHalfOfBinSize =
- roundedToNearestRightSideOfTheBin - binSize / 2;
- return subtractedHalfOfBinSize;
- }
-
- private static class ConsensusFraction
- implements Comparable<ConsensusFraction> {
-
- /* Valid-after timestamp of the consensus in milliseconds. */
- private long validAfterMillis;
-
- /* Fresh-until timestamp of the consensus in milliseconds. */
- private long freshUntilMillis;
-
- /* Fraction of consensus weight in [0.0, 1.0] of this relay. */
- private double fractionConsensusWeight;
-
- /* Probability for being selected by clients as rendezvous point. */
- private double probabilityRendezvousPoint;
-
- /* Fraction of descriptor identifiers in [0.0, 1.0] that this relay
- * has been responsible for. This is the "distance" from the
- * fingerprint of the relay three HSDir positions earlier in the ring
- * to the fingerprint of this relay. Fractions of all HSDirs in a
- * consensus add up to 3.0, not 1.0. */
- private double fractionResponsibleDescriptors;
-
- private ConsensusFraction(long validAfterMillis,
- long freshUntilMillis,
- double fractionConsensusWeight,
- double probabilityRendezvousPoint,
- double fractionResponsibleDescriptors) {
- this.validAfterMillis = validAfterMillis;
- this.freshUntilMillis = freshUntilMillis;
- this.fractionConsensusWeight = fractionConsensusWeight;
- this.probabilityRendezvousPoint = probabilityRendezvousPoint;
- this.fractionResponsibleDescriptors =
- fractionResponsibleDescriptors;
- }
-
- @Override
- public boolean equals(Object otherObject) {
- if (!(otherObject instanceof ConsensusFraction)) {
- return false;
- }
- ConsensusFraction other = (ConsensusFraction) otherObject;
- return this.validAfterMillis == other.validAfterMillis &&
- this.freshUntilMillis == other.freshUntilMillis &&
- this.fractionResponsibleDescriptors ==
- other.fractionResponsibleDescriptors &&
- this.fractionConsensusWeight == other.fractionConsensusWeight &&
- this.probabilityRendezvousPoint ==
- other.probabilityRendezvousPoint;
- }
-
- @Override
- public int compareTo(ConsensusFraction other) {
- return this.validAfterMillis < other.validAfterMillis ? -1 :
- this.validAfterMillis > other.validAfterMillis ? 1 : 0;
- }
- }
-
- /* Extract fractions that relays were responsible for from consensuses
- * located in in/{archive,recent}/relay-descriptors/consensuses/. */
- private static SortedMap<String, SortedSet<ConsensusFraction>>
- extractConsensusFractions(Collection<String> fingerprints) {
- SortedMap<String, SortedSet<ConsensusFraction>>
- extractedConsensusFractions =
- new TreeMap<String, SortedSet<ConsensusFraction>>();
- DescriptorReader descriptorReader =
- DescriptorSourceFactory.createDescriptorReader();
- descriptorReader.addDirectory(archiveConsensuses);
- descriptorReader.addDirectory(recentConsensuses);
- Iterator<DescriptorFile> descriptorFiles =
- descriptorReader.readDescriptors();
- while (descriptorFiles.hasNext()) {
- DescriptorFile descriptorFile = descriptorFiles.next();
- for (Descriptor descriptor : descriptorFile.getDescriptors()) {
- if (!(descriptor instanceof RelayNetworkStatusConsensus)) {
- continue;
- }
- RelayNetworkStatusConsensus consensus =
- (RelayNetworkStatusConsensus) descriptor;
- SortedSet<String> weightKeys = new TreeSet<String>(Arrays.asList(
- "Wmg,Wmm,Wme,Wmd".split(",")));
- weightKeys.removeAll(consensus.getBandwidthWeights().keySet());
- if (!weightKeys.isEmpty()) {
- System.err.println("Consensus with valid-after time "
- + DATE_TIME_FORMAT.format(consensus.getValidAfterMillis())
- + " doesn't contain expected Wmx weights. Skipping.");
- continue;
- }
- double wmg = ((double) consensus.getBandwidthWeights().get("Wmg"))
- / 10000.0;
- double wmm = ((double) consensus.getBandwidthWeights().get("Wmm"))
- / 10000.0;
- double wme = ((double) consensus.getBandwidthWeights().get("Wme"))
- / 10000.0;
- double wmd = ((double) consensus.getBandwidthWeights().get("Wmd"))
- / 10000.0;
- SortedSet<String> hsDirs = new TreeSet<String>(
- Collections.reverseOrder());
- long totalConsensusWeight = 0L;
- double totalWeightsRendezvousPoint = 0.0;
- SortedMap<String, Double> weightsRendezvousPoint =
- new TreeMap<String, Double>();
- for (Map.Entry<String, NetworkStatusEntry> e :
- consensus.getStatusEntries().entrySet()) {
- String fingerprint = e.getKey();
- NetworkStatusEntry statusEntry = e.getValue();
- SortedSet<String> flags = statusEntry.getFlags();
- if (flags.contains("HSDir")) {
- hsDirs.add(statusEntry.getFingerprint());
- }
- totalConsensusWeight += statusEntry.getBandwidth();
- double weightRendezvousPoint = 0.0;
- if (flags.contains("Fast")) {
- weightRendezvousPoint = (double) statusEntry.getBandwidth();
- if (flags.contains("Guard") && flags.contains("Exit")) {
- weightRendezvousPoint *= wmd;
- } else if (flags.contains("Guard")) {
- weightRendezvousPoint *= wmg;
- } else if (flags.contains("Exit")) {
- weightRendezvousPoint *= wme;
- } else {
- weightRendezvousPoint *= wmm;
- }
- }
- weightsRendezvousPoint.put(fingerprint, weightRendezvousPoint);
- totalWeightsRendezvousPoint += weightRendezvousPoint;
- }
- /* Add all HSDir fingerprints with leading "0" and "1" to
- * simplify the logic to traverse the ring start. */
- SortedSet<String> hsDirsCopy = new TreeSet<String>(hsDirs);
- hsDirs.clear();
- for (String fingerprint : hsDirsCopy) {
- hsDirs.add("0" + fingerprint);
- hsDirs.add("1" + fingerprint);
- }
- final double RING_SIZE = new BigInteger(
- "10000000000000000000000000000000000000000",
- 16).doubleValue();
- for (String fingerprint : fingerprints) {
- double probabilityRendezvousPoint = 0.0,
- fractionResponsibleDescriptors = 0.0,
- fractionConsensusWeight = 0.0;
- NetworkStatusEntry statusEntry =
- consensus.getStatusEntry(fingerprint);
- if (statusEntry != null) {
- if (hsDirs.contains("1" + fingerprint)) {
- String startResponsible = fingerprint;
- int positionsToGo = 3;
- for (String hsDirFingerprint :
- hsDirs.tailSet("1" + fingerprint)) {
- startResponsible = hsDirFingerprint;
- if (positionsToGo-- <= 0) {
- break;
- }
- }
- fractionResponsibleDescriptors =
- new BigInteger("1" + fingerprint, 16).subtract(
- new BigInteger(startResponsible, 16)).doubleValue()
- / RING_SIZE;
- }
- fractionConsensusWeight =
- ((double) statusEntry.getBandwidth())
- / ((double) totalConsensusWeight);
- probabilityRendezvousPoint =
- weightsRendezvousPoint.get(fingerprint)
- / totalWeightsRendezvousPoint;
- }
- if (!extractedConsensusFractions.containsKey(fingerprint)) {
- extractedConsensusFractions.put(fingerprint,
- new TreeSet<ConsensusFraction>());
- }
- extractedConsensusFractions.get(fingerprint).add(
- new ConsensusFraction(consensus.getValidAfterMillis(),
- consensus.getFreshUntilMillis(), fractionConsensusWeight,
- probabilityRendezvousPoint,
- fractionResponsibleDescriptors));
- }
- }
- }
- return extractedConsensusFractions;
- }
-
- private static void extrapolateHidServStats(
- SortedMap<String, SortedSet<HidServStats>> hidServStats,
- SortedMap<String, SortedSet<ConsensusFraction>>
- consensusFractions) throws Exception {
- hidservStatsCsvFile.getParentFile().mkdirs();
- BufferedWriter bw = new BufferedWriter(
- new FileWriter(hidservStatsCsvFile));
- bw.write("fingerprint,stats_start,stats_end,"
- + "hidserv_rend_relayed_cells,hidserv_dir_onions_seen,"
- + "prob_rend_point,frac_hsdesc\n");
- for (Map.Entry<String, SortedSet<HidServStats>> e :
- hidServStats.entrySet()) {
- String fingerprint = e.getKey();
- if (!consensusFractions.containsKey(fingerprint)) {
- System.err.println("We have hidserv-stats but no consensus "
- + "fractions for " + fingerprint + ". Skipping.");
- continue;
- }
- for (HidServStats stats : e.getValue()) {
- long statsStartMillis = stats.statsEndMillis
- - stats.statsIntervalSeconds * 1000L;
- double sumProbabilityRendezvousPoint = 0.0,
- sumResponsibleDescriptors = 0.0;
- int statusEntries = 0;
- for (ConsensusFraction frac :
- consensusFractions.get(fingerprint)) {
- if (statsStartMillis <= frac.validAfterMillis &&
- frac.validAfterMillis < stats.statsEndMillis) {
- sumProbabilityRendezvousPoint +=
- frac.probabilityRendezvousPoint;
- sumResponsibleDescriptors +=
- frac.fractionResponsibleDescriptors;
- statusEntries++;
- }
- }
- bw.write(String.format("%s,%s,%s,%d,%d,%.8f,%.8f%n", fingerprint,
- DATE_TIME_FORMAT.format(statsStartMillis),
- DATE_TIME_FORMAT.format(stats.statsEndMillis),
- stats.rendRelayedCells, stats.dirOnionsSeen,
- sumProbabilityRendezvousPoint / statusEntries,
- sumResponsibleDescriptors / statusEntries));
- }
- }
- bw.close();
- }
-
- private static Random rnd = new Random(3);
-
- private static void simulateCells() throws Exception {
-
- /* Generate consensus weights following an exponential distribution
- * with lambda = 1 for 3000 potential rendezvous points. */
- final int numberRendPoints = 3000;
- double[] consensusWeights = new double[numberRendPoints];
- double totalConsensusWeight = 0.0;
- for (int i = 0; i < numberRendPoints; i++) {
- double consensusWeight = -Math.log(1.0 - rnd.nextDouble());
- consensusWeights[i] = consensusWeight;
- totalConsensusWeight += consensusWeight;
- }
-
- /* Compute probabilities for being selected as rendezvous point. */
- double[] probRendPoint = new double[numberRendPoints];
- for (int i = 0; i < numberRendPoints; i++) {
- probRendPoint[i] = consensusWeights[i] / totalConsensusWeight;
- }
-
- /* Generate 10,000,000,000 (roughly 60 MiB/s) cells in chunks
- * following an exponential distribution with lambda = 0.00001 and
- * randomly assign them to a rendezvous point to report them later. */
- long cellsLeft = 10000000000L;
- final double cellsLambda = 0.00001;
- long[] observedCells = new long[numberRendPoints];
- while (cellsLeft > 0) {
- long cells = (long) (-Math.log(1.0 - rnd.nextDouble())
- / cellsLambda);
- double selectRendPoint = rnd.nextDouble();
- for (int i = 0; i < probRendPoint.length; i++) {
- selectRendPoint -= probRendPoint[i];
- if (selectRendPoint <= 0.0) {
- observedCells[i] += cells;
- break;
- }
- }
- cellsLeft -= cells;
- }
-
- /* Obfuscate reports using binning and Laplace noise, and then attempt
- * to remove noise again. */
- final long binSize = 1024L;
- final double b = 2048.0 / 0.3;
- long[] reportedCells = new long[numberRendPoints];
- long[] removedNoiseCells = new long[numberRendPoints];
- for (int i = 0; i < numberRendPoints; i++) {
- long observed = observedCells[i];
- long afterBinning = ((observed + binSize - 1L) / binSize) * binSize;
- double p = rnd.nextDouble();
- double laplaceNoise = -b * (p > 0.5 ? 1.0 : -1.0) *
- Math.log(1.0 - 2.0 * Math.abs(p - 0.5));
- long reported = afterBinning + (long) laplaceNoise;
- reportedCells[i] = reported;
- long roundedToNearestRightSideOfTheBin =
- ((reported + binSize / 2) / binSize) * binSize;
- long subtractedHalfOfBinSize =
- roundedToNearestRightSideOfTheBin - binSize / 2;
- removedNoiseCells[i] = subtractedHalfOfBinSize;
- }
-
- /* Perform 10,000 extrapolations from random fractions of reports by
- * probability to be selected as rendezvous point. */
- simCellsCsvFile.getParentFile().mkdirs();
- BufferedWriter bw = new BufferedWriter(new FileWriter(
- simCellsCsvFile));
- bw.write("frac,p025,p500,p975\n");
- double[] fractions = new double[] { 0.01, 0.02, 0.03, 0.04, 0.05, 0.1,
- 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99 };
- final int numberOfExtrapolations = 10000;
- for (double fraction : fractions) {
- List<Long> extrapolations = new ArrayList<Long>();
- for (int i = 0; i < numberOfExtrapolations; i++) {
- SortedSet<Integer> nonReportingRelays = new TreeSet<Integer>();
- for (int j = 0; j < numberRendPoints; j++) {
- nonReportingRelays.add(j);
- }
- List<Integer> shuffledRelays = new ArrayList<Integer>(
- nonReportingRelays);
- Collections.shuffle(shuffledRelays);
- SortedSet<Integer> reportingRelays = new TreeSet<Integer>();
- for (int j = 0; j < (int) ((double) numberRendPoints * fraction);
- j++) {
- reportingRelays.add(shuffledRelays.get(j));
- nonReportingRelays.remove(shuffledRelays.get(j));
- }
- double reportingProbability;
- long totalReports;
- do {
- reportingProbability = 0.0;
- totalReports = 0L;
- for (int reportingRelay : reportingRelays) {
- reportingProbability += probRendPoint[reportingRelay];
- totalReports += removedNoiseCells[reportingRelay];
- }
- if (reportingProbability < fraction - 0.001) {
- int addRelay = new ArrayList<Integer>(nonReportingRelays).get(
- rnd.nextInt(nonReportingRelays.size()));
- nonReportingRelays.remove(addRelay);
- reportingRelays.add(addRelay);
- } else if (reportingProbability > fraction + 0.001) {
- int removeRelay = new ArrayList<Integer>(reportingRelays).get(
- rnd.nextInt(reportingRelays.size()));
- reportingRelays.remove(removeRelay);
- nonReportingRelays.add(removeRelay);
- }
- } while (reportingProbability < fraction - 0.001 ||
- reportingProbability > fraction + 0.001);
- extrapolations.add((long) ((double) totalReports
- / reportingProbability));
- }
- Collections.sort(extrapolations);
- long p025 = extrapolations.get((extrapolations.size() * 25) / 1000),
- p500 = extrapolations.get((extrapolations.size() * 500) / 1000),
- p975 = extrapolations.get((extrapolations.size() * 975) / 1000);
- bw.write(String.format("%.2f,%d,%d,%d%n", fraction, p025, p500,
- p975));
- }
- bw.close();
- }
-
- private static void simulateOnions() throws Exception {
-
- /* Generate 3000 HSDirs with "fingerprints" between 0.0 and 1.0. */
- final int numberHsDirs = 3000;
- SortedSet<Double> hsDirFingerprints = new TreeSet<Double>();
- for (int i = 0; i < numberHsDirs; i++) {
- hsDirFingerprints.add(rnd.nextDouble());
- }
-
- /* Compute fractions of observed descriptor space. */
- SortedSet<Double> ring =
- new TreeSet<Double>(Collections.reverseOrder());
- for (double fingerprint : hsDirFingerprints) {
- ring.add(fingerprint);
- ring.add(fingerprint - 1.0);
- }
- SortedMap<Double, Double> hsDirFractions =
- new TreeMap<Double, Double>();
- for (double fingerprint : hsDirFingerprints) {
- double start = fingerprint;
- int positionsToGo = 3;
- for (double prev : ring.tailSet(fingerprint)) {
- start = prev;
- if (positionsToGo-- <= 0) {
- break;
- }
- }
- hsDirFractions.put(fingerprint, fingerprint - start);
- }
-
- /* Generate 40000 .onions with 4 HSDesc IDs, store them on HSDirs. */
- final int numberOnions = 40000;
- final int replicas = 4;
- final int storeOnDirs = 3;
- SortedMap<Double, SortedSet<Integer>> storedDescs =
- new TreeMap<Double, SortedSet<Integer>>();
- for (double fingerprint : hsDirFingerprints) {
- storedDescs.put(fingerprint, new TreeSet<Integer>());
- }
- for (int i = 0; i < numberOnions; i++) {
- for (int j = 0; j < replicas; j++) {
- int leftToStore = storeOnDirs;
- for (double fingerprint :
- hsDirFingerprints.tailSet(rnd.nextDouble())) {
- storedDescs.get(fingerprint).add(i);
- if (--leftToStore <= 0) {
- break;
- }
- }
- if (leftToStore > 0) {
- for (double fingerprint : hsDirFingerprints) {
- storedDescs.get(fingerprint).add(i);
- if (--leftToStore <= 0) {
- break;
- }
- }
- }
- }
- }
-
- /* Obfuscate reports using binning and Laplace noise, and then attempt
- * to remove noise again. */
- final long binSize = 8L;
- final double b = 8.0 / 0.3;
- SortedMap<Double, Long> reportedOnions = new TreeMap<Double, Long>(),
- removedNoiseOnions = new TreeMap<Double, Long>();
- for (Map.Entry<Double, SortedSet<Integer>> e :
- storedDescs.entrySet()) {
- double fingerprint = e.getKey();
- long observed = (long) e.getValue().size();
- long afterBinning = ((observed + binSize - 1L) / binSize) * binSize;
- double p = rnd.nextDouble();
- double laplaceNoise = -b * (p > 0.5 ? 1.0 : -1.0) *
- Math.log(1.0 - 2.0 * Math.abs(p - 0.5));
- long reported = afterBinning + (long) laplaceNoise;
- reportedOnions.put(fingerprint, reported);
- long roundedToNearestRightSideOfTheBin =
- ((reported + binSize / 2) / binSize) * binSize;
- long subtractedHalfOfBinSize =
- roundedToNearestRightSideOfTheBin - binSize / 2;
- removedNoiseOnions.put(fingerprint, subtractedHalfOfBinSize);
- }
-
- /* Perform 10,000 extrapolations from random fractions of reports by
- * probability to be selected as rendezvous point. */
- simOnionsCsvFile.getParentFile().mkdirs();
- BufferedWriter bw = new BufferedWriter(new FileWriter(
- simOnionsCsvFile));
- bw.write("frac,p025,p500,p975\n");
- double[] fractions = new double[] { 0.01, 0.02, 0.03, 0.04, 0.05, 0.1,
- 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99 };
- final int numberOfExtrapolations = 10000;
- for (double fraction : fractions) {
- List<Long> extrapolationsTwo = new ArrayList<Long>();
- for (int i = 0; i < numberOfExtrapolations; i++) {
- SortedSet<Double> nonReportingRelays =
- new TreeSet<Double>(hsDirFractions.keySet());
- List<Double> shuffledRelays = new ArrayList<Double>(
- nonReportingRelays);
- Collections.shuffle(shuffledRelays);
- SortedSet<Double> reportingRelays = new TreeSet<Double>();
- for (int j = 0; j < (int) ((double) hsDirFractions.size()
- * fraction); j++) {
- reportingRelays.add(shuffledRelays.get(j));
- nonReportingRelays.remove(shuffledRelays.get(j));
- }
- double reportingProbability;
- long totalReports;
- do {
- reportingProbability = 0.0;
- totalReports = 0L;
- for (double reportingRelay : reportingRelays) {
- reportingProbability += hsDirFractions.get(reportingRelay)
- / 3.0;
- totalReports += removedNoiseOnions.get(reportingRelay);
- }
- if (reportingProbability < fraction - 0.001) {
- double addRelay =
- new ArrayList<Double>(nonReportingRelays).get(
- rnd.nextInt(nonReportingRelays.size()));
- nonReportingRelays.remove(addRelay);
- reportingRelays.add(addRelay);
- } else if (reportingProbability > fraction + 0.001) {
- double removeRelay =
- new ArrayList<Double>(reportingRelays).get(
- rnd.nextInt(reportingRelays.size()));
- reportingRelays.remove(removeRelay);
- nonReportingRelays.add(removeRelay);
- }
- } while (reportingProbability < fraction - 0.001 ||
- reportingProbability > fraction + 0.001);
- double totalFraction = 0.0;
- for (double fingerprint : reportingRelays) {
- totalFraction += hsDirFractions.get(fingerprint) * 4.0;
- }
- extrapolationsTwo.add((long) ((double) totalReports
- / totalFraction));
- }
- Collections.sort(extrapolationsTwo);
- long pTwo025 = extrapolationsTwo.get(
- (extrapolationsTwo.size() * 25) / 1000),
- pTwo500 = extrapolationsTwo.get(
- (extrapolationsTwo.size() * 500) / 1000),
- pTwo975 = extrapolationsTwo.get(
- (extrapolationsTwo.size() * 975) / 1000);
- bw.write(String.format("%.2f,%d,%d,%d%n", fraction, pTwo025,
- pTwo500, pTwo975));
- }
- bw.close();
- }
-}
-
diff --git a/task-13192/src/java/Simulate.java b/task-13192/src/java/Simulate.java
new file mode 100644
index 0000000..e41e02c
--- /dev/null
+++ b/task-13192/src/java/Simulate.java
@@ -0,0 +1,357 @@
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileWriter;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.SortedMap;
+import java.util.SortedSet;
+import java.util.TreeMap;
+import java.util.TreeSet;
+
+
+public class Simulate {
+ private static File simCellsCsvFile =
+ new File("out/csv/sim-cells.csv");
+
+ private static File simOnionsCsvFile =
+ new File("out/csv/sim-onions.csv");
+
+ public static void main(String[] args) throws Exception {
+ System.out.print("Simulating extrapolation of rendezvous cells");
+ simulateManyCells();
+ System.out.print("\nSimulating extrapolation of .onions");
+ simulateManyOnions();
+ System.out.println("\nTerminating.");
+ }
+
+ private static Random rnd = new Random();
+
+ private static void simulateManyCells() throws Exception {
+ simCellsCsvFile.getParentFile().mkdirs();
+ BufferedWriter bw = new BufferedWriter(new FileWriter(
+ simCellsCsvFile));
+ bw.write("run,frac,wmean,wmedian,wiqm\n");
+ final int numberOfExtrapolations = 1000;
+ for (int i = 0; i < numberOfExtrapolations; i++) {
+ bw.write(simulateCells(i));
+ System.out.print(".");
+ }
+ bw.close();
+ }
+
+ private static void simulateManyOnions() throws Exception {
+ simOnionsCsvFile.getParentFile().mkdirs();
+ BufferedWriter bw = new BufferedWriter(new FileWriter(
+ simOnionsCsvFile));
+ bw.write("run,frac,wmean,wmedian,wiqm\n");
+ final int numberOfExtrapolations = 1000;
+ for (int i = 0; i < numberOfExtrapolations; i++) {
+ bw.write(simulateOnions(i));
+ System.out.print(".");
+ }
+ bw.close();
+ }
+
+ private static String simulateCells(int run) {
+
+ /* Generate consensus weights following an exponential distribution
+ * with lambda = 1 for 3000 potential rendezvous points. */
+ final int numberRendPoints = 3000;
+ double[] consensusWeights = new double[numberRendPoints];
+ double totalConsensusWeight = 0.0;
+ for (int i = 0; i < numberRendPoints; i++) {
+ double consensusWeight = -Math.log(1.0 - rnd.nextDouble());
+ consensusWeights[i] = consensusWeight;
+ totalConsensusWeight += consensusWeight;
+ }
+
+ /* Compute probabilities for being selected as rendezvous point. */
+ double[] probRendPoint = new double[numberRendPoints];
+ for (int i = 0; i < numberRendPoints; i++) {
+ probRendPoint[i] = consensusWeights[i] / totalConsensusWeight;
+ }
+
+ /* Generate 10,000,000,000 cells (474 Mbit/s) in chunks following an
+ * exponential distribution with lambda = 0.0001, so on average
+ * 10,000 cells per chunk, and randomly assign them to a rendezvous
+ * point to report them later. */
+ long cellsLeft = 10000000000L;
+ final double cellsLambda = 0.0001;
+ long[] observedCells = new long[numberRendPoints];
+ while (cellsLeft > 0) {
+ long cells = Math.min(cellsLeft,
+ (long) (-Math.log(1.0 - rnd.nextDouble()) / cellsLambda));
+ double selectRendPoint = rnd.nextDouble();
+ for (int i = 0; i < probRendPoint.length; i++) {
+ selectRendPoint -= probRendPoint[i];
+ if (selectRendPoint <= 0.0) {
+ observedCells[i] += cells;
+ break;
+ }
+ }
+ cellsLeft -= cells;
+ }
+
+ /* Obfuscate reports using binning and Laplace noise, and then attempt
+ * to remove noise again. */
+ final long binSize = 1024L;
+ final double b = 2048.0 / 0.3;
+ long[] reportedCells = new long[numberRendPoints];
+ long[] removedNoiseCells = new long[numberRendPoints];
+ for (int i = 0; i < numberRendPoints; i++) {
+ long observed = observedCells[i];
+ long afterBinning = ((observed + binSize - 1L) / binSize) * binSize;
+ double p = rnd.nextDouble();
+ double laplaceNoise = -b * (p > 0.5 ? 1.0 : -1.0) *
+ Math.log(1.0 - 2.0 * Math.abs(p - 0.5));
+ long reported = afterBinning + (long) laplaceNoise;
+ reportedCells[i] = reported;
+ long roundedToNearestRightSideOfTheBin =
+ ((reported + binSize / 2) / binSize) * binSize;
+ long subtractedHalfOfBinSize =
+ roundedToNearestRightSideOfTheBin - binSize / 2;
+ removedNoiseCells[i] = subtractedHalfOfBinSize;
+ }
+
+ /* Perform extrapolations from random fractions of reports by
+ * probability to be selected as rendezvous point. */
+ StringBuilder sb = new StringBuilder();
+ double[] fractions = new double[] { 0.01, 0.02, 0.03, 0.04, 0.05, 0.1,
+ 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99 };
+ for (double fraction : fractions) {
+ SortedSet<Integer> nonReportingRelays = new TreeSet<Integer>();
+ for (int j = 0; j < numberRendPoints; j++) {
+ nonReportingRelays.add(j);
+ }
+ List<Integer> shuffledRelays = new ArrayList<Integer>(
+ nonReportingRelays);
+ Collections.shuffle(shuffledRelays);
+ SortedSet<Integer> reportingRelays = new TreeSet<Integer>();
+ for (int j = 0; j < (int) ((double) numberRendPoints * fraction);
+ j++) {
+ reportingRelays.add(shuffledRelays.get(j));
+ nonReportingRelays.remove(shuffledRelays.get(j));
+ }
+ List<double[]> singleRelayExtrapolations;
+ double totalReportingProbability;
+ do {
+ singleRelayExtrapolations = new ArrayList<double[]>();
+ totalReportingProbability = 0.0;
+ for (int reportingRelay : reportingRelays) {
+ double probability = probRendPoint[reportingRelay];
+ if (probability > 0.0) {
+ singleRelayExtrapolations.add(
+ new double[] {
+ removedNoiseCells[reportingRelay] / probability,
+ removedNoiseCells[reportingRelay],
+ probability });
+ }
+ totalReportingProbability += probability;
+ }
+ if (totalReportingProbability < fraction - 0.001) {
+ int addRelay = new ArrayList<Integer>(nonReportingRelays).get(
+ rnd.nextInt(nonReportingRelays.size()));
+ nonReportingRelays.remove(addRelay);
+ reportingRelays.add(addRelay);
+ } else if (totalReportingProbability > fraction + 0.001) {
+ int removeRelay = new ArrayList<Integer>(reportingRelays).get(
+ rnd.nextInt(reportingRelays.size()));
+ reportingRelays.remove(removeRelay);
+ nonReportingRelays.add(removeRelay);
+ }
+ } while (totalReportingProbability < fraction - 0.001 ||
+ totalReportingProbability > fraction + 0.001);
+ Collections.sort(singleRelayExtrapolations,
+ new Comparator<double[]>() {
+ public int compare(double[] o1, double[] o2) {
+ return o1[0] < o2[0] ? -1 : o1[0] > o2[0] ? 1 : 0;
+ }
+ });
+ double totalProbability = 0.0, totalValues = 0.0;
+ double totalInterquartileProbability = 0.0,
+ totalInterquartileValues = 0.0;
+ Double weightedMedian = null;
+ for (double[] extrapolation : singleRelayExtrapolations) {
+ totalValues += extrapolation[1];
+ totalProbability += extrapolation[2];
+ if (weightedMedian == null &&
+ totalProbability > totalReportingProbability * 0.5) {
+ weightedMedian = extrapolation[0];
+ }
+ if (totalProbability > totalReportingProbability * 0.25 &&
+ totalProbability < totalReportingProbability * 0.75) {
+ totalInterquartileValues += extrapolation[1];
+ totalInterquartileProbability += extrapolation[2];
+ }
+ }
+ sb.append(String.format("%d,%.2f,%.0f,%.0f,%.0f%n", run, fraction,
+ totalValues / totalProbability, weightedMedian,
+ totalInterquartileValues / totalInterquartileProbability));
+ }
+ return sb.toString();
+ }
+
+ private static String simulateOnions(final int run) {
+
+ /* Generate 3000 HSDirs with "fingerprints" between 0.0 and 1.0. */
+ final int numberHsDirs = 3000;
+ SortedSet<Double> hsDirFingerprints = new TreeSet<Double>();
+ for (int i = 0; i < numberHsDirs; i++) {
+ hsDirFingerprints.add(rnd.nextDouble());
+ }
+
+ /* Compute fractions of observed descriptor space. */
+ SortedSet<Double> ring =
+ new TreeSet<Double>(Collections.reverseOrder());
+ for (double fingerprint : hsDirFingerprints) {
+ ring.add(fingerprint);
+ ring.add(fingerprint - 1.0);
+ }
+ SortedMap<Double, Double> hsDirFractions =
+ new TreeMap<Double, Double>();
+ for (double fingerprint : hsDirFingerprints) {
+ double start = fingerprint;
+ int positionsToGo = 3;
+ for (double prev : ring.tailSet(fingerprint)) {
+ start = prev;
+ if (positionsToGo-- <= 0) {
+ break;
+ }
+ }
+ hsDirFractions.put(fingerprint, fingerprint - start);
+ }
+
+ /* Generate 40000 .onions with 4 HSDesc IDs, store them on HSDirs. */
+ final int numberOnions = 40000;
+ final int replicas = 4;
+ final int storeOnDirs = 3;
+ SortedMap<Double, SortedSet<Integer>> storedDescs =
+ new TreeMap<Double, SortedSet<Integer>>();
+ for (double fingerprint : hsDirFingerprints) {
+ storedDescs.put(fingerprint, new TreeSet<Integer>());
+ }
+ for (int i = 0; i < numberOnions; i++) {
+ for (int j = 0; j < replicas; j++) {
+ int leftToStore = storeOnDirs;
+ for (double fingerprint :
+ hsDirFingerprints.tailSet(rnd.nextDouble())) {
+ storedDescs.get(fingerprint).add(i);
+ if (--leftToStore <= 0) {
+ break;
+ }
+ }
+ if (leftToStore > 0) {
+ for (double fingerprint : hsDirFingerprints) {
+ storedDescs.get(fingerprint).add(i);
+ if (--leftToStore <= 0) {
+ break;
+ }
+ }
+ }
+ }
+ }
+
+ /* Obfuscate reports using binning and Laplace noise, and then attempt
+ * to remove noise again. */
+ final long binSize = 8L;
+ final double b = 8.0 / 0.3;
+ SortedMap<Double, Long> reportedOnions = new TreeMap<Double, Long>(),
+ removedNoiseOnions = new TreeMap<Double, Long>();
+ for (Map.Entry<Double, SortedSet<Integer>> e :
+ storedDescs.entrySet()) {
+ double fingerprint = e.getKey();
+ long observed = (long) e.getValue().size();
+ long afterBinning = ((observed + binSize - 1L) / binSize) * binSize;
+ double p = rnd.nextDouble();
+ double laplaceNoise = -b * (p > 0.5 ? 1.0 : -1.0) *
+ Math.log(1.0 - 2.0 * Math.abs(p - 0.5));
+ long reported = afterBinning + (long) laplaceNoise;
+ reportedOnions.put(fingerprint, reported);
+ long roundedToNearestRightSideOfTheBin =
+ ((reported + binSize / 2) / binSize) * binSize;
+ long subtractedHalfOfBinSize =
+ roundedToNearestRightSideOfTheBin - binSize / 2;
+ removedNoiseOnions.put(fingerprint, subtractedHalfOfBinSize);
+ }
+
+ /* Perform extrapolations from random fractions of reports by
+ * probability to be selected as rendezvous point. */
+ StringBuilder sb = new StringBuilder();
+ double[] fractions = new double[] { 0.01, 0.02, 0.03, 0.04, 0.05, 0.1,
+ 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99 };
+ for (double fraction : fractions) {
+ SortedSet<Double> nonReportingRelays =
+ new TreeSet<Double>(hsDirFractions.keySet());
+ List<Double> shuffledRelays = new ArrayList<Double>(
+ nonReportingRelays);
+ Collections.shuffle(shuffledRelays);
+ SortedSet<Double> reportingRelays = new TreeSet<Double>();
+ for (int j = 0; j < (int) ((double) hsDirFractions.size()
+ * fraction); j++) {
+ reportingRelays.add(shuffledRelays.get(j));
+ nonReportingRelays.remove(shuffledRelays.get(j));
+ }
+ List<double[]> singleRelayExtrapolations;
+ double totalReportingProbability;
+ do {
+ singleRelayExtrapolations = new ArrayList<double[]>();
+ totalReportingProbability = 0.0;
+ for (double reportingRelay : reportingRelays) {
+ double probability = hsDirFractions.get(reportingRelay) / 3.0;
+ if (probability > 0.0) {
+ singleRelayExtrapolations.add(
+ new double[] { removedNoiseOnions.get(reportingRelay)
+ / probability, removedNoiseOnions.get(reportingRelay),
+ probability });
+ }
+ totalReportingProbability += probability;
+ }
+ if (totalReportingProbability < fraction - 0.001) {
+ double addRelay =
+ new ArrayList<Double>(nonReportingRelays).get(
+ rnd.nextInt(nonReportingRelays.size()));
+ nonReportingRelays.remove(addRelay);
+ reportingRelays.add(addRelay);
+ } else if (totalReportingProbability > fraction + 0.001) {
+ double removeRelay =
+ new ArrayList<Double>(reportingRelays).get(
+ rnd.nextInt(reportingRelays.size()));
+ reportingRelays.remove(removeRelay);
+ nonReportingRelays.add(removeRelay);
+ }
+ } while (totalReportingProbability < fraction - 0.001 ||
+ totalReportingProbability > fraction + 0.001);
+ Collections.sort(singleRelayExtrapolations,
+ new Comparator<double[]>() {
+ public int compare(double[] o1, double[] o2) {
+ return o1[0] < o2[0] ? -1 : o1[0] > o2[0] ? 1 : 0;
+ }
+ });
+ double totalProbability = 0.0, totalValues = 0.0;
+ double totalInterquartileProbability = 0.0,
+ totalInterquartileValues = 0.0;
+ Double weightedMedian = null;
+ for (double[] extrapolation : singleRelayExtrapolations) {
+ totalValues += extrapolation[1];
+ totalProbability += extrapolation[2];
+ if (weightedMedian == null &&
+ totalProbability > totalReportingProbability * 0.5) {
+ weightedMedian = extrapolation[0];
+ }
+ if (totalProbability > totalReportingProbability * 0.25 &&
+ totalProbability < totalReportingProbability * 0.75) {
+ totalInterquartileValues += extrapolation[1];
+ totalInterquartileProbability += extrapolation[2];
+ }
+ }
+ sb.append(String.format("%d,%.2f,%.0f,%.0f,%.0f%n", run, fraction,
+ totalValues / totalProbability, weightedMedian,
+ totalInterquartileValues / totalInterquartileProbability));
+ }
+ return sb.toString();
+ }
+}
[View Less]
1
0

[translation/whisperback] Update translations for whisperback
by translation@torproject.org 09 Feb '15
by translation@torproject.org 09 Feb '15
09 Feb '15
commit f08adb64d5f0db433fa31e57a4252b576c3427d6
Author: Translation commit bot <translation(a)torproject.org>
Date: Mon Feb 9 17:15:16 2015 +0000
Update translations for whisperback
---
ta/ta.po | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ta/ta.po b/ta/ta.po
index c0312a2..413de92 100644
--- a/ta/ta.po
+++ b/ta/ta.po
@@ -11,7 +11,7 @@ msgstr ""
"Project-Id-Version: The Tor Project\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-17 17:40+…
[View More]0100\n"
-"PO-Revision-Date: 2015-02-09 16:41+0000\n"
+"PO-Revision-Date: 2015-02-09 17:02+0000\n"
"Last-Translator: git12a <git12(a)openmailbox.org>\n"
"Language-Team: Tamil (http://www.transifex.com/projects/p/torproject/language/ta/)\n"
"MIME-Version: 1.0\n"
@@ -106,7 +106,7 @@ msgid ""
"As a work-around you can save the bug report as a file on a USB drive and try to send it to us at %s from your email account using another system. Note that your bug report will not be anonymous when doing so unless you take further steps yourself (e.g. using Tor with a throw-away email account).\n"
"\n"
"Do you want to save the bug report to a file?"
-msgstr ""
+msgstr "பிஎ௠à®
றிà®à¯à®à¯ சà¯à®à¯à®µà¯à®Ÿà®°à¯à®à¯ பிரà®à¯à®©à¯à®¯à®Ÿà®²à¯ à®
னà¯à®ªà¯à®ªà®ªà¯à®ªà®à®¯à®¿à®¯à®²à®µà®¿à®²à¯à®²à¯. \n\nபிஎ௠à®
றிà®à¯à®à¯à®¯à¯ USB-à®à®¿à®°à¯à®µà®¿à®²à¯ à®à¯à®Ÿà®ªà¯à®ªà®Ÿà® à®à¯à®®à®¿à®€à¯à®€à¯, வà¯à®±à¯à®°à¯ à®à®£à®¿à®©à®¿à®¯à®¿à®²à¯ à®à®à¯à®à®³à®¿à®©à¯ மினà¯à®©à®à¯à®à®²à¯ à®®à¯à®²à®®à¯ à®à®à¯à®à®³à®¿à®©à¯ %s à®®à¯à®à®µà®°à®¿à®¯à®¿à®à¯à®à¯ à®
னà¯à®ªà¯à®ª à®®à¯à®¯à®±à¯à®à¯à®à®¿à®à¯à®à®²à®Ÿà®®à¯. à®à®µà®©à®€à¯à®€à®¿à®±à¯à®à¯ : சà¯à®à¯à®à®³à¯ à®à®¿à®±à®ªà¯à®ªà¯ வஎிமà¯à®±à¯à®à®³à¯ (à®à®€à®Ÿà®°à®£à®®à¯, Tor பயனà¯à®ªà®à¯à®€à¯à®€à®¿ ஀றà¯à®à®Ÿà®²à®¿à® மினà¯à®©à®à¯à®à®²à¯ à®à®£à®à¯à®à¯ பயனà¯à®ªà®à¯à®€à¯à®€à¯à®€à®²à¯) à®®à¯à®±à¯à®à¯à®³à¯à®³à®µà®¿à®²à¯à®²à¯à®¯à¯à®©à¯à®±à®Ÿà®²à¯ à®à®à¯à®à®³à®¿à®©à¯ பிஎ௠à®
றிà®à¯à®à¯ à®
சடம஀à¯à®¯à
®®à®Ÿà® à®à®°à¯à®à¯à®à®Ÿà®€à¯.\n\nசà¯à®à¯à®à®³à¯ பிஎ௠à®
றிà®à¯à®à¯à®¯à¯ à®à®°à¯ à®à¯à®Ÿà®ªà¯à®ªà®Ÿà® à®à¯à®®à®¿à®à¯à® வà¯à®£à¯à®à¯à®®à®Ÿ?"
#: ../whisperBack/gui.py:389 ../data/whisperback.ui.h:21
msgid "WhisperBack"
[View Less]
1
0

09 Feb '15
commit b056a1abdd9b2a57ba152896f5084257b03bcda5
Author: Translation commit bot <translation(a)torproject.org>
Date: Mon Feb 9 17:15:06 2015 +0000
Update translations for gettor
---
ta/gettor.po | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/ta/gettor.po b/ta/gettor.po
index 5664ca4..dbd9af2 100644
--- a/ta/gettor.po
+++ b/ta/gettor.po
@@ -10,7 +10,7 @@ msgstr ""
"Project-Id-Version: The Tor Project\n"
"Report-Msgid-Bugs-To: \n"
"POT-…
[View More]Creation-Date: 2013-01-19 13:40+0100\n"
-"PO-Revision-Date: 2015-02-07 16:10+0000\n"
+"PO-Revision-Date: 2015-02-09 17:15+0000\n"
"Last-Translator: git12a <git12(a)openmailbox.org>\n"
"Language-Team: Tamil (http://www.transifex.com/projects/p/torproject/language/ta/)\n"
"MIME-Version: 1.0\n"
@@ -254,13 +254,13 @@ msgid ""
"in the body of the email to the following email address:\n"
"\n"
" bridges(a)torproject.org"
-msgstr ""
+msgstr "நீங்கள் bridge-களை மின்னஞ்சலின் கடிதவுல்விவரத்தில் \"get bridges\"\nஎன குறிப்பிட்டு கீழேவுள்ள முகவரிக்கு அனுப்பிப் பெறலாம்:\n\n bridges(a)torproject.org"
#: lib/gettor/i18n.py:172
msgid ""
"It is also possible to fetch bridges with a web browser at the following\n"
"url: https://bridges.torproject.org/"
-msgstr ""
+msgstr "இணைய Browser-இல் கீழ்காணும் முகவரியிலும் பெறலாம்:\nhttps://bridges.torproject.org/"
#: lib/gettor/i18n.py:175
msgid ""
@@ -289,7 +289,7 @@ msgstr ""
msgid ""
"It was successfully understood. Your request is currently being processed.\n"
"Your package (%s) should arrive within the next ten minutes."
-msgstr ""
+msgstr "கோரிக்கை வெற்றிகரமாக புரிந்து கொள்ளப்பட்டது. உங்கள் கோரிக்கை தற்போது செயல்முறையிலுள்ளது. உங்களின் (%s) தொகுப்புகள் அடுத்த பத்து நிமிடங்களில் வந்து சேரும்."
#: lib/gettor/i18n.py:191
msgid ""
@@ -301,7 +301,7 @@ msgstr ""
msgid ""
"Unfortunately we are currently experiencing problems and we can't fulfill\n"
"your request right now. Please be patient as we try to resolve this issue."
-msgstr ""
+msgstr "துரதிர்ஷ்டவசமாக நாங்கள் தற்போது பிரச்சினைகள் சந்திக்கிறோம், ஆகையால் உங்கள் கோரிக்கை நிறைவேற்ற முடியாது. இந்தச் பிரச்சினைகளை தீர்க்க முயற்சிக்கும் நேரத்தில் தயவுசெய்து பொறுமையாக இருங்கள்."
#: lib/gettor/i18n.py:197
msgid ""
@@ -315,7 +315,7 @@ msgstr ""
msgid ""
"UNPACKING THE FILES\n"
"==================="
-msgstr ""
+msgstr "கோப்புகளை விரிவாக்குவது எப்படி\n==================="
#: lib/gettor/i18n.py:205
msgid ""
@@ -481,7 +481,7 @@ msgstr "இது நீங்கள் பயன்படுத்தும்
#: lib/gettor/i18n.py:299
msgid "How do I extract the file(s) you sent me?"
-msgstr ""
+msgstr "நீங்கள் அனுப்பிய கோப்புகளை விரிவாக்குவது எப்படி?"
#: lib/gettor/i18n.py:301
msgid "QUESTION:"
[View Less]
1
0

[stem/master] Our ONLINE target could fail with an undefined 'streams'
by atagar@torproject.org 09 Feb '15
by atagar@torproject.org 09 Feb '15
09 Feb '15
commit 6069f40bcb5114265345dd74fb312d9af46ee8a3
Author: Damian Johnson <atagar(a)torproject.org>
Date: Mon Feb 9 09:07:08 2015 -0800
Our ONLINE target could fail with an undefined 'streams'
Doesn't make the test pass, but was masking the actual issue which is
"SocksError: [6] TTL expired".
======================================================================
ERROR: test_attachstream
----------------------------------------------------------------------…
[View More]
Traceback (most recent call last):
File "/home/atagar/Desktop/stem/test/integ/control/controller.py", line 1206, in test_attachstream
our_stream = [stream for stream in streams if stream.target_address == host][0]
UnboundLocalError: local variable 'streams' referenced before assignment
---
test/integ/control/controller.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/test/integ/control/controller.py b/test/integ/control/controller.py
index 0839bc1..6ec03a0 100644
--- a/test/integ/control/controller.py
+++ b/test/integ/control/controller.py
@@ -1176,7 +1176,7 @@ class TestController(unittest.TestCase):
host = '38.229.72.14' # www.torproject.org
port = 80
- circuit_id = None
+ circuit_id, streams = None, []
def handle_streamcreated(stream):
if stream.status == 'NEW' and circuit_id:
[View Less]
1
0

09 Feb '15
commit f36f81bb5c90c6f8a06f22f755e00ababe7407b8
Merge: ba47de6 c9dcc48
Author: Damian Johnson <atagar(a)torproject.org>
Date: Mon Feb 9 08:50:40 2015 -0800
Better argument handling for run_tests.py
Balking if provided with unrecognized arguments, and more intuitive handling
for targets...
https://trac.torproject.org/projects/tor/ticket/14804
run_tests.py | 20 +++++++++++++++++---
stem/interpreter/arguments.py | 8 ++++++--
2 files …
[View More]changed, 23 insertions(+), 5 deletions(-)
[View Less]
1
0

[translation/whisperback] Update translations for whisperback
by translation@torproject.org 09 Feb '15
by translation@torproject.org 09 Feb '15
09 Feb '15
commit afbfec9ee96c79ea4e426cf66d1101cf35327141
Author: Translation commit bot <translation(a)torproject.org>
Date: Mon Feb 9 16:45:11 2015 +0000
Update translations for whisperback
---
ta/ta.po | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/ta/ta.po b/ta/ta.po
index 8e3fcd6..c0312a2 100644
--- a/ta/ta.po
+++ b/ta/ta.po
@@ -4,14 +4,15 @@
#
# Translators:
# annes badusha <badusha000(a)gmail.com>, 2014
+# git12a <git12(a)openmailbox.org…
[View More]>, 2015
# Khaleel Jageer <jskcse4(a)gmail.com>, 2014
msgid ""
msgstr ""
"Project-Id-Version: The Tor Project\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-17 17:40+0100\n"
-"PO-Revision-Date: 2014-12-19 17:10+0000\n"
-"Last-Translator: annes badusha <badusha000(a)gmail.com>\n"
+"PO-Revision-Date: 2015-02-09 16:41+0000\n"
+"Last-Translator: git12a <git12(a)openmailbox.org>\n"
"Language-Team: Tamil (http://www.transifex.com/projects/p/torproject/language/ta/)\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
@@ -81,7 +82,7 @@ msgid ""
"The bug report could not be sent, likely due to network problems. Please try to reconnect to the network and click send again.\n"
"\n"
"If it does not work, you will be offered to save the bug report."
-msgstr "\nபிழை அறிக்கை நெட்வொர்க் பிரச்னையால் வாய்ப்பு அனுப்பப்படும். நெட்வொர்க்கில் மீண்டும் மற்றும் again. அனுப்ப கிளிக் முயற்சிக்கவும்\n\nஅது வேலை செய்யவில்லை என்றால், நீங்கள் பிழை அறிக்கையை சேமிக்க வழங்கப்படும்."
+msgstr "\n\nபிழை அறிக்கை நெட்வொர்க் பிரச்னையால் அனுப்பப்படயியலவில்லை. நெட்வொர்க்கை மீண்டும் இணைக்க முயற்சித்தபின் , மீண்டும் அனுப்பு கிளிக் செய்யவும்.\n\n\nஅது வேலை செய்யவில்லை என்றால், பிழை அறிக்கையை கோப்பாக சேமிக்க வழிவழங்கப்படும்."
#: ../whisperBack/gui.py:274
msgid "Your message has been sent."
[View Less]
1
0

[stem/master] Rather than error, keep the default target when none are provided
by atagar@torproject.org 09 Feb '15
by atagar@torproject.org 09 Feb '15
09 Feb '15
commit c9dcc48de7222f8e1819fdef4e324ac7d9176839
Author: Damian Johnson <atagar(a)torproject.org>
Date: Mon Feb 9 08:37:54 2015 -0800
Rather than error, keep the default target when none are provided
Actually, on reflection if the user only provides attribute targets (ex.
'--target ONLINE') there's no point in erroring. They clearly want to keep the
default.
---
run_tests.py | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/run_tests.…
[View More]py b/run_tests.py
index c696b1d..0d45175 100755
--- a/run_tests.py
+++ b/run_tests.py
@@ -385,11 +385,13 @@ def _get_args(argv):
attribute_targets.remove(Target.RUN_ALL)
run_targets = all_run_targets
- args['run_targets'] = run_targets
- args['attribute_targets'] = attribute_targets
+ # if no RUN_* targets are provided then keep the default (otherwise we
+ # won't have any tests to run)
+
+ if run_targets:
+ args['run_targets'] = run_targets
- if not args['run_targets']:
- raise ValueError("This wouldn't run anything. You need to provide at least one target that starts with 'RUN_'.")
+ args['attribute_targets'] = attribute_targets
elif opt == '--test':
args['specific_test'] = arg
elif opt in ('-l', '--log'):
[View Less]
1
0