# [tor-commits] [tech-reports/master] adding benefits and risks to stats in 4.2 and 4.3, other minor edits

karsten at torproject.org karsten at torproject.org
Wed Jun 17 18:48:07 UTC 2015

commit 56ddf59a769c803b4d08e78da37b7ae7b3e2a146
Author: A. Johnson <aaron.m.johnson at nrl.navy.mil>
Date:   Thu Jan 1 14:20:12 2015 -0500

adding benefits and risks to stats in 4.2 and 4.3, other minor edits
---
2015/hidden-service-stats/hidden-service-stats.tex |  219 +++++++++++++-------
1 file changed, 145 insertions(+), 74 deletions(-)

diff --git a/2015/hidden-service-stats/hidden-service-stats.tex b/2015/hidden-service-stats/hidden-service-stats.tex
index 9aeb64c..20e9dc9 100644
--- a/2015/hidden-service-stats/hidden-service-stats.tex
+++ b/2015/hidden-service-stats/hidden-service-stats.tex
@@ -291,7 +291,8 @@ we want to avoid that statistics can be used to single out a specific
client and learn about its activity.
This includes power users that access lots of services or transfer large
data volumes as well as clients which are services themselves, like
-tor2web.
+tor2web. It also includes the fact that a given user under direct
+observation is using hidden services at all.

\paragraph{Assess precise number of available services}

@@ -424,7 +425,7 @@ the introduction point.
% Risks: Also tricky. That info could tell us clearly if the IP circuit is
% on a new or already established circuit which changes the traffic
% timing. Not sure how useful it is to an attacker though.
-%
+\\
\textbf{Benefits:}
%
We would learn what fraction of introduction points can be established on
@@ -452,7 +453,7 @@ But if we ever decide to re-use existing circuits for rendezvous without
extending them by another hop, this metric will give us an idea on the
adoption of that change.
Admitted, this benefit is not huge.
-%
+\\
\textbf{Risks:}
%
No obvious risks.  % only talking about aggregate statistics here, not
@@ -485,7 +486,7 @@ More precisely, relays would remember for each circuit how it was built,
and as soon as they receive an \verb+ESTABLISH_INTRO+ cell, they increment
one of two counters.
See ticket 13466 for details.
-%
+\\
\textbf{Benefits:}
%
We would learn what fraction of clients and what fraction of services run
@@ -517,18 +518,18 @@ later section.
%
A relay counts how many \verb+ESTABLISH_INTRO+ cells it receives and acts
upon during the statistics interval.
-%
+\\
\textbf{Benefits:}
%
We could validate that we have a uniform'' random distribution among
chosen introduction points in the network.
-If not, there might be a problem.
-%
+If not, there might be a problem. This could also be used to count the
+number of introduction points actually being used by hidden services,
+which could reveal errors or non-standard usage.
+\\
\textbf{Risks:}
-Considering we have a good randomness meaning every relay has the same
-chance to be picked, there are no obvious risks to share this.
-If not, we don't see a real risk for an attacker to know that a specific
-relay got chosen X times instead of the measured average Y.
+This could reveal the introduction points of hidden services with
+encrypted descriptors.

\subsubsection{Time from establishing introduction point to tearing down
circuit (1.1.4.)}
@@ -539,7 +540,7 @@ circuit (1.1.4.)}
How long did an introduction circuit last?
Relays would report statistics like mean/median time, variance/IQR, and/or
percentiles here.
-%
+\\
\textbf{Benefits:}
%
The longer introduction circuits last, the better, from a performance POV.
@@ -563,7 +564,7 @@ minimum, maximum, average number of hosted descriptors during the
statistics interval.
(There may be more efficient ways to implement these statistics that avoid
keeping a full history with timestamps, which are not discussed here.)
-%
+\\
\textbf{Benefits:}
%
This is an interesting statistic that would allow us to understand how
@@ -580,7 +581,7 @@ services by checking the amount of descriptors received per publishing
period.
If this ever becomes a problem we can imagine publishing fake descriptors
to confuse the directories.
-%
+\\
\textbf{Risks:}
%
Publishing this stat would allow someone who is indexing hidden services
@@ -595,6 +596,13 @@ HSes.''.
This is a bit related to differential privacy as we understand it, but
much more basic.

+Another risk is that changes in the number of HSes could reveal patterns
+in the services over time. For example, it could reveal some services that
+are published during certain times of day and certain days of the week,
+which could correlate with daylight hours and/or working days in certain
+parts of the world. This information could also be correlated with
+network outages over time to narrow down the location of hidden services.
+
\subsubsection{Number of descriptor updates per service (3.1.2.)}

@@ -605,7 +613,13 @@ Assuming that stats are published daily (which is not necessary), this is
going to be a number between 1 and 24 (since RendPostPeriod is currently
one hour) and services pick a new directory after 24 hours (see
\verb+rendcommon.c:get_time_period()+).
-%
+\\
+\textbf{Benefits:}
+This could reveal overall HS descriptor stability, which reflects
+the frequency of events causing descriptor updates, such as changing
+IPs or changing authentication keys. Also, this could reveal client errors
+or DoS attacks on HSDirs.
+\\
\textbf{Risks:}
%
Depending on how many HSes are behind each HSDir, this statistic might or
@@ -616,7 +630,11 @@ RendPostPeriod was publishing to that HSDir (and that the HSDir doesn't
have many clients).
Do we want to reveal that?
OTOH, it seems to me that if the directory is serving many services, this
-statistic doesn't really provide any insight.
+statistic doesn't really provide any insight. In addition, this could
+be used to reveal the introduction points used by a hidden service
+(assuming its address is known, but its descriptors are encrypted) by
+DOSing suspected IPs and observing in the responsible HSDirs report
+a higher number of descriptor updates.

\subsubsection{Time between last and first published descriptor with same
identifier}
@@ -639,7 +657,7 @@ descriptor identifiers change.
%
Relays report average number of introduction points contained in
hidden-service descriptors, possibly also percentiles.
-%
+\\
\textbf{Benefits:}
%
It would be interesting to know whether services deviate from the default
@@ -654,7 +672,7 @@ This statistic will also be killed by rend-spec-ng.
%
Relays can look at published hidden-service descriptor and count
descriptors with plain-text vs. encrypted introduction point sections.
-%
+\\
\textbf{Benefits:}
%
We would learn what fraction of services uses authentication features.
@@ -673,19 +691,24 @@ later time.
%
A relay reports the total number of descriptor fetch requests, regardless
of the requested hidden service identity.
-%
+\\
+\textbf{Benefits:}
+This statistics would be an indication of the amount of client HS
+activity. It would complement the statistics counting rendezvous data
+cells by indicating the number of initial'' connections rather
+than the amount of data, which should be a better estimate of the
+number of unique user-service pairs.
+\\
\textbf{Risks:}
%
-An adversary can use this statistic to evaluate the popularity of an HS.
-An adversary can also use this stat to detect big changes in the numbers
-of visitors of popular HSes.
-Of course, there will be noise in the statitics since multiple services
-correspond to each directory, but the adversary could reduce the noise
-after observing the same service rotating to different directories, and
-also by examining the statistics of all 6 directories that correspond to
-the service.
-This doesn't seem like a problem that is solvable with simple obfuscation
-of stats, and I suggest we don't do this statistic at all.
+An adversary could use this statistic to evaluate the popularity of a
+known HS because the HSes set of HSDirs would likely be unique and also
+known. The adversary could remove from the reported counts the fetches
+for other HSes sharing the individual HSDirs by identifying by how much
+the counts of a given HSDir tend to change during the periods for which
+the target HS is assigned to them.
+This is a major problem that doesn't seem resolvable without some kind
+of anonymization or private aggregation of the per-relay stats.

\subsubsection{Number of descriptor fetch requests by service identity
(3.2.2.)} \label{subsubsec:num_descriptor_fetches_per_hs}
@@ -694,6 +717,22 @@ of stats, and I suggest we don't do this statistic at all.
%
Relays report the distribution of descriptor fetch requests to hidden
service identities.
+%
+\textbf{Benefits:}
+This statistic would have the same benefits
+as~\ref{subsubsec:num_desc_fetches}, as well as being informative
+about HS usage. For example, it could reveal if there are many
+moderately-popular HSes or just a few very popular ones.
+\\
+\textbf{Risks:}
+This statistic would have the same risk
+as~\ref{subsubsec:num_desc_fetches}, as well as potentially
+making it even easier to reveal per-HS popularity. For example,
+if the statistic reported the maximum number of fetches over all
+HSes, and it was already clear which HS was the most popular,
+then this statistic could give that adversary an exact measure
+of popularity.
+%

\subsubsection{Number of established rendezvous points (2.1.1.)}
\label{subsubsec:num_rps}
@@ -701,7 +740,7 @@ service identities.
\textbf{Details:}
%
Relays report how many \verb+ESTABLISH_RENDEZVOUS+ cells they received.
-%
+\\
\textbf{Benefits:}
%
The number of received \verb+ESTABLISH_RENDEZVOUS+ cells indicates how
@@ -714,7 +753,7 @@ connection, and which we may not even gather because of privacy concerns.
We can easily weight the number of \verb+ESTABLISH_RENDEZVOUS+ cells with
the probability of choosing a relay as rendezvous point to estimate the
total number of such cells in the network.
-%
+\\
\textbf{Risk:}
%
There is no obvious risk from sharing this number if aggregated over a
@@ -724,25 +763,25 @@ large enough time period.
\label{subsubsec:num_intros_from_clients}

\textbf{Details:}
-%
Relays report how many \verb+INTRODUCE1+ cells they received from clients.
-%
+\\
\textbf{Benefits:}
-%
This indicates that there is in fact a client trying to reach a hidden
service thus the amount of cells could give us a rough estimate of how
many clients are actually connecting and using hidden services.
-%
+\\
\textbf{Risks:}
-%
-Unclear.
-On the one hand, this is basically the same risk as the amount of time a
-relay is picked as an introduction point.
-On the other hand, an adversary could fetch a hidden-service descriptor,
-learn that a particular relay was an introduction point for that service,
-and then see the relay receive many \verb+INTRODUCE1+ cells.
-Basically, this statistic could be used to learn how many connection
-requests a very popular hidden service gets.
+An adversary could use this statistic to learn the number of attempted
+connections to a known HS. To do this, the adversary would fetch a
+descriptor for the HS, learn the introduction poins for the HS,
+and then observe how many \verb+INTRODUCE1+ cells were received by
+those relays. To remove connections due to other HSes sharing the same
+IP, the adversary could observe the counts over the (almost certainly
+unique) set of IPs and identify how these counts differ from other
+relays not used by the target HS and from the same relays in periods
+for which they don't serve as IPs for the target HS.
+This statistic may require anonymization or private aggregation techniques
+to avoid this problem.

% [dgoulet]: I think, after discussing it with Nick, that this might be OK
% if the relay reports this stat for a lot of HS meaning the relay has at
@@ -767,7 +806,7 @@ Relays can serve as introduction point for an arbitrary number of hidden
services.
Relays could report statistics (like percentiles) on received
\verb+INTRODUCE1+ cell by introduction circuit.
-%
+\\
\textbf{Benefits:}
%
This statistic would tell us something about usage diversity of hidden
@@ -775,6 +814,12 @@ services.
A special case would be the number or fraction of established introduction
points that never sees a single \verb+INTRODUCE1+ cell.
It's unclear what we'd do with this information, though.
+\\
+\textbf{Risks:}
+This shares the same risks as \ref{subsubsec:num_intros_from_clients} but
+potentially makes it even easier to differentiate introductions due to
+different HSes with the same IP.
+%

\subsubsection{Number of server rendezvous (2.2.1.)}
\label{subsubsec:num_server_rendezvous}
@@ -782,7 +827,7 @@ It's unclear what we'd do with this information, though.
\textbf{Details:}
%
Relays report the total number of \verb+RENDEZVOUS1+ cells they receive.
-%
+\\
\textbf{Benefits:}
%
The number of received \verb+RENDEZVOUS1+ cells tells us how many
@@ -790,7 +835,7 @@ connection requests are actually accepted by servers.
This number may be lower than the number of \verb+ESTABLISH_RENDEZVOUS+
cells, because of failures in connection establishment, authentication
failures, or other reasons.
-%
+\\
\textbf{Risks:}
%
There is no obvious risk from this metric, because it's unrelated to any
@@ -817,7 +862,7 @@ direction (2.3.2.)} \label{subsubsection:num_cells_rend_circ}
\textbf{Details:}
%
Relays report the number of \verb+RELAY+ cells sent in either direction.
-%
+\\
\textbf{Benefits:}
%
The number of \verb+RELAY+ cells sent by either client or server can give
@@ -831,13 +876,20 @@ peer-to-peer models.
As a special case, we'd want to know what fraction of circuits has zero
\verb+RELAY+ cells, which would indicate a connection problem late in the
process.
-%
+\\
\textbf{Risks:}
%
In contrast to the cells discussed above, \verb+RELAY+ cells contain
actual user content.
The pattern of \verb+RELAY+ cells could also be used to fingerprint a
-given server or even client.
+given server or even client. Even more simply, this statistic could
+reveal the mere existence of an HS connection, especially a large one,
+which the adversary might otherwise not be aware of. This could combine
+badly with background knowledge such as the following: the adversary
+observes at the user's ISP a certain large amount of traffic from a user
+that he suspects might be hidden-service traffic. If he observes an
+increase in this statistic by about the same amount, he can infer that
+the user was indeed using hidden services.
While total number of cells by direction aggregated over a certain time
period should be okay to measure, any statistics going further than that
need closer analysis.
@@ -849,7 +901,7 @@ need closer analysis.
%
Relays report the time from seeing the first \verb+RELAY+ cell sent by the
client to tearing down circuit by either client or server.
-%
+\\
\textbf{Benefits:}
%
The time between receiving the first \verb+RELAY+ cell to tearing down the
@@ -859,14 +911,19 @@ short-lived or long-lived.
This information may help us make educated guesses on the type of
applications run over hidden services.
It may also help us improve the selection criteria for rendezvous points.
-%
+\\
\textbf{Risks:}
%
Session length is quite sensitive data that could be correlated with
circuit lifetimes at other places in the network.
Fortunately, the rendezvous point is neither specific to any given client
or service, which makes this information slightly less sensitive.
-Still, this metric needs further analysis.
+However, if, for example, the adversary knows that a user maintained some
+network activity for a specific amount of time and then observes a relay
+report in this statistic that someone used a rendezvous connection
+for that long, then he could infer that the target user was using hidden
+services. This gets worse if the adversary makes observations over time
+and can observe the changes in the resulting statistics.

% How many rendezvous requests finally succeded?
% Opposite: What percentage of the time did the rendezvous fail to happen?
@@ -883,6 +940,22 @@ cell} \label{subsubsec:num_rend_circ_no_data}
%
Relays report the number of rendezvous circuits that have been closed
before client or service sent a single data cell.
+\\
+\textbf{Benefits:}
+This could indicate erroneous, malicious, and non-standard behavior by
+HS clients. Learing about any such behaviour would be interesting and
+useful to improve hidden-service performance and usability.
+\\
+\textbf{Risks:}
+An adversary at the user's ISP could test to see if a user is attempting
+to connect to a hidden service by allowing the connection to proceed until
+just before data gets sent, and then killing it by modifying the packet
+contents to be garbage. Then he could observe if later any relay reports
+a larger-than-expected value for this statistic. This would be more
+effective if the user response is immediately to attempt to create new
+connections, which would only make the increase in this statistics more
+noticeable.
+

\subsection{Performance-related statistics}

@@ -894,7 +967,7 @@ first client introduction (1.2.4.)}
%
Relays report the time between \verb+ESTABLISH_INTRO+ and first
\verb+INTRODUCE1+ cell.
-%
+\\
\textbf{Benefits:}
%
This statistic tells us how long it takes for the hidden service to
@@ -908,7 +981,7 @@ This may not be very useful, but is listed here for completeness.
% [karsten]: again, I think you're wrong about introduction points
% changing at each upload.
% [dgoulet]: Yup, IP do *NOT* change at each upload.
-%
+\\
\textbf{Risks:}
%
No obvious risks.
@@ -920,7 +993,7 @@ server rendezvous (2.2.2.)} \label{subsubsec:time_rp_to_rend1}
%
Relays report the time from receiving an \verb+ESTABLISH_RENDEZVOUS+ cell
to receiving the corresponding \verb+RENDEZVOUS1+ cell.
-%
+\\
\textbf{Benefits:}
%
The time between receiving an \verb+ESTABLISH_RENDEZVOUS+ cell from the
@@ -931,7 +1004,7 @@ events near the beginning and near the end of the connection establishment
process.
If we ever want to improve the substeps inbetween, this metric is the only
way to measure effectiveness of improvements in the deployed network.
-%
+\\
\textbf{Risks:}
%
Again, there are at least no obvious risks from gathering this statistic.
@@ -943,13 +1016,13 @@ Again, there are at least no obvious risks from gathering this statistic.
%
Relays report the time from receiving a \verb+RENDEZVOUS1+ cell to seeing
the first \verb+RELAY+ cell sent from the client.
-%
+\\
\textbf{Benefits:}
The time from receiving a \verb+RENDEZVOUS1+ cell from the server (and
relaying it as \verb+RENDEZVOUS2+ cell to the client) and receiving the
first \verb+RELAY+ cell from the client is another performance indicator
of the protocol.
-%
+\\
\textbf{Risks:}
%
There are no obvious risks from learning the time between these two
@@ -975,7 +1048,7 @@ and report them along with the total number of received
\verb+ESTABLISH_INTRO+ cells.
Or it would report successes and failures, rather than totals and
failures.
-%
+\\
\textbf{Benefits:}
%
Wrong \verb+ESTABLISH_INTRO+ cells shows either a very bad bug in the code
@@ -989,7 +1062,7 @@ or a deliberate action (data mangling, unknown attack, DoS, ...).
% [karsten:] right, this is a fine question, not only limited to this
% statistic.  I added a new paragraph to the section start for "general
% considerations for gathering hidden-service statistics".
-%
+\\
\textbf{Risks:}
%
No obvious risks.
@@ -1001,12 +1074,12 @@ No obvious risks.
%
Relays report frequencies of circuit terminations requested by services
vs. different types of failures.
-%
+\\
\textbf{Benefits:}
%
If there are more than a small percentage of failures, decide how to make
things more robust.
-%
+\\
\textbf{Risks:}
%
No obvious risks.
@@ -1032,11 +1105,11 @@ descriptor.
For example: a) clock sync issues, b) different network view between, c)
the hidden service hasn't published recently'', d) the hidden service
is offline for months''.
-%
+\\
\textbf{Benefits:}
%
This seems like a statistic that could potentially find bugs in Tor.
-%
+\\
\textbf{Risks:}
%
This statistic could reveal things that we don't really understand and
@@ -1051,7 +1124,7 @@ might reveal information about specific services.
%
How many \verb+INTRODUCE1+ cells have been discarded because of unknown
service/malformed (?)/whatever-can-go-wrong, by introduction point?
-%
+\\
\textbf{Benefits:}
%
Anything exceeding a small portion of discarded \verb+INTRODUCE1+ cells
@@ -1062,7 +1135,7 @@ mangling, unknown attack, DoS, ...).
% help us investiguate. It can I guess trigger an alarm but apart from
% that...
% [karsten]: right, see section start.
-%
+\\
\textbf{Risks:}
%
No obvious risks.
@@ -1077,7 +1150,7 @@ only fractions are reported, it's not that bad.
%
Relays report the number of \verb+RENDEZVOUS1+ cell with unknown
-%
+\\
\textbf{Benefits:}
%
The number of \verb+RENDEZVOUS1+ cell that cannot be matched with a
@@ -1086,7 +1159,7 @@ problems in the protocol.
We might even distinguish between rendezvous cookies that were previously
known to the relay and those that seem entirely unrelated.
The benefit gained from this statistic is not huge though.
-%
+\\
\textbf{Risk:}
%
No obvious risks.
@@ -1349,11 +1422,9 @@ should be rounded to a chosen granularity $\delta$:
$\hat{c_i} = \delta[c_i/\delta]$.
$\delta$ should be larger than the amount by which a
single activity could change the bucket count, where again the notion of a
-single activity depends on the context. Also, for simplicity, it is
-recommended that bins
-are not split over multiple buckets (e.g. there should not be buckets for
-values 0 and 1 if $\delta = 2$). The bins here serve the same purpose
-of protecting privacy over time that they did when publishing counts.
+single activity depends on the context. The bins here serve the same
+purpose of protecting privacy over time that they did when publishing
+counts.
\item Fresh Laplace noise $\nu_i$ with distribution
$\textsf{Lap}(2\delta/\epsilon)$ should be added to the center of the
bin of the $i$th bucket. Let the resulting value be