[tor-dev] Hidden Service statistics coming soon!

Karsten Loesing karsten at torproject.org
Mon Feb 9 18:13:55 UTC 2015

Hash: SHA1

On 04/02/15 13:12, George Kadianakis wrote:
> George Kadianakis <desnacked at riseup.net> writes:
>> Hello people,
>> for the past few months we've been working on getting better 
>> statistics for hidden services [0].
>> The questions we are trying to answer are "Approximately how
>> many hidden services are there?" and "Approximately how much
>> traffic of the Tor network is going to hidden services?".
>> We can answer these questions by collecting statistics from Tor 
>> relays: specifically, from hidden service directories (HSDirs)
>> and rendezvous points. In our design, these relays first
>> obfuscate the statistics before publishing them, so that the
>> numbers themselves are not entirely precise [1]. We specify how
>> exactly these statistics are collected in proposal
>> 238-hs-relay-stats.txt [2].
>> We have also developed a Tor branch [3] implementing that
>> proposal that people can run on their relays to start collecting
>> hidden service statistics. The corresponding trac ticket is
>> #13192 if you want to follow the developer discussion [4].
>> Our plan is that in approximately a week we will ask volunteers
>> to run the branch. Then in a month from now we will use those
>> stats to write a blog post about the approximate size of Tor
>> hidden services network and the approximate traffic it's
>> pushing.
> Hello,
> we have now finished writing a tech report with the results of
> these statistics. You can find it in PDF form here: 
> https://research.torproject.org/techreports/extrapolating-hidserv-stats-2015-01-31.pdf
>  We are currently working on a more casual-reader-friendly blog
> post, which will contain additional information that the Tor
> community might be interested in. You will find it in
> blog.torproject.org in two weeks from now or so.

Below is the "additional information" that George was referring to.
And even though it's me sending this mail, most of these words are not
mine but Aaron Johnson's.  George and I only reviewed Aaron's math and
questioned the assumptions he put in.

So, in the tech report we were estimating how much hidden-service
traffic there is in the network.  But we were also wondering what
*fraction* of traffic that is.

We can take two different approaches to answer this question.  First,
we can calculate the hidden-service fraction of "external" traffic,
that is, traffic relayed into Tor from non-Tor sources, which can be
subdivided into exit traffic and hidden-service traffic.  The weights
in the 2015-01-19 00:00 consensus were all 0 for relays with the Exit
flag for any position except the exit position.  Therefore it is
reasonable to assume that all traffic on that day "relayed" (that is,
read and then written) by nodes with the Exit flag was exiting to
non-Tor hosts.  There are edge cases where this assumption breaks:
there are a small number of relays that don't have the Exit flag but
that still permit a small number of outgoing ports; and it's
conceivable that a non-exit guard accumulated clients and then changed
its exit policy.  But neither of these cases should affect our
calculation substantially.  The amount of exit traffic is at least
193322616 + 1686627828 = 1879950444, where the summands are the
minimum of read and write traffic figures for Exit and Exit&Guard
traffic.  On the other hand, we estimate total hidden-service traffic
in the network as 526 Mbit/s on January 19, or 65750000 bytes.  Thus
hidden-service traffic constitutes 65750000 / (65750000 + 1879950444)
= 0.034 of external traffic.

Second, we can calculate the hidden-service fraction of "all" traffic,
that is, the total number of bytes relayed by all relays.  Each
rendezvous circuit involves six relays, and so the total amount of HS
traffic is 6 * 65750000 = 394500000 bytes per day.  The total number
of bytes relayed by all relays is 6462858486.  The hidden-service
fraction of total traffic is thus 394500000 / 6462858486 = 0.061.
This is about twice the "external" traffic fraction because rendezvous
circuits are twice the length of exit circuits.  It is actually a bit
lower than twice the external traffic fraction because the total
traffic number includes traffic that is non-exit and
non-hidden-service (consensus fetches are likely a big component of

tl;dr: 3.4% of client traffic is hidden-service traffic, and 6.1% of
traffic seen at a relay is hidden-service traffic.

All the best,

Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org


More information about the tor-dev mailing list