Anonymity-preserving collection of usage data of a hidden service authoritative directory

Wed May 2 19:51:08 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

> Here are early results from moria1:

Wow! That was amazingly quick! :)

> The novel descriptors, and failed fetches, are high at the beginning
> because it had just started up so it didn't have any yet. Hard to
> guess what steady-state will be.

Sure, the first 10 rows or so might result from restarting the directory
node. But from there on it looks like it has stabilized. Hidden services
publish their descriptors once an hour, don't they? (Well, that was easy
to see even without looking into the spec by the decreasing number of
novel publications after the first four intervals = 60 minutes.) So it's
very unlikely that there will be many novel publications after the shown
intervals.

The only thing that does not stabilize (yet) is the total number of
descriptors. This should come from the fact that lease times for
descriptors are very much higher than republication times (24 hours vs.
1 hour, right?). Doesn't that mean that the increase in total
descriptors from the fifth interval on only comes from descriptors that
have not been refreshed and represent probably offline hidden services?
That would mean that 145 (=803-658) or 18% of the descriptors in the
last interval are useless (or even more if the total number of
descriptors increases further, what is likely the case). Wouldn't it
make more sense to synchronize publication intervals and lease times?
Was that what you meant with "artifacts"? Why would a client expect that
a hidden service with a 23-hour old descriptor is online if it knows
that it should have republished every hour? In a decentralized design I
suggest to cut down the lease time to one hour (or maybe 1.5 hours).
This saves resources for replicating descriptors in case of
leaving/joining routers.

> But the first thing to note is that
> the total number of fetches are really not that high.

At least the number of fetches needs to be multiplied by five, because
requests should be (more or less) equally distributed among directories.
 Though these numbers still are not as high as I expected, it is very
interesting to have some absolute numbers.

> The second thing
> to note is to start wondering why a few services publish so often --
> is it because their intro circuits break often, maybe because they have
> a poor network connection, so they feel the need to republish?

To be honest, I don't know yet if these numbers are really high or not.
What is high and what is low? Does low mean that all services publish
equally often, and high means that all services but one publish only
once and the remaining service publishes all the other times? I think I
need to read a good statistics book to learn how to evaluate such data.
When writing the spec, the percent-histories were just a goodie, and I
wanted to implement something more complex than a counter in C to see if
I have problems with the implementation stuff. ;) But you are right, if
that number is (too) high, we should try to find out why.

> And the
> third is to remember that hidden services suck right now, so we shouldn't
> take the current usage pattern to be the requirements for future hidden
> services. :)

Then my question is: Why do hidden services suck right now? Do you mean
performance? Yes, that could be improved. In an earlier evaluation I
found that connection establishment after having downloaded the
descriptor takes 5.39 +- 12.4 seconds, i.e. with an acceptable mean, but
a huge variance. Afterwards, message round-trip times were 2.32 +- 1.66
seconds, i.e. acceptable after all.

Or are there other reasons why they suck? Unclear security properties?
Too complicated setup? The need for Tor on the client side? What do you
think?

Anyway, even if the current usage pattern does not really justify to
distribute storage of rendezvous service descriptors, future
applications of hidden services might do so. Or the other way round, new
applications that would not be reasonable in a centralized storage can
be made possible in a decentralized one. That keeps me optimistic. :)

- --Karsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGOOus0M+WPffBEmURAnlbAKCY6btkPWnV3OekkzrdHmKdcOqa7QCdHBAn
b3laZQNu4f72/8SHM3yJyo8=
=m/F/
-----END PGP SIGNATURE-----