Proposal: Distributed Storage for Tor Hidden Service Descriptors

Karsten Loesing karsten.loesing at gmx.net
Tue Jun 12 22:19:44 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Roger,

thanks for your many comments! They really help to advance the project!

Let me start with answering the comment that caused (and still causes)
most of my headaches; the black-hole problem:

> I still worry about an attacker generating keys until he gets n that
> are very close to each other, and then putting resources into making
> those n servers stable. There is suddenly a black hole in the 
> descriptor space. He can't target a particular hidden service, but a)
> he can be a nuisance, and b) for hidden services that require high 
> availability, a one-period gap may still be significant.

I agree with you that this problem is not acceptable! I simply
overlooked it; thinking that if someone is not able to target an attack
to a certain service he wouldn't perform it. But why not attack the
whole system, just because it's possible?

The problem occurs when performing replication by identifier ranges,
rather than independently for every single descriptor as you propose:

> It would be neat to have a design where each hidden service 
> descriptor has its own set n of redundant locations. This approach 
> would mean that there is no way to predict how to attack *any* piece 
> of the keyspace.

Such a solution is possible. But there is another problem with your
approach: It might be too expensive. The advantage in replicating whole
identifier ranges is that all descriptors of that range can be
replicated in a single (maybe large) message. When descriptors are
(seemingly randomly) distributed among the whole ring, we need a message
exchange for every descriptor.

What do you think about combining both approaches?

- - The DHT nodes perform replication on the "ring level" to handle node
failures and single dishonest nodes. This should be limited to e.g. 1
additional copy for every descriptor, and replication could be performed
by the DHT nodes themselves.

- - Server and client (and maybe directory nodes) agree on a schema to
generate multiple descriptor IDs for the same descriptor. Replicas could
be maintained by the server, or (as you suggested) by the directory
nodes. I could imagine that 2 copies are enough for every descriptor.

This leads to 2x2=4 copies for one descriptor, or even more if more
conservative values are chosen for either replication level.

Headaches are still there, but I think I could live with such a
solution. Please correct me if I overlooked another thing!

> Not considered stable? But who is stable and who is not is a function
> of who's in the network -- as the median uptime goes up or town, the
> routers on the borderline will oscillate between being called stable
> and not.
> 
> (We're going to be working on more useful definitions of stable -- 
> see proposal 108 -- but I don't think that will change the above 
> note.)
> 
> So it might be that the HidServ flag, or whatever we call it, should
> not simply be a copy of the Stable flag. One option is to demand
> that the router has been considered Stable for the past n periods.
> But does that still have the same flaw, just shifted? Or is it all a
> matter of damping the oscillations adequately?

The idea is to require nodes to be in the Tor network for a certain time
before they can become part of the DHT. Of course, that should not
depend on the median uptime of nodes, but rather on a constant time,
e.g. one time-period. As pre-studies showed, churn is very small in the
Tor network (compared to applications for which DHTs are usually applied
to). And I expect it to be even smaller when we consider only nodes that
are up for 24+ hours.

> Also, we should recognize that servers don't lose their stable flag 
> immediately -- many clients continue to use old network statuses for 
> perhaps hours after the node goes away / becomes unstable / etc.

But both, server and client would realize very quickly that a router is
gone and use the next best match in the routing table.

> We should consider how to handle the situations when different users 
> have different opinions of the list of valid authorities. This will 
> happen each time we add a new authority and not everybody has 
> upgraded, and it will also happen from time to time as authorities 
> move locations (as happened recently with moria1 and moria2).

The list of valid authorities was something that I expected to be solved
somewhere else. Isn't there something like consensus on directories? How
do Tor clients achieve a consensus about Tor routers for common onion
routing? The idea was to simply rely on that list; and I expected that
this list should be similar at every node (or at least very close to that).

>>> A HS directory node replicates descriptors for which it is 
>>> responsible by downloading them from other HS directory nodes. 
>>> Therefore, it checks its routing table periodically every 10 
>>> minutes for changes.
> 
> a) Why 10 minutes?

No special reason. If it's cheap, we can do that every 60, 10, or 1
second(s).

> b) Related to the above questions, how much overhead (again in terms
> of storage waste and in increased expected number of requests til 
> answer) do we see from a given level of churn based on the 
> oscillation of who gets the HidServ flag?

This answer depends on the solutions we find for the problems above.

>>> When setting up the hidden service at introduction points, a 
>>> hidden service provider does not pass its own public key, but the
>>> public key of a freshly generated key pair. It also includes 
>>> this public key in the hidden service descriptor together with 
>>> the other introduction point information. The reason is that the
>>> introduction point does not need to know for which hidden
>>> service it works, and should not know it to prevent it from
>>> tracking the hidden service's activity.
> 
> Nice trick.

Thanks, but I have to give the credits to Lasse and Paul. They describe
something like that in their Valet Node paper.

> But why a keypair? Why not just a secret cookie? Right now 
> introduction points hear the hash of the public key as their cookie,
> and clients tell the hash of the public key to identify which
> service they want to be introduced to, and sign it with the
> corresponding private key.

The idea was that a client needs to encrypt the INTRODUCE1 message to
the server with some key. And my concern was that someone could find out
that this encryption was done with the server's public key. But you may
be right, that there is no reason for this concern. So it should be
sufficient to use a fresh introduction cookie and encrypt the INTRODUCE1
message with the server's public key.

>>> Instead of downloading descriptors from a hidden service 
>>> authoritative directory, a hidden service client downloads it 
>>> from a randomly chosen hidden service directory that is 
>>> responsible for keeping replica for the descriptor ID.
> 
> And if it doesn't get one, it tries a new random one until it has 
> tried all n?

Correct.

>>> descriptor-id = h(permanent-id + h(time-period + cookie))
>>> 
>>> "permanent-id" is the hashed value of the public key of the 
>>> hidden service provider, "time-period" is a periodically changing
>>> value, e.g. the current date, and "cookie" is a shared secret 
>>> between the hidden service provider and its clients. (The 
>>> "time-period" should be constructed in a way that periods do not
>>> change at the same moment for all descriptors by including the 
>>> "permanent-id" in the construction.)
> 
> Huh? I don't understand the statement in parentheses.

What I meant is that descriptor-ids should not change all at once. There
would be much work to create new descriptor copies and distribute them,
e.g. every day at 4:00am, and then there is silence for 24 hours. In
order to circumvent this, I envisioned a way to make the switching from
one period to the next depend on the public key of the service. Then,
every server would have it's own switching time, and switchings of all
descriptors would be distributed among the whole day.

> (Speaking of which, what period are we thinking of? An hour? A day?)

I always thought this to be one day for no special reason. Perhaps it's
quite expensive to let it be just one hour, because a service needs to
publish a new descriptor (and all of its copies) every hour then. Maybe
we need to maintain descriptors for more than one period (at least while
switching periods) in case that clients' clocks skew compared to the
server's clock.

>>> 3  The network status format needs to be extended by a new status
>>> flag to denote that a router is a hidden service directory.
> 
> We may want to call this flag HIDSERV1 or the like, so we can specify
> that it is serving version 1 of the hidden service scheme. That way
> we can transition to future schemes more easily.

Then, we sould rather call it HIDSERV2, because versions 0 and 1 are
already given to the existing hidden service schemes. But sure, we
should give it a number.

> There. Those are enough questions for now.  :)

And I hope that I had some answers on those questions. Anyway, your
questions show to me that there are likely to be even more flaws in the
design. When was it that Google expects a stable implementation of this? ;)

- --Karsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGbxwA0M+WPffBEmURAhFZAKCyeDhY2waWnSR60J9X2Mm/H3BPcgCffaOK
GPGFqv7AOCQFej2cp2RoEPQ=
=pibj
-----END PGP SIGNATURE-----



More information about the tor-dev mailing list