Proposal: Distributed Storage for Tor Hidden Service Descriptors

Roger Dingledine arma at mit.edu
Thu Jun 7 16:52:55 UTC 2007


On Sun, May 13, 2007 at 07:40:50PM +0200, Karsten Loesing wrote:
>   http://88.84.144.63/114-distributed-storage.txt

Hi Karsten,

Thanks for getting this started. I've put the current version at
https://tor.eff.org/svn/trunk/doc/spec/proposals/114-distributed-storage.txt
and I quote a few paragraphs here with some questions.

>  Tor clients and servers:
>
>    All participants can combine the network status lists received from
>    all directory authorities to one routing list containing only those
>    servers that store and serve hidden service descriptors and which
>    are contained in the majority of network status lists. A participant
>    only trusts its own routing list and never learns about routing
>    information from other parties. This list should only be created
>    on demand by Tor clients and servers that are involved in the new
>    hidden service protocol, i.e. hidden service directory node, hidden
>    service provider, and hidden service client.

We should consider how to handle the situations when different users
have different opinions of the list of valid authorities. This will
happen each time we add a new authority and not everybody has upgraded,
and it will also happen from time to time as authorities move locations
(as happened recently with moria1 and moria2).

I don't think it's a killer issue, since we have some robustness from
redundancy, and we can hope that the partitioning won't be so great
that none of the replicas are the right ones. But we might want to
think about how much waste there will be in storage (people who think
they're in the first n but most people think they're not) and retrieval
(what the expected number of requests will be before you're likely to
get an answer).

>    HS directory nodes accept publish and fetch requests for hidden service
>    descriptors and store/retrieve them to/from their local memory. (It is not
>    necessary to make descriptors persistent, because after disconnecting, the
>    onion router would not be accepted as storing node anyway, because it is
>    not stable.) All requests and replies are formatted as HTTP messages.
>    Requests are directed to the router's directory port and are contained
>    within BEGIN_DIR cells. A HS directory node stores a descriptor only when
>    it thinks that it is responsible for storing that descriptor based on its
>    own routing table. Every HS directory node is responsible for the
>    descriptor IDs in the interval of its n-th predecessor in the ID circle up
>    to its own ID (n denotes the number of replicas).

Not considered stable? But who is stable and who is not is a function of
who's in the network -- as the median uptime goes up or town, the routers
on the borderline will oscillate between being called stable and not.

(We're going to be working on more useful definitions of stable -- see
proposal 108 -- but I don't think that will change the above note.)

So it might be that the HidServ flag, or whatever we call it, should
not simply be a copy of the Stable flag. One option is to demand that
the router has been considered Stable for the past n periods. But does
that still have the same flaw, just shifted? Or is it all a matter of
damping the oscillations adequately?

>    A HS directory node replicates descriptors for which it is responsible by
>    downloading them from other HS directory nodes. Therefore, it checks its
>    routing table periodically every 10 minutes for changes. Whenever it
>    realizes that a predecessor has left the network, it establishes a
>    connection to the new n-th predecessor and requests its stored descriptors
>    in the interval of its (n+1)-th predecessor and the requested n-th
>    predecessor. Whenever it realizes that a new onion router has joined with
>    an ID higher than its former n-th predecessor, it adds it to its
>    predecessors and discards all descriptors in the interval of its (n+1)-th
>    and its n-th predecessor.

a) Why 10 minutes?

b) Related to the above questions, how much overhead (again in terms of
storage waste and in increased expected number of requests til answer)
do we see from a given level of churn based on the oscillation of who
gets the HidServ flag?

>    When setting up the hidden service at introduction points, a hidden service
>    provider does not pass its own public key, but the public key of a freshly
>    generated key pair. It also includes this public key in the hidden service
>    descriptor together with the other introduction point information. The
>    reason is that the introduction point does not need to know for which
>    hidden service it works, and should not know it to prevent it from
>    tracking the hidden service's activity.

Nice trick.

But why a keypair? Why not just a secret cookie? Right now introduction
points hear the hash of the public key as their cookie, and clients
tell the hash of the public key to identify which service they want to
be introduced to, and sign it with the corresponding private key.

But if it's a one-time keypair, what's the point in proving that you
know the private key that goes with it? Either you do or you don't, but I
don't see why any of the parties involved would care.

What would happen if we just make this a secret string that the client
provides? Like how we do rendezvous cookies now.

>  Hidden service client:
>
>    Instead of downloading descriptors from a hidden service authoritative
>    directory, a hidden service client downloads it from a randomly chosen
>    hidden service directory that is responsible for keeping replica for the
>    descriptor ID.

And if it doesn't get one, it tries a new random one until it has tried
all n?

>      descriptor-id = h(permanent-id + h(time-period + cookie))
>
>    "permanent-id" is the hashed value of the public key of the hidden service
>    provider, "time-period" is a periodically changing value, e.g. the current
>    date, and "cookie" is a shared secret between the hidden service provider
>    and its clients. (The "time-period" should be constructed in a way that
>    periods do not change at the same moment for all descriptors by including
>    the "permanent-id" in the construction.)

Huh? I don't understand the statement in parentheses.

>  Attacks by hidden service directory nodes
>
>    A hidden service directory node could misuse a stored descriptor to track
>    a hidden service's activity and usage pattern by clients. Though there is
>    no countermeasure against this kind of attack, it is very expensive to
>    track a certain hidden service over time. An attacker would need to run a
>    large number of stable onion routers that work as hidden service directory
>    nodes to have a good probability to become responsible for its changing
>    descriptor IDs. For each period, the probability is:
>
>      1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
>      as total
>      number of hidden service directories, c as compromised nodes, and r as
>      number of replicas

I still worry about an attacker generating keys until he gets n that are
very close to each other, and then putting resources into making those n
servers stable. There is suddenly a black hole in the descriptor space.
He can't target a particular hidden service, but a) he can be a nuisance,
and b) for hidden services that require high availability, a one-period
gap may still be significant.

(Speaking of which, what period are we thinking of? An hour? A day?)

It would be neat to have a design where each hidden service descriptor
has its own set n of redundant locations. This approach would mean that
there is no way to predict how to attack *any* piece of the keyspace.

It's clear how the hidden services learn where to publish (say, they
can hash the particular descriptor-id with the strings '1' through
'n'). And people trying to fetch the descriptor can do the same process.

What may be more complex is how the hidden service directories will know
what they should be mirroring. I guess one option would be to put the
descriptor-id in the descriptor, and now any hidden service directory
that sees it can compute the hash with "1"..."n" and see where it sits
and who it should publish descriptors to. But this is a 'push' approach
rather than a 'pull' approach. Is that more fragile?

Are there better approaches?

>    3  The network status format needs to be extended by a new status flag to
>    denote that a router is a hidden service directory.

We may want to call this flag HIDSERV1 or the like, so we can specify
that it is serving version 1 of the hidden service scheme. That way we
can transition to future schemes more easily.


There. Those are enough questions for now. :)

Thanks!
--Roger



More information about the tor-dev mailing list