[tor-dev] [Draft Proposal] Scalable Hidden Services

Wed Oct 30 08:11:16 UTC 2013

On Mon, Oct 28, 2013 at 08:49:46PM +0000, George Kadianakis wrote:
> Christopher Baines <cbaines8 at gmail.com> writes:
> 
> > On 28/10/13 13:19, Matthew Finkel wrote:
> >> This is a proposal I wrote to implement scalable hidden services. It's
> >> by no means finished (there are some slight inconsistencies which I will
> >> be correcting later today or tomorrow) but I want to make it public in
> >> the meantime. I'm also working on some additional security measures that
> >> can be used, but those haven't been written yet.
> >
> > Great, I will try to link this in to the earlier thread for some continuity.
> >
> > It seems to me that this is a description of "Alternative 3" in Nick's
> > email. Multiple instances, with multiple sets of introduction points,
> > somehow combined in to one service descriptor? I haven't managed to
> > fully comprehend your proposal yet, but I though I would try and
> > continue the earlier discussion.
> >
> > So, going back to the goals, this "alternative" can have master nodes,
> > but, can have you can also just have this "captain" role dynamically
> > self assigned. Why did you include an alternative here, do you see these
> > being used differently? It seems like the initial mode does not fulfil
> > goal 2 or 3?
> >
> > One of the differences between the alternatives that keeps coming up, is
> > who (if anyone) can determine the number of nodes. Alternative 3 can
> > keep this secret to the service operator by publishing a combined
> > descriptor. I also discussed in the earlier thread how you could do this
> > in the "Alternative 4: Single hidden service descriptor, multiple
> > service instances per intro point." design, by having the instances
> > connect to each introduction point 1, or more times, and possibly only
> > connecting to a subset of the introduction points (possibly didn't
> > consider this in the earlier thread).
> >
> 
> So far, we have avoided defining our adversaries. Here is an example
> of some adversaries (wrt distinguishing multi-node HSes from
> single-node HSes and finding the number of nodes and their status):
> - HS Client. This is a client who knows the descriptor of an
>   HS, and hence all its IPs.
> - Introduction Point. This is an introduction point of the HS. This is
>   a naive version of the next adversary and I will not consider it in
>   the attacks below.
> - Introduction Point + Client: This is an adversary who knows the
>   descriptor of an HS, hence all its IPs, and is also an IP of the
>   Hidden Service. This is superior to the simple 'Introduction Point'
>   adversary and more realistic (since not many Introduction Points
>   will target random HSes, but if you know an HS, you will try
>   targetting it by becoming its IP).
> 
> Let's see how these adversaries are doing if they want to determine
> the number of HS-nodes:
> 
> - Alternative 3: Single hidden service descriptor, one service instance per intro point:
> -- From PoV of a client:
>    If a client suspects that an HS is multi-node, then its number of
>    nodes is simply the number of its introduction points.
> -- Same thing applies for IP+Client.

Based on the proposal it could be even easier than that, but it is
definitely easy.

> 
> - Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
> -- From PoV of an IP+Client:
>    It's trivial for an IP+Client to distinguish a multi-node HS from a
>    single-node HS, by looking at the number of introduction circuits
>    to it. Single-node HSes only have a single IP circuit (IIRC).

Yup, but I think that with the current design, we will need to
decide how much information leakage we can accept. Is it acceptable for
a client or IP to know that we are scaling without knowing the number
of nodes or does the mere fact that we are scaling reveal enough
information to an attacker? I completely understand why we want to
limit the amount of information we reveal, but maybe we should also
categorize these properties into "We can not leak this information" and
"This would be a nice property to have in a design just because the
probability that an attack is successful is proportional to the amount
of information they have about the target."

> 
>    Also, depending on how we assign HS-nodes to IPs it might be
>    possible to find the number of HS-nodes too (or at least a lower or
>    upper limit of them).
> 
> I don't see a way for a client to get the number of nodes of an HS in
> Alternative 4. However, an IP+Client is able to do so in both
> alternatives.
> 
> BTW, as Paul said, if we try to hide the number of nodes (from an
> IP+Client adversary) by establishing multiple circuits from a single
> HS-node to the IP, we should be careful because multiple "same source
> same destination" circuits might lead to nasty attacks.
> 
> Finally, if we go with the "Alternative 4: Single hidden service
> descriptor, multiple service instances per intro point." (which
> currently seems as the best idea to me), we should think of how many
> IPs each HS-node will connect to. There are at least three ways:
> a) An HS-node establishes circuits to all the IPs.
> b) An HS-node establishes circuits to a k-subset of the IPs.
> c) An HS-node establishes circuits to a random number of the IPs.
> 
> From the above, a) trivially reveals the number of nodes to all IPs
> and also establishes too many circuits which is bad for the network. I
> think b) and c) our best options here. We should think of how various
> values of 'k' change our security and availability here, and we should
> think whether randomization actually adds any useful obfuscation wrt
> the number/uptime of HS-nodes.
> 
> We should also think of how we assign HS-nodes to IPs. Lars Luthman
> started doing so in
> https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We
> should think more!

We should also think about how many IPs are selected, because as far as
I can tell, each node is blind to how many peers it has. So, if the
initial hidden service chooses three IPs, and then 5 more nodes are
added, then each of those 5 nodes connect to k out of 3 IPs. How does a
node decide that it should add an IP to the set?

> 
> > Another recurring point for comparison, is can anyone determine if a
> > particular service instance is down. Alternative 4 can get around this
> > by hiding the instances behind the introduction points, and to keep the
> > information from the introduction points, each instance (as described
> > above) can keep multiple connections open, occasionally dropping some to
> > keep the introduction point guessing. I think this would work, providing
> > that the introduction point cannot work out what connections correspond
> > with what instances. If each instance has a disjoint set of introduction
> > points, of which some subset (possibly total) is listed in the
> > descriptor, it would be possible to work out both if a instance goes
> > down, and what introduction points correspond to that instance, just by
> > repeatedly trying to connect through all the introduction points? If you
> > start failing to connect for a particular subset of the introduction
> > points, this could suggest a instance failure. Correlating this with
> > power or network outages could give away the location of that instance?
> >
> 
> Indeed.
> 
> It also seems hard to me to obfuscate the number and status of
> HS-nodes by randomly disconnecting introduction circuits from
> IPs. Especially so if we want to do it without influencing the
> performance of the HS (i.e. avoiding disconnecting circuits when
> clients are using them). Naive solutions will probably allow IPs to
> distinguish random decoy failure from an actual permanent
> power-outage-like failure of an HS-node.
> 
> I'll ignore the random-disconnects idea for now and analyze our
> alternatives with respect to recognizing the status (uptime) of
> HS-nodes:
> 
> - Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
> -- From the PoV of IP+Client:
>    An IP+Client will be able to detect changes in the status of
>    HS-nodes by monitoring its introduction circuits.
> 
> - Alternative 3: Single hidden service descriptor, one service instance per intro point.
> -- From the PoV of a client:
>    Clients can distinguish uptime of HS peers, since they know that
>    each peer has one IP (and they know all the IPs of a hidden
>    service).
> -- Same thing applies for IP+Client.
> 
> It seems to me that an IP+Client adversary is always able to find the
> number and status of HS-nodes. The proposed ways to fix this is to add
> measures like random-circuit-disconnects and connecting to IPs
> multiple times from a single HS-node. Both of these solutions seems
> easy to get wrong and hard to prove secure. We should think more about
> them!

Perhaps we should look at hybrid solutions. None of them seem
worthwhile at this point, but they are worth considering.

- Shared Introduction Points, Multiple Descriptors
-- From POV of IP+Client:
   The client can retrieve all descriptors from the responsible HSDirs
   thus revealing all of the Introduction Points. The IP will receive
   connections from a subset of the hidden service nodes. This reveals
   the fact that it is a multi-node hidden service, but does not
   directly reveal how many nodes it contains.

   Another, similar, option is for a HS to publish different descriptors
   for each period which include different IPs. Initially this will
   limit the likelihood of correlation and therefore prevent leaking the
   HS is multi-node and how many nodes. However, over time, this will be
   broken, as well, so this is only acceptable for short lived (possibly
   single period) hidden services.

   Similarly, an HS publishes multiple descriptors to the same HSDir and
   the HSDir randomly chooses one of the descriptors everytime it is
   queried. However, this suffers the same fate.

- Shared Introduction Points, Combined descriptors
-- This provides the HSDir with too much power but for the sake of
   completeness we can see that this will improve the simplicity of
   setting up and configuring the hidden services but it is no more
   secure than the Shared Introduction Point design.

The more I think about this problem the more I convince myself we need
another layer of abstraction between the client and the IP, perhaps
proxy the INTRODUCE1 cell via the HSDir that is responible for the HS's
descriptor and the client never need know the IPs. This requires a lot
more thought (more than I can give it right now, at least).

Thanks for the analysis!