[tor-dev] [Draft Proposal] Scalable Hidden Services

Mon Oct 28 20:49:46 UTC 2013

Christopher Baines <cbaines8 at gmail.com> writes:

> On 28/10/13 13:19, Matthew Finkel wrote:
>> This is a proposal I wrote to implement scalable hidden services. It's
>> by no means finished (there are some slight inconsistencies which I will
>> be correcting later today or tomorrow) but I want to make it public in
>> the meantime. I'm also working on some additional security measures that
>> can be used, but those haven't been written yet.
>
> Great, I will try to link this in to the earlier thread for some continuity.
>
> It seems to me that this is a description of "Alternative 3" in Nick's
> email. Multiple instances, with multiple sets of introduction points,
> somehow combined in to one service descriptor? I haven't managed to
> fully comprehend your proposal yet, but I though I would try and
> continue the earlier discussion.
>
> So, going back to the goals, this "alternative" can have master nodes,
> but, can have you can also just have this "captain" role dynamically
> self assigned. Why did you include an alternative here, do you see these
> being used differently? It seems like the initial mode does not fulfil
> goal 2 or 3?
>
> One of the differences between the alternatives that keeps coming up, is
> who (if anyone) can determine the number of nodes. Alternative 3 can
> keep this secret to the service operator by publishing a combined
> descriptor. I also discussed in the earlier thread how you could do this
> in the "Alternative 4: Single hidden service descriptor, multiple
> service instances per intro point." design, by having the instances
> connect to each introduction point 1, or more times, and possibly only
> connecting to a subset of the introduction points (possibly didn't
> consider this in the earlier thread).
>

So far, we have avoided defining our adversaries. Here is an example
of some adversaries (wrt distinguishing multi-node HSes from
single-node HSes and finding the number of nodes and their status):
- HS Client. This is a client who knows the descriptor of an
  HS, and hence all its IPs.
- Introduction Point. This is an introduction point of the HS. This is
  a naive version of the next adversary and I will not consider it in
  the attacks below.
- Introduction Point + Client: This is an adversary who knows the
  descriptor of an HS, hence all its IPs, and is also an IP of the
  Hidden Service. This is superior to the simple 'Introduction Point'
  adversary and more realistic (since not many Introduction Points
  will target random HSes, but if you know an HS, you will try
  targetting it by becoming its IP).

Let's see how these adversaries are doing if they want to determine
the number of HS-nodes:

- Alternative 3: Single hidden service descriptor, one service instance per intro point:
-- From PoV of a client:
   If a client suspects that an HS is multi-node, then its number of
   nodes is simply the number of its introduction points.
-- Same thing applies for IP+Client.

- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From PoV of an IP+Client:
   It's trivial for an IP+Client to distinguish a multi-node HS from a
   single-node HS, by looking at the number of introduction circuits
   to it. Single-node HSes only have a single IP circuit (IIRC).

   Also, depending on how we assign HS-nodes to IPs it might be
   possible to find the number of HS-nodes too (or at least a lower or
   upper limit of them).

I don't see a way for a client to get the number of nodes of an HS in
Alternative 4. However, an IP+Client is able to do so in both
alternatives.

BTW, as Paul said, if we try to hide the number of nodes (from an
IP+Client adversary) by establishing multiple circuits from a single
HS-node to the IP, we should be careful because multiple "same source
same destination" circuits might lead to nasty attacks.

Finally, if we go with the "Alternative 4: Single hidden service
descriptor, multiple service instances per intro point." (which
currently seems as the best idea to me), we should think of how many
IPs each HS-node will connect to. There are at least three ways:
a) An HS-node establishes circuits to all the IPs.
b) An HS-node establishes circuits to a k-subset of the IPs.
c) An HS-node establishes circuits to a random number of the IPs.

>From the above, a) trivially reveals the number of nodes to all IPs
and also establishes too many circuits which is bad for the network. I
think b) and c) our best options here. We should think of how various
values of 'k' change our security and availability here, and we should
think whether randomization actually adds any useful obfuscation wrt
the number/uptime of HS-nodes.

We should also think of how we assign HS-nodes to IPs. Lars Luthman
started doing so in
https://lists.torproject.org/pipermail/tor-dev/2013-October/005615.html. We
should think more!

> Another recurring point for comparison, is can anyone determine if a
> particular service instance is down. Alternative 4 can get around this
> by hiding the instances behind the introduction points, and to keep the
> information from the introduction points, each instance (as described
> above) can keep multiple connections open, occasionally dropping some to
> keep the introduction point guessing. I think this would work, providing
> that the introduction point cannot work out what connections correspond
> with what instances. If each instance has a disjoint set of introduction
> points, of which some subset (possibly total) is listed in the
> descriptor, it would be possible to work out both if a instance goes
> down, and what introduction points correspond to that instance, just by
> repeatedly trying to connect through all the introduction points? If you
> start failing to connect for a particular subset of the introduction
> points, this could suggest a instance failure. Correlating this with
> power or network outages could give away the location of that instance?
>

Indeed.

It also seems hard to me to obfuscate the number and status of
HS-nodes by randomly disconnecting introduction circuits from
IPs. Especially so if we want to do it without influencing the
performance of the HS (i.e. avoiding disconnecting circuits when
clients are using them). Naive solutions will probably allow IPs to
distinguish random decoy failure from an actual permanent
power-outage-like failure of an HS-node.

I'll ignore the random-disconnects idea for now and analyze our
alternatives with respect to recognizing the status (uptime) of
HS-nodes:

- Alternative 4: Single hidden service descriptor, multiple service instances per intro point.
-- From the PoV of IP+Client:
   An IP+Client will be able to detect changes in the status of
   HS-nodes by monitoring its introduction circuits.

- Alternative 3: Single hidden service descriptor, one service instance per intro point.
-- From the PoV of a client:
   Clients can distinguish uptime of HS peers, since they know that
   each peer has one IP (and they know all the IPs of a hidden
   service).
-- Same thing applies for IP+Client.

It seems to me that an IP+Client adversary is always able to find the
number and status of HS-nodes. The proposed ways to fix this is to add
measures like random-circuit-disconnects and connecting to IPs
multiple times from a single HS-node. Both of these solutions seems
easy to get wrong and hard to prove secure. We should think more about
them!