Re: [tor-dev] Hidden Service Scaling

9 May 2014

      On 09/05/14 20:05, waldo wrote:
...
El 04/05/14 07:42, Christopher Baines escribió:
...
On 04/05/14 11:43, waldo wrote:
...
El 02/05/14 02:34, Christopher Baines escribió:
...
On 02/05/14 00:45, waldo wrote:
...
I am worried about an attack coming from evil IP based on
forced disconnection of the HS from the IP. I don't know if
this is possible but I am worried that if you pick a new
circuit randomly could be highly problematic. Lets say I am
NSA and I own 10% of the routers and disconnecting your HS
from an IP I control, if you select a new circuit randomly,
even if the probabilities are low, eventually is a matters
of time until I force you to use an specific circuit from
those convenient to me in order to have a possible
circuit(out of many) that transfers your original IP as
metadata through cooperative routers I own and then do away
with the anonymity of the hidden service.
Yeah, that does appear plausible. Are there guard nodes used
for hidden service circuits (I forget)?
No idea, according to this docs 
https://www.torproject.org/docs/hidden-services.html.en  there
aren't guards in the circuits to the IP in step one(not
mentioned). They are definitely used on step five to protect
against a timing attack with a corrupt entry node.
Even if they are used, I still see some problems. I mean it
looks convenient to try to reconnect to the same IP but in real
life you are going to find nodes that fail a lot so if you picked
an IP that has bad connectivity reconnecting to it is not gonna
contribute at all with the HS scalability or availability of your
HS, on the contrary.
I don't think a minority of bad IP's will do much to hurt a hidden 
service.
Hi Christopher. You are correct a minority can't do much harm, but
they don't contribute. What's the point on keeping them? I don't
meant to be rude, but also minority is relative. Can you please tell
us what is the total number of IPs? I ask you because you were
working there so you likely know better. If is 3 then one bad node is
33% of failed connections, If they are 50 one is only 2%.
I agree that it would be good if the service could detect and avoid
"bad" IP's, but I don't yet see a good method for doing so (that fits
within the rest of the design).

Regarding the number of IP's, unfortunately I also don't know. This is
possible to look up though, as you could modify a node running in the
real network to log the number of nodes it considers choosing for an IP
(I just haven't got time to do that atm). There might also be an easier
way to do it with some existing Tor network stats tool.
...
...
...
Maybe a good idea would be to try to reconnect and if it is
failing too much select another IP.
It currently does do this, but on probably a shorter time period
than you are suggesting. It keeps a count of connection failures
while trying to reconnect, but this is reset once a new connection
is established.
Yes I meant measuring a larger time period and over different
circuits as the cause of disconnection could have being the circuit
failing and not the IP.
What happens if the HS just goes offline for a while? It keeps trying
to connect, finds that it can't connect to the IPs and picks another
set? You are differentiating that case?
I am unsure what you mean here, can you clarify that you do mean the
"HS", and what "It" refers to?
...
How do they coordinate which one publishes the descriptor? Wich one
puts the first descriptor?
So, starting from a state where you have no instances of a hidden
service running.

You start instance 1, it comes up and checks for a descriptor. This
fails as this service is new and has not been published before. It picks
some introduction points (does not matter how), and publishes a descriptor.

You then start instance 2, like 1, it comes up and checks for a
descriptor. This succeeds, instance 2 then connects to each of the
introduction points in the descriptor.

So in terms of coordination, there is none. You have to start the
instances one after the other (you just have to start one before the
rest in the general case).

Thinking through this now has also brought up another point for
interesting behaviour which I don't think I have tested, what happens if
the descriptor contains 1 or more unreachable IP's at the time the
second instance retrieves it... (just thought I would note this here).
...
...
This gets complicated, as you need to ensure that each instance of
the service is using the same introduction points,
I saw your answer to another person and seems to me is related to
this you are saying
If the "service key"'s (randomly generated keys per introduction
point) are used, then this would complicate/cause problems with the
multiple instances connecting to one introduction point. Only one key
would be listed in the descriptor, which would only allow one
instance to get the traffic.
What if the instances interchange keys and use the same?
Master/slave for example and one of them take the master role if the
master goes offline. Lets say master instance creates the IPs and
sends a message to the rest to connect there.
When designing this, I chose to try and avoid any direct instance to
instance communication or master/slave relationships. This decision has
advantages and disadvantages.
...
How about changing the descriptor to host several keys per IP in
case previous is not possible/too difficult?
That would reveal some information in the descriptor about the number of
service instances, it would also require some HSDir logic to combine the
descriptors uploaded by different instances.
...
Why this needs to be ensured? Does it breaks something? I understand
it could be convenient to have at least two to avoid correlation
attacks from the IP but why all? What would happen if one instance
goes offline? The whole thing breaks? The desirable behavior I think
in that case is that other instances take over and split load as if
nothing happened.
If the private part of the public key included for an IP is only held by
one of the n instances, then only that one instance with the private
part of the key will get any of the clients.
...
If they use the same key, you could send the rendezvous message to
all instances from the IP as all would have the same private key and
can decrypt it (if the IP is not shared by different HS, don't know
if this is possible currently). So the message doesn't gets lost in a
failed circuit. If it is shared by several HSs some routing could be
convenient but not necessary IMHO as they don't have the key to
decrypt and statistics gathering could be blinded by the HS sending
bogus RV messages that later discards.
Instances could negotiate which one is going to answer even if the
one who is going to answer is not connected to the IP if instances
talk to each other.
Let's say I could have a master that receives information of the load
of slaves and receives the RV messages all instances receive. Later 
instructs the instance with less load to answer.
This is something interesting that this design allows, as the instances
could communicate with the IP's to dynamically allocate new clients.
...
...
it seems to me that tracking connectivity failures over the long
term, and changing IP on some threshold could break this.
Why would break it? I would create new IPs connect to them and would 
keep the old ones until the new ones become accessible (I could
detect this by monitoring if I receive messages or by querying the
HSDirs). The IP could act non cooperatively and send me bogus
messages to try to confuse the HS and avoid it going away so probably
checking both could be a good idea. I could also test if I start
receiving messages through the new IPs. Just ideas.
If it is not a problem with the IP, but with an instances local
connection, that one instance would decide to switch that IP out, upload
a new descriptor, and thus break the consistency between the different
instances (the different instances would not be using the same IP).
...
...
...
If the IP is doing it on purpose the HS Is going to go away so
the control the IP has disconnecting your HS is capped for any
attack known or unknown. If is not on purpose the HS goes
throwing away failing nodes until it picks a good node as IP. I
think it would cause over time, the tor network
re-balance/readapt to new conditions itself. For instance in the
case some IP is overloaded (maybe by DoS) causes the HS to go
away from the IP.
I would also rotate the IPs after using them some time. I don't
think is good to have one IP for too long. Doesn't sounds good to
me. If for instance I am big daddy and know your IPs I could go
there seize the computers and start gathering funny statistics
about your HS. Or simply censor your HS by dropping messages from
clients trying to send you the rendezvous point (is this
possible? looks like it is if I drop introduce messages and
generate fake ones). You wouldn't even know cause I can keep your
connected and receiving fake connections. Only maybe if you try
to check the IP by trying to send a rendezvous point from your HS
to your HS (this IP quality test would be great if tor would do
it periodically). I somehow do it myself manually  when I notice
the HS is superhard to reach. Sometimes it works great, sometimes
even being turned on the server and online, is not visible. So
you have to take down tor and restart it and wait again for a
while.
I was thinking maybe you could select new ones and inform HSDirs
about the change and after the new ones are known end circuits to
the previous IPs and with that avoid the overhead of the
rotation.
I would rebuild circuits to the IP from time to time (originating
from the HS). Multiple connections to the same IP would permit to
do this better since I can make a new one and afterwards kill a
previous circuit remaining connected all the time.
Lots of things here, generally, some things seem quite hard to do
in a uncoordinated, distributed manor (e.g. IP rotation).
Why uncoordinated?
Simply because that is how I have chosen to approach the issue. There
will be advantages and disadvantages.
...
Looks to me it would be convenient instances
could talk. Load balance, taking over of failed instances, etc. Would
take work to do that for sure but doesn't seems impossible to me. I
guess the HSDir code would have to be modified to be able to host new
signed information coming for the same HS while maintaining the old
one. Maybe follow signed commands from the HS. Delete this IPs, add
this other IPs with some limit to avoid the HS attack HSDirs flooding
them to store bogus info). New question here could anyone flood an
HSDir by posting a zillion descriptors for a zillion bogus HSs? The
HS would have to select new ones, publish them wait until they become
available before dropping circuits to the old ones.
I think it is possible to load balance new clients without needing
direct instance to instance communication.
...
...
And I am not to sure that things like IP rotation and rebuilding
circuits to IP's will even help with anonymity issues.
Regarding entry guards this is one article speaking about them,
don't know if up to date:
https://blog.torproject.org/category/tags/entry-guards
Seems they are always used for all sort of circuits including HS. So
if you reused the code they are being selected.
I am still concerned that if things stay too long in one way, big 
players (antidemocratic governments for instance) could do things.
Keep in mind that if you have more running instances of your HS the
chances to locate one of them increases since I only have to locate
one of your instances to know who you are.
There are also other factors that make it harder to locate any instance
of a service with multiple instances. For example, it becomes harder to
correlate data center power failures with hidden service failures if
that service is hosted in multiple physical locations.
...
Ok take a look at this attack, correct me if I am wrong and some
point is not possible (I invite anyone to proof me wrong). As I said
I appreciate your work, but it needs to be challenged to be accepted
by the community so it doesn't stays in a limbo of "I don't know" and
is better to patch every possible hole before it becomes mainstream.
Suppose I am an totalitarian government and you are a dissident
running an HS over Tor in the same country.
1 - I start introducing high availability high bandwidth corrupt
nodes to the network across the globe (I could rent servers in case
you decide to connect to nodes offshore or simply deploy nodes in
another country), the more I put the higher the chances of being a
stone in your anonymity path. To lower the budget I could host
several Tor routers in one computer with several network interfaces
and fast CPU/crypto hard for OpenSSL.
2 - I see you are using some IPs (I can query the HSDirs to get some)
so If I am not lucky to be your HS's IP at start, I go flooding those
you select to take them offline in order to force you switch so you
pick eventually one of my corrupt nodes as IP. Is not clear to me if
choosing them in a deterministic way gets in the way of this or
helps. If is deterministic I could precalculate how many nodes I have
to force go offline until you pick one of mine, so I could shutdown
those nodes that will never be selected and lower my budget at least.
I could bribe some ISP to rent servers with specific IP numbers(don't
know if you selected this) or bribe the IP operator if he/she
publishes the contact email. You can't even suspect if you don't know
the flooded IP operator as is totally normal a node going offline. So
no dust raised (there is a way to make a Tor router publicly publish
it was attacked so other ppl can be warned?).
With the code I published currently, being deterministic helps, as you
could create nodes with identities in the right regions (just like the
attack against HSDir's).
...
3 - I become one of your IPs at least. This is a good achievement.
From now on I know you are going to connect back to me using some
circuit that can contain corrupt nodes or not when I disconnect you.
When you connect back to me you have the counter reset so I can
disconnect you as much as I want.
4 - I know your last node so if it is not mine I disconnect your
circuit until you select a circuit that contains one of my corrupt
nodes as the last one.
5 - When you do I can see the previous node in your circuit. If it
is not mine I disconnect you and go to step 4. If it is, I learn
about one of your guards. I stay for a while going to step 4 to
enumerate your guards. The more you have the longer it takes, the
less guards the easier for me. But I can continue the attack as long
as I know one of the guards to gain time and see if I have success. I
could flood your guard to force you select another guard and
accelerate the process. Or globally block access to the guard.
6 - Once I learn all your guards I can do some things in front of
that.
- Since your instance is going to connect to those nodes for a long 
while, I could censor your instance flooding those nodes at least
until you notice and select new guard nodes (I can be insidious here
repeating the attack over and over again and for each instance). I
could wait to enumerate all the guard nodes of all of your instances
since all connect to my IP.
- Since I am a big government and I have control of the ISP you are 
using, I could monitor incoming connections to your guard nodes. I 
record all network IP numbers connecting  to those nodes for a
while. Maybe I could filter here some nodes with some heuristics
(nodes that only connect to those guards since the probability of
another node connecting to those specific guards should be low) but
not necessarily as I am going to filter later. Notice that HSs stay
connected for looong time periods so the connection time of a server
should be longer than a client and I could discard nodes with that
information.
Once I have enough information I disconnect you from the net some 
small amount of time to avoid you leave my IP leaving some room for 
random failed circuits. I can tell the ISPs to do this for me. Is 
totally normal a disconnection so no dust raised. I can take my time 
too. Disconnect you today, wait some time and disconnect you again.
I could do some things for the case of an HS hosted offshore. Bribe
ISP employees, cause DoS to ISPs or individual computers, sell
backdoored routers, backdoor router firmware, but that is less
realistic and harder. Not discardable IMHO but I am going for the
easy case here.
Notice I don't care which circuit you use from now on to reconnect
back to me even if you select new guards. I could monitor if your HS
answers to RV messages using several preselected RV points too.
If once I disconnect those nodes, I don't see any instance go away I 
discard those nodes as hosting the HS.
If I see some of the instances go away then your node is in that
subset.
I perform a binary search here disconnecting half nodes every time
so the disconnection number it takes me is O(log(n)) where n is the
total number of nodes I see connecting to those guards.
I repeat each of this steps several times to filter with statistics
the casual circuit failure of your HS to my IP.
If two or more instances use the same IP I would still see some 
instances staying or going so seems it doesn't protects against the 
attack at all even if they are indistinguishable. If you close
circuits and reopen new ones from time to time I would get noise here
but maybe I could filter with statistics.
- I could in some cases seize a guard node or bribe the operator.
If so far there are no flaws, I can spy you to know where you are 
hosting computers and seize them without giving you time to turn
them off and use plausible deniability crypto soft (truecrypt) and be
able to claim you where routing instead of hosting the HS (by cloning
the router without the HS data).
With multiple instances seems to me now becomes desirable to host at 
least one router per instance, to be able to deny you where hosting
and claim you were routing. As looks it won't be possible to
correlate the router down with the HS down (other instances would
hide that if they are indistinguishable and take over when one
instance goes down).
Now if you instead rotate the IPs from time to time, I would be
forced to go back to step 2 of the attack but on the other side your
chances of selecting a corrupt IP or a corrupt node in your circuits
increase. I as hosting one of your IPs would have less control since
is not going to last forever and would be time limited.
Changing circuits could introduce noise in step 6 to some extent, on
the other side increases the chances of selecting a corrupt node in
your circuit.
So probably all of this would have to be studied with statistics and 
current Tor network size.
Looks to me this could have more implications in other areas.
I didn't go through that very thoroughly, but it sounds reasonable.
...
...
...
In some previous messages about the subject I saw that HSDirs
provide all the HS IPs. I don't like this way of doing things
since let's say I have 6 IPs to my HS available to everyone. To
cause a DoS to your HS seems to me all I have to do is cause a
DoS to the IPs. And there is no need for everyone to know all the
IPs of one HS all the time. All one user needs to connect is just
some maybe for redundancy but not all.
Is there some way to only provide part of the IPs of one HS to
one user? Avoid enumeration? Maybe distribute partial information
to HSDirs? Don't know, just thinking. Maybe "abuse" some caching
effect on HSDirs and publish partial IP information on one end
and partial in another end that only reaches all users in
entirety over time.
As the set of IP's is so small,
Again small is relative. Earlier you mentioned some nodes failing 
where not going to affect too much the service so seems contradictory
to me. Can you please mention numbers? Can't this number be
increased?
Sorry, I should have been more specific. The set of IP's I am referring
to here are those used by a service. The number is determined by an
algorithm that adjusts the number based on the services load. I think
that this is around the 3 to 10 range (but this is a guess).

If this is roughly correct, it becomes very hard to distribute strict
subsets of the 10 IP's, in such a way that no one can learn about all 10.
...
...
I cannot think of any practical way to do this without it being
trivial to break.
This is not directly related to your work but could be worth
discussing. I was thinking that one property that maybe could be
exploited could be the fact that the whole Tor network has lots of
computational power that is hard to match by a single player (unless
the player is really big). This is a rough idea that could contain
flaws and maybe could be improved.
What if lets say the IP information is encrypted by the HS,  doesn't 
provides the key and makes Tor clients "bruteforce" them to open the 
encrypted message containing the IP. All IP keys scattered through
the keyspace that could be larger or shorter depending on the time I
want you to spend looking for it. So any IP would have equal chance
of being found. Passing the key through an ASIC resistant function
and then encrypt with the result so big players  would have to use at
most GPU and somehow equalize different CPU powers through memory
bandwidth. All Tor clients start looking at a different random
position of the keyspace until they find one key to desencrypt one
IP.
From there they start to communicate with the IP if available and
keep looking for the rest of the IPs (to be able to reconnect if the
RV and the initial used IP go offline). Maybe re-connection to the RV
could be desirable too to some extent to improve availability of HS
in case the circuit fails. Don't know if currently possible.
Maybe adding 1 bit of the key inside the encrypted message for
another encrypted IP so finding through a route of decryption makes
it more feasible than starting from zero (this could be a good idea
or not). Therefore forcing anyone to follow a decryption path
depending on where they start decrypting IPs information.
To explain the idea better lets say I have 3 introduction points A B
and C, but I don't see why there couldn't be more specially since the
bulk of the traffic goes through the rendezvous and not through the
IP, IPs could be shared by several HS and since is harder to cause a
DoS to more nodes than just a few.
lets say one Tor client starts looking at a random position and
finds the key for B (now it can connect to B if available and pass
the RV message) once decrypted it gets one bit for the key of C so
starts looking for the key of C as is easier to find than the key for
A. Then gets C and that gives it bits for A.
Another Tor client starts random finds A and gets 1 bit for B (now
it can connect to A and pass the RV message if available) goes to B
and gets the bit for C. finds C.
After a while the HS rotates IPs and the process starts again (not 
necessarily all at once could be a flow in a way some get replaced 
maintaining part of the old ones). So anyone trying to cause a DoS
would have to perform all of that work again and be very quick to
find all of the IPs before they are rotated again by the HS in order
to flood all of the IPs. So at most all they could do is turn the HS
intermittent (if enough CPU power and enough bandwidth). On the other
side the Tor swarn would be very efective at finding them all.
The big question that remains to me here is how much this causes a
big player waste a load of resources without pushing out of the
network small players (mobile devices for instance). The memory hard
functions equalizes somehow devices by the memory bandwidth limit but
still can make large differences. Seems to me that the more IPs are
selected for an HS the more this can be achieved.
One option I was thinking to fix that problem is for instance the HS 
could function normally if everything is running smooth, and if it 
detects that can't create circuits to the IPs switch to protective
mode in a way that things work as usual if there is no attack but the
system changes to defensive mode if there is an attack.
Let me explain better. I as an HS work as normal. Suddenly I see all
the circuits to my IPs can't be established (or router operator of my
IP publish they are being attacked). I create new ones and see again
that eventually I can't connect. That probably means someone is
attacking my IPs. I switch to defensive mode and publish encrypted
IPs. Clients notice they are encrypted and start each on their side
to look for the IP keys.
I could have degrees here and increase computational complexity to
do away with the attack. Lets say maybe that only mobile devices get
pushed away from the net but CPU and GPU nodes could still connect so
the attack is only to a part.
One problem could appear here is the time it takes the information 
published to HSDirs to reach clients. I ignore a lot of information 
about that part and I beleive is under active research ATM.
The scheme doesn't pushes away big players with large computational 
power and bandwidth but can push away some medium players. Also it 
doesn't protects to other attacks for instance flooding through a RV 
(could puzzle solving be applied here? Let's say the harder the
puzzle the more bandwidth I give to you)
It's an interesting approach, I might try reading it again in a bit.