Hi all,
We are a university research group and following is a proposal for possible optimization on Tor's rendezvous circuit design. It's not written as a formal Tor proposal yet because I want to have a discussion with the community first, get some feedbacks and make sure it does not introduce new vulnerabilities.
So please feel free to comment if you have any thoughts or find any holes :)
PS: a txt version of the proposal is attached in this mail as well.
------------------------------------------------------
The changes described in this document is focused on speeding up hidden service while at least keeping the same level of anonymity as current version provides.
Naming conventions:
HSv3: current version of hidden service protocol HS5h: hidden service protocol that uses 5 hops HS: hidden service client: The onion proxy that tries to communicate with HS HSDir: hidden service directory RP: rendezvous point
1. The 5 hops rendezvous circuit
The setup on HS side is the same as HSv3. HS picks couple introduction points and uploads the descriptor to HSDir. The main difference is how to pick the RP. In HS5h, Instead of letting the client pick whatever node it wants, both client and HS use negotiation protocol to pick a RP collaboratively.
Following is a high level description for the rendezvous circuit setup process in HS5h:
(0) client fetches descriptor for a hidden service. (1) client connects to introduction point. (2) since client and HS are connected via introduction point, they can negotiate a random number using this channel. (For more details, see [RAND_NEGO]) (3) both client and HS maps that random number to a random onion router using the same scheme, so they end up with the same node. This is the candidate RP. (4) both client and HS create a 3 hops circuit using RP as last hop. (5) RP joins the circuit originates from HS to the circuit originates from client. (6) now client and HS are connected. Because their original circuits share the same endpoint(the RP), the length of the path is 5 hops.
Improvement:
(1) Reduction in circuit setup time. Both the client and HS only build 3 hops circuit now. While In current protocol(HSv3), HS needs to build a 4 hops circuit. (2) A shorter circuit means less transmission delay. It uses only one more hop compared to HSv3's Tor2Web mode. Also the circuit length for HS5h's Tor2Web mode can be further reduced to 3 hops. (3) Less setup time on client side for the first connection. In HSv3, a client has to first setup a 3 hops circuit to RP before sending INTRODUCTION1 cell to the introduction point. While in HS5h, this process is removed.
Possible drawback:
(1) An extra step is needed: random number negotiation. This result ins more communication between two nodes and Introduction point needs to remember states for the negotiation.
We think this negotiation step will not introduce a noticeable delay to the whole process. Firstly, it reuses the connection to introduction point for both sides so it requires no extra circuits build up. Secondly, the bottle neck is circuit setup, cell/stream transmission delay is actually pretty low.
2. Negotiate random numbers between two nodes [RAND_NEGO]
H(a) here means digest of message a.
(1) client generates a random number R_a and sends the digest H(R_a) to HS (2) HS remembers H(R_a), generates a random number R_h and sends it back to the client (3) client receives R_h and sends back R_a to HS (4) HS checks R_a against H(R_a) to make sure it's not generated after client receives R_h. (5) Now both client and HS compute a shared random number using R_a and R_c. (for instance, simply add up these two number)
The negotiation protocol can be very simple here because only two parties are involved and the random number does not need to be remained secret.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take control over R_c. So to mitigate this, we can use a variant of Diffie-Hellman handshake:
(1) client generates a public key g^x and sends the digest H(g^x) to HS (2) HS remembers H(g^x), generates a public key g^y and sends it back to the client (3) client receives g^y and sends back g^x to HS (4) HS checks g^x against H(g^x) to make sure it's not generated after client receives g^y. (5) Now both client and HS compute a shared random number: R_c = g^(xy)
3. Rough anonymity analysis for HSv3 and HS5h
3.1. Assumptions, attack rules and constraints
In this section, we assume 3 hops being the minimal circuit length to keep the origin node of a circuit anonymous from the destination.
Two simple attack rules are used to remove nodes from a circuit:
1. If node A is picked by the attacker, then he can pick himself as node A. 2. If the identity of node A is known by the attacker, then he can connect directly to it.
To defense against these attack rules, following constraints need to be enforced for a circuit:
1. No one except the circuit originator can have total control over choice of all nodes on the circuit. Note that it's OK even if the originator does not have total control.
2. Identity for all nodes on the circuit, except the last hop, must remain secret. i.e. only known by circuit originator.
3.2. Why 6 hops is the minimal circuit length for HSv3
First let's begin with an ASCII graph for HSv3 rendezvous circuit:
Client -> (a1) -> (a2) -> (RP) <- (b3) <- (b2) <- (b1) <- HS
Client chooses 3 ORs: a1, a2, RP. HS chooses the other 3 ORs: b3, b2, b1.
Now consider HS as the malicious side. In order to get closer to client, the best it can do is to connect directly to RP, which reduces the circuit length to 3 hops:
Client -> (a1) -> (a2) -> (RP) <- HS
If client is the malicious side, the best it can do is to choose itself as RP, which also reduces the circuit length to 3 hops:
Client <- (b3) <- (b2) <- (b1) <- HS
Taking any node from the original 6 hops circuit will result in 2 hops circuit in either case. As noted above, 3 hops is assumed to be the minimal circuit length.
3.3. Why HS5h only needs 5 hops
HS5h makes use of the fact that the last hop of a circuit is different from all previous hops. Since last hop will be in direct contact with the destination host, its identity does not need to be remain secret. We only have to make sure the last hop cannot be determined by any malicious host at will. Otherwise the malicious side can just pick himself as last hop and thus shorten the circuit.
This is where hop negotiation come into play. A negotiated hop is guaranteed to be a random node and cannot be determined by anyone. So it can be used as the last hop for both client and HS. Since the last hop for both sides overlaps, it's effectively acting as a rendezvous point. In comparison, in HSv3, the rendezvous point is picked and controlled by the client. Therefore it can not be used as the last hop for HSs' rendezvous circuit.
Again, following is an ASCII graph for a HS5h rendezvous circuit:
Client -> (a1) -> (a2) -> (RP) <- (b2) <- (b1) <- HS
Now if the client is malicious, it can only connects directly to the RP, i.e. pick the second last hop as himself. Therefore HS can still maintain a 3 hops distance from the malicious client:
Client -> (RP) <- (b2) <- (b1) <- HS
While in HSv3, the client can pick himself as the RP.
And because this is a symmetric scheme, the same argument goes for malicious HS:
Client -> (a1) -> (a2) -> (RP) <- HS
Either cases, we maintained a 3 hops circuit.
3.4. Summary
We took 3 as the minimal circuit length because that's the default setup in Tor. The analysis should work for any number.
To generalize it, if N is the minimal circuit length, then:
1) without node negotiation, the minimal rendezvous circuit length is 2N. 2) with node negotiation, the minimal rendezvous circuit length is 2N-1.
4. Other possible speed optimization under HS5h
4.1 Optimistic rendezvous mode
Similar to optimistic_data for AP connections, we can implement optimistic_rendezvous mode under HS5h. The basic idea is to optimistically assume any picked candidate RP will be willing to act as a RP.
Since now we need to maintain state in introduction point to help with node negotiation, we can use it to communicate with HS when establishing the rendezvous circuit. So both client and HS can launch circuits to newly negotiated RP simultaneously. If this candidate RP refuses to be a rendezvous point to one side, the other side can be notified via introduction point. Then client and HS can go through the negotiation process again to pick a new candidate RP.
This mode can also be implemented under HSv3 as well. In HSv3, wait for RENDEZVOUS_ESTABLISHED cell and sending INTRODUCE1 cell has to be a sequential operation. With optimistic_rendezvous mode, this can be done in parallel.
NOTE: we need to test RENDEZVOUS_ESTABLISHED rate in real tor network. If the rate is low, then this mode will slow down the setup process instead.
To make this mode work even better, we can have all OR explicitly mark whether it is willing to be a rendezvous point in it's descriptor. This should greatly increase the success rate for receiving RENDEZVOUS_ESTABLISHED.
5. Hazards
a) How to design the scheme for mapping a random number to the same node between client and server?
This will be trivia if it's easy to synchronize node list between two onion proxy. The problem is: how much overhead will be caused by synchronizing node list (or consensus).
b) How much overhead will be introduced by maintaining state for node negotiation?
Since in HS5h, both introduction point and hidden service need to maintain state for node negotiation.
To do so, the client can send something like a negotiation identifier in the initial introduction request. Then for introduction point, it can mark the circuit from the client with this ID. For HS, it can hash the negotiation state structure into a hash table with ID as the key.
Theoretically, this overhead should be low, because the "negotiation ID" is just like rendezvous cookie.
BTW, we might be able reduce number of hops to 4 or even 3 if entry guard is not required. Because entry guard is basically not negotiable.
Qingping Hou dave2008713@gmail.com writes:
Hi all,
We are a university research group and following is a proposal for possible optimization on Tor's rendezvous circuit design. It's not written as a formal Tor proposal yet because I want to have a discussion with the community first, get some feedbacks and make sure it does not introduce new vulnerabilities.
I only quickly skimmed the proposal but it looks interesting.
Particularly so as I'm not aware of many proposals that improve the performance of HSes.
A few comments below.
So please feel free to comment if you have any thoughts or find any holes :)
<snip>
Negotiate random numbers between two nodes [RAND_NEGO]
H(a) here means digest of message a.
(1) client generates a random number R_a and sends the digest H(R_a) to HS (2) HS remembers H(R_a), generates a random number R_h and sends it back to the client (3) client receives R_h and sends back R_a to HS (4) HS checks R_a against H(R_a) to make sure it's not generated after client receives R_h. (5) Now both client and HS compute a shared random number using R_a and R_c. (for instance, simply add up these two number)
The negotiation protocol can be very simple here because only two parties are involved and the random number does not need to be remained secret.
This seems to suffer from the same early-abort issue that the #8244 commit-and-reveal solution suffers from (https://lists.torproject.org/pipermail/tor-dev/2013-November/005878.html).
For example, in this specific protocol, the attacker establishes a channel to the HS, sends her H(R_a) and receives R_h. The attacker then calculates 'R_a + R_h' and if the result does not point to an attacker controlled RP, she aborts the protocol. Then she reconnects and does the same thing again, till she gets a result that is convenient for her.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take control over R_c. So to mitigate this, we can use a variant of Diffie-Hellman handshake:
Yes, this is indeed a problem if the random numbers are small (for example, in [0, <number of ORs>]). A key exchange makes sense here. I I wonder if there are cheaper solutions that don't involve "expensive" operations like modular exponentiations
(1) client generates a public key g^x and sends the digest H(g^x) to HS (2) HS remembers H(g^x), generates a public key g^y and sends it back to the client (3) client receives g^y and sends back g^x to HS (4) HS checks g^x against H(g^x) to make sure it's not generated after client receives g^y. (5) Now both client and HS compute a shared random number: R_c = g^(xy)
- Rough anonymity analysis for HSv3 and HS5h
3.1. Assumptions, attack rules and constraints
In this section, we assume 3 hops being the minimal circuit length to keep the origin node of a circuit anonymous from the destination. Two simple attack rules are used to remove nodes from a circuit: 1. If node A is picked by the attacker, then he can pick himself as node A. 2. If the identity of node A is known by the attacker, then he can connect directly to it. To defense against these attack rules, following constraints need to be enforced for a circuit: 1. No one except the circuit originator can have total control over choice of all nodes on the circuit. Note that it's OK even if the originator does not have total control. 2. Identity for all nodes on the circuit, except the last hop, must remain secret. i.e. only known by circuit originator.
3.2. Why 6 hops is the minimal circuit length for HSv3
First let's begin with an ASCII graph for HSv3 rendezvous circuit: Client -> (a1) -> (a2) -> (RP) <- (b3) <- (b2) <- (b1) <- HS Client chooses 3 ORs: a1, a2, RP. HS chooses the other 3 ORs: b3, b2, b1. Now consider HS as the malicious side. In order to get closer to client, the best it can do is to connect directly to RP, which reduces the circuit length to 3 hops: Client -> (a1) -> (a2) -> (RP) <- HS If client is the malicious side, the best it can do is to choose itself as RP, which also reduces the circuit length to 3 hops: Client <- (b3) <- (b2) <- (b1) <- HS Taking any node from the original 6 hops circuit will result in 2 hops circuit in either case. As noted above, 3 hops is assumed to be the minimal circuit length.
3.3. Why HS5h only needs 5 hops
HS5h makes use of the fact that the last hop of a circuit is different from all previous hops. Since last hop will be in direct contact with the destination host, its identity does not need to be remain secret. We only have to make sure the last hop cannot be determined by any malicious host at will. Otherwise the malicious side can just pick himself as last hop and thus shorten the circuit. This is where hop negotiation come into play. A negotiated hop is guaranteed to be a random node and cannot be determined by anyone. So it can be used as the last hop for both client and HS. Since the last hop for both sides overlaps, it's effectively acting as a rendezvous point. In comparison, in HSv3, the rendezvous point is picked and controlled by the client. Therefore it can not be used as the last hop for HSs' rendezvous circuit. Again, following is an ASCII graph for a HS5h rendezvous circuit: Client -> (a1) -> (a2) -> (RP) <- (b2) <- (b1) <- HS Now if the client is malicious, it can only connects directly to the RP, i.e. pick the second last hop as himself. Therefore HS can still maintain a 3 hops distance from the malicious client: Client -> (RP) <- (b2) <- (b1) <- HS While in HSv3, the client can pick himself as the RP. And because this is a symmetric scheme, the same argument goes for malicious HS: Client -> (a1) -> (a2) -> (RP) <- HS Either cases, we maintained a 3 hops circuit.
3.4. Summary
We took 3 as the minimal circuit length because that's the default setup in Tor. The analysis should work for any number. To generalize it, if N is the minimal circuit length, then: 1) without node negotiation, the minimal rendezvous circuit length is 2N. 2) with node negotiation, the minimal rendezvous circuit length is 2N-1.
- Other possible speed optimization under HS5h
4.1 Optimistic rendezvous mode
Similar to optimistic_data for AP connections, we can implement optimistic_rendezvous mode under HS5h. The basic idea is to optimistically assume any picked candidate RP will be willing to act as a RP. Since now we need to maintain state in introduction point to help with node negotiation, we can use it to communicate with HS when establishing the rendezvous circuit. So both client and HS can launch circuits to newly negotiated RP simultaneously. If this candidate RP refuses to be a rendezvous point to one side, the other side can be notified via introduction point. Then client and HS can go through the negotiation process again to pick a new candidate RP. This mode can also be implemented under HSv3 as well. In HSv3, wait for RENDEZVOUS_ESTABLISHED cell and sending INTRODUCE1 cell has to be a sequential operation. With optimistic_rendezvous mode, this can be done in parallel. NOTE: we need to test RENDEZVOUS_ESTABLISHED rate in real tor network. If the rate is low, then this mode will slow down the setup process instead. To make this mode work even better, we can have all OR explicitly mark whether it is willing to be a rendezvous point in it's descriptor. This should greatly increase the success rate for receiving RENDEZVOUS_ESTABLISHED.
Hazards
a) How to design the scheme for mapping a random number to the same node between client and server?
This will be trivia if it's easy to synchronize node list between two onion proxy. The problem is: how much overhead will be caused by synchronizing node list (or consensus).
Yes, this seems like a central problem to this proposal. Both parties will probably have to negotiate their consensus version in the beginning of the protocol to avoid ambiguities later on (since an attacker can choose between multiple valid consensus versions that might be convenient for him).
On 02/11/2014 01:53 PM, George Kadianakis wrote:
Negotiate random numbers between two nodes [RAND_NEGO]
H(a) here means digest of message a.
(1) client generates a random number R_a and sends the digest H(R_a) to HS (2) HS remembers H(R_a), generates a random number R_h and sends it back to the client (3) client receives R_h and sends back R_a to HS (4) HS checks R_a against H(R_a) to make sure it's not generated after client receives R_h. (5) Now both client and HS compute a shared random number using R_a and R_c. (for instance, simply add up these two number)
The negotiation protocol can be very simple here because only two parties are involved and the random number does not need to be remained secret.
This seems to suffer from the same early-abort issue that the #8244 commit-and-reveal solution suffers from (https://lists.torproject.org/pipermail/tor-dev/2013-November/005878.html).
For example, in this specific protocol, the attacker establishes a channel to the HS, sends her H(R_a) and receives R_h. The attacker then calculates 'R_a + R_h' and if the result does not point to an attacker controlled RP, she aborts the protocol. Then she reconnects and does the same thing again, till she gets a result that is convenient for her.
That is a good call :) And the HS can also do the same attack after received R_a. This is
Anyway, the point here is if the negotiated shared random number is random from the attacker's point of view (i.e. it's the same as generating a random number by herself), then this attack is almost the same as the following attack for HSv3:
1) attacker picks RP as herself 2) attacker sends introduction cell 3) attacker checks whether the last hop of HS circuit is controlled by her 3.a) if yes, attack succeed, circuit is shorten to 2 hops 3.b) if not, go back to 2)
while in HS5h:
1) attacker picks R_a and sends H(R_a) 2) attacker receives R_h 3) attacker computes shared random number and check to see if it maps to the node she controlled 3.a) if yes, attack succeed, circuit is shorten to 2 hops 3.b) if not, go back to 1)
Both attacks shorten the circuit to 2 hops, and I think the only difference is the speed of the attack here.
For HSv3, 3 messages need to be transmitted: 1. attacker -> introduction point [introduce1 cell] 2. introduction point -> HS [introduce2 cell] 3. HS -> RP [rendezvous cell]
NOTE: the RP is controlled by the attacker here.
For HS5h, 4 messages need to be transmitted: 1. attacker -> introduction point [H(R_a)] 2. introduction point -> HS [H(R_a)] 3. attacker -> introduction point [R_h] 4. introduction point -> attacker [R_h]
Even though HS5h needs to send one more message, the circuit build up time for HSv3 (from HS to RP) will probably take longer. So maybe the attacker can attack less frequently under HSv3.
If that's the case, then we can turn the random number generation into a computation heavy process so it takes client longer to compute R_a and HS longer to compute R_h. But then it causes extra load on HS.
Another random thought, if HS reuses the circuit (i.e. when we have virtual circuit implemented), it will take attacker longer to complete an attacker under HS5h.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take control over R_c. So to mitigate this, we can use a variant of Diffie-Hellman handshake:
Yes, this is indeed a problem if the random numbers are small (for example, in [0, <number of ORs>]). A key exchange makes sense here. I I wonder if there are cheaper solutions that don't involve "expensive" operations like modular exponentiations
(1) client generates a public key g^x and sends the digest H(g^x) to HS (2) HS remembers H(g^x), generates a public key g^y and sends it back to the client (3) client receives g^y and sends back g^x to HS (4) HS checks g^x against H(g^x) to make sure it's not generated after client receives g^y. (5) Now both client and HS compute a shared random number: R_c = g^(xy)
- Rough anonymity analysis for HSv3 and HS5h
3.1. Assumptions, attack rules and constraints
In this section, we assume 3 hops being the minimal circuit length to keep the origin node of a circuit anonymous from the destination. Two simple attack rules are used to remove nodes from a circuit: 1. If node A is picked by the attacker, then he can pick himself as node A. 2. If the identity of node A is known by the attacker, then he can connect directly to it. To defense against these attack rules, following constraints need to be enforced for a circuit: 1. No one except the circuit originator can have total control over choice of all nodes on the circuit. Note that it's OK even if the originator does not have total control. 2. Identity for all nodes on the circuit, except the last hop, must remain secret. i.e. only known by circuit originator.
3.2. Why 6 hops is the minimal circuit length for HSv3
First let's begin with an ASCII graph for HSv3 rendezvous circuit: Client -> (a1) -> (a2) -> (RP) <- (b3) <- (b2) <- (b1) <- HS Client chooses 3 ORs: a1, a2, RP. HS chooses the other 3 ORs: b3, b2, b1. Now consider HS as the malicious side. In order to get closer to client, the best it can do is to connect directly to RP, which reduces the circuit length to 3 hops: Client -> (a1) -> (a2) -> (RP) <- HS If client is the malicious side, the best it can do is to choose itself as RP, which also reduces the circuit length to 3 hops: Client <- (b3) <- (b2) <- (b1) <- HS Taking any node from the original 6 hops circuit will result in 2 hops circuit in either case. As noted above, 3 hops is assumed to be the minimal circuit length.
3.3. Why HS5h only needs 5 hops
HS5h makes use of the fact that the last hop of a circuit is different from all previous hops. Since last hop will be in direct contact with the destination host, its identity does not need to be remain secret. We only have to make sure the last hop cannot be determined by any malicious host at will. Otherwise the malicious side can just pick himself as last hop and thus shorten the circuit. This is where hop negotiation come into play. A negotiated hop is guaranteed to be a random node and cannot be determined by anyone. So it can be used as the last hop for both client and HS. Since the last hop for both sides overlaps, it's effectively acting as a rendezvous point. In comparison, in HSv3, the rendezvous point is picked and controlled by the client. Therefore it can not be used as the last hop for HSs' rendezvous circuit. Again, following is an ASCII graph for a HS5h rendezvous circuit: Client -> (a1) -> (a2) -> (RP) <- (b2) <- (b1) <- HS Now if the client is malicious, it can only connects directly to the RP, i.e. pick the second last hop as himself. Therefore HS can still maintain a 3 hops distance from the malicious client: Client -> (RP) <- (b2) <- (b1) <- HS While in HSv3, the client can pick himself as the RP. And because this is a symmetric scheme, the same argument goes for malicious HS: Client -> (a1) -> (a2) -> (RP) <- HS Either cases, we maintained a 3 hops circuit.
3.4. Summary
We took 3 as the minimal circuit length because that's the default setup in Tor. The analysis should work for any number. To generalize it, if N is the minimal circuit length, then: 1) without node negotiation, the minimal rendezvous circuit length is 2N. 2) with node negotiation, the minimal rendezvous circuit length is 2N-1.
- Other possible speed optimization under HS5h
4.1 Optimistic rendezvous mode
Similar to optimistic_data for AP connections, we can implement optimistic_rendezvous mode under HS5h. The basic idea is to optimistically assume any picked candidate RP will be willing to act as a RP. Since now we need to maintain state in introduction point to help with node negotiation, we can use it to communicate with HS when establishing the rendezvous circuit. So both client and HS can launch circuits to newly negotiated RP simultaneously. If this candidate RP refuses to be a rendezvous point to one side, the other side can be notified via introduction point. Then client and HS can go through the negotiation process again to pick a new candidate RP. This mode can also be implemented under HSv3 as well. In HSv3, wait for RENDEZVOUS_ESTABLISHED cell and sending INTRODUCE1 cell has to be a sequential operation. With optimistic_rendezvous mode, this can be done in parallel. NOTE: we need to test RENDEZVOUS_ESTABLISHED rate in real tor network. If the rate is low, then this mode will slow down the setup process instead. To make this mode work even better, we can have all OR explicitly mark whether it is willing to be a rendezvous point in it's descriptor. This should greatly increase the success rate for receiving RENDEZVOUS_ESTABLISHED.
Hazards
a) How to design the scheme for mapping a random number to the same node between client and server?
This will be trivia if it's easy to synchronize node list between two onion proxy. The problem is: how much overhead will be caused by synchronizing node list (or consensus).
Yes, this seems like a central problem to this proposal. Both parties will probably have to negotiate their consensus version in the beginning of the protocol to avoid ambiguities later on (since an attacker can choose between multiple valid consensus versions that might be convenient for him).
I was thinking if the overhead of downloading consensus is not too high, we can do the following during the random number negotiation process:
1) client sends {H(R_a), C_consensus_ver} 2) HS sends {R_h, HS_consensus_ver}
Now each side know the version of consensus from the other side, they pick the newest one. The side with the older version will have to download the new consensus before moving on. This can force nodes update consensus more frequently as well.
Also, synchronizing consensus is based on the idea that we pick random RP using the same scheme that's used in router_choose_random_node. It might be possible to come up with a scheme to map a random number to a node with none synchronized node list (i.e. different version of valid consensuses). But I haven't think hard enough on this part yet ;p Will work on this in the following days.
On Tue, Feb 11, 2014 at 11:55:05AM -0500, Qingping Hou wrote:
(0) client fetches descriptor for a hidden service. (1) client connects to introduction point. (2) since client and HS are connected via introduction point, they can negotiate a random number using this channel. (For more details, see [RAND_NEGO]) (3) both client and HS maps that random number to a random onion router using the same scheme, so they end up with the same node. This is the candidate RP. (4) both client and HS create a 3 hops circuit using RP as last hop. (5) RP joins the circuit originates from HS to the circuit originates from client. (6) now client and HS are connected. Because their original circuits share the same endpoint(the RP), the length of the path is 5 hops.
Worth discussing.
to the whole process. Firstly, it reuses the connection to introduction point for both sides so it requires no extra circuits build up. Secondly, the bottle neck is circuit setup, cell/stream transmission delay is actually pretty low.
To be clear, the client is the one who learns first what the RP should be, yes? That means:
A) The problem George brought up -- the client can keep doing this dance until they agree upon an RP that the client controls, and now the HS effectively has a two-hop path to the RP. Maybe that is ok (two is still more than one), but it should be made clear.
B) The client should extend a circuit to RP first, establish a rendezvous cookie there, and only then respond to HS with its R_a and rend cookie? Otherwise there will be a race where both sides try to extend to RP, and it's unspecified what happens if HS gets there first.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take control over R_c. So to mitigate this, we can use a variant of Diffie-Hellman handshake: (1) client generates a public key g^x and sends the digest H(g^x) to HS (2) HS remembers H(g^x), generates a public key g^y and sends it back to the client (3) client receives g^y and sends back g^x to HS (4) HS checks g^x against H(g^x) to make sure it's not generated after client receives g^y. (5) Now both client and HS compute a shared random number: R_c = g^(xy)
You're making both sides do public key operations just because the hash function might be broken? I would guess the load, and DoS opportunities, introduced by public key operations on the HS side will outweigh any theoretical benefits here.
This is where hop negotiation come into play. A negotiated hop is guaranteed to be a random node and cannot be determined by anyone.
For a single run this is true, but for the repeated game it's not. This might be the killer flaw here.
a) How to design the scheme for mapping a random number to the same node between client and server?
This one will indeed be tricky, since each side can have one of several "currently valid" views of the network (i.e. consensus networkstatus documents).
--Roger
On 02/11/2014 04:51 PM, Roger Dingledine wrote:
On Tue, Feb 11, 2014 at 11:55:05AM -0500, Qingping Hou wrote:
(0) client fetches descriptor for a hidden service. (1) client connects to introduction point. (2) since client and HS are connected via introduction point, they can negotiate a random number using this channel. (For more details, see [RAND_NEGO]) (3) both client and HS maps that random number to a random onion router using the same scheme, so they end up with the same node. This is the candidate RP. (4) both client and HS create a 3 hops circuit using RP as last hop. (5) RP joins the circuit originates from HS to the circuit originates from client. (6) now client and HS are connected. Because their original circuits share the same endpoint(the RP), the length of the path is 5 hops.
Worth discussing.
to the whole process. Firstly, it reuses the connection to introduction point for both sides so it requires no extra circuits build up. Secondly, the bottle neck is circuit setup, cell/stream transmission delay is actually pretty low.
To be clear, the client is the one who learns first what the RP should be, yes? That means:
A) The problem George brought up -- the client can keep doing this dance until they agree upon an RP that the client controls, and now the HS effectively has a two-hop path to the RP. Maybe that is ok (two is still more than one), but it should be made clear.
B) The client should extend a circuit to RP first, establish a rendezvous cookie there, and only then respond to HS with its R_a and rend cookie? Otherwise there will be a race where both sides try to extend to RP, and it's unspecified what happens if HS gets there first.
No, to make it faster, it should return R_a immediately. This is a symmetric design, so both sides send the rendezvous cookie along the circuit. RP will join the two circuits once it received the same rendezvous cookies.
Note that at step 2), if HS is able to recover R_a from H(R_a), it can take control over R_c. So to mitigate this, we can use a variant of Diffie-Hellman handshake: (1) client generates a public key g^x and sends the digest H(g^x) to HS (2) HS remembers H(g^x), generates a public key g^y and sends it back to the client (3) client receives g^y and sends back g^x to HS (4) HS checks g^x against H(g^x) to make sure it's not generated after client receives g^y. (5) Now both client and HS compute a shared random number: R_c = g^(xy)
You're making both sides do public key operations just because the hash function might be broken? I would guess the load, and DoS opportunities, introduced by public key operations on the HS side will outweigh any theoretical benefits here.
Agree. will a large random number be sufficient here? We are working on the prototype ATM. Maybe we can measure this a bit after it's done :)
This is where hop negotiation come into play. A negotiated hop is guaranteed to be a random node and cannot be determined by anyone.
For a single run this is true, but for the repeated game it's not. This might be the killer flaw here.
Right, as we noted at the beginning, this proposal focuses on speeding up HS while not introducing new vulns. Current HSv3 has this problem as well. As I replied in George's mail, the attacker can keep sending introduce cell to HS and thus forcing it to create new circuits, i.e. re-pick the last hop.
On Tue 11 Feb 2014 01:53:54 PM EST, George Kadianakis wrote:
Qingping Hou dave2008713@gmail.com writes:
Hazards
a) How to design the scheme for mapping a random number to the same node between client and server?
This will be trivia if it's easy to synchronize node list between two onion proxy. The problem is: how much overhead will be caused by synchronizing node list (or consensus).
Yes, this seems like a central problem to this proposal. Both parties will probably have to negotiate their consensus version in the beginning of the protocol to avoid ambiguities later on (since an attacker can choose between multiple valid consensus versions that might be convenient for him).
On Tue 11 Feb 2014 04:51:25 PM EST, Roger Dingledine wrote:
This one will indeed be tricky, since each side can have one of several "currently valid" views of the network (i.e. consensus networkstatus documents).
OK, as I mentioned in previous mail, one possible solution is to let two parties exchange consensus version and used the latest one. This approach is simple to implement and can encourage OPs to update their network view more frequently.
I just came up with another possible solution for this problem, which is based on the selection of responsible HSDirs.
Tor clients will only consider a consensus is valid if it's not older than 1 day. We can explore the fact that normally in a small period of time, Tor network should not change drastically. So different valid consensuses should share a large amount of nodes. The RP choose scheme works as follow:
1. two parties negotiate a random number R as noted in the proposal 2. put nodes into a circular list and order them based on identity digest (FP) (this is already implemented in Tor) 3. calculate hash of random number H(R) 4. find the the relay with identity digest cloesest to H(R)
This should work with pretty high probability as long as two parties' current consensus shares a large portion of relays.
Now what if they end up picking different RPs? This is easy: in the last step of random number negotiation, Client can send the relay she picked together with the R_a. Then HS can check whether they end up with the same one, if not, a request for renegotiation can be sent along the circuit.
Noted that current rendezvous circuit design also has similar problem. For example, the client might have a newer consensus and choose a RP that's not in HS's old consensus. In that case, HS will just give up without notifying the Client. My second proposed approach even solves this problem :)
Again, this still cannot prevent the abort-if-i-don't-like attack, but it's not a new vuln for HS anyway.
-- qp
Hi,
just to repeat, my VM that holds a tor relay is at your use should you wish to test things out. It has its own ipV4 address and people from the dev team are welcome to log into it and try things out. Just be aware that if you kill the relay, you have to make it work again :D
https://metrics.torproject.org/relay-search.html?search=176.31.156.199
Regards,
Phill.
On 14 February 2014 22:13, Qingping Hou dave2008713@gmail.com wrote:
On Tue 11 Feb 2014 01:53:54 PM EST, George Kadianakis wrote:
Qingping Hou dave2008713@gmail.com writes:
Hazards
a) How to design the scheme for mapping a random number to the same
node
between client and server? This will be trivia if it's easy to synchronize node list
between two
onion proxy. The problem is: how much overhead will be caused by synchronizing node list (or consensus).
Yes, this seems like a central problem to this proposal. Both parties will probably have to negotiate their consensus version in the beginning of the protocol to avoid ambiguities later on (since an attacker can choose between multiple valid consensus versions that might be convenient for him).
On Tue 11 Feb 2014 04:51:25 PM EST, Roger Dingledine wrote:
This one will indeed be tricky, since each side can have one of several "currently valid" views of the network (i.e. consensus networkstatus documents).
OK, as I mentioned in previous mail, one possible solution is to let two parties exchange consensus version and used the latest one. This approach is simple to implement and can encourage OPs to update their network view more frequently.
I just came up with another possible solution for this problem, which is based on the selection of responsible HSDirs.
Tor clients will only consider a consensus is valid if it's not older than 1 day. We can explore the fact that normally in a small period of time, Tor network should not change drastically. So different valid consensuses should share a large amount of nodes. The RP choose scheme works as follow:
- two parties negotiate a random number R as noted in the proposal
- put nodes into a circular list and order them based on identity digest
(FP) (this is already implemented in Tor) 3. calculate hash of random number H(R) 4. find the the relay with identity digest cloesest to H(R)
This should work with pretty high probability as long as two parties' current consensus shares a large portion of relays.
Now what if they end up picking different RPs? This is easy: in the last step of random number negotiation, Client can send the relay she picked together with the R_a. Then HS can check whether they end up with the same one, if not, a request for renegotiation can be sent along the circuit.
Noted that current rendezvous circuit design also has similar problem. For example, the client might have a newer consensus and choose a RP that's not in HS's old consensus. In that case, HS will just give up without notifying the Client. My second proposed approach even solves this problem :)
Again, this still cannot prevent the abort-if-i-don't-like attack, but it's not a new vuln for HS anyway.
-- qp
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 02/11/2014 11:55 AM, Qingping Hou wrote:
Hi all,
We are a university research group and following is a proposal for possible optimization on Tor's rendezvous circuit design. It's not written as a formal Tor proposal yet because I want to have a discussion with the community first, get some feedbacks and make sure it does not introduce new vulnerabilities.
I am working with Qingping on this design, and we actually think we might be able to get the hop count all the way down to three. Five is solid (assuming we can solve the problems that have come up already, and assuming there are no further problems) but with some slightly cleverer crypto and a more complicated handshake, it's possible for each side to prove to the other that it is *not* cheating on the connection chain, which is the primary reason we need extra hops.
First off, recall that a single anonymizing proxy --
Client -- Rendezvous -- Service
still guarantees that Client can't learn Service's IP and vice versa, and no single-point eavesdropper can learn both Client and Service's IPs. The reason we don't use single-hop circuits in Tor is that the rendezvous itself might be malicious (or subverted).
Now, here is the usual three-hop Tor circuit, but for a hidden service:
Client -- Client_Guard -- Rendezvous -- Service_Guard -- Service
The additional guard hops protect both Client and Service from a malicious third-party Rendezvous, but - please correct me if I'm wrong about this - I believe that is *all* they are doing relative to a single proxy. Therefore, if Client could be assured that Service had set up its half of the chain honestly - was *not* deliberately forfeiting (some of) its own anonymity in order to learn more about Client - and vice versa, a three-hop circuit would be sufficient.
Here is a potential way to accomplish this. It also has the nice property of *not* needing Client and Service to agree on the complete list of available relays. One unusual cryptographic primitive is required: we need a cipher that is *deterministic* and *commutative* but retains all the other desiderata of a message cipher. These special properties are defined as follows:
Deterministic: E_k(M1) == E_k(M2) iff M1 == M2 Commutative: D_q(E_k(E_q(M))) == E_k(M)
The Massey-Omura cryptosystem https://en.wikipedia.org/wiki/Three-pass_protocol#Massey-Omura_cryptosystem has these properties, but there may be others.
In what follows, I will use E_k() and D_k() to refer to this cryptosystem specifically. I will use curly braces { ... } to indicate a message which is, as a whole, signed by its sender; these messages may in practice also be encrypted, but I *think* this is not actually required in any case.
In what follows, Fx (for some x) always refers to the fingerprint of some Tor node, and
(0) In advance, Client and Service each pick a set of guards that they will use for all communication, as in the present system.
(1) Client and Service somehow agree on a shared secret S. Client and Service also each independently pick a secret which is *not* shared with the counterparty: Client's secret will be referred to as A, and Service's secret will be referred to as B.
(2) Service picks a guard that it will use for this connection, and chooses K rendezvous candidates from its list of all available relays, such that the fingerprint of each candidate satisfies a condition based on S, that the client can verify (something like FP(c) mod S < Q, where Q is a constant determined in advance). This step gives the service a certain amount of flexibility in its chosen set of rendezvous candidates: the idea is to allow an honest service to avoid selecting any of its own guards, or itself. K needs to be large enough that a *malicious* service can't reasonably control all of the candidates, but *small* enough that a malicious *client* can't practically enumerate a service's guards by repeatedly connecting to the service.
(3) Service transmits to Client:
{ E_b(Fs), E_b(Fg), E_b(F1), ..., E_b(Fn), F1, ..., Fn }_s
where Fs is the fingerprint of the service itself, Fg the fingerprint of the service's selected guard, and F1...Fn the fingerprints of the rendezvous candidates.
(4) Client verifies that none of the encrypted fingerprints are equal, and selects a rendezvous point from the list of candidates. Client transmits to Server:
{ E_a(E_b(Fs)), E_a(E_b(Fg)), E_a(Fr), E_a(Ff), E_a(Fc), Fr }_c
where Fr is the fingerprint of the selected rendezvous point, Ff is the fingerprint of the *client's* guard, and Fc is the fingerprint of the client itself. Client connects to its guard, extends a circuit to the selected rendezvous point, and notifies the RP that it expects a hidden service rendezvous from a peer with concealed fingerprint E_b(Fg).
(5) Server computes E_a(Fs) = D_b(E_a(E_b(Fs)) and E_a(Fg) = D_b(E_a(E_b(Fg))), then verifies that none of the encrypted fingerprints are equal. Server connects to its guard, extends a circuit to the selected rendezvous point, and notifies the RP that it expects a hidden service rendezvous from a peer with concealed fingerprint E_a(Ff).
(6) Client and Server each reveal their respective secrets (A, B) to the rendezvous point, enabling the RP to verify that each connection has the fingerprint that its peer expects. If this checks out, the RP links the circuits together.
This last step is necessary to ensure that neither side connects directly to the RP. Revealing A and B does not allow the RP to do anything it couldn't otherwise do, because we're done encrypting things with A and B at this stage, and a malicious RP could reveal the identity of the next hop anyway.
I *think* the way Tor defines identity fingerprints guarantees that neither side can lie about its identity (claiming to be Fs when it is *itself* Fg, for instance): as long as all the messages involved are signed as Tor does already, then only the node that really is Fs could generate those signatures.
I also think that this protocol is immune to abort-if-you-don't-like-the-answer provided that each side imposes a substantial delay between connection attempts. If there is a better way to do that I am all ears.
zw
On Tue, Feb 11, 2014 at 9:07 PM, Zack Weinberg zackw@panix.com wrote:
(3) Service transmits to Client:
{ E_b(Fs), E_b(Fg), E_b(F1), ..., E_b(Fn), F1, ..., Fn }_s
where Fs is the fingerprint of the service itself, Fg the fingerprint of the service's selected guard, and F1...Fn the fingerprints of the rendezvous candidates.
As far as I can tell, none of E_b(F1),...,E_b(Fn) is ever used again in the protocol. Is the only point to convince the client that F1,...,Fn are not Fs or Fg? Because you have a problem in that case: you need to convince the client that E_b(F1) is actually E_b(F1) and not E_b(Some other random junk). (E_b(F1) should be replaced by a zero-knowledge proof that E_b(Fs) != E_b(F1))
I also think that this protocol is immune to abort-if-you-don't-like-the-answer provided that each side imposes a substantial delay between connection attempts.
This sounds like a great way to DoS your least favorite hidden service at low cost.
On 02/11/2014 10:42 PM, Nicholas Hopper wrote:
On Tue, Feb 11, 2014 at 9:07 PM, Zack Weinberg zackw@panix.com wrote:
(3) Service transmits to Client:
{ E_b(Fs), E_b(Fg), E_b(F1), ..., E_b(Fn), F1, ..., Fn }_s
where Fs is the fingerprint of the service itself, Fg the fingerprint of the service's selected guard, and F1...Fn the fingerprints of the rendezvous candidates.
As far as I can tell, none of E_b(F1),...,E_b(Fn) is ever used again in the protocol. Is the only point to convince the client that F1,...,Fn are not Fs or Fg?
Yes.
Because you have a problem in that case: you need to convince the client that E_b(F1) is actually E_b(F1) and not E_b(Some other random junk). (E_b(F1) should be replaced by a zero-knowledge proof that E_b(Fs) != E_b(F1))
Oh, bother. I was trying to avoid having to dig into the ZKP literature, but you're right.
I also think that this protocol is immune to abort-if-you-don't-like-the-answer provided that each side imposes a substantial delay between connection attempts.
This sounds like a great way to DoS your least favorite hidden service at low cost.
So, in my head, that was "substantial delay between connection attempts to/from the *same peer*", but it now dawns on me that we may not be able to know that. In fact, we may want to *ensure* that we don't know that. Sigh. Unlinkability is hard. ;-)
Do Tor clients actually have identity keys, as relays do? Perhaps not *persistent* identity keys, and even if they did, an adversarial client could just rotate them whenever it felt like, but still I'd like to know if Fc is even a thing. I tried and failed to figure this out from the protocol spec. (Maybe I was looking in the wrong place?)
zw
On 02/12/2014 05:06 PM, Zack Weinberg wrote:
Do Tor clients actually have identity keys, as relays do? Perhaps not *persistent* identity keys, and even if they did, an adversarial client could just rotate them whenever it felt like, but still I'd like to know if Fc is even a thing. I tried and failed to figure this out from the protocol spec. (Maybe I was looking in the wrong place?)
OP does not have persistent key. Key pairs are generated at start up. See init_keys() in router.c
Hi all,
Apologies for top-posting. I'm making a brief comment about general concerns in this design. Additional apologies if something already said obviates my comments. I have purposely not looked at most of the developments in HS design changes because they are _too_ interesting to me and I have had to make sure to keep my focus on some other features of Tor where progress is needed.
The biggest concern is that no matter how you handle the commitment and the size of the flexible set, you make it fairly easy for a HS simply following this protocol precisely and with just the resource of a handful of other nodes (n) in the network to identify the client guard with certainty each time. (If he owns less than n, it becomes a likelihood rather than a certainty each time. Alternatively if he owns a few ASes or IXPs he could accomplish similar results with a judicious choice of the network location of all n.) Given the push elsewhere to use single guards for long periods, this makes guard nodes all the more likely to face subpoena or other forms of attack since the argument that this is a successful strategy to locate clients of interest is greatly strengthened. Since the HS can choose two hops that it does not object to, the client should be similarly protected, i.e., four relays in the circuit overall.
The other big concern is that this looks like there are many places to DoS or resource deplete the hidden service. Earlier designs kept per introduction connection state for the HS to a minimum. There may be ways to reduce that, but it is an important consideration.
aloha, Paul
On Tue, Feb 11, 2014 at 10:07:20PM -0500, Zack Weinberg wrote:
On 02/11/2014 11:55 AM, Qingping Hou wrote:
Hi all,
We are a university research group and following is a proposal for possible optimization on Tor's rendezvous circuit design. It's not written as a formal Tor proposal yet because I want to have a discussion with the community first, get some feedbacks and make sure it does not introduce new vulnerabilities.
I am working with Qingping on this design, and we actually think we might be able to get the hop count all the way down to three. Five is solid (assuming we can solve the problems that have come up already, and assuming there are no further problems) but with some slightly cleverer crypto and a more complicated handshake, it's possible for each side to prove to the other that it is *not* cheating on the connection chain, which is the primary reason we need extra hops.
First off, recall that a single anonymizing proxy --
Client -- Rendezvous -- Service
still guarantees that Client can't learn Service's IP and vice versa, and no single-point eavesdropper can learn both Client and Service's IPs. The reason we don't use single-hop circuits in Tor is that the rendezvous itself might be malicious (or subverted).
Now, here is the usual three-hop Tor circuit, but for a hidden service:
Client -- Client_Guard -- Rendezvous -- Service_Guard -- Service
The additional guard hops protect both Client and Service from a malicious third-party Rendezvous, but - please correct me if I'm wrong about this - I believe that is *all* they are doing relative to a single proxy. Therefore, if Client could be assured that Service had set up its half of the chain honestly - was *not* deliberately forfeiting (some of) its own anonymity in order to learn more about Client - and vice versa, a three-hop circuit would be sufficient.
Here is a potential way to accomplish this. It also has the nice property of *not* needing Client and Service to agree on the complete list of available relays. One unusual cryptographic primitive is required: we need a cipher that is *deterministic* and *commutative* but retains all the other desiderata of a message cipher. These special properties are defined as follows:
Deterministic: E_k(M1) == E_k(M2) iff M1 == M2 Commutative: D_q(E_k(E_q(M))) == E_k(M)
The Massey-Omura cryptosystem https://en.wikipedia.org/wiki/Three-pass_protocol#Massey-Omura_cryptosystem has these properties, but there may be others.
In what follows, I will use E_k() and D_k() to refer to this cryptosystem specifically. I will use curly braces { ... } to indicate a message which is, as a whole, signed by its sender; these messages may in practice also be encrypted, but I *think* this is not actually required in any case.
In what follows, Fx (for some x) always refers to the fingerprint of some Tor node, and
(0) In advance, Client and Service each pick a set of guards that they will use for all communication, as in the present system.
(1) Client and Service somehow agree on a shared secret S. Client and Service also each independently pick a secret which is *not* shared with the counterparty: Client's secret will be referred to as A, and Service's secret will be referred to as B.
(2) Service picks a guard that it will use for this connection, and chooses K rendezvous candidates from its list of all available relays, such that the fingerprint of each candidate satisfies a condition based on S, that the client can verify (something like FP(c) mod S < Q, where Q is a constant determined in advance). This step gives the service a certain amount of flexibility in its chosen set of rendezvous candidates: the idea is to allow an honest service to avoid selecting any of its own guards, or itself. K needs to be large enough that a *malicious* service can't reasonably control all of the candidates, but *small* enough that a malicious *client* can't practically enumerate a service's guards by repeatedly connecting to the service.
(3) Service transmits to Client:
{ E_b(Fs), E_b(Fg), E_b(F1), ..., E_b(Fn), F1, ..., Fn }_s
where Fs is the fingerprint of the service itself, Fg the fingerprint of the service's selected guard, and F1...Fn the fingerprints of the rendezvous candidates.
(4) Client verifies that none of the encrypted fingerprints are equal, and selects a rendezvous point from the list of candidates. Client transmits to Server:
{ E_a(E_b(Fs)), E_a(E_b(Fg)), E_a(Fr), E_a(Ff), E_a(Fc), Fr }_c
where Fr is the fingerprint of the selected rendezvous point, Ff is the fingerprint of the *client's* guard, and Fc is the fingerprint of the client itself. Client connects to its guard, extends a circuit to the selected rendezvous point, and notifies the RP that it expects a hidden service rendezvous from a peer with concealed fingerprint E_b(Fg).
(5) Server computes E_a(Fs) = D_b(E_a(E_b(Fs)) and E_a(Fg) = D_b(E_a(E_b(Fg))), then verifies that none of the encrypted fingerprints are equal. Server connects to its guard, extends a circuit to the selected rendezvous point, and notifies the RP that it expects a hidden service rendezvous from a peer with concealed fingerprint E_a(Ff).
(6) Client and Server each reveal their respective secrets (A, B) to the rendezvous point, enabling the RP to verify that each connection has the fingerprint that its peer expects. If this checks out, the RP links the circuits together.
This last step is necessary to ensure that neither side connects directly to the RP. Revealing A and B does not allow the RP to do anything it couldn't otherwise do, because we're done encrypting things with A and B at this stage, and a malicious RP could reveal the identity of the next hop anyway.
I *think* the way Tor defines identity fingerprints guarantees that neither side can lie about its identity (claiming to be Fs when it is *itself* Fg, for instance): as long as all the messages involved are signed as Tor does already, then only the node that really is Fs could generate those signatures.
I also think that this protocol is immune to abort-if-you-don't-like-the-answer provided that each side imposes a substantial delay between connection attempts. If there is a better way to do that I am all ears.
zw
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Note that Phantom allows both the service and the client to choose the number of hops under their respective control. I believe this applies in part to I2P as well. There is thus no force to accept any globally enforced maximum hopcount there. This concept may or not work for Tor.
On 02/12/2014 01:20 AM, grarpamp wrote:
Note that Phantom allows both the service and the client to choose the number of hops under their respective control. I believe this applies in part to I2P as well. There is thus no force to accept any globally enforced maximum hopcount there. This concept may or not work for Tor.
This works in Tor as well. I don't remember seeing a strict hop count in the spec. A custom Tor client can choose any number of hops it wants.
In fact, the official Tor already has similar feature built in: Tor2Web mode, in which client does a direct connect to RP.
On 02/11/2014 11:53 PM, Paul Syverson wrote:
The biggest concern is that no matter how you handle the commitment and the size of the flexible set, you make it fairly easy for a HS simply following this protocol precisely and with just the resource of a handful of other nodes (n) in the network to identify the client guard with certainty each time. (If he owns less than n, it becomes a likelihood rather than a certainty each time. Alternatively if he owns a few ASes or IXPs he could accomplish similar results with a judicious choice of the network location of all n.) Given the push elsewhere to use single guards for long periods, this makes guard nodes all the more likely to face subpoena or other forms of attack since the argument that this is a successful strategy to locate clients of interest is greatly strengthened. Since the HS can choose two hops that it does not object to, the client should be similarly protected, i.e., four relays in the circuit overall.
The other big concern is that this looks like there are many places to DoS or resource deplete the hidden service. Earlier designs kept per introduction connection state for the HS to a minimum. There may be ways to reduce that, but it is an important consideration.
So I have to say that I'm increasingly skeptical about the value of guards in general, and in this specific case - where both endpoints are emitting nothing but Tor protocol - I'm not convinced guards add anything at all, because *even if* both sides happen to pick malicious entry points that are controlled by the same adversary, the traffic will be indistinguishable from middle-relay traffic! ... Except, I suppose, by traffic correlation to recover the flow and realize that the malicious relays are in positions 1 and 3, which in turn means each is talking directly to an endpoint. And then they can do cross-flow correlation and figure out which end is the hidden service. However, this kind of long-term correlation attack is *easier* for an adversary that controls enough stable nodes that it has a reasonable chance of getting picked as a guard by at least one party. See above skepticism.
That said, these are both really good points, and together, may kill the whole concept. :-( I wonder if we can formalize the problem enough to prove one way or another whether you really must have at least four hops?
(Four hops is what I2P uses, with two chosen entirely by the client and two entirely by the server; but there appears to be nothing to guarantee that a malicious peer can't connect directly to its counterparty's two-hop chain, sacrificing some of its own anonymity but getting closer to the counterparty. I did just argue that that shouldn't matter, though...)
zw
On Wed, Feb 12, 2014 at 05:43:10PM -0500, Zack Weinberg wrote:
On 02/11/2014 11:53 PM, Paul Syverson wrote:
The biggest concern is that no matter how you handle the commitment and the size of the flexible set, you make it fairly easy for a HS simply following this protocol precisely and with just the resource of a handful of other nodes (n) in the network to identify the client guard with certainty each time. (If he owns less than n, it becomes a likelihood rather than a certainty each time. Alternatively if he owns a few ASes or IXPs he could accomplish similar results with a judicious choice of the network location of all n.) Given the push elsewhere to use single guards for long periods, this makes guard nodes all the more likely to face subpoena or other forms of attack since the argument that this is a successful strategy to locate clients of interest is greatly strengthened. Since the HS can choose two hops that it does not object to, the client should be similarly protected, i.e., four relays in the circuit overall.
The other big concern is that this looks like there are many places to DoS or resource deplete the hidden service. Earlier designs kept per introduction connection state for the HS to a minimum. There may be ways to reduce that, but it is an important consideration.
So I have to say that I'm increasingly skeptical about the value of guards in general, and in this specific case - where both endpoints are emitting nothing but Tor protocol - I'm not convinced guards add anything at all, because *even if* both sides happen to pick malicious entry points that are controlled by the same adversary, the traffic will be indistinguishable from middle-relay traffic! ... Except, I suppose, by traffic correlation to recover the flow and realize that the malicious relays are in positions 1 and 3, which in turn means each is talking directly to an endpoint. And then they can do cross-flow correlation and figure out which end is the hidden service. However, this kind of long-term correlation attack is *easier* for an adversary that controls enough stable nodes that it has a reasonable chance of getting picked as a guard by at least one party. See above skepticism.
Asymptotically guards buy you nothing. But it's not a long-term attack that motivated guards. Lasse Overlier and I showed in our 2006 paper "Locating Hidden Servers" that prior to guards, someone owning a single relay on the live Tor network of the day (c. 250 relays) could find a hidden server in a matter of minutes. We were focused on what could be done with a single relay so could only attack HS circuits. But we noted that the same effect would apply to all Tor circuits if the attacker had two or more relays. The next year Bauer et al. in "Low-Resource Routing Attacks Against Tor" showed that specifically (although in simulation rather than on the live Tor network as we showed).
In our recent CCS paper we showed that guard compromise for reasonable sized adversary was more on the order of several months than several minutes. So quite the win. The time to circuit compromise for typical usage behaviors once the adversary has a guard is fairly quick.
My point above was primarily about how a hostile hidden service could easily learn the guard of the client by simply following the suggested protocol, and then apply more directed resources on the guard for circuits of interest to later identify and watch the client and all or most of his behavior. Without guards, the client is just quickly exposed directly. Much worse.
That said, these are both really good points, and together, may kill the whole concept. :-( I wonder if we can formalize the problem enough to prove one way or another whether you really must have at least four hops?
I expect that experimental results will be more compelling than anything you can prove in this case. I also think that especially for sensitive users (subject to targeted attacks rather than trawling) the guards need to be chosen in a trust-aware way, as Lasse and I suggested and as we have been exploring in more recent papers. Now once you throw bridges into this mix...
aloha, Paul