[tor-dev] [Discussion] 5 hops rendezvous circuit design

11 Feb 2014

      Hi all,

We are a university research group and following is a proposal for possible
optimization on Tor's rendezvous circuit design. It's not written as a formal
Tor proposal yet because I want to have a discussion with the community first,
get some feedbacks and make sure it does not introduce new vulnerabilities.

So please feel free to comment if you have any thoughts or find any holes :)

PS: a txt version of the proposal is attached in this mail as well.

------------------------------------------------------

The changes described in this document is focused on speeding up hidden service
while at least keeping the same level of anonymity as current version provides.

Naming conventions:

    HSv3:
        current version of hidden service protocol
    HS5h:
        hidden service protocol that uses 5 hops
    HS:
        hidden service
    client:
        The onion proxy that tries to communicate with HS
    HSDir:
        hidden service directory
    RP:
        rendezvous point

1. The 5 hops rendezvous circuit

    The setup on HS side is the same as HSv3. HS picks couple introduction
    points and uploads the descriptor to HSDir. The main difference is how to
    pick the RP. In HS5h, Instead of letting the client pick whatever node it
    wants, both client and HS use negotiation protocol to pick a RP
    collaboratively.

    Following is a high level description for the rendezvous circuit setup
    process in HS5h:

    (0) client fetches descriptor for a hidden service.
    (1) client connects to introduction point.
    (2) since client and HS are connected via introduction point, they can
        negotiate a random number using this channel.
        (For more details, see [RAND_NEGO])
    (3) both client and HS maps that random number to a random onion router
        using the same scheme, so they end up with the same node.
        This is the candidate RP.
    (4) both client and HS create a 3 hops circuit using RP as last hop.
    (5) RP joins the circuit originates from HS to the circuit originates
        from client.
    (6) now client and HS are connected. Because their original circuits
        share the same endpoint(the RP), the length of the path is 5 hops.

    Improvement:

        (1) Reduction in circuit setup time.
            Both the client and HS only build 3 hops circuit now. While In
            current protocol(HSv3), HS needs to build a 4 hops circuit.
        (2) A shorter circuit means less transmission delay. It uses only one
            more hop compared to HSv3's Tor2Web mode.
            Also the circuit length for HS5h's Tor2Web mode can be further
            reduced to 3 hops.
        (3) Less setup time on client side for the first connection.
            In HSv3, a client has to first setup a 3 hops circuit to RP before
            sending INTRODUCTION1 cell to the introduction point. While in
            HS5h, this process is removed.

    Possible drawback:

        (1) An extra step is needed: random number negotiation.
            This result ins more communication between two nodes and
            Introduction point needs to remember states for the negotiation.

        We think this negotiation step will not introduce a noticeable delay
        to the whole process. Firstly, it reuses the connection to introduction
        point for both sides so it requires no extra circuits build up.
        Secondly, the bottle neck is circuit setup, cell/stream transmission
        delay is actually pretty low.

2. Negotiate random numbers between two nodes [RAND_NEGO]

    H(a) here means digest of message a.

    (1) client generates a random number R_a and sends the digest H(R_a) to HS
    (2) HS remembers H(R_a), generates a random number R_h and sends it back to
        the client
    (3) client receives R_h and sends back R_a to HS
    (4) HS checks R_a against H(R_a) to make sure it's not generated after
        client receives R_h.
    (5) Now both client and HS compute a shared random number using R_a
        and R_c. (for instance, simply add up these two number)

    The negotiation protocol can be very simple here because only two parties
    are involved and the random number does not need to be remained secret.

    Note that at step 2), if HS is able to recover R_a from H(R_a), it can take
    control over R_c. So to mitigate this, we can use a variant of
    Diffie-Hellman handshake:

    (1) client generates a public key g^x and sends the digest H(g^x) to HS
    (2) HS remembers H(g^x), generates a public key g^y and sends it back
        to the client
    (3) client receives g^y and sends back g^x to HS
    (4) HS checks g^x against H(g^x) to make sure it's not generated after
        client receives g^y.
    (5) Now both client and HS compute a shared random number: R_c = g^(xy)

3. Rough anonymity analysis for HSv3 and HS5h

3.1. Assumptions, attack rules and constraints

    In this section, we assume 3 hops being the minimal circuit length to keep
    the origin node of a circuit anonymous from the destination.

    Two simple attack rules are used to remove nodes from a circuit:

    1. If node A is picked by the attacker, then he can pick himself as node A.
    2. If the identity of node A is known by the attacker, then he can connect
       directly to it.

    To defense against these attack rules, following constraints need to be
    enforced for a circuit:

    1. No one except the circuit originator can have total control over choice
       of all nodes on the circuit. Note that it's OK even if the originator
       does not have total control.

    2. Identity for all nodes on the circuit, except the last hop, must
       remain secret. i.e. only known by circuit originator.

3.2. Why 6 hops is the minimal circuit length for HSv3

    First let's begin with an ASCII graph for HSv3 rendezvous circuit:

        Client -> (a1) -> (a2) -> (RP) <- (b3) <- (b2) <- (b1) <- HS

    Client chooses 3 ORs: a1, a2, RP. HS chooses the other 3 ORs: b3, b2, b1.

    Now consider HS as the malicious side. In order to get closer to client,
    the best it can do is to connect directly to RP, which reduces the circuit
    length to 3 hops:

        Client -> (a1) -> (a2) -> (RP) <- HS

    If client is the malicious side, the best it can do is to choose itself as
    RP, which also reduces the circuit length to 3 hops:

                                Client <- (b3) <- (b2) <- (b1) <- HS

    Taking any node from the original 6 hops circuit will result in 2 hops
    circuit in either case. As noted above, 3 hops is assumed to be the
    minimal circuit length.

3.3. Why HS5h only needs 5 hops

    HS5h makes use of the fact that the last hop of a circuit is different from
    all previous hops. Since last hop will be in direct contact with the
    destination host, its identity does not need to be remain secret. We only
    have to make sure the last hop cannot be determined by any malicious host
    at will. Otherwise the malicious side can just pick himself as last hop and
    thus shorten the circuit.

    This is where hop negotiation come into play. A negotiated hop is
    guaranteed to be a random node and cannot be determined by anyone. So it
    can be used as the last hop for both client and HS. Since the last hop for
    both sides overlaps, it's effectively acting as a rendezvous point. In
    comparison, in HSv3, the rendezvous point is picked and controlled by the
    client. Therefore it can not be used as the last hop for HSs' rendezvous
    circuit.

    Again, following is an ASCII graph for a HS5h rendezvous circuit:

        Client -> (a1) -> (a2) -> (RP) <- (b2) <- (b1) <- HS

    Now if the client is malicious, it can only connects directly to the RP,
    i.e. pick the second last hop as himself. Therefore HS can still maintain a
    3 hops distance from the malicious client:

                        Client -> (RP) <- (b2) <- (b1) <- HS

    While in HSv3, the client can pick himself as the RP.

    And because this is a symmetric scheme, the same argument goes for
    malicious HS:

        Client -> (a1) -> (a2) -> (RP) <- HS

    Either cases, we maintained a 3 hops circuit.

3.4. Summary

    We took 3 as the minimal circuit length because that's the default setup in
    Tor. The analysis should work for any number.

    To generalize it, if N is the minimal circuit length, then:

    1) without node negotiation, the minimal rendezvous circuit length is 2N.
    2) with node negotiation, the minimal rendezvous circuit length is 2N-1.

4. Other possible speed optimization under HS5h

4.1 Optimistic rendezvous mode

    Similar to optimistic_data for AP connections, we can implement
    optimistic_rendezvous mode under HS5h. The basic idea is to optimistically
    assume any picked candidate RP will be willing to act as a RP.

    Since now we need to maintain state in introduction point to help with node
    negotiation, we can use it to communicate with HS when establishing the
    rendezvous circuit. So both client and HS can launch circuits to newly
    negotiated RP simultaneously. If this candidate RP refuses to be a
    rendezvous point to one side, the other side can be notified via
    introduction point. Then client and HS can go through the negotiation
    process again to pick a new candidate RP.

    This mode can also be implemented under HSv3 as well. In HSv3, wait for
    RENDEZVOUS_ESTABLISHED cell and sending INTRODUCE1 cell has to be a
    sequential operation. With optimistic_rendezvous mode, this can be done in
    parallel.

    NOTE: we need to test RENDEZVOUS_ESTABLISHED rate in real tor network. If
    the rate is low, then this mode will slow down the setup process instead.

    To make this mode work even better, we can have all OR explicitly mark
    whether it is willing to be a rendezvous point in it's descriptor. This
    should greatly increase the success rate for receiving
    RENDEZVOUS_ESTABLISHED.

5. Hazards

    a) How to design the scheme for mapping a random number to the same node
       between client and server?

        This will be trivia if it's easy to synchronize node list between two
        onion proxy. The problem is: how much overhead will be caused by
        synchronizing node list (or consensus).

    b) How much overhead will be introduced by maintaining state for node
       negotiation?

        Since in HS5h, both introduction point and hidden service need to
        maintain state for node negotiation.

        To do so, the client can send something like a negotiation identifier
        in the initial introduction request. Then for introduction point, it
        can mark the circuit from the client with this ID. For HS, it can hash
        the negotiation state structure into a hash table with ID as the key.

        Theoretically, this overhead should be low, because the "negotiation
        ID" is just like rendezvous cookie.

[tor-dev] [Discussion] 5 hops rendezvous circuit design

Qingping Hou