[tor-dev] New Proposal 306: A Tor Implementation of IPv6 Happy Eyeballs

teor teor at riseup.net
Tue Jul 2 11:15:42 UTC 2019


Hi Iain,

Thanks for your review!

> On 2 Jul 2019, at 19:39, Iain Learmonth <irl at torproject.org> wrote:
> 
> Signed PGP part
> Hi,
> 
> My comments are inline.
> 
>> Filename: 306-ipv6-happy-eyeballs.txt Title: A Tor Implementation of
>> IPv6 Happy Eyeballs Author: Neel Chauhan Created: 25-Jun-2019
>> Supercedes: 299 Status: Open Ticket:
>> https://trac.torproject.org/projects/tor/ticket/29801
>> 
>> 1. Introduction
>> 
>> As IPv4 address space becomes scarce, ISPs and organizations will
> deploy
>> IPv6 in their networks. Right now, Tor clients connect to guards
>> using IPv4 connectivity by default.
>> 
>> When networks first transition to IPv6, both IPv4 and IPv6 will be
> enabled
>> on most networks in a so-called "dual-stack" configuration. This is
> to not
>> break existing IPv4-only applications while enabling IPv6
>> connectivity. However, IPv6 connectivity may be unreliable and
>> clients should be able to connect to the guard using the most
>> reliable technology, whether
> IPv4
>> or IPv6.
> 
> The big problem that happy eyeballs was meant to solve was that often
> you might have something announcing an IPv6 prefix but that routing was
> not properly configured, so while the operating system thought it had
> IPv6 Internet it was actually just broken. In some cases, the IPv6
> Internet would be partitioned as there weren't enough backup routes to
> fail over to in times of outages. For most purposes, as I understand it,
> this means either IPv6 connectivity to a host is there or it's not.
> There's not really a middle ground where it sometimes works but is flaky
> (i.e. where you can maintain a connection but it has high packet loss).

You're right, I think our worst-case scenario in the current tor
implementation is 100% packet loss, which happens when a firewall is
configured to drop packets.

We should be much clearer about these two scenarios in the proposal
(IPv4/IPv6 failure, and IPv4/IPv6 timeout).

Another common scenario is very slow (DirPort) speeds, as a defence against
old clients on tor26. But the DirPort is out of scope for this proposal.

>> In ticket #27490, we introduced the option ClientAutoIPv6ORPort
>> which lets a client randomly choose between IPv4 or IPv6. However,
>> this random decision does not take into account unreliable
>> connectivity or falling back to the competing IP version should one
>> be unreliable or unavailable.
>> 
>> One way to select between IPv4 and IPv6 on a dual-stack network is a
>> so-called "Happy Eyeballs" algorithm as per RFC 8305. In one, a
>> client attempts the preferred IP family, whether IPv4 or IPv6. Should
>> it work, the client sticks with the preferred IP family. Otherwise,
>> the client attempts the alternate version. This means if a dual-stack
>> client has both IPv4 and IPv6, and IPv6 is unreliable, preferred or
>> not, the client uses IPv4, and vice versa. However, if IPv4 and IPv6
>> are both equally reliable, and IPv6 is preferred, we use IPv6.
> 
> This sounds like a good candidate for a consensus parameter, such that
> we can switch the preference for all clients at once, not just the ones
> that have updated to the version we switch the preference in.

Tor already has these IPv4 and IPv6 torrc options:
* ClientUseIPv4 - use IPv4, on by default
* ClientUseIPv6 - use IPv6, off by default, overridden by explicit bridge,
                  PT, and proxy configs
* ClientPreferIPv6ORPort - prefer IPv6, off by default

At the moment, these options work well:
* ClientUseIPv4 1
  Only use IPv4
  (other options are ignored)
* ClientPreferIPv6ORPort 1
  Try to use IPv6 as much as possible
  (overrides ClientUseIPv4 1 and ClientUseIPv6 0)
* ClientUseIPv4 0
  Only use IPv6
  (other options are ignored)

After this proposal is fully deployed, all valid combinations of
options should work well. In particular:

* the default should be:
  ClientUseIPv4 1
  ClientUseIPv6 1
  ClientPreferIPv6ORPort 0 (for load-balancing reasons)
* tor clients should work with these defaults on IPv4-only, dual-stack,
  and IPv6-only networks (and they should continue to work on all these
  networks if ClientPreferIPv6ORPort is 1)
* we should have consensus parameters for:
  ClientUseIPv6 (emergency use)
  ClientPreferIPv6ORPort (if most of the guards have IPv6, and it's fast)

We should probably ClientUseIPv6 0 in the first alpha release, and then
change the consensus parameter and torrc defaults after we've done enough
testing.

We should be clearer about these torrc options, consensus parameters,
testing, and deployment in the proposal.

> There may also be other ordering parameters for the address candidates.
> We might want to avoid using IPv6 addresses that are using 6to4 or
> Teredo as we *know* those are tunnels and thus have encapsulation
> overhead, higher latency, and funnel all the traffic through centralised
> (even if distributed) points in the network.

I'm not sure how this feature would work: most of the time, when tor is
ordering addresses, it has already chosen a relay. It has exactly one
IPv4 address, and an optional IPv6 address.

This kind of ordering of multiple IPv6 addresses requires a pool of
addresses from multiple relays. It's out of scope for this proposal, but
it could be implemented as part of our pool refactor:
https://trac.torproject.org/projects/tor/ticket/30817#comment:3

>> In Proposal 299, we have attempted a IP fallback mechanism using
> failure
>> counters and preferring IPv4 and IPv6 based on the state of the
> counters.
>> However, Prop299 was not standard Happy Eyeballs and an alternative,
>> standards-compliant proposal was requested in [P299-TRAC] to avoid
> issues
>> from complexity caused by randomness.
>> 
>> This proposal describes a Tor implementation of Happy Eyeballs and
>> is intended as a successor to Proposal 299.
>> 
>> 2. Address Selection
>> 
>> To be able to handle Happy Eyeballs in Tor, we will need to modify
>> the data structures used for connections to guards, namely the extend
>> info structure.
>> 
>> The extend info structure should contain both an IPv4 and an IPv6
> address.
>> This will allow us to try IPv4 and the IPv6 addresses should both be
>> available on a relay and the client is dual-stack.
> 
> The Happy Eyeballs specification doesn't just talk about having one v4
> and one v6 address. In some cases, relays may be multihomed and so may
> have multiple v4 or v6 addresses. We should be able to race all the
> candidates.

Tor relays only advertise 1 IPv4 address:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n392
and 0 or 1 IPv6 address:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n764
in their descriptor.

The consensus only contains 1 IPv4 address:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2297
and 0 or 1 IPv6 address:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2316
per relay.

Adding extra addresses is out of scope for this proposal. We could do
it in a separate proposal, but it might not be the best use of limited
space in the consensus.

(If a relay machine is down, all its addresses are down. It's rare for
a client to not be able to reach one IP address on a relay, but be
able to reach another address on the same relay in the *same* IP
family.)

>> When parsing relay descriptors and filling in the extend info data
>> structure, we need to fill in both the IPv4 and IPv6 address if
> they both
>> are available. If only one family is available for a relay (IPv4 or
> IPv6),
>> we should fill in the address for preferred family and leave the
> alternate
>> family null.
> 
> To match the IETF protocol more closely, we should have a list of
> candidate addresses and order them according to our preferences.

With the current descriptor and consensus implementation, there
will only ever be 1 or 2 addresses in the list for each relay.

(There is one extend info data structure per relay connection
request. Modifying other parts of the tor implementation is out of
scope for this proposal.)

>> 3. Connecting To A Relay
>> 
>> If there is an existing authenticated connection, we should use it
>> similar to how we used it pre-Prop306.
>> 
>> If there is no existing authenticated connection for an extend info,
>> we should attempt to connect using the first available, allowed, and
>> preferred address.
>> 
>> We should also allow falling back to the alternate address. For
>> this, three alternate designs will be given.
>> 
>> 3.1. Proposed Designs
>> 
>> This subsection will have three proposed designs for connecting to
> relays
>> via IPv4 and IPv6 in a Tor implementation of Happy Eyeballs.

Here are the design tradeoffs for this section, which we should add to
the proposal:
* launching multiple TCP connections places up to 2x the socket load
  on dual-stack relays and authorities, because both connections may
  succeed,
* launching multiple TLS connections places up to 2x the CPU load on
  dual-stack relays and authorities, because both connections may
  succeed,
* increasing the delays between connections mitigates these issues,
  but reduces perceived performance, particularly at bootstrap time
  (pre-emptive circuits hide these delays after bootstrap).

>> The
> proposed
>> designs are as listed as follows:
>> 
>> * Section 3.1.1: First Successful Authentication
>> 
>> * Section 3.1.2: TCP Connection to Preferred Address On First
> Authenticated
>> Connection
>> 
>> * Section 3.1.3: TCP Connection to Preferred Address On First TCP
> Success
>> 
>> 3.1.1. First Successful Authentication
>> 
>> In this design, Tor will first connect to the preferred address and
>> attempt to authenticate. After a 1.5 second delay, Tor will connect
>> to the alternate address and try to authenticate. On the first
>> successful authenticated connection, we close the other connection.
>> 
>> This design places the least connection load on the network, but
>> might add extra TLS load.
> 
> The delay seems arbitrary. OnionPerf collects data on latency in the Tor
> network, and could be used to inform better timing choices for the best
> end user performance (the happiest eyeballs).

The 1.5 second delay is based on Onionperf data, and we should reference
the Onionperf figures in the proposal.

See my previous review of an earlier draft of this proposal:

>> On 26 Jun 2019, at 13:33, teor <teor at riseup.net> wrote:
> 
>>> 
>>> Depending on their location, most tor clients authenticate to the first
>>> hop within 0.5-1.5 seconds. So I suggest we use a 1.5 second delay:
>>> https://metrics.torproject.org/onionperf-buildtimes.html
>>> 
>>> In RFC 8305, the default delay is 250 milliseconds, and the maximum
>>> delay is 2 seconds. So 1.5 seconds is reasonable for TLS and tor link
>>> authentication.
>>> https://tools.ietf.org/html/rfc8305#section-8
>>> 
>>> (This delay will mainly affect initial bootstrap, because all of Tor's
>>> other connections are pre-emptive, or re-used.)
>>> 
>>> A small number of clients may do wasted authentication.
>>> That's ok. Tor already does multiple bootstrap and guard connections.


> If we choose to take this route, we should open new connections with a
> timeout of ~250ms, and only change the condition for deciding which is
> the connection we will use.

Tor already does multiple bootstrap and guard connections over IPv4, so
I'm not sure exactly what design you're proposing. Can you give me an
example?

>> 3.1.2. TCP Connection to Preferred Address On First Authenticated
> Connection
>> 
>> This design attempts a TCP connection to a preferred address. On a
>> failure or a 250 ms delay, we try the alternative address.
>> 
>> On the first successful TCP connection Tor attempts to authenticate
>> immediately. On the authentication failure, or a 1.5 second delay,
>> Tor closes the other connection.

Neel, that's not what I wrote in my last email:

>> On 26 Jun 2019, at 13:33, teor <teor at riseup.net> wrote:
>>> 
>>> 1. Tor connects to the preferred address and tries to authenticate.
>>>    On failure, or after a 1.5 second delay, it connects to the alternate address
>>>    and tries to authenticate.
>>>    On the first successful authentication, it closes the other connection.

A small number of clients will take longer than 1.5 seconds to
authenticate. So we should only close a connection when the other
connection to the relay successfully authenticates.

>> This design is the most reliable for clients, but increases the
>> connection load on dual-stack guards and authorities.
> 
> Creating TCP connections is not a huge issue,

That's not true: Tor's last connection level denial of service event
was November 2017 - February 2018. And there are occasional connection
spikes on authorities and fallbacks.

These connection DoSes need to be mentioned in the proposal.

> and we should be racing
> the connections with the ~250ms timeout anyway. All the designs will
> have this issue.

I'm not sure exactly what issue you're referring to?

>> 3.1.3. TCP Connection to Preferred Address On First TCP Success
>> 
>> In this design, we will connect via TCP to the first preferred
>> address. On a failure or after a 250 ms delay, we attempt to connect
>> via TCP to the alternate address. On a success, Tor attempts to
>> authenticate and closes the other connection.
>> 
>> This design is the closest to RFC 8305 and is similar to how Happy
>> Eyeballs is implemented in a web browser.
> 
> This is probably also the "simplest" to implement, as it means that the
> happy eyeballs algorithm is contained to the socket handling code.
> 
> I don't believe that requiring authentication to complete is going to do
> anything more than generate load on relays. Either the packet loss is
> high enough that the three way handshake fails, or there is low packet
> loss. I don't think the case where requiring an additional few packets
> make it through helps you choose a better connection is going to be that
> common.

Middleboxes that only break IPv4 TLS are rare, but they do exist:

>> On 26 Jun 2019, at 13:33, teor <teor at riseup.net> wrote:
>>> 
>>> We have talked about this design in the team over the last few months.
>>> Our key insights are that:
>>> * most failed TCP connections fail immediately in the kernel, some
>>>   fail quickly with a response from the router, and others are blackholed
>>>   and time out
>>> * it's unlikely that a client will fail to authenticate to a relay over one
>>>   IP version, but succeed over the other IP version, because the directory
>>>   authorities authenticate to each relay when they check reachability
>>> * some censorship systems only break authentication over IPv4,
>>>   but they are rare

But we still want tor to work by default on those networks, so we should
try IPv4 and IPv6 all the way up to TLS.

> Of course it is always possible to add a "PreferredAddressFamily" option
> to torrc for those that know they are on a bad IPv6 network.

Tor already has this torrc option:
* ClientPreferIPv6ORPort - prefer IPv6, off by default

>> 3.2. Recommendations for Implementation of Section 3.1 Proposals
>> 
>> We should start with implementing and testing the implementation as
>> described in Section 3.1.1 (First Successful Authentication), and
>> then doing the same for the implementations described in 3.1.2 and
>> 3.1.3 if desired or required.
> 
> I'd want to see some justification with some experimental (or even
> anecdotal) data as to why first successful authentication is the way to
> go. 3.1.3 is going to be the simpler option and, in my opinion, the best
> place to start.

It increases the risk of network-wide DoS, and fails to work around some
censored networks. But it might be good for a simple initial test
implementation.

> 3.1.3 can likely be implemented using exactly the algorithm in section 5
> of RFC 8305, excluding portions relating to DNS because we already have
> all the candidates from the server descriptor.

All supported Tor client versions use microdescriptors, not server
descriptors. Since consensus method 28 in tor 0.3.3.6, microdesc
consensuses contain IPv6 addresses. (This is important during bootstrap.)

See proposal 283 for context:
https://gitweb.torproject.org/torspec.git/tree/proposals/283-ipv6-in-micro-consensus.txt

We also intend to use this proposal to connect to the hard-coded fallbacks
and authorities, some of which have IPv6 addresses.

Ideally, we shouldn't need to change any of the code from proposal 283.

But we might need to change the relay selection logic, because otherwise
tor could chose a run of IPv4-only relays, and fail to bootstrap on an
IPv6-only network.

So we need to add another section to the proposal, I guess.

>> 4. Handling Connection Successes And Failures
>> 
>> Should a connection to a guard succeed and is authenticated via TLS,
>> we can then use the connection. In this case, we should cancel all
>> other connection timers and in-progress connections. Cancelling the
>> timers is so we don't attempt new unnecessary connections when our
>> existing connection is successful, preventing denial-of-service
>> risks.
>> 
>> However, if we fail all available and allowed connections, we
> should tell
>> the rest of Tor that the connection has failed. This is so we can
> attempt
>> another guard relay.
> 
> Some issues that come to mind:
> 
> - I wonder how many relay IPv6 addresses are actually using tunnels. At
> the levels of throughput they use, that overhead adds up. What is the
> additional bandwidth cost and what is the impact of reduced MSS?

Here's one way we can mitigate this overhead:
* tor clients prefer IPv4 by default,
* tor uses a 1.5 second delay between IPv4 and IPv6 connections

That way, most clients that can use IPv4, will end up using IPv4, and
avoid this overhead.

The clients that don't will fall into two categories:
* IPv6-only, so the overhead is a small price to pay for connectivity, or
* high-latency, so the overhead might not be noticeable anyway.

> - What are the tunables? RFC8305 has some that would be applicable, and
> probably all of them could be consensus parameters if we wanted to tune
> them:
> * First Address Family Count

This value must be fixed at 1.

Tor's code only connects to 1 relay at a time, and that relay only has
1 address from each family. Increasing the number of addresses per relay
or per "happy eyeballs" attempt is out of scope for this proposal.

> * Connection Attempt Delay

From Onionperf data, I think this should default to 1.5 seconds.

But I'm happy to modify it based on testing, or future Onionperf
measurements. Let's make it a torrc option and consensus parameter?

> * Minimum Connection Attempt Delay

Dynamically adjusting the delay per client is out of scope for this
proposal. It also carries privacy risks, unless we add some jitter.

Let's fix the minimum at 10 milliseconds as recommended in RFC
8305, and adjust it network-wide using the "Connection Attempt Delay"
consensus parameter.

> * Maximum Connection Attempt Delay

As above, but if we choose to include TLS in the delay, we should
set the maximum much higher than the RFC 8305 recommendation of
2 seconds. Let's make it 30 seconds, to match tor's existing timeout.

(Users might want to set the delay this high on very slow networks.)

> - How do we know what is going on? We do not collect metrics from
> clients about their usage, but we do collect metrics from relays. Are
> there any counters we should be adding to extra info descriptors to help
> us see whether or not this is working?

We should definitely be collecting the number of IPv4 and IPv6 connections
to ORPorts. We should probably also distinguish authenticated
(relay, authority reachability) and unauthenticated (client, bridge)
connections.

We should also be including these stats in the heartbeat logs.

We were going to wait for PrivCount for these stats, but we didn't manage
to implement it in the sponsored time we had available. So I don't think
it makes sense to block further stats on PrivCount at this time.

> Could clients help relays by
> reporting that a connection is being closed because they have another
> connection? (I don't know the answer, but RFC8305 does explicitly point
> out that it is a mitigation technique designed to hide problems from the
> user, which means that those problems might come back to haunt us later
> if we're not on top of them.)


Clients don't report circuit or stream close reasons to relays, to
preserve privacy and avoid information leaks.

Clients can't always report connection close reasons over the Tor
protocol, because it sits below the TLS layer, but connections can be
closed at the TCP stage. (Or any subsequent stage, including TLS, link,
or link authentication.)

T
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20190702/2afd4bfd/attachment-0001.sig>


More information about the tor-dev mailing list