I’m sure this exists somewhere so this is more of a request-for-links, but what’s the current thinking on TCP CCA selection for Tor relays? While it has fairness issues (and reported long-tail issues for higher-latency links, though I can’t find good in-practice analysis of this), BBA should handle random packet loss much better than, eg, Cubic. This is likely less of an issue for western users, but many other parts of the world (especially China) see much higher packet loss due to regularly-overloaded links. I presume it is not good practice to change the default CCA for relays/bridges, but it seems BBA/BBAv2 would be a worthwhile experiment to see if it improves the browsing experience for non-western tor users.
Matt
Hi Matt
Am 2020-01-09 um 6:58 AM schrieb Matt Corallo:
I’m sure this exists somewhere so this is more of a request-for-links, but what’s the current thinking on TCP CCA selection for Tor relays? While it has fairness issues (and reported long-tail issues for higher-latency links, though I can’t find good in-practice analysis of this), BBA should handle random packet loss much better than, eg, Cubic. This is likely less of an issue for western users, but many other parts of the world (especially China) see much higher packet loss due to regularly-overloaded links. I presume it is not good practice to change the default CCA for relays/bridges, but it seems BBA/BBAv2 would be a worthwhile experiment to see if it improves the browsing experience for non-western tor users.
Matt
You can find a nice compare between loss less and loss based congestion here [1].
It's difficult to say if one or the other are better in the use with Tor. A single TCP connection between two Tor relays bundles multiple circuits (data flows) which can result in very different needs for congestion to connect end points.
[1] https:// heim.ifi.uio.no/davihay/hayes10__google_delay_based_tcp_conges_contr.pdf -- Cheers, Felix
On Thu, 9 Jan 2020 00:58:36 -0500 Matt Corallo tor-lists@mattcorallo.com wrote:
BBA should handle random packet loss much better than, eg, Cubic.
Do you mean BBR? https://github.com/google/bbr
In my experience it does work very well on Tor relays, and also on servers in general (keeping in mind that these TCP congestion control algorithms only affect upload, so matter most on hosts which do a lot of uploading, or as in case of Tor both upload and download).
The next best in my tests was Illinois: https://en.wikipedia.org/wiki/TCP-Illinois I've been using it for a long time before BBR got included in the Linux kernel. Today, in some cases BBR is better, in other Illionis can be. The latter ramps up a bit slower on new connections, but appears to be able to achieve higher speeds after that.
These two are head and shoulders better than all other options available in the Linux kernel, including the default one (Cubic). And yes, perhaps indeed this is an area of Tor relay performance tuning that doesn't get enough of the attention that it deserves.
Cool! What did your testing rig look like?
I suppose the real question is what does the latency/loss profile of the average Tor (bridge) user look like?
On 1/10/20 8:18 AM, Roman Mamedov wrote:
On Thu, 9 Jan 2020 00:58:36 -0500 Matt Corallo tor-lists@mattcorallo.com wrote:
BBA should handle random packet loss much better than, eg, Cubic.
Do you mean BBR? https://github.com/google/bbr
In my experience it does work very well on Tor relays, and also on servers in general (keeping in mind that these TCP congestion control algorithms only affect upload, so matter most on hosts which do a lot of uploading, or as in case of Tor both upload and download).
The next best in my tests was Illinois: https://en.wikipedia.org/wiki/TCP-Illinois I've been using it for a long time before BBR got included in the Linux kernel. Today, in some cases BBR is better, in other Illionis can be. The latter ramps up a bit slower on new connections, but appears to be able to achieve higher speeds after that.
These two are head and shoulders better than all other options available in the Linux kernel, including the default one (Cubic). And yes, perhaps indeed this is an area of Tor relay performance tuning that doesn't get enough of the attention that it deserves.
On Fri, 10 Jan 2020 16:24:56 +0000 Matt Corallo tor-lists@mattcorallo.com wrote:
Cool! What did your testing rig look like?
A few years ago I've got a dedicated server from one of these cheap French hosts, which appeared to have a congested uplink (low-ish upload speeds). Since the support was not able to solve this, but the server was very cheap to cancel just over that, I looked for ways to utilize it better even despite the congestion.
If I remember correctly, I also had a Japanese VPS at the time, so my tests were intentionally for a "difficult" case, uploading from France to Japan (with 250+ms ping).
Here are my completely unscientific scribbles of how all the various algorithms behaved. The scenario is uploading for a minute or so, observing the speed in MB/sec visually, then recording how it appeared to change during that minute (and then repeating this a couple of times to be certain).
tcp_bic.ko -- 6...5...4 tcp_highspeed.ko -- 2 tcp_htcp.ko -- 1.5...3...2 tcp_hybla.ko -- 3...2...1 tcp_illinois.ko -- 6...7...10 tcp_lp.ko -- 2...1 tcp_scalable.ko -- 5...4...3 tcp_vegas.ko -- 2.5 tcp_veno.ko -- 2.5 tcp_westwood.ko -- <1 tcp_yeah.ko -- 2...5...6
This was on the 3.14 kernel which did not have BBR yet to compare. In later comparisons, as mentioned before, it is on par or better than Illinois.
I suppose the real question is what does the latency/loss profile of the average Tor (bridge) user look like?
I think the real question is, is there any reason to *not* use BBR or that Illinois. So far I do not see a single one.
Hmm, this type of test doesn’t really seem to have much connection to the average Tor user. Middle relay <-> middle relay connections may be mostly servers, but residential/mobile connections in Russia/Iran/China likely don’t perform quite the same. Worse still, BBR can have measurable effects on packet retransmissions, and while it may require an unrealistic amount of state to track such things for many flows, it’s not a given that it won’t make tor traffic stand out (luckily, of course, large CDNs like DropBox, Spotify, YouTube, etc have been migrating to it, so maybe this won’t be the case in the future).
Sadly, the large scale deployments of BBR are mostly not high-latency links (as CDNs generally have a nearby datacenter for you to communicate with), and the high retransmission rates may result in more “lag” for browsing when absolute bandwidth isn’t the primary concern. On the flip side, Spotify’s measurements seem to indicate that, at least in some cases, the jitter can decrease enough to be noticeable for users.
Is there a way we could do measurements of packet loss/latency profiles of bridge users? This should enable simulation for things like this, but it sounds like there’s no good existing work in this domain?
Matt
On Jan 10, 2020, at 17:36, Roman Mamedov rm@romanrm.net wrote:
On Fri, 10 Jan 2020 16:24:56 +0000 Matt Corallo tor-lists@mattcorallo.com wrote:
Cool! What did your testing rig look like?
A few years ago I've got a dedicated server from one of these cheap French hosts, which appeared to have a congested uplink (low-ish upload speeds). Since the support was not able to solve this, but the server was very cheap to cancel just over that, I looked for ways to utilize it better even despite the congestion.
If I remember correctly, I also had a Japanese VPS at the time, so my tests were intentionally for a "difficult" case, uploading from France to Japan (with 250+ms ping).
Here are my completely unscientific scribbles of how all the various algorithms behaved. The scenario is uploading for a minute or so, observing the speed in MB/sec visually, then recording how it appeared to change during that minute (and then repeating this a couple of times to be certain).
tcp_bic.ko -- 6...5...4 tcp_highspeed.ko -- 2 tcp_htcp.ko -- 1.5...3...2 tcp_hybla.ko -- 3...2...1 tcp_illinois.ko -- 6...7...10 tcp_lp.ko -- 2...1 tcp_scalable.ko -- 5...4...3 tcp_vegas.ko -- 2.5 tcp_veno.ko -- 2.5 tcp_westwood.ko -- <1 tcp_yeah.ko -- 2...5...6
This was on the 3.14 kernel which did not have BBR yet to compare. In later comparisons, as mentioned before, it is on par or better than Illinois.
I suppose the real question is what does the latency/loss profile of the average Tor (bridge) user look like?
I think the real question is, is there any reason to *not* use BBR or that Illinois. So far I do not see a single one.
-- With respect, Roman
Hi,
On 11/01/2020 05:07, Matt Corallo wrote:
Sadly, the large scale deployments of BBR are mostly not high-latency links (as CDNs generally have a nearby datacenter for you to communicate with), and the high retransmission rates may result in more “lag” for browsing when absolute bandwidth isn’t the primary concern. On the flip side, Spotify’s measurements seem to indicate that, at least in some cases, the jitter can decrease enough to be noticeable for users.
BBR is good for Netflix, but is not so good for non-streaming traffic. You also get issues between competing flows which doesn't matter for Netflix (typically you only watch one video at a time) but would matter for Tor.
We don't have good models of what Tor traffic looks like, but I strongly suspect it is different to the Neflix/YouTube typical workloads.
Is there a way we could do measurements of packet loss/latency profiles of bridge users? This should enable simulation for things like this, but it sounds like there’s no good existing work in this domain?
We have two tools that build simulated/emulated Tor networks: chutney and shadow. Unfortunately, neither implements everything that would be required. We really want to see what happens when x% of the network switches congestion control algorithm and see how flows interact at large relays (either relay to relay, or guard connections).
If you have a large openstack cluster available, you could set up with your favorite orchestration tool a number of VMs with emulated WAN links between them, and connect a bunch of Tor clients to that network in other VMs, and perform measurements.
Last time I looked you could not switch TCP congestion control algorithm in Linux per-namespace (maybe you can now and you don't need to have multiple VMs).
Generally I would recommend *not* changing from TCP cubic unless you really understand the interactions that are going on between flows.
Thanks, Iain.
Quoting Iain Learmonth (2020-01-20 16:00:01)
Last time I looked you could not switch TCP congestion control algorithm in Linux per-namespace (maybe you can now and you don't need to have multiple VMs).
It's been allowed for about two years now [0], but you don't need it anyways. Trying out new congestion control algorithms is not exactly a new fad, so it has been possible to set the congestion control algorithm via setsockopt since, apparently, Linux 2.6.13 [1], released a good 15 years ago. You'd probably need to patch tor to do that effectively, but if you're going to all this trouble anyways, patching one program shouldn't really be a barrier.
[0] https://github.com/torvalds/linux/commit/6670e152447732ba90626f36dfc015a13fb... [1] http://man7.org/linux/man-pages/man7/tcp.7.html
tor-relays@lists.torproject.org