On 2023-09-24 23:38, David Fifield wrote:
On Thu, Sep 21, 2023 at 09:26:58PM -0600, David Fifield wrote:
I made a graph of the bandwidth on the two bridges since this started happening.
The two vertical lines mark: 2023-09-20 14:00:00 earliest known case of domain resolving to Cloudflare 2023-09-21 18:00:00 change to foursquare.com in rdsys https://gitlab.torproject.org/tpo/anti-censorship/rdsys-admin/-/merge_reques...
- snowflake-02 bandwidth has dwindled to almost nothing. Seriously almost nothing: it's around 3 MB/s currently.
- There's a huge almost instantaneous step in snowflake-01 at around 2023-09-21 13:00:00. At first, I thought this might have been a consequence of the rdsys change, but it's about 5 hours earlier than that. What could it be? Some unrelated unblocking event that just happened to happen while this domain stuff is happening?
The non-use of snowflake-02 continues -- see the attached graph. I'm racking my brain trying to understand that is. snowflake-01 usage has decreased a lot too -- the graph appears to be at about the same level, but you can see it's not brickwalled at the upper end of the range as it was before. Even ignoring the step anomaly at 2023-09-21 13:00:00, it didn't go to zero like snowflake-02 did.
It may be that whatever decides whether you get a Fastly or a Cloudflare edge server correlates highly with whether your client is capable of using snowflake-02. My working assumption, so far, has been that Tor Browser has multi-bridge support since 12.0 (2022-12-07), while Orbot only has multi-bridge support in the unreleased Orbot 17 (https://github.com/guardianproject/orbot/releases/tag/17.0.0-BETA-2-tor.0.4.... is the first beta release to have it). If Tor Browser users are mostly on desktop, and mobile users are mostly on mobile/cellular, and DNS resolution for cdn.sstatic.net also correlates with desktop vs. mobile, then that could explain it. It would mean that ~100% of Tor Browser users are getting a Cloudflare IP address, and <100% of Orbot users are.
But it's not the case that Orbot 17 is totally unreleased. The Play Store currently has 16.6.3-RC-1-tor.0.4.7.10 released 2022-11-01: https://web.archive.org/web/20230925022736/https://play.google.com/store/app... But Orbot 17 betas are available (most recent is 2023-08-09 https://github.com/guardianproject/orbot/releases/tag/17.1.0-BETA-3-tor.0.4....), and version 17 is in F-Droid: http://meetbot.debian.net/tor-meeting/2023/tor-meeting.2023-09-21-15.57.log.... <dcf1> Orbot 17 has both bridges, but it's not released yet, except in beta, afaik. I walways thought that was the cause of the low use of snowflake-02, that we were just waiting for Orbot to make a full release of v17. But maybe it is more complicated. <meskio> mmm, I have here orbot 17, so I guess I'm using the beta... <meskio> is in fdroid So even if the correlation hypothesis were correct, I wouldn't expect snowflake-02 to drop as far as it ha
I looked into this a bit because I also have Orbot 17 and I was curious about how it works.
As discussed in the team meeting, Orbot 17 users have access to what Orbot calls the "Ask Tor" feature that pulls bridge lines from our circumvention settings API: https://bridges.torproject.org/moat/circumvention/map
However, when it comes to Snowflake, Orbot won't use the provided bridge lines. If this API call returns a bridge of type Snowflake, it will instead use the builtin bridges. So it seems that our update to the circumvention settings won't benefit Orbot 17 users either. I opened an issue about this: https://github.com/guardianproject/orbot/issues/983
This doesn't at all explain the lack of use of snowflake-02 before this event. It is even provided first in their torrc configuration for v17.
I'm curious if the Guardian Project has any guesses on which distribution channels are most popular. I assumed that not many users would be downloading it from their fdroid repository because I assumed it would be blocked, but I just checked on OONI and it doesn't appear blocked in most places.
Maybe the bridge selection at the client is not as random as we intend? Even though there are two bridge lines, maybe tor systematically prefers the one that's listed first? The idea here is that maybe snowflake-02 only gets used when snowflake-01 is past its capacity and starts to fail connection attempts. With the suddenly reduced overall level of users, there's enough headroom that snowflake-02 essentially never gets used.
A possible explanation for the sudden step in snowflake-01 usage at 2023-09-21 13:00:00 is that there's a population of Snowflake clients out there other than the ones we are responsible for. Whoever is distributing the clients for that population may have noticed the cdn.sstatic.net change and deployed their own mitigation, separate from anything we have done. The step only took about 15 minutes (see the second attached graph), which is a pretty fast deployment. If that other imagined deployment only knows about snowflake-01, that could explain why the step appears in snowflake-01's graph and not snowflake-02's. It still doesn't explain why, before the step, snowflake-01 took a big hit to users but did not go to zero, while snowflake-02 kept declining.
I looked at our historical prometheus broker metrics (snapshot attached) and I don't see any artifacts that suggest something of note happened at 2023-09-31 13:00:00. If there was a deployed fix, I would have expected client polls to change at least a little (either rise because clients found out it was working again or drop because clients are no longer repeatedly contacting the broker).
Maybe we have an undetected bug in multi-proxy support that favors snowflake-01? The broker is supposed to reject proxies that do not have multi-bridge support since 2022-10-03: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... But maybe it's not working the way it's supposed to? Maybe it's easier to get a proxy for snowflake-01 than for snowflake-02?
At times in the past, we've approached questions like this by introducing new metrics. We could add some broker metrics that count the different allowed-relay-hostname-pattern provided by proxies and the bridge fingerprints that clients are requesting (or whether they are mostly not providing a bridge and relying on the default snowflake-01).
This wouldn't necessarily need to a be a permanent change, just a temporary deployment of more prometheus metrics at the broker until we figure out what's going on.
Maybe there's something wrong with the snowflake-02 bridge? I've been using snowflake-02 all day today (using AMP cache rendezvous). In the morning, it did seem to be a little screwy -- I couldn't get a YouTube video to play without frequent stops. One time, I happened to notice these messages in the log; they may be unrelated, but perhaps there is some weird interaction with Conflux: 2023-09-24 19:03:16.550 [NOTICE] Failed to find node for hop #1 of our path. Discarding this circuit. 2023-09-24 19:03:16.552 [NOTICE] Our circuit 0 (id: 38) died due to an invalid selected path, purpose Unlinked conflux circuit. This may be a torrc configuration issue, or a bug. 2023-09-24 19:22:50.237 [NOTICE] Failed to find node for hop #1 of our path. Discarding this circuit. I checked the bridge to ensure that the expected version of the server software was deployed (commit 0a6aeda9), and it was.
While I was using Tor Browser, I let it upgrade to 13.0a5. 13.0a5 has a fix to the default bridge lines, but I uses a manual bridge line so I would only be on snowflake-02. After the upgrade, it started working better and I could watch YouTube as normal. Maybe it was just a concidence that the upgrade to 13.0a5 seemed to improve the performance? In any case, from watching bandwidth on the bridge, it looks like I've had the bridge mostly to myself all day.
Just for good measure, I upgraded tor on snowflake-02 from 0.4.7.13-1~focal+1 to 0.4.8.6-1~focal+1 at 2023-09-24 20:48:25.