On 2025-02-12 12:26, Cecylia Bocovich wrote:
On 2025-02-11 09:26, Michael Rogers via anti-censorship-team wrote:
On 11/02/2025 12:47, Michael Rogers via anti-censorship-team wrote:
On 07/02/2025 21:22, Cecylia Bocovich via anti-censorship-team wrote:
On 2025-02-07 12:22, Michael Rogers via anti-censorship-team wrote:
Hi all,
After updating Briar's bridge config to use the current settings from Moat, we're seeing two Snowflake bridges consistently failing in our CI tests. They're the two bridges that use SQS. Here's a snippet from the log:
INFO: NOTICE Managed proxy "/builds/briar/onionwrapper/ onionwrapper-java/test.tmp/35/lyrebird": offer created Feb 04, 2025 1:12:34 PM org.briarproject.onionwrapper.AbstractTorWrapper message INFO: NOTICE Managed proxy "/builds/briar/onionwrapper/ onionwrapper-java/test.tmp/35/lyrebird": broker failure operation error SQS: GetQueueUrl, https response error StatusCode: 400, RequestID: 60e91cfa-a2a0-55db-beb0-7ce6b621d324, AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist.
Does the queue really not exist, or does this point to some other issue, like the bridges being geoIP restricted or the app needing to pass some extra information to the transport?
[snip]
Well, I should've waited before sending that message, because starting at 12:59 UTC the attempts to bootstrap via SQS bridges succeeded, with only one queue-related error being printed per boostrapping attempt.
Did anything change in the bridge config around that time, or could the queue errors be load-dependent?
Cheers, Michael
Nothing changed as far as I'm aware. It seems likely to me that there is some external factor (like load) that is causing a lot of variation in how long it takes to create these queues.
I just remembered this open issue: https://gitlab.torproject.org/tpo/ anti-censorship/pluggable-transports/snowflake/-/issues/40363
I'm still not sure of the cause but it seems to be more likely to happen if two bridges are configured at the same time.
It turns out there were two main bugs with the SQS queue implementation that explain the queue creation errors you were seeing. These have both been fixed.
The first bug was a bottleneck that was preventing us from receiving messages from the broker queue quickly enough[0]. This explained why the failure rate was sometimes higher at times when there was more load on the system.
The second bug was a pointer reuse error that caused multiple simultaneous polls to be overwritten[1]. This was also likely to occur at times of increased load but could also be triggered by having more than one SQS bridge line.
We haven't had any issue with our budget limits lately. With these fixes, SQS is now perhaps the most reliable rendezvous channel.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
[1] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...