[anti-censorship-team] Comparing broker answer and connection times across snowflake-01 and snowflake-02

David Fifield david at bamsoftware.com
Wed Oct 11 19:08:44 UTC 2023


On Tue, Sep 26, 2023 at 04:50:08PM -0400, Cecylia Bocovich wrote:
> > Maybe we have an undetected bug in multi-proxy support that favors
> > snowflake-01? The broker is supposed to reject proxies that do not have
> > multi-bridge support since 2022-10-03:
> > https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/40193
> > But maybe it's not working the way it's supposed to? Maybe it's easier
> > to get a proxy for snowflake-01 than for snowflake-02?
> 
> At times in the past, we've approached questions like this by introducing
> new metrics. We could add some broker metrics that count the different
> allowed-relay-hostname-pattern provided by proxies and the bridge
> fingerprints that clients are requesting (or whether they are mostly not
> providing a bridge and relying on the default snowflake-01).
> 
> This wouldn't necessarily need to a be a permanent change, just a temporary
> deployment of more prometheus metrics at the broker until we figure out
> what's going on.

Leaving aside the original topic of this thread (namely why snowflake-02
did not recover to former levels as quickly as snowflake-02 did,
following the cdn.sstatic.net change, which is still a mystery), I did
some client experiments to see if it's easier/faster to get a proxy for
snowflake-01 versus snowflake-02.

In summary, no, it doesn't seem like snowflake-01 has an advantage over
snowflake-02, at least when the torrc is configured to use only one of
the two. The median time to get an answer from the broker, and the total
time to get a working proxy, are roughly equal and even a little lower
for snowflake-02. The poll success rate is a little higher with
snowflake-02 too.

What this suggests to me is that the imbalance in usage of the two
bridges is attributable to client choices, like we thought initially.

These are the distributions of times (in seconds) over 50 bootstraps
with each bridge. The mean time to get an answer from the broker, for
example, was 1.4745 s with snowflake-01, and 1.659 s with snowflake-02.

== snowflake-01 ==

    Distribution of poll outcomes:
    broker-timeout      connected  proxy-timeout
                 9             50             14

    Time to get answer from broker:
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
     0.2780  0.6450  0.7260  1.4745  0.8692  6.4550       9

    Time to get answer from broker (successful only):
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     0.2780  0.6508  0.7260  1.3237  0.8648  6.4550

    Total time to connection:
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      3.521   4.021   4.601  10.568   8.770  74.889

== snowflake-02 ==

    Distribution of poll outcomes:
    broker-timeout      connected  proxy-timeout
                 2             50              9

    Time to get answer from broker:
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
      0.354   0.615   0.721   1.659   1.022   7.432       2

    Time to get answer from broker (successful only):
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     0.3540  0.6050  0.6955  1.3836  0.8117  7.4320

    Total time to connection:
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      3.705   4.011   4.495   8.111   6.026  53.076


This is the loop to bootstrap with each bridge 50 times (there's a race
condition but it didn't cause a problem for me):
for x in $(seq 1 50); do for inst in 01 02; do TZ=UTC tor -f torrc-$inst & tail -n 0 -F tor-$inst.log | grep -l 'Bootstrapped 100'; kill $!; done; done

torrc-01 and torrc-02 are:

```
UseBridges 1
DataDirectory datadir-01
ClientTransportPlugin snowflake exec ./client
Bridge snowflake 192.0.2.3:80 2B280B23E1107BB62ABFC40DDCC8824814F80A72
fingerprint=2B280B23E1107BB62ABFC40DDCC8824814F80A72
url=https://snowflake-broker.torproject.net.global.prod.fastly.net/
front=foursquare.com fronts=foursquare.com,github.githubassets.com
ice=stun:stun.l.google.com:19302,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478
utls-imitate=hellorandomizedalpn
SocksPort auto
LogTimeGranularity 1
Log notice stdout
Log notice file tor-01.log
```
```
UseBridges 1
DataDirectory datadir-02
ClientTransportPlugin snowflake exec ./client
Bridge snowflake 192.0.2.4:80 8838024498816A039FCBBAB14E6F40A0843051FA
fingerprint=8838024498816A039FCBBAB14E6F40A0843051FA
url=https://snowflake-broker.torproject.net.global.prod.fastly.net/
front=foursquare.com fronts=foursquare.com,github.githubassets.com
ice=stun:stun.l.google.com:19302,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.net:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478
utls-imitate=hellorandomizedalpn
SocksPort auto
LogTimeGranularity 1
Log notice stdout
Log notice file tor-02.log
```

This command extracts event timestamps from the tor logs:
perl parse.pl tor-01.log tor-02.log > times.csv
(I hacked times.csv after the fact to add fingerprints to the first run
on each bridge.)

This command shows the distribution of times:
Rscript times.r
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parse.pl
Type: text/x-perl
Size: 1912 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/anti-censorship-team/attachments/20231011/fe2f8a7e/attachment-0001.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: times.csv
Type: text/csv
Size: 19652 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/anti-censorship-team/attachments/20231011/fe2f8a7e/attachment-0001.csv>
-------------- next part --------------
library(tidyverse)

options(width=120)

FINGERPRINTS <- c(
	"2B280B23E1107BB62ABFC40DDCC8824814F80A72" = "snowflake-01",
	"8838024498816A039FCBBAB14E6F40A0843051FA" = "snowflake-02"
)

x <- read_csv("times.csv")

for (fingerprint in names(FINGERPRINTS)) {
	cat("\n")
	cat("\n")
	cat(FINGERPRINTS[[fingerprint]], "\n");
	xx <- filter(x, fingerprint == !!fingerprint)
	xx_connected <- filter(xx, resolution == "connected")
	cat("\n")
	cat("Distribution of poll outcomes:")
	print(table(xx$resolution))
	cat("\n")
	cat("Time to get answer from broker:\n")
	print(summary(as.numeric(xx$answer_ts - xx$poll_ts)))
	cat("\n")
	cat("Time to get answer from broker (successful only):\n")
	print(summary(as.numeric(xx_connected$answer_ts - xx_connected$poll_ts)))
	cat("\n")
	cat("Total time to connection:\n")
	print(summary(as.numeric(xx_connected$resolved_ts - xx_connected$run_ts)))
}


More information about the anti-censorship-team mailing list