[metrics-team] How to interpret written/read bytes per second in Relay Search?

Karsten Loesing karsten at torproject.org
Wed Mar 25 08:42:48 UTC 2020


Hi David!

On 2020-03-24 22:39, David Fifield wrote:
> I'm looking at the graphs for the Snowflake bridge,
> https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB6915AB06BFB7F
> 
> For the past year and more, the "written bytes" and "read bytes" have
> matched almost identically. On 2020-02-19, they start to diverge. That
> date coincides with the limited release of a version of Snowflake that
> is able to run for a longer time, and a rebuild and restart of the
> bridge (https://bugs.torproject.org/33336#comment:8). In the past few
> days, the "read bytes" has grown to be about 10 times the "written
> bytes". I'm trying to interpret this information.
> 
> 1. Where does the data in the graph come from? I didn't find it covered
>    at https://metrics.torproject.org/reproducible-metrics.html. I looked
>    at the most recent bridge-extra-info descriptor:
> 	write-history 2020-03-24 14:50:48 (86400 s) 2308002816,3215030272,2062971904,3323116544,4469634048
> 	read-history 2020-03-24 14:50:48 (86400 s) 5265647616,8424690688,5873989632,38813471744,7116800000
> 	dirreq-write-history 2020-03-24 14:50:48 (86400 s) 95039488,88645632,70933504,96854016,57280512
> 	dirreq-read-history 2020-03-24 14:50:48 (86400 s) 2764800,1468416,1036288,2764800,2027520
>    However if I just divide the write-history and read-history numbers
>    by 86400, the numbers I get don't match the graph.
>    			write		read
>    	2020-03-20	26.7 kB/s	 60.9 kB/s
>    	2020-03-21	37.2 kB/s	 97.5 kB/s
>    	2020-03-22	23.9 kB/s	 68.0 kB/s
>    	2020-03-23	38.5 kB/s	449.2 kB/s
>    	2020-03-24	51.7 kB/s	 82.4 kB/s

You're right that this data is not described on the Reproducible Metrics
page. That page only explains where the data in the main Tor Metrics
website graphs comes from.

The data in Relay Search comes from Onionoo which has its protocol
specification here, which is not as detailed as the Reproducible Metrics
page though:

https://metrics.torproject.org/onionoo.html#bandwidth

Looking at the read-history line above, we can translate that to:

5265647616 B from 2020-03-19 14:50:48 to 2020-03-20 14:50:48
8424690688 B from 2020-03-20 14:50:48 to 2020-03-21 14:50:48
5873989632 B from 2020-03-21 14:50:48 to 2020-03-22 14:50:48
38813471744 B from 2020-03-22 14:50:48 to 2020-03-23 14:50:48
7116800000 B from 2020-03-23 14:50:48 to 2020-03-24 14:50:48

In the next step we cut these intervals at midnight UTC, under the
assumption that bandwidth usage was linear over time in all these
intervals, and obtain:

2008259494 B from 2020-03-19 14:50:48 to 2020-03-19 24:00:00
3257388122 B from 2020-03-20 00:00:00 to 2020-03-20 14:50:48
3213083421 B from 2020-03-20 14:50:48 to 2020-03-20 24:00:00
5211607267 B from 2020-03-21 00:00:00 to 2020-03-21 14:50:48
2240274379 B from 2020-03-21 14:50:48 to 2020-03-21 24:00:00
3633715253 B from 2020-03-22 00:00:00 to 2020-03-22 14:50:48
14803026862 B from 2020-03-22 14:50:48 to 2020-03-22 24:00:00
24010444882 B from 2020-03-23 00:00:00 to 2020-03-23 14:50:48
2714268444 B from 2020-03-23 14:50:48 to 2020-03-23 24:00:00
4402531556 B from 2020-03-24 00:00:00 to 2020-03-24 14:50:48

Summing up B by UTC date we get:

2008259494 B on 2020-03-19 (only 14:50:48 to 24:00:00)
6470471543 B on 2020-03-20
7451881646 B on 2020-03-21
18436742115 B on 2020-03-22
26724713326 B on 2020-03-23
4402531556 on 2020-03-24 (only 00:00:00 to 14:50:48)

Converted to kB/s with k = 1000:

60.9 kB/s on 2020-03-19 (only 14:50:48 to 24:00:00)
74.9 kB/s on 2020-03-20
86.2 kB/s on 2020-03-21
213.4 kB/s on 2020-03-22
309.3 kB/s on 2020-03-23
82.4 kB/s on 2020-03-24 (only 00:00:00 to 14:50:48)

Note that Relay Search shows a different number for 2020-03-19, because
it also considers an earlier read-history line. It also shows slightly
different numbers for 2020-03-20 to 2020-03-23, because Onionoo
normalizes numbers to integer values 0 to 999 to save space in its data
formats. I'm not 100% certain why 2020-03-24 is not displayed yet, but
it's probably due to data being too recent, even though I don't find
this in the code.

> 2. When I look at the graphs of some default bridges, I see the written
>    and read number being almost equal always.
>    https://metrics.torproject.org/rs.html#details/5F161D2E5713C93F16FEEDD63178E37208AA78DF
>    https://metrics.torproject.org/rs.html#details/8F4541EEE3F2306B7B9FEF1795EC302F6B84DAE8
>    When I look at moria1, a directory authority, I see written being
>    much greater than read.
>    https://metrics.torproject.org/rs.html#details/9695DFC35FFEB861329B9F1AB04C46397020CE31
>    What accounts for the equality in some cases and the inequality in
>    others? What could explain the divergence in the case of the
>    Snowflake bridge?

The inequality in case of directory authorities is very likely due to
directory requests. Requesting a consensus takes just a few dozen bytes,
but responding with a consensus takes about 2.4 MiB or something like
0.5 MiB when compressed.

I can only speculate about the Snowflake bridge. When looking at the 5
years graph on Relay Search it seems like the increase in read bytes is
not that unusual. It's the divergence from written bytes that hasn't
happened for a while. But if you look at late 2017 there has been a time
when read bytes outnumbered written bytes.

> 3. Roger found a case where traffic tagged with a 0.0.0.0/8 address was
>    being ignored by some part of tor's internal bandwidth accounting
>    (https://bugs.torproject.org/33693). Until recently, the Snowflake
>    bridge had a bug where, for certain clients, it reported a client
>    address of 0.0.0.0 to the tor bridge's ExtORPort (https://bugs.torproject.org/33157).
>    The bug is only partially fixed--we now report no address at all for
>    the affected clients. The fix was not deployed until 2020-02-22, so
>    it doesn't explain the divergence of read/written on its own. Do you
>    know offhand whether an apparent client address of 0.0.0.0, or no
>    address at all, would cause problems with measuring usage?

I'm afraid I don't know. I wonder if teor knows more about this, as he
spent some time on bandwidth statistics for IPv6 traffic statistics
recently.

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20200325/cad9ac05/attachment-0001.sig>


More information about the metrics-team mailing list