[metrics-team] Are bandwidth charts double counting?

Karsten Loesing karsten at torproject.org
Tue Oct 17 08:17:55 UTC 2017


Hi Tom,

On 2017-10-16 22:33, Tom Ritter wrote:
> I was looking at https://metrics.torproject.org/bandwidth.html and
> https://metrics.torproject.org/bandwidth-flags.html and was confused a
> little by the graph.  Is it double (or triple) counting?
> 
> The definitions page says "bandwidth history: the volume of incoming
> and/or outgoing traffic that a relay claims to have handled on behalf
> of clients."
> 
> It's the "and/or" that throws me.
> 
> If it's 'and' then the Exit bandwidth history is double counting:
> divide by two to get the bandwidth that exits the tor network.

I see the confusion there.

The purpose of the glossary is to define what a term means in the
context of Metrics. But it does that on a very high level that most
users understand. It's not supposed to be sufficiently precise to
understand the computations behind a given graph.

Stated differently, I think it's okay that "bandwidth history" can mean
either incoming traffic only or outgoing traffic only or even both. It's
the fact that it's a number that is reported by relays that is
important, at least to me. I can be convinced otherwise though.

But I also understand how you still want to know where the numbers in a
graph come from. The good news is that that's exactly what we just
received funding for!

https://trac.torproject.org/projects/tor/wiki/org/sponsors/Sponsor13

In particular, there's:

Activity 2.3: Write specification for assessing how much traffic the Tor
network can handle and how much traffic there is.

Once such a specification exists, it will tell you that the "Bandwidth
history" line in https://metrics.torproject.org/bandwidth.html is the
sum of incoming and outgoing traffic, divided by two.

But why did you ask about triple counting? How would we accidentally
triple count something here?

Feel free to ask similar questions, and I'll make sure that the various
specification documents will answer them.

> The other question I had, that I don't think we are able to calculate,
> is "How many connections does the Tor Network produce".  Obviously it
> handwaves over 'connection', but for the browser scenario I'd say
> 'connections to first party domains'.
> 
> exit_streams_opened might be the best measurement to accomplish
> something very similar though right? It'd be unique connections to
> third party domains (for ports 443 and 80) instead of first party
> domains, but that's pretty close.
> 
> How would I go about calculating it? Is it as simple as summing this
> field across all the extra info descriptors for a given time period?

Fine question. We don't have such statistics yet. But let's see what we
could do with existing data.

First, I'm not sure what exactly you mean by first party domains and
third party domains.

Regarding your idea to use "exit-streams-opened", yes, that might tell
us something. For example, I found this line in an extra-info descriptor:

exit-streams-opened
80=1761692,182=104,443=1343240,4070=1080,5000=540,5002=36,8999=2004,9696=12,51000=28,51413=2952,other=320388

I think if I were to calculate a network total here, I wouldn't try to
sum up all such lines for a given time frame. There are only few relays
reporting these numbers, so that number would likely be much too low.

A better approach might be to extrapolate this line to a network total
for that given day by computing the probability of picking this exit out
of all others. Say, if the exit probability is 0.5% (random guess),
there would be (1,761,692 + 1,343,240) / 0.5%  opened streams in the
network to ports 80 and 443 on that day. I didn't look up the 0.5%, so
I'll not write the result here, but you see what I mean.

That approach would produce a few dozen extrapolated network totals per
day, depending on how many usable reports we have. The median of those
extrapolations could then be a first approximation of the number you're
looking for.

Hope that helps!

> -tom

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20171017/98d4a799/attachment-0001.sig>


More information about the metrics-team mailing list