Re: [tor-dev] Progress on hidserv-stats Metrics integration, request for code review

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Cc'ing tor-dev@, because why not.] On 11/03/15 19:13, Karsten Loesing wrote:
Please let me know if I can help *reduce* confusion somehow. :)
Looking forward, hidden-service statistics are now available on Metrics: https://metrics.torproject.org/hidserv-data.html I also started making some very quick graphs here: https://people.torproject.org/~karsten/volatile/hidserv-stats-2015-03-11.pdf The question is, what graphs do we want on Metrics? How about: - Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF) - Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF) - Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF) Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise. Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics. By the way, I decided against using onion service terminology, because I wasn't sure when we were planning to switch. I'm not sure if Metrics should be one of the first Tor websites to switch, or whether people will just wonder what crazy Tor-unrelated stuff Metrics has statistics for. I don't feel strongly though. Thoughts? Thanks! All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVAbKrAAoJEJD5dJfVqbCrRpcH/269XxlatdhSjiqlrIVxmfjU Yz9UnnrBToYJQ1As1o7KUG7NiW+vpq/qmsdNNxjogyEUr4EOPQVd6TDb/4+xjcDM HbiZRfrEu51KSDPiOYqZwFWcOoSOMtf34PiTyu+eo+xWsZ8fd+FCrnk5Qk9rDP7S RYKtHSV9RWY8G3RmDJHqOJNwbF76vxKVHVfQ2qY9ufHe3emS6eAkFzlg8KqRFrkv i1zhyXPNWUauW6mKfUWa/nCS7fae46xzx6J3oertvbKdBKtcNmyl1PqYgrCDTIUX pc4N68xyCJN+FNji/uI6mWCcW2FE059uGYDNpOMzJGeSovU0naPTrpmROtR7Mts= =P8oo -----END PGP SIGNATURE-----

Karsten Loesing <karsten@torproject.org> writes:
[Cc'ing tor-dev@, because why not.]
On 11/03/15 19:13, Karsten Loesing wrote:
Please let me know if I can help *reduce* confusion somehow. :)
Looking forward, hidden-service statistics are now available on Metrics:
Very nice!
I also started making some very quick graphs here:
https://people.torproject.org/~karsten/volatile/hidserv-stats-2015-03-11.pdf
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics.
I see what you are saying here. At the same time, there are some technical graphs that I'd like to monitor over time and I bet other researchers would also enjoy. Unfortunately, those graphs are not particularly interesting to common people or press. Still, having them on a website and getting them updated on a daily basis would really help to monitor them in the long-term. This might be a bit off-topic but here are some examples of such graphs: a) Boxplot graphs with probabilities for guards/IPs/HSDirs etc. I'd like this to monitor the outliers over time. I think these graphs would reveal some interesting big probabilities and maybe reveal attacks or find bugs. I think an old version of the extrapolation tech report used to have boxplots like this for RPs and HSDirs. b) Related to the above, I'd like to see boxplot graphs with reported bandwidth by relays. I have heard that adversarial relays sending fake high reported bandwidth is still a good way to get good probabilities during path selection. I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
By the way, I decided against using onion service terminology, because I wasn't sure when we were planning to switch. I'm not sure if Metrics should be one of the first Tor websites to switch, or whether people will just wonder what crazy Tor-unrelated stuff Metrics has statistics for. I don't feel strongly though. Thoughts?
Thanks!
All the best, Karsten

On Thu, Mar 12, 2015 at 06:01:13PM +0000, George Kadianakis wrote:
Karsten Loesing <karsten@torproject.org> writes:
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Sounds great. I'm really excited to see these graphs up on the metrics page. That said, for the "total hidden-service traffic" one... we want to know that, but we also want to know what fraction that is of total traffic, yes? I could imagine a graph with total hidden-service traffic and also with total traffic; but then the smaller curve will be around y=0 and not easy to see. What would you all think about a graph that is estimated fraction of total traffic that is hidden-service traffic, instead of graph #1 above?
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Oh. Yes, this is exactly the same question. Hm. I think the "number of hidden-service related bytes" is going to go up over time, and make it really easy for people to mis-conclude "hidden-service related bytes are getting to be more of Tor's traffic" which is not what that means. Which assumptions from the blog post do you think are going to become less right in the future? Because I'd much rather have the graph that tells us the answer to the research question.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We could also imagine a cron job somewhere that generates the graphs somewhere (e.g. people.tp.o), and an "advanced" link from the hs metrics page to those graphs. To make it clearer that it's informal and not something we'll necessarily include forever. --Roger

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/03/15 21:26, Roger Dingledine wrote:
On Thu, Mar 12, 2015 at 06:01:13PM +0000, George Kadianakis wrote:
Karsten Loesing <karsten@torproject.org> writes:
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Sounds great. I'm really excited to see these graphs up on the metrics page.
That said, for the "total hidden-service traffic" one... we want to know that, but we also want to know what fraction that is of total traffic, yes? I could imagine a graph with total hidden-service traffic and also with total traffic; but then the smaller curve will be around y=0 and not easy to see. What would you all think about a graph that is estimated fraction of total traffic that is hidden-service traffic, instead of graph #1 above?
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Oh. Yes, this is exactly the same question. Hm. I think the "number of hidden-service related bytes" is going to go up over time, and make it really easy for people to mis-conclude "hidden-service related bytes are getting to be more of Tor's traffic" which is not what that means.
Which assumptions from the blog post do you think are going to become less right in the future? Because I'd much rather have the graph that tells us the answer to the research question.
The main assumption was that exits only handle exit traffic and that non-exits (relays without the Exit flag) don't handle exit traffic at all. Basically, if we want to make this graph available, we'll first have to come up with a reliable metric for traffic exiting the network. I left out the absolute-hidden-service-traffic graph for now, but it's not hard to add it later.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We could also imagine a cron job somewhere that generates the graphs somewhere (e.g. people.tp.o), and an "advanced" link from the hs metrics page to those graphs. To make it clearer that it's informal and not something we'll necessarily include forever.
Let's first come up with graphs without necessarily automating them, and then let's discuss where they fit in best. All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVAwfHAAoJEJD5dJfVqbCrkwMH/jzjApFnDwEZTjiu1V2F2gt8 hDL93Az1xH5wetcJ4M3j9IcEUNhKi+gSSBVXFAPc2xUU8WNlxzZsyMMHWUOWiOUX w3lD3U2PxmTPOeooU0x18AC0EuDggrs60NOnuDhoVit6iBpH75j+y+2+wHZZWfHy mocnUH30eT3ZnSr4S/Ya3bF3adQGemCXVUy84MV9hV8+ybo2olGpZHFUpalqUMcw HIoFYcaAF+GTFNrvqxpxpBgjussf4RhHNm1uvZ5HIIqKQK/ddgRzYSyLAD4U19C2 oPEGEGJUlvKnruFrE7KPUKW0MjsGBWQgIzvkmYMh3oBYAEboSFgMEkSkduEtPxI= =COZv -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/03/15 19:01, George Kadianakis wrote:
Karsten Loesing <karsten@torproject.org> writes:
Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics.
I see what you are saying here.
At the same time, there are some technical graphs that I'd like to monitor over time and I bet other researchers would also enjoy. Unfortunately, those graphs are not particularly interesting to common people or press. Still, having them on a website and getting them updated on a daily basis would really help to monitor them in the long-term.
This might be a bit off-topic but here are some examples of such graphs:
a) Boxplot graphs with probabilities for guards/IPs/HSDirs etc. I'd like this to monitor the outliers over time. I think these graphs would reveal some interesting big probabilities and maybe reveal attacks or find bugs. I think an old version of the extrapolation tech report used to have boxplots like this for RPs and HSDirs.
b) Related to the above, I'd like to see boxplot graphs with reported bandwidth by relays. I have heard that adversarial relays sending fake high reported bandwidth is still a good way to get good probabilities during path selection.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We briefly talked about this on IRC. There are already quite similar graphs that you want in b): https://metrics.torproject.org/advbwdist-perc.html https://metrics.torproject.org/advbwdist-relay.html The graph you describe in a) is more difficult. I haven't made up my mind entirely here, but I think this graph doesn't fit on Metrics, but rather a site or service dedicated to monitoring the network for attacks or bugs. Philipp's Sybil attack detector comes to mind. But more generally, when we think about adding graphs to Metrics and automating them, we should start with manually-created graphs first. If we like them enough that we want to automate them, putting them on Metrics is an option. All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVAwYJAAoJEJD5dJfVqbCrojgH/R0/2M19qn8s7xP/hCgWDm/S +TFrVbKDUQNuCL6lwjEI0OEw/gAGxysxnm5fULFLgKnn0WvFPyc66a/D1bkS6HQ5 9orAzgLp5bNzQRYRuden/nG1OHrW+M+cZRTz1kjxG9/1Yd27ERbsIJgVezak0mYx 7+0uIddTs/AXTq9QHRwlnQZdCYcye/kOkJLuMA/EfLKy1ZjSlCgR7Xhf+p+4e15i FxnhGrFdINIoTdcbvakXQ16Hi3ZuhCp5jYzV2RxRYa1CVogUe3SALhXyQaueZh4n hPD6A8v78iHQB7r12zVDSOcLKllkFABlIphUfLJBWd5MTe0tkL8GDYsRdjnt/9o= =mV9S -----END PGP SIGNATURE-----
participants (3)
-
George Kadianakis
-
Karsten Loesing
-
Roger Dingledine