-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
[Cc'ing tor-dev@, because why not.]
On 11/03/15 19:13, Karsten Loesing wrote:
Please let me know if I can help *reduce* confusion somehow. :)
Looking forward, hidden-service statistics are now available on Metrics:
https://metrics.torproject.org/hidserv-data.html
I also started making some very quick graphs here:
https://people.torproject.org/~karsten/volatile/hidserv-stats-2015-03-11.pdf
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics.
By the way, I decided against using onion service terminology, because I wasn't sure when we were planning to switch. I'm not sure if Metrics should be one of the first Tor websites to switch, or whether people will just wonder what crazy Tor-unrelated stuff Metrics has statistics for. I don't feel strongly though. Thoughts?
Thanks!
All the best, Karsten
Karsten Loesing karsten@torproject.org writes:
[Cc'ing tor-dev@, because why not.]
On 11/03/15 19:13, Karsten Loesing wrote:
Please let me know if I can help *reduce* confusion somehow. :)
Looking forward, hidden-service statistics are now available on Metrics:
Very nice!
I also started making some very quick graphs here:
https://people.torproject.org/~karsten/volatile/hidserv-stats-2015-03-11.pdf
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted
interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile
mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing
both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics.
I see what you are saying here.
At the same time, there are some technical graphs that I'd like to monitor over time and I bet other researchers would also enjoy. Unfortunately, those graphs are not particularly interesting to common people or press. Still, having them on a website and getting them updated on a daily basis would really help to monitor them in the long-term.
This might be a bit off-topic but here are some examples of such graphs:
a) Boxplot graphs with probabilities for guards/IPs/HSDirs etc. I'd like this to monitor the outliers over time. I think these graphs would reveal some interesting big probabilities and maybe reveal attacks or find bugs. I think an old version of the extrapolation tech report used to have boxplots like this for RPs and HSDirs.
b) Related to the above, I'd like to see boxplot graphs with reported bandwidth by relays. I have heard that adversarial relays sending fake high reported bandwidth is still a good way to get good probabilities during path selection.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
By the way, I decided against using onion service terminology, because I wasn't sure when we were planning to switch. I'm not sure if Metrics should be one of the first Tor websites to switch, or whether people will just wonder what crazy Tor-unrelated stuff Metrics has statistics for. I don't feel strongly though. Thoughts?
Thanks!
All the best, Karsten
On Thu, Mar 12, 2015 at 06:01:13PM +0000, George Kadianakis wrote:
Karsten Loesing karsten@torproject.org writes:
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using weighted
interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted interquartile
mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics (containing
both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Sounds great. I'm really excited to see these graphs up on the metrics page.
That said, for the "total hidden-service traffic" one... we want to know that, but we also want to know what fraction that is of total traffic, yes? I could imagine a graph with total hidden-service traffic and also with total traffic; but then the smaller curve will be around y=0 and not easy to see. What would you all think about a graph that is estimated fraction of total traffic that is hidden-service traffic, instead of graph #1 above?
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Oh. Yes, this is exactly the same question. Hm. I think the "number of hidden-service related bytes" is going to go up over time, and make it really easy for people to mis-conclude "hidden-service related bytes are getting to be more of Tor's traffic" which is not what that means.
Which assumptions from the blog post do you think are going to become less right in the future? Because I'd much rather have the graph that tells us the answer to the research question.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We could also imagine a cron job somewhere that generates the graphs somewhere (e.g. people.tp.o), and an "advanced" link from the hs metrics page to those graphs. To make it clearer that it's informal and not something we'll necessarily include forever.
--Roger
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/03/15 21:26, Roger Dingledine wrote:
On Thu, Mar 12, 2015 at 06:01:13PM +0000, George Kadianakis wrote:
Karsten Loesing karsten@torproject.org writes:
The question is, what graphs do we want on Metrics? How about:
- Total hidden-service traffic in Mbit/s (per day, using
weighted interquartile mean, like lower graph on page 1 of the PDF)
- Unique .onion addresses (per day, using weighted
interquartile mean, like upper graph on page 1 of the PDF)
- Fraction of relays reporting hidden-service statistics
(containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF)
I think these are indeed the essential graphs here. Let's proceed with these for now!
Sounds great. I'm really excited to see these graphs up on the metrics page.
That said, for the "total hidden-service traffic" one... we want to know that, but we also want to know what fraction that is of total traffic, yes? I could imagine a graph with total hidden-service traffic and also with total traffic; but then the smaller curve will be around y=0 and not easy to see. What would you all think about a graph that is estimated fraction of total traffic that is hidden-service traffic, instead of graph #1 above?
Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise.
Oh. Yes, this is exactly the same question. Hm. I think the "number of hidden-service related bytes" is going to go up over time, and make it really easy for people to mis-conclude "hidden-service related bytes are getting to be more of Tor's traffic" which is not what that means.
Which assumptions from the blog post do you think are going to become less right in the future? Because I'd much rather have the graph that tells us the answer to the research question.
The main assumption was that exits only handle exit traffic and that non-exits (relays without the Exit flag) don't handle exit traffic at all.
Basically, if we want to make this graph available, we'll first have to come up with a reliable metric for traffic exiting the network.
I left out the absolute-hidden-service-traffic graph for now, but it's not hard to add it later.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We could also imagine a cron job somewhere that generates the graphs somewhere (e.g. people.tp.o), and an "advanced" link from the hs metrics page to those graphs. To make it clearer that it's informal and not something we'll necessarily include forever.
Let's first come up with graphs without necessarily automating them, and then let's discuss where they fit in best.
All the best, Karsten
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/03/15 19:01, George Kadianakis wrote:
Karsten Loesing karsten@torproject.org writes:
Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics.
I see what you are saying here.
At the same time, there are some technical graphs that I'd like to monitor over time and I bet other researchers would also enjoy. Unfortunately, those graphs are not particularly interesting to common people or press. Still, having them on a website and getting them updated on a daily basis would really help to monitor them in the long-term.
This might be a bit off-topic but here are some examples of such graphs:
a) Boxplot graphs with probabilities for guards/IPs/HSDirs etc. I'd like this to monitor the outliers over time. I think these graphs would reveal some interesting big probabilities and maybe reveal attacks or find bugs. I think an old version of the extrapolation tech report used to have boxplots like this for RPs and HSDirs.
b) Related to the above, I'd like to see boxplot graphs with reported bandwidth by relays. I have heard that adversarial relays sending fake high reported bandwidth is still a good way to get good probabilities during path selection.
I think a new tab on metrics called "Advanced" with such research graphs would be helpful. Maybe.
We briefly talked about this on IRC. There are already quite similar graphs that you want in b):
https://metrics.torproject.org/advbwdist-perc.html
https://metrics.torproject.org/advbwdist-relay.html
The graph you describe in a) is more difficult. I haven't made up my mind entirely here, but I think this graph doesn't fit on Metrics, but rather a site or service dedicated to monitoring the network for attacks or bugs. Philipp's Sybil attack detector comes to mind.
But more generally, when we think about adding graphs to Metrics and automating them, we should start with manually-created graphs first. If we like them enough that we want to automate them, putting them on Metrics is an option.
All the best, Karsten