[metrics-team] Tor Primer for Data Scientists
phg at gmx.li
Thu Jan 7 16:24:42 UTC 2016
I skipped through the state of the onion talk , which is quite a call
for action on metrics. Getting my hands dirty with the data turned out to
be less easy than I was hoping, so I asked for hints in #tor-dev. Karsten
had the presence of mind to ask for some notes on my short journey  and
in the following I will summarize my thoughts :-)
metrics.torproject.org turned out very helpful, particularly the rather
hidden about page that contains a glossary and FAQs. I think making the
about page more prominent and extending it with a short, but technically
complete introduction to tor would make it a great Tor Primer for Data
I think this primer should also contain a description of what is measured
and where that data can be found (preferably in documented csv files or
similar -- the python/Go/Java libs look a bit scary to me ...) and discuss
how data was used to detect attacks (e.g. explain the source, the axis and
the meaning of the graph at min. 24 in the talk ). Ideally it would
conclude with a discussion of current challenges and some references for
further reading (e.g. research papers).
The metrics themselves would be much more accessible if they contained
links to the plotted data (and to the correspoding R/... files in the git
repo maybe too).
Let me finish with some questions I still have after going through
wikipedia and the Tor-project site as further inspiration:
- What is the difference between a node and a relay?
- How can the client know the full route while each relay only knows the
previous and next?
- Is the route changed at a certain invertal or for every request
- How do you become an authority relay?
- How are bridge relays kept secret and found when needed?
Philipp (aka qiv)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the metrics-team