[metrics-team] Tor Primer for Data Scientists

Karsten Loesing karsten at torproject.org
Fri Jan 8 09:23:14 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/01/16 17:24, Philipp Germann wrote:
> Hi there,

Hi Philipp,

> I skipped through the state of the onion talk [1], which is quite a
> call for action on metrics. Getting my hands dirty with the data
> turned out to be less easy than I was hoping, so I asked for hints
> in #tor-dev. Karsten had the presence of mind to ask for some notes
> on my short journey [2] and in the following I will summarize my
> thoughts :-)

Thanks for writing down your thoughts.  Let me try to answer your
questions below and in the next step improve the documentation to make
this easier for the next person trying to get up to speed on Tor metrics.

> metrics.torproject.org turned out very helpful, particularly the
> rather hidden about page that contains a glossary and FAQs. I think
> making the about page more prominent and extending it with a short,
> but technically complete introduction to tor would make it a great
> Tor Primer for Data Scientists.

I agree that we need to address data scientists better on that page.
We'll just have to keep in mind that 90% of visitors care about the
graphs, not about how they are made.  But I'm optimistic that we can
find a middle way that makes both audiences happy.  Adding "Make About
page more prominent" and "Add short introduction to Tor or link to
one" to the list of possible website improvements.

> I think this primer should also contain a description of what is
> measured

The first (frequently asked) questions on the About page attempt to
answer that.  I agree that the FAQ format is limited and that it might
need to be complemented by an overview of some kind.  Adding "Consider
adding an overview in addition to FAQ" to the list.

> and where that data can be found (preferably in documented csv
> files or similar -- the python/Go/Java libs look a bit scary to me
> ...)

Did you also look at the CollecTor service at
https://collector.torproject.org/ that documents raw file formats?

Regarding the request for .csv files, we're currently working on a
converter that turns Tor descriptors into .json files.  That might be
easier to use for you than the text-based descriptors, and you
wouldn't have to use a library for that.  The converter is not ready
yet, but we might hear news on that at the next team meeting.  It's
part of the Analytics Project (briefly mentioned at minute 25 in the
32C3 talk).

> and discuss how data was used to detect attacks (e.g. explain the
> source, the axis and the meaning of the graph at min. 24 in the
> talk [1]).

We have been discussing putting those graphs on Tor Metrics, which
would come with a description of all those things.  It's on my todo
list somewhere here to move that one forward.

> Ideally it would conclude with a discussion of current challenges
> and some references for further reading (e.g. research papers).

That one is tricky.  I think we're failing to keep such a list for
Tor, where we have a somewhat outdated list on
https://research.torproject.org/ideas.html, and I'm not very
optimistic that we'd do a much better job for metrics-related ideas.
The more realistic plan would be to offer to people that they come to
us with their preferences and skills, and we go find one or three
possible projects for them.

> The metrics themselves would be much more accessible if they
> contained links to the plotted data (and to the correspoding R/...
> files in the git repo maybe too).

Well, links to the plotted data are contained on the graph pages.
Look at the "Related metrics" section below graphs, in particular the
links starting with "Data:".  I'll put another task "Make link to .csv
files more prominent on graph pages" on the list.

As for adding links to the graphing code, I hadn't considered that.
I'll add that as another task "Add links to graphing code to graph pages".

> Let me finish with some questions I still have after going through 
> wikipedia and the Tor-project site as further inspiration:

Happy to answer these questions, but note that we're not authoring
Tor's Wikipedia page, so if your questions only arose from reading the
Wikipedia page, there's not much we can do about that (other than
telling you everything you'd want to know on *.torproject.org pages).

> - What is the difference between a node and a relay?

Depends on who you ask, but for me, a Tor node is any participant in
the Tor network which can be a Tor relay, a Tor bridge, or a Tor
client.  Other people will fight over Tor clients not being Tor nodes.

Adding another task "Add definition for node to frequently used terms".

> - How can the client know the full route while each relay only
> knows the previous and next?

Because the client selects relays (and possibly a bridge) for the
route, whereas the relay only receives an encrypted message from the
previous node, decrypts it, and sends another encrypted message to the
next node.  The relay doesn't have to know where the message
originally came from or where it will go in the end.

> - Is the route changed at a certain invertal or for every request 
> (website)?

Somewhat, yes, though routes, called circuits, can also be reused for
several requests.

> - How do you become an authority relay?

By becoming a well-known member of the Tor community and then deciding
together with the other operators of authority relays that you should
be running one.

> - How are bridge relays kept secret and found when needed?

Bridges register themselves at the bridge authority which distributes
their addresses in small batches of three addresses via a website or
email auto responder to users.  This is in contrast to relays which
are publicly listed.

I think we'll have to link to the main Tor website for those last few
questions.  Adding "Include link to relevant introductory materials
about Tor in general" to the list.

Here are the tasks I mentioned above:

 - Make About page more prominent.
 - Add short introduction to Tor or link to one.
 - Consider adding an overview in addition to FAQ.
 - Make link to .csv files more prominent on graph pages.
 - Add links to graphing code to graph pages.
 - Add definition for node to frequently used terms.
 - Include link to relevant introductory materials about Tor in general.

Let me know if I should rephrase something or add something else.

> Cheers, Philipp (aka qiv)

All the best,
Karsten


> 1. https://www.youtube.com/watch?v=EXEUE__ap08 2.
> http://pastebin.com/D1k5bTZf
> 
> 
> 
> _______________________________________________ metrics-team
> mailing list metrics-team at lists.torproject.org 
> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
> 

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWj4ACAAoJEJD5dJfVqbCrItQH/RZIdvBRy3FMuacK4HzobFTt
DfJRuA6/y0zM2x+psFa7OzCiBVhV+ws4/PSDH1bOAX6WZRne4selBSNslHGrx5Xg
CxKcD+U+ffs7jAC9dZfM32VX3aEV9tOSUIAdaUOTSMmQONOG0wpEEdnSAwwqKby3
5o1qRFKWbJkyQOrgU2XwxLAHjc/tgksFVxzT/4qYbb338eejGs6ZN7MelvxrlnP8
JlU6p4lHB/12xmVwFmqmUIlMmqYSW0Gh+16jy1WoFE68bH220i3wGgHxDUCmgJ8I
v8RHS8ftpfR912XwvPgAR+rPPHQvkTE7Wp8eEhC2EsncOUAmw+xob19oLYEmSDY=
=N509
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list