[metrics-team] OnionStats

Anathema anathema at anche.no
Tue May 31 00:17:40 UTC 2016


Hi everyone,
I've participated to the last metrics team meeting on IRC, and I've
"presented" a new tool. Since I'll not be able to attend the next
meeting, I'm going to present the full project here.

A little bit of background:
I was trying to find out some information related to Tor node stats like
"how many nodes from country X have been activated in the last Y months"
or "how many hosts with hostname X on platform Y there are" and so on.
I found Atlas really helpful in some aspects but not so good in others,
mainly I was not able to answer the above questions. 
Plus, there are some cons:
- it's slow
- it returns limited results
- its query language doesn't allow complex and combined queries

So I started writing a tool that fulfill my requirements. When it was
almost finished, I thought: "well this is cool, maybe the Tor community
can be interested in it". And here we are.

What's all this about? OnionStats
First, a note: the software described below can be integrated in Atlas.
I created one from scratch because was easier for me, but if we don't
want to use two services we can think about integrating mine into Atlas
or Atlas into mine.

So the name is OnionStats because, you know, Tor, Onionoo, onions. (it
was TorStats but then Karsten suggested a better name :)

The software stack is as follow:
- Semantic-ui + jQuery as a frontend
- Tornado as a backend
- Monogdb as a DBMS
- Elasticsearch as a search engine

Here is the link to a live instance: http://138.201.90.124:8080   (it's
a cheap VPS so it may be slow due to a lack of resources - please be
gentle and don't hammer it).

How things work:
Basically, there is a python script that runs in the background (cron)
every 12h that fetches the nodes information using Onioon protocol and
save the information into the mongodb schema.
mongodb-collector automatically pushes the data into Elasticsearch.
When you search through the web UI, the backend makes an Elasticsearch
query and returns the data back to the web UI which displays the data.

Easy, clean, fast.

Pro:
- it's fast. Really. 
- huge results cap: I've hardcoded a limit of 2000 results per query for
testing but it can be easily increased in production with better hardware.
- easy to audit: Atlas is made of AngularJS, which is great but for
someone who doens't know anything about it, it's a big learning curve. I
think that's a little bit overkill. My code is just plain jQuery and
DataTables. That's all I needed.
- complex queries: it can be possible to leverage almost all the
Elasticsearch syntax features. More information in the "Syntax" section

Cons:
- updates data every 12h. In the IRC meeting someone told me that I can
decrease the sleep time, so it may be possible to reduce to every 6h or
maybe 1h?

There are few HTML glitches so I apologize, I'm not a frontend coder and
I'll try to fix them ASAP.

I didn't push the code to my <github|bitbucket> repository but if you
want to take a look at the code I'm more then welcome to publish it,
just let me know which of the two do you prefer or if you prefer another
way of sharing (like a link to a tarball on the server).

Hope you like it and I'd be more than welcome to help integrate it into
Atlas or integrate some Atlas' feature into OnionStats (and maybe, find
a better name :)

Let me know what you think.

Thank you,
Regards


-- 
Anathema

+--------------------------------------------------------------------+
|GPG/PGP KeyID: CFF94F0A available on http://pgpkeys.mit.edu:11371/  |
|Fingerprint: 80CE EC23 2D16 143F 6B25  6776 1960 F6B4 CFF9 4F0A     |
|                                                                    |
|https://keybase.io/davbarbato                                       |
+--------------------------------------------------------------------+




More information about the metrics-team mailing list