New subject: Extending Pyonionoo to provide statistics

13 Oct 2012

      Hi Karsten. This is actually really good timing. As you probably
noticed from tor-commits@ I just pushed support for network status
documents (v2 and v3 votes, consensus, and microdescriptor
consensuses). It's the combination of a month of work from Ravi and a
couple more from me.

Stem's descriptor parsing functionality is one of its most well
developed features and I'm anxious for it to start to get some users.
The feature gap between stem and metrics-lib is now pretty small, so
I'd love for it to start to take over some of metrics-lib's
responsibilities (and hopefully in turn get direct involvement from
you and Sathyanarayanan so it'll better meet your needs).
...
We would start with daily users, both direct and bridge users, and later add
aggregate statistics about relays and bridges, and after that torperf
performance statistics.
Gotcha. As I understand it the document types that stem is still missing are...

* microdescriptors
* bridge pool assignments
* exit list entry
* torperf output

So what you'll need from me is primarily the ability to parse torperf
output? Or is there another document type that I'm missing?
...
I think we'd need Damian's help for the descriptor-parsing side
Happy to help, though I'm not entirely sure yet what kind of
additional parsing support you'll need.
...
Sathya, Damian, can you already say how much of 2013 you will be around
for doing Tor stuff and how much time you think you could spend on this
project?
For my part a fair bit. This is, of course, a hobby that I do outside
of a full time job so work and life might get in the way. That said,
any project that includes collaborating with other developers to make
use of stem goes to the top of my todo list.
...
I'm thinking one ticket for implementing
usage statistics in Pyonionoo, a second ticket for the censorship
detector integration which depends on the first ticket, and two more
tickets for network statistics and torperf statistics integration into
Pyonionoo.  I'd probably suggest the first two tickets only for trying
to get funding from sponsor F year 3 and save the other two tickets for
later.
Hmm, aren't those last two tickets the only ones where I'd be
involved? It sounds like this project mostly concerns pyonionoo so
guess we should wait for input from Sathyanarayanan...

Cheers! -Damian

PS. Sathyanarayanan: I was gonna ping you separately but might as well
hijack this thread - would you mind giving stem's new networkstatus
module a try in pyonionoo? You can find documentation for it at...

https://stem.readthedocs.org/en/latest/stem.descriptor.html#module-stem.desc...

... let me know if you run into any issues!

PPS. Grrr, their pydocs have some rendering bugs... >:(

On Fri, Oct 12, 2012 at 9:57 AM, Karsten Loesing <karsten@torproject.org> wrote:
...
Hi Sathya, hi Damian,
I have been working on improving bridge usage statistics in the past
weeks, and I discussed my results with George in the context of
improving the censorship detector.  We concluded that we need to write a
fair amount of new code and that it would be nice to integrate this new
code into Pyonionoo.
Let me explain in more detail: I'm not yet done with the bridge usage
analysis, but early results are that we can calculate daily bridge users
similar to how we calculate daily direct users.  We need a more precise
implementation than what we have in metrics-web though.  So, we either
need to redesign the complex beast called metrics-web, or start from
scratch, ideally in a programming language that attracts more potential
contributors.
The censorship detector takes the output of metrics-web's daily direct
usage numbers and tries to detect sudden drops or increases in usage.
It would be good to also look at bridge usage for censorship events, and
it would be quite important to have a tighter integration of the
censorship detector into the usage-calculating process and look at raw
numbers there.  Right now, the vast majority of events that the
censorship detector reports are just false positives.
We first thought about implementing this in a new Python project that
has several modules: the first module would calculate daily users, the
second would determine possible censorship events, the third would
export data via JSON, the fourth would graph results on a website, and
the fifth would send out email notifications.
However, that codebase would share a lot of functionality with
Onionoo/Pyonionoo.  We could instead extend Pyonionoo and Atlas/Compass
to not only provide network status data, but also statistics about the
Tor network and its usage.  We would start with daily users, both direct
and bridge users, and later add aggregate statistics about relays and
bridges, and after that torperf performance statistics.  Once we're
there we can retire a large part of metrics-web.
This project is huge, which is why I'm thinking we should apply for
funding.  I expect the project to take at least 6 months of developer
time, in addition to the time to make Pyonionoo a full replacement of
Onionoo.  Before we ask for money, we should have a very rough idea
who's going to work on this project.  Once we have the money, we'll
actually have to do it, and I can't do this all by myself.
I think we'd need Damian's help for the descriptor-parsing side and
general help with Python and Sathya to make Pyonionoo modular enough to
provide all these different features.  I'd either write or help write
the user-counting code, and I think George would help with integrating
and improving the existing censorship-detecting code.
Sathya, Damian, can you already say how much of 2013 you will be around
for doing Tor stuff and how much time you think you could spend on this
project?  If the answer is "maybe not very much", by all means, please
say so.
If there's at least some interest in this project, I'd create Trac
tickets in the next few days.  I'm thinking one ticket for implementing
usage statistics in Pyonionoo, a second ticket for the censorship
detector integration which depends on the first ticket, and two more
tickets for network statistics and torperf statistics integration into
Pyonionoo.  I'd probably suggest the first two tickets only for trying
to get funding from sponsor F year 3 and save the other two tickets for
later.
Thanks,
Karsten

Re: [tor-dev] Extending Pyonionoo to provide statistics

Damian Johnson

Sathyanarayanan Gunasekaran

Karsten Loesing

tags

participants (3)