[tor-dev] Extending Pyonionoo to provide statistics

Damian Johnson atagar at torproject.org
Sat Oct 13 20:30:08 UTC 2012


Hi Karsten. This is actually really good timing. As you probably
noticed from tor-commits@ I just pushed support for network status
documents (v2 and v3 votes, consensus, and microdescriptor
consensuses). It's the combination of a month of work from Ravi and a
couple more from me.

Stem's descriptor parsing functionality is one of its most well
developed features and I'm anxious for it to start to get some users.
The feature gap between stem and metrics-lib is now pretty small, so
I'd love for it to start to take over some of metrics-lib's
responsibilities (and hopefully in turn get direct involvement from
you and Sathyanarayanan so it'll better meet your needs).

> We would start with daily users, both direct and bridge users, and later add
> aggregate statistics about relays and bridges, and after that torperf
> performance statistics.

Gotcha. As I understand it the document types that stem is still missing are...

* microdescriptors
* bridge pool assignments
* exit list entry
* torperf output

So what you'll need from me is primarily the ability to parse torperf
output? Or is there another document type that I'm missing?

> I think we'd need Damian's help for the descriptor-parsing side

Happy to help, though I'm not entirely sure yet what kind of
additional parsing support you'll need.

> Sathya, Damian, can you already say how much of 2013 you will be around
> for doing Tor stuff and how much time you think you could spend on this
> project?

For my part a fair bit. This is, of course, a hobby that I do outside
of a full time job so work and life might get in the way. That said,
any project that includes collaborating with other developers to make
use of stem goes to the top of my todo list.

> I'm thinking one ticket for implementing
> usage statistics in Pyonionoo, a second ticket for the censorship
> detector integration which depends on the first ticket, and two more
> tickets for network statistics and torperf statistics integration into
> Pyonionoo.  I'd probably suggest the first two tickets only for trying
> to get funding from sponsor F year 3 and save the other two tickets for
> later.

Hmm, aren't those last two tickets the only ones where I'd be
involved? It sounds like this project mostly concerns pyonionoo so
guess we should wait for input from Sathyanarayanan...

Cheers! -Damian

PS. Sathyanarayanan: I was gonna ping you separately but might as well
hijack this thread - would you mind giving stem's new networkstatus
module a try in pyonionoo? You can find documentation for it at...

https://stem.readthedocs.org/en/latest/stem.descriptor.html#module-stem.descriptor.networkstatus

... let me know if you run into any issues!

PPS. Grrr, their pydocs have some rendering bugs... >:(

On Fri, Oct 12, 2012 at 9:57 AM, Karsten Loesing <karsten at torproject.org> wrote:
> Hi Sathya, hi Damian,
>
> I have been working on improving bridge usage statistics in the past
> weeks, and I discussed my results with George in the context of
> improving the censorship detector.  We concluded that we need to write a
> fair amount of new code and that it would be nice to integrate this new
> code into Pyonionoo.
>
> Let me explain in more detail: I'm not yet done with the bridge usage
> analysis, but early results are that we can calculate daily bridge users
> similar to how we calculate daily direct users.  We need a more precise
> implementation than what we have in metrics-web though.  So, we either
> need to redesign the complex beast called metrics-web, or start from
> scratch, ideally in a programming language that attracts more potential
> contributors.
>
> The censorship detector takes the output of metrics-web's daily direct
> usage numbers and tries to detect sudden drops or increases in usage.
> It would be good to also look at bridge usage for censorship events, and
> it would be quite important to have a tighter integration of the
> censorship detector into the usage-calculating process and look at raw
> numbers there.  Right now, the vast majority of events that the
> censorship detector reports are just false positives.
>
> We first thought about implementing this in a new Python project that
> has several modules: the first module would calculate daily users, the
> second would determine possible censorship events, the third would
> export data via JSON, the fourth would graph results on a website, and
> the fifth would send out email notifications.
>
> However, that codebase would share a lot of functionality with
> Onionoo/Pyonionoo.  We could instead extend Pyonionoo and Atlas/Compass
> to not only provide network status data, but also statistics about the
> Tor network and its usage.  We would start with daily users, both direct
> and bridge users, and later add aggregate statistics about relays and
> bridges, and after that torperf performance statistics.  Once we're
> there we can retire a large part of metrics-web.
>
> This project is huge, which is why I'm thinking we should apply for
> funding.  I expect the project to take at least 6 months of developer
> time, in addition to the time to make Pyonionoo a full replacement of
> Onionoo.  Before we ask for money, we should have a very rough idea
> who's going to work on this project.  Once we have the money, we'll
> actually have to do it, and I can't do this all by myself.
>
> I think we'd need Damian's help for the descriptor-parsing side and
> general help with Python and Sathya to make Pyonionoo modular enough to
> provide all these different features.  I'd either write or help write
> the user-counting code, and I think George would help with integrating
> and improving the existing censorship-detecting code.
>
> Sathya, Damian, can you already say how much of 2013 you will be around
> for doing Tor stuff and how much time you think you could spend on this
> project?  If the answer is "maybe not very much", by all means, please
> say so.
>
> If there's at least some interest in this project, I'd create Trac
> tickets in the next few days.  I'm thinking one ticket for implementing
> usage statistics in Pyonionoo, a second ticket for the censorship
> detector integration which depends on the first ticket, and two more
> tickets for network statistics and torperf statistics integration into
> Pyonionoo.  I'd probably suggest the first two tickets only for trying
> to get funding from sponsor F year 3 and save the other two tickets for
> later.
>
> Thanks,
> Karsten


More information about the tor-dev mailing list