Hi Karsten. This is actually really good timing. As you probably noticed from tor-commits@ I just pushed support for network status documents (v2 and v3 votes, consensus, and microdescriptor consensuses). It's the combination of a month of work from Ravi and a couple more from me.
Stem's descriptor parsing functionality is one of its most well developed features and I'm anxious for it to start to get some users. The feature gap between stem and metrics-lib is now pretty small, so I'd love for it to start to take over some of metrics-lib's responsibilities (and hopefully in turn get direct involvement from you and Sathyanarayanan so it'll better meet your needs).
We would start with daily users, both direct and bridge users, and later add aggregate statistics about relays and bridges, and after that torperf performance statistics.
Gotcha. As I understand it the document types that stem is still missing are...
* microdescriptors * bridge pool assignments * exit list entry * torperf output
So what you'll need from me is primarily the ability to parse torperf output? Or is there another document type that I'm missing?
I think we'd need Damian's help for the descriptor-parsing side
Happy to help, though I'm not entirely sure yet what kind of additional parsing support you'll need.
Sathya, Damian, can you already say how much of 2013 you will be around for doing Tor stuff and how much time you think you could spend on this project?
For my part a fair bit. This is, of course, a hobby that I do outside of a full time job so work and life might get in the way. That said, any project that includes collaborating with other developers to make use of stem goes to the top of my todo list.
I'm thinking one ticket for implementing usage statistics in Pyonionoo, a second ticket for the censorship detector integration which depends on the first ticket, and two more tickets for network statistics and torperf statistics integration into Pyonionoo. I'd probably suggest the first two tickets only for trying to get funding from sponsor F year 3 and save the other two tickets for later.
Hmm, aren't those last two tickets the only ones where I'd be involved? It sounds like this project mostly concerns pyonionoo so guess we should wait for input from Sathyanarayanan...
Cheers! -Damian
PS. Sathyanarayanan: I was gonna ping you separately but might as well hijack this thread - would you mind giving stem's new networkstatus module a try in pyonionoo? You can find documentation for it at...
https://stem.readthedocs.org/en/latest/stem.descriptor.html#module-stem.desc...
... let me know if you run into any issues!
PPS. Grrr, their pydocs have some rendering bugs... >:(
On Fri, Oct 12, 2012 at 9:57 AM, Karsten Loesing karsten@torproject.org wrote:
Hi Sathya, hi Damian,
I have been working on improving bridge usage statistics in the past weeks, and I discussed my results with George in the context of improving the censorship detector. We concluded that we need to write a fair amount of new code and that it would be nice to integrate this new code into Pyonionoo.
Let me explain in more detail: I'm not yet done with the bridge usage analysis, but early results are that we can calculate daily bridge users similar to how we calculate daily direct users. We need a more precise implementation than what we have in metrics-web though. So, we either need to redesign the complex beast called metrics-web, or start from scratch, ideally in a programming language that attracts more potential contributors.
The censorship detector takes the output of metrics-web's daily direct usage numbers and tries to detect sudden drops or increases in usage. It would be good to also look at bridge usage for censorship events, and it would be quite important to have a tighter integration of the censorship detector into the usage-calculating process and look at raw numbers there. Right now, the vast majority of events that the censorship detector reports are just false positives.
We first thought about implementing this in a new Python project that has several modules: the first module would calculate daily users, the second would determine possible censorship events, the third would export data via JSON, the fourth would graph results on a website, and the fifth would send out email notifications.
However, that codebase would share a lot of functionality with Onionoo/Pyonionoo. We could instead extend Pyonionoo and Atlas/Compass to not only provide network status data, but also statistics about the Tor network and its usage. We would start with daily users, both direct and bridge users, and later add aggregate statistics about relays and bridges, and after that torperf performance statistics. Once we're there we can retire a large part of metrics-web.
This project is huge, which is why I'm thinking we should apply for funding. I expect the project to take at least 6 months of developer time, in addition to the time to make Pyonionoo a full replacement of Onionoo. Before we ask for money, we should have a very rough idea who's going to work on this project. Once we have the money, we'll actually have to do it, and I can't do this all by myself.
I think we'd need Damian's help for the descriptor-parsing side and general help with Python and Sathya to make Pyonionoo modular enough to provide all these different features. I'd either write or help write the user-counting code, and I think George would help with integrating and improving the existing censorship-detecting code.
Sathya, Damian, can you already say how much of 2013 you will be around for doing Tor stuff and how much time you think you could spend on this project? If the answer is "maybe not very much", by all means, please say so.
If there's at least some interest in this project, I'd create Trac tickets in the next few days. I'm thinking one ticket for implementing usage statistics in Pyonionoo, a second ticket for the censorship detector integration which depends on the first ticket, and two more tickets for network statistics and torperf statistics integration into Pyonionoo. I'd probably suggest the first two tickets only for trying to get funding from sponsor F year 3 and save the other two tickets for later.
Thanks, Karsten