Damian,
I saw on the tor-dev mailing list you would be happy to have help with Python related Tor work. I'm game. I'd love to help the Tor Project.
Hi Scott. Glad you want to help! These are great questions so I'm looping in Ravi (another developer who's actively hacking on stem) and tor-dev in case other people are interested.
I looked through the Easy bugs for Arm but it seems that Stem is where the effort is going now.
Yup. Arm was experiencing really bad feature creep so I swapped my focus to stem around a year back. Here's the relevant links for stem...
* Development Wiki https://trac.torproject.org/projects/tor/wiki/doc/stem
* Gitweb https://gitweb.torproject.org/stem.git
* Bug Tracker http://tinyurl.com/stem-bugs
Should I focus on Stem? Do you have a list of easier tasks to get my feet wet with?
Sure! We're currently working to make it feature complete and prepare for its initial release. Development tasks tend to fall into a few general categories...
=================== Controller Functionality ===================
Stem is primarily a replacement for TorCtl, a controller library used by several of our other projects (arm, SoaT, TorBEL, etc). Ravi's been focusing on making the Controller class feature complete. This is mostly located in...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/control.py
Some nice introductory tasks here would be...
* Add methods for querying relay descriptors
The first bits of this is very easy, while the later parts get a bit more interesting... - Add a get_relay() method that calls get_info("desc/id/[fingerprint]") then return a ServerDescriptor for it. - Write unit and integ tests for the addition. - [tricky part] Add a get_all_relays() method that provides an iterator for all of the descriptors. The tricky bit is that the BaseController provides discrete controller messages, but in this case the "GETINFO desc/all" output is so large that we want to process it while it comes in. This means changing stem's underlying message processing system.
* Split low and high level controller methods
The Controller class is growing to be pretty bulky, and its methods can generally be split into two categories...
- Low level methods that simply mirror what the control-spec provides. - Higher level methods (like the get_relay() and get_all_relays() above) which are more user friendly and build on what the control spec provides.
Ravi: Do you have any thoughts on this? Do you think this would be a good way to break up the controller, or would this just make it more confusing? Alternatively we could wait until we try using stem in arm to see if this makes sense or not...
=============== Descriptor Parsing ===============
This is where my focus has been for the last couple weeks. A controller needs to be able to parse Tor's descriptor content (GETINFO's desc, ns, and md responses). This has expanded to include file descriptors and other descriptor content, so stem can be used as a python replacement for MetricsLib. This is useful for projects like Onionoo's python counterpart.
This is a somewhat dry part of stem to work on, so unless you like reading specs and writing a parser for them this might not be a great section to work on. That said, if detail oriented parsing and validation sounds interesting then there's lots 'o tasks here.
====== Clients ======
Stem doesn't yet have any clients, and will certainly need some before we make a release in order to work out the kinks. Our plan for this is...
* Port arm's interpretor Arm's control interpretor is actually a standalone script that can, with a little work, be used independently from the rest of arm. It would be interesting to see what an stem based control interpretor would look like...
* Port arm's torTools module Arm makes very heavy use of the controller, and abstracts all of its TorCtl usage behind a wrapper module... https://gitweb.torproject.org/arm.git/blob/HEAD:/src/util/torTools.py
This abstraction should make arm reasonably easy to move to stem. Also, stem's getting many of that wrapper's features so arm's codebase could be greatly simplified after we move.
* Port TorBEL TorBEL would be a great candidate after we work out most of stem's rough edges by moving arm. It'll likely provide some interesting use cases, though this is still a ways out.
======= Usability =======
Stem doesn't yet have a site, nor have I posted its sphinx documentation anywhere. A few things that we should do here is...
* Make a pretty site Ideally I'd like to write the site itself with sphinx so we have a consistent look between the front page and the documentation.
* Site with auto-updating documentation This would involve setting up a site with a cron process that fetches stem and, if changed, serves up the new documentation.
* Tidy up sphinx docs Stem has a lot of documentation, but we haven't really looked much at the sphinx output with an eye for making it developer friendly. It would be nice to have people try to use stem, and improve the documentation for the confusing bits.
* Examples We should ask tor-dev@ for a list of bite sized Tor tasks that people commonly want (like "print my relay's bandwidth" or "list all of the relays with the BadExit flag"). Then write example scripts that do this, and provide them on stem's site to give example usage.
If so, how hard is it to set up a dev environment?
Very, very easy. Assuming that you have python, git, and tor installed simply do the following.
* Get a copy of stem git clone git://git.torproject.org/stem.git
* To run its unit tests cd stem ./run_tests.py --unit
* To run its integ tests ./run_tests.py --integ
Does the Tor Project have a formal mentorship program? How does that work?
I'm not sure what you mean by a 'formal mentorship program'. I've mentored students through GSoC and Wesleyan's open source program if that answers your question.
Cheers! -Damian