[tor-dev] Tor Volunteer

Thu Sep 13 17:21:58 UTC 2012

> Damian,
>
> I saw on the tor-dev mailing list you would be happy to have help with
> Python related Tor work.  I'm game. I'd love to help the Tor Project.

Hi Scott. Glad you want to help! These are great questions so I'm
looping in Ravi (another developer who's actively hacking on stem) and
tor-dev in case other people are interested.

> I looked through the Easy bugs for Arm but it seems that Stem is where the
> effort is going now.

Yup. Arm was experiencing really bad feature creep so I swapped my
focus to stem around a year back. Here's the relevant links for
stem...

* Development Wiki
https://trac.torproject.org/projects/tor/wiki/doc/stem

* Gitweb
https://gitweb.torproject.org/stem.git

* Bug Tracker
http://tinyurl.com/stem-bugs

> Should I focus on Stem?
> Do you have a list of easier tasks to get my feet wet with?

Sure! We're currently working to make it feature complete and prepare
for its initial release. Development tasks tend to fall into a few
general categories...

===================
Controller Functionality
===================

Stem is primarily a replacement for TorCtl, a controller library used
by several of our other projects (arm, SoaT, TorBEL, etc). Ravi's been
focusing on making the Controller class feature complete. This is
mostly located in...

https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/control.py

Some nice introductory tasks here would be...

* Add methods for querying relay descriptors

The first bits of this is very easy, while the later parts get a bit
more interesting...
- Add a get_relay() method that calls
get_info("desc/id/[fingerprint]") then return a ServerDescriptor for
it.
- Write unit and integ tests for the addition.
- [tricky part] Add a get_all_relays() method that provides an
iterator for all of the descriptors. The tricky bit is that the
BaseController provides discrete controller messages, but in this case
the "GETINFO desc/all" output is so large that we want to process it
while it comes in. This means changing stem's underlying message
processing system.

* Split low and high level controller methods

The Controller class is growing to be pretty bulky, and its methods
can generally be split into two categories...

- Low level methods that simply mirror what the control-spec provides.
- Higher level methods (like the get_relay() and get_all_relays()
above) which are more user friendly and build on what the control spec
provides.

Ravi: Do you have any thoughts on this? Do you think this would be a
good way to break up the controller, or would this just make it more
confusing? Alternatively we could wait until we try using stem in arm
to see if this makes sense or not...

===============
Descriptor Parsing
===============

This is where my focus has been for the last couple weeks. A
controller needs to be able to parse Tor's descriptor content
(GETINFO's desc, ns, and md responses). This has expanded to include
file descriptors and other descriptor content, so stem can be used as
a python replacement for MetricsLib. This is useful for projects like
Onionoo's python counterpart.

This is a somewhat dry part of stem to work on, so unless you like
reading specs and writing a parser for them this might not be a great
section to work on. That said, if detail oriented parsing and
validation sounds interesting then there's lots 'o tasks here.

======
Clients
======

Stem doesn't yet have any clients, and will certainly need some before
we make a release in order to work out the kinks. Our plan for this
is...

* Port arm's interpretor
Arm's control interpretor is actually a standalone script that can,
with a little work, be used independently from the rest of arm. It
would be interesting to see what an stem based control interpretor
would look like...

* Port arm's torTools module
Arm makes very heavy use of the controller, and abstracts all of its
TorCtl usage behind a wrapper module...
https://gitweb.torproject.org/arm.git/blob/HEAD:/src/util/torTools.py

This abstraction should make arm reasonably easy to move to stem.
Also, stem's getting many of that wrapper's features so arm's codebase
could be greatly simplified after we move.

* Port TorBEL
TorBEL would be a great candidate after we work out most of stem's
rough edges by moving arm. It'll likely provide some interesting use
cases, though this is still a ways out.

=======
Usability
=======

Stem doesn't yet have a site, nor have I posted its sphinx
documentation anywhere. A few things that we should do here is...

* Make a pretty site
Ideally I'd like to write the site itself with sphinx so we have a
consistent look between the front page and the documentation.

* Site with auto-updating documentation
This would involve setting up a site with a cron process that fetches
stem and, if changed, serves up the new documentation.

* Tidy up sphinx docs
Stem has a lot of documentation, but we haven't really looked much at
the sphinx output with an eye for making it developer friendly. It
would be nice to have people try to use stem, and improve the
documentation for the confusing bits.

* Examples
We should ask tor-dev@ for a list of bite sized Tor tasks that people
commonly want (like "print my relay's bandwidth" or "list all of the
relays with the BadExit flag"). Then write example scripts that do
this, and provide them on stem's site to give example usage.

> If so, how hard is it to set up a dev environment?

Very, very easy. Assuming that you have python, git, and tor installed
simply do the following.

* Get a copy of stem
git clone git://git.torproject.org/stem.git

* To run its unit tests
cd stem
./run_tests.py --unit

* To run its integ tests
./run_tests.py --integ

> Does the Tor Project have a formal mentorship program?  How does that work?

I'm not sure what you mean by a 'formal mentorship program'. I've
mentored students through GSoC and Wesleyan's open source program if
that answers your question.

Cheers! -Damian