[tor-dev] Google Summer of Code Proposal - PathSupport counterpart for Stem

Damian Johnson atagar1 at gmail.com
Mon Apr 2 17:11:26 UTC 2012

Hi Ravi. This is a nice first draft and please keep in mind that I'm
pretty green with PathSupport (I've never used it myself) so feel free
to push back on any suggestions.

The high level approach that you seem to be taking is to copy
PathSupport into stem, then refactor and test it. Is that right? If so
then a few questions...

* Did you get Mike's permission for that? TorCtl is under the BSD
license (I think) and stem is LGPLv3.
* Is this the design that we want? PathSupport is modeled as a narrow
object hierarchy built upon TorCtl.EventHandler. We have the
opportunity to make any API we want so, as a user, what would you find
to be the most intuitive?

My suggestion for starting tasks would be to...

1. Write a simple script to use PathSupport to, say, run wget from a
target locale ('./my_script FR http://www.torproject.org/'). See where
the pain points were in using PathSupport and what, as a user, you
would rather that it did differently.

My understanding is that PathSupport is highly focused on
experimentation since that is what Mike needed for his work. However,
that is just one consumer and I'm most interested in providing an
elegant, simple API that handles basic use cases (like the wget
example) easily and can be *extended* for experiments.

2. Talk with the users of PathSupport to figure out their use cases.
We should either include those capabilities in our PathSupport
counterpart *or* provide what they need to easily make it themselves
(if it's a specialized use case). Only three people or places to
contact come to mind...

* Mike for SoaT and the bandwidth authorities
* Sebastian for TorBEL
* tor-dev@ for researchers and other developers using PathSupport,
Roger might have some suggestions

3. Part of why I was dubious about this being a quick and easy project
is that Stem currently lacks the controller capabilities that you
need. You mention using stem.control.BaseController at several points
which makes sense since it... well, exists. However, as its pydocs say
this is not the class you are looking for...

"Don't use this directly - subclasses provide higher level functionality."

... or they will once we have them. Part of this project would be to
start the general controller class to provide the capabilities that
you need (plus tests of course). On first glance the things that a
PathSupport copy would need are...

* Event handling for, at least, NEWCONSENSUS and NEWDESC.
* A Network Status class. This would be similar to
stem.descriptor.server_descriptor but *far* easier (there's only
around three network status lines).

These are easy and I'm happy to work on them with you. We will, of
course, need more before actually migrating any clients.

> Their feedback will ensure that the API will be usable.

Don't count on it. This will give a nice first draft but expect to
rewrite things quite a few times as we go along. Actually using your
API for real clients will certainly reveal some things that we could
do better. ;)

> I also will communicate with my mentor about my progress and hopefully, will have an intuitive, easy to use API design ready before the coding period starts.

I would like to see a rough first draft of an API as part of the
application, which we could then incrementally refine. Maybe a trac
subpage under stem would be the best place for this?

> Implementation implies writing the code, tests and the documentation.


> An amalgamation of the PathSupport.PathBuilder and the PathSupport.ConsensusTracker classes.

I understand why Mike made them separate. A few things to think about...

a. The ConsensusTracker is useful as a standalone class by providing
the current consensus and descriptors. I used this for a short time
with arm but stopped due to 'b'.

b. Loading all of the consensus and descriptor data is... a lot.

atagar at morrigan:~$ du -h ~/.tor/cached-consensus ~/.tor/cached-descriptors
672K	/home/atagar/.tor/cached-consensus
3.1M	/home/atagar/.tor/cached-descriptors

When I did this with arm a couple years ago it choked the application
for several seconds and caused high memory usage. I've heard that this
is better, but still we should figure out what is really necessary for
the PathSupport functionality that we want.

c. This will be moot, of course, if we go with a different design.

> TorCtl.PathSupport.PathBuilder uses a TorCtl.PathSupport.SelectionManager. A helper class for handling (router) configuration updates. I will merge a part of this into stem.path.PathController too

Not quite following. I thought that the SelectionManager was an
argument for the configuration the user wanted to run PathSupport
with. Keeping those separate conceptually seems like a good idea,
though again I haven't actually tried it in practice.

> Is a direct subclass of stem.control.BaseController


> A major change would be to make PathController fully thread-safe instead of an event/queue system.

Slight correction, stem uses almost the exact same event/queue based
model as TorCtl. The difference is that it also adds read/write locks
to provide more complete thread safety.

> The following will be ported to use Stem:
> * Torflow

Woah, bad idea. Torflow = SoaT + Bandwidth Authorities. That is both
way bigger than you want to take on, and probably the last things that
will migrate (if they ever do at all). Doesn't TorBEL manually
construct circuits? If so then that would be a far better client.

That said, I see where you're getting this from and I might be
completely misunderstanding how TorBEL works...

04:34 < logan> please recommend some TorCtl clients which use the
PathSupport module
04:42 < Sebastian> logan: I think there's just torflow
04:44 < logan> what about torbel ?
04:47 < logan> and SoaT ?
04:50 < Sebastian> soat is a part of torflow
04:50 < Sebastian> torbel doesn't use it
04:51 < Sebastian> torbel uses TorCtl.Router and TorCtl.TorUtil

> There are some unimplemented parts of the general controller class that are required for the implementation of PathSupport, such as the Router class. atagar is currently working on this.

Oh, good that you spotted this. In an ideal world I'd be working on
this but, if the last couple months are any guide, I wouldn't count on

> I will help with implementing these so that they will be ready before the coding period begins.

Great. The top slot on my dance card usually goes to anything that has
people actively offering to help. At the moment that's mostly around
descriptor parsing, but I'm happy to swap back to the controller if
you want to work on it with me.

> Port Torflow to use Stem. This will consume a part of week 11,

/me chokes, realizing that ten days are being allocated to this

... er, ambitious

> I have written a few patches for some Tor Project projects, #1667 (Tor), #5032 (Thandy). Two to Stem, which have been committed to the repository #5199 and #5472.

Many thanks for those, btw. :)

Do you have any standalone code samples (preferably python) that
you've written? Possibly for school?

> I have exams until the 29th of April, so I will be missing a few days of the community bonding period...

No problem.

> Stem, like all libraries implementing an API for a moving target, requires
> maintenance. I will co-maintain Stem in the future. By the time I'm done with
> the SoC program, I would've also gained familiarity with other related
> projects such as Torflow, TorBEL and Arm. I'll be in a position where I can
> help out with those if there is a need.

Great, we're always glad when people stick around after GSoC. It's
unpleasantly rare, but always good to hope for.

> I will keep people informed about my progress by sending (probably monthly,
> or as often as required) reports the mailing list.

Last year we did bi-weekly status updates. I think that I'd like to
work directly with whoever is selected rather than just having code
tossed over the fence, but we'll see if that works out (it's not
everyone's cup of tea). If you'd rather work on things more
independently then let me know.

I'm a little uncomfortable with how nebulous the individual
PathSupport tasks are. Please more concretely say what they include
and your approach. Alternatively, feel free to make this a
"semi-PathSupport and other stem tasks" proposal, taking on some
general stem tasks (like Safe Cookie, metrics-lib migration, general
controller work, etc) plus _exploratory_ work on PathSupport.

* The advantage of that approach would be a better defined tasks
without the unknowns that often derail projects.
* The disadvantage is that you'd finish lots of small, useful features
rather than a big one (personally I count this as a plus, but some
people like just having a single big goal).

Completely up to you. Feel free to continue focusing your application
on PathSupport if you want, the above is just a potential alternative.

Cheers! -Damian

More information about the tor-dev mailing list