Hi,
I'm a fourth year undergraduate student of Computer Science at IIT Kharagpur,India. I'm interested in applying for Gsoc this year for Tor. Specifically i'm interested in the project on txtorcon/stem integration. I have worked on twisted before developing ssh and git daemons using it. I'm a newbie at tor development though. Could someone point me to what is required for a good application for this project? Do I need to submit a patch to Tor? I have no experience working with stem. Could someone give me pointers on where to start? PS - You can view my twisted code at - https://github.com/lonehacker/Drupal.org-Git-Daemons . My contributions are under the id [anarix]
Thanks, Anshul Singhle
Hi Anshul, glad you want to get involved!
Specifically i'm interested in the project on txtorcon/stem integration. I have worked on twisted before developing ssh and git daemons using it. I'm a newbie at tor development though.
Great. Some twisted experience is definitely a plus for that project.
Could someone point me to what is required for a good application for this project?
If you're curious about what a good application looks like then take a peek at...
https://www.torproject.org/about/gsoc.html.en#Example
We expect a pretty thoroughly thought out plan and design as part of the application, and it's a major plus if you're already active with the community and contributing.
Do I need to submit a patch to Tor?
Meejah didn't cite it as a strict requirement for that project, but it's strongly suggested.
I have no experience working with stem. Could someone give me pointers on where to start?
See the tutorials on...
Cheers! -Damian
Anshul Singhle anshul.singhle@gmail.com writes:
I'm a newbie at tor development though. Could someone point me to what is required for a good application for this project? Do I need to submit a patch to Tor? I have no experience working with stem. Could someone give me pointers on where to start?
Hi Anshul,
I would start with checking out the txtorcon code and examples.
https://txtorcon.readthedocs.org/en/latest/ https://github.com/meejah/txtorcon
I would also familiarize yourself with Tor's control spec:
https://gitweb.torproject.org/torspec.git/blob/master:/control-spec.txt
If you're going to poke in Tor's code, the control-spec or configuration parsing code is probably the most useful. It would also probably be useful to look at users of the Python control libraries. For txtorcon right now that would be ooni, APAF or the new Twisted re-write of torperf.
The main thrust of the project would be to replace as much of the parsing code in txtorcon as possible with calls through to stem instead. The more "deleted" lines the better! ;)
Additionally, optionally providing stem instances in txtorcon callbacks would be a "nice to have". For example, currently the event callbacks just give the user back the "rest of the line". I would imagine some flag in TorControlProtocol which instead gives you back the appropriate Stem object; an immediate user here would be Karsten's re-written torperf stuff.
Atagar is best to help with the GSoC application stuff as I have not mentored for GSoC before.
Cheers, meejah
Thanks for the reply and the useful links! I went through txtorcon and stem a bit. Also skimmed through the control spec a bit. I'll now give an overview of what this project is, please correct me if I got something wrong. So basically tor has a control protocol that allows clients to connect to it, the spec is in the control-spec doc. txtorcon is a twisted-based implementation of this protocol. The aim of this project would be to use the protocol-parsing from stem instead of using the "re-implementation" in txtorcon. Since the protocol parsing is synchronous anyway(AFAICT) it makes sense to do this. Therefore, it would also make sense to identify other synchronous activities being done by txtorcon and use Stem to do those too (since we will be instantiating a stem object anyway). So as I see it, the majority of the changes will be in torcontrolprotocol.py(https://github.com/meejah/txtorcon/blob/master/txtorcon/torcontrolprotocol.p...) and in that in the TorControlProtocol and TorProtocolError parts(mainly) These classes have some functions that are used by others and some which are for internal use(i guess the ones with _ are internal if I got the naming convention right). So we will just throw away the internal functions(should we?) and keep the external function api the same, the difference being these external functions will now call stem instead of doing the heavy lifting themselves.
I have a question about LineOnlyReciever - Are there other types of receivers which txtorcon doesn't implement but stem does? If we are integrating the two it would be good to optionally add API's for these. Is this the primary reason for doing this integration? Or is it just better for tor if all the protocol specific code is one place so that it is easy to adapt to any changes in the API?
So in essence i guess my questions is - What is the primary focus of this project - Is it to bring the protocol specific code in one place or to use Stem wherever possible? or both?
Thanks, Regards, Anshul
Anshul Singhle anshul.singhle@gmail.com writes:
The aim of this project would be to use the protocol-parsing from stem instead of using the "re-implementation" in txtorcon. Since the protocol parsing is synchronous anyway(AFAICT) it makes sense to do this.
So far so good, yes.
One thing to note is that while "most" of the parsing is basically synchronous in txtorcon, some of the replies are huge. The one we care about being "GETINFO ns/all" which gives TorState the initial list of Router objects; there are ~3000 routers, so around 4*3000 lines to parse and that's way too long to pause the reactor. So there is some incremental parsing stuff in txtorcon (a state machine in TorState), and this would likely be the "some modifications to stem" bit: to provide a way to feed some stem object lines of a consensus and get objects back out representing the routers (which would be given to txtorcon.Router instead of the current way of passing some strings to update() in router.py).
Therefore, it would also make sense to identify other synchronous activities being done by txtorcon and use Stem to do those too (since we will be instantiating a stem object anyway).
Yes, very possibly!
So as I see it, the majority of the changes will be in torcontrolprotocol.py(https://github.com/meejah/txtorcon/blob/master/txtorcon/torcontrolprotocol.p...) and in that in the TorControlProtocol and TorProtocolError parts(mainly)
Yes. There will hopefully be some changes in TorState, to do with the Router information I mention above (TorState keeps the list of Router objects, which represent the Tor relays in the consensus).
These classes have some functions that are used by others and some which are for internal use(i guess the ones with _ are internal if I got the naming convention right).
Yes.
So we will just throw away the internal functions(should we?) and keep the external function api the same, the difference being these external functions will now call stem instead of doing the heavy lifting themselves.
Yes.
I have a question about LineOnlyReciever - Are there other types of receivers which txtorcon doesn't implement but stem does?
LineOnlyReceiver is a Twisted thing; it handles buffering etcetera and delivers one line at a time to TorControlProtocol (via lineRecieved). It could be that a different superclass is appropriate. Stem uses the socket stuff directly, in a synchronous (threaded) way, so I wouldn't expect any sharing to happen here.
Aside: note that the camelCase methods are all Twisted overrides and the_underscore_ones are txtorcon's.
So in essence i guess my questions is - What is the primary focus of this project - Is it to bring the protocol specific code in one place or to use Stem wherever possible? or both?
The main thrust of the project is to have one Python implmentation of the protocol parsing code -- namely Stem. The reasoning here being that Stem already parses a lot more things than txtorcon anyway (and I have no intensions of re-implementing all that) and so it makes sense to leverage that in txtorcon. From atagar's perspective, he'd like more people exercising Stem code and we agree it's somewhat silly to have two Python implementations of this, even if some of it is rather simple.
If you can identify other things that make sense for txtorcon to use from Stem, that's bonus. I would say, however, that I would forsee any users who want pieces of functionality from Stem to "just use Stem". That is, I don't want to *add* an API in txtorcon for loading + parsing consensus files, etc. -- users who want that should just use Stem (since there's no Python/Twisted async file APIs anyway). A good example of this is the Twisted version of torperf.
By the same token, there will very likely be users of txtorcon who don't care about stem functionality so txtorcon users shouldn't be *forced* to learn about stem (installing it as a dependency is fine).
I'm getting at the event stuff here: I would like it to remain optional to receive the event text itself versus a stem RouterStatusEntryV3 instance in an event callback (i.e. one registered via TorControlProtocol.add_event_listener). The Twisted version of torperf would make a fine place to "test out" this event stuff somewhere that will actually be used/released (e.g. instead of another example).
Super extra bonus points if we can a) add a zope.interface for the callback (it lacks one currently) and b) use adaptors to figure out which kind of object the listener wants. I haven't thought about this, really, but could be fun (or not possible).
Thanks for the interest, meejah
p.s. I should mention that if you're needing something from me for GSoC and I haven't responded, please don't hesitate to bug me via private mail -- sometimes things slip through the cracks, and I'm fairly busy these days.
Also, I am usually idling in #tor-dev on OFTC and the best way for me to see a message there is to make sure it has "meejah: " at the start