Hi, all!
Originally, I wrote the proposal for this year's GSoC, but nobody seems willing to mentor it and the idea most probably needs more time to shape. Hence, I post it on this list to get some feedback and let the idea evolve. (:
Best, Robert
Problem statement
Tor's control protocol[0] facilitated the communication from other programs with a locally running Tor process, regardless of their programming language. It is used by various projects like GUI controllers, such as Vidalia or arm, research projects, or services for monitoring the Tor network, such as bandwidth scanners. The Tor control protocol is a message-based protocol which requires lots of message/string parsing when using it. Hence, to ease the use of that interface, libraries such as JTorCtl, TorCtl, Txtorcon, and Stem were developed. Today, some of those libraries can be considered outdated, such as JTorCtl and TorCtl - even though TorCtl is still in use. Stem and Txtorcon are state-of-the-art and heavily being used, provide thread safety, caching and a better interface, rendering string parsing unnecessary. However, both are Python libraries, restricting the use of the Tor control protocol to Python scripts only. While Python is a fine and very popular scripting language, actual restriction to use a specific programming language, prevents wider use of the Tor control interface.
Proposal
Therefore, I would like to propose a prototype of a next-generation Tor control interface, aiming to combine the strengths of both the present control protocol and the state-of-the-art libraries. It should provide (network) connectivity from other programs to a locally running Tor process, regardless of the programming language, while preserving thread safety, caching and a better interface. Before implementing in Tor itself, coding a prototype is substantially less work. The prototype should consist of a daemon running on the same machine as the Tor instance. The daemon implements a backend interface to the Tor instance through the Tor control protocol (via Stem) and a frontend interface providing JSON encoded data over HTTP (REST). Therefore, semi-structured data (JSON) can be accessed by any (HTTP-capable) programming/scripting language. HTTP compression and/or encryption can be implemented, too. Not having looked into Python REST frameworks yet in detail, Flask looks promising for this task.
Until the new interface is implemented in Tor itself, it will require continual upkeep and maintenance. Therefore, as a long term goal, the new interface should be implemented in Tor itself, if the prototype proofs to be successful.
[0] https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=control- spec.txt
A short summary of the discussion on IRC: It is generally considered a good idea to better encode data in Tor's control protocol. In this case Protocol Buffers probably is more suitable for data serialization than JSON. Though Flask seems to be a nice prototyping framework, security concerns were raised regarding the use of HTTP as transport protocol, where ZeroMQ or raw-TCP may be more suitable.
Best, Robert