[tor-dev] Searchable Tor descriptor archive - GSoC 2013 project

Kostas Jakeliunas kostas at jakeliunas.com
Wed May 29 03:38:25 UTC 2013


I'm a student who will be working on the Searchable Tor descriptor archive
as part of Google Summer of Code. Yay!

I've been following Tor development for a while and hope that this
opportunity will be my way of sneaking into the development kitchen of Tor.
In any case, I hope to stay around for a longer time to come.

The original GSoC project proposal is based on one of the Tor project ideas
available [1] and is part of the Tor Metrics project [2]. The GSoC proposal
itself is also available to read [3] (TXT; if there's any interest, I can
work on reformatting.) My primary mentor is Karsten and my secondary mentor
is Damian.

I will quote the abstract from the proposal to sum up the high-level goals
of this project:

I'd like to create a more integrated and powerful descriptor archival
> search and browse system. (The current tools are very restrictive and the
> experience disjointed.) To do this, I'll write an archival browsing
> application wherein the results are interactive: they may act as further
> search filters. Together with a search string input tool which will have
> more filtering options, the application will provide a more cohesive
> archival browse & search experience and will be a more efficient tool.

So as of now, we have an array of tools for inspecting, searching for and
getting aggregate data about running relays. (For an overview, see the
Tools page in the Metrics portal. [4]) These tools include relay search,
consensus info, exit-by-IP search, and quite a few more; furthermore, two
Onionoo [5] based applications/tools: Atlas and Compass.

This project would proposes to:

   - implement a more powerful backend that would allow one to search for
   all available relays since mid-2007 (I should have clarified in the
   previous discussions, and Karsten already includes this bit; i.e., since v2
   statuses became available [6]; I guess this can also be discussed). "More
   powerful" here means, first and foremost, "all (>= v2) archival data"
   (relay descriptors and consensuses at the very least), and furthermore (at
   least per the original proposal), involving more complex queries: we'd be
   looking into, I think, minimally, combined AND/OR filters referring to a
   wider range of data fields available in the archival data and the ability
   to specify multiple date ranges. Referring to consensus-related data while
   searching for relays and vice versa would also be possible. (The
   capabilities would therefore also include those of exoneraTor.)

   - implement backend results which would, as of current standing, aim for
   Onionoo compatibility (again see protocol design in [5]), or perhaps
   supersede it while providing backwards compatibility (e.g. returning
   paginated lists of consensus-status-entries where a specified relay was

   - (as per original proposal,) implement a more powerful archival
   descriptor search & browse tool (frontend) which would provide a more
   uniform "looking up relays" / "searching by using many criteria" / "further
   refining search in the results page" experience - "refining search
   results", i.e. adjusting filters would be semantically the same as entering
   search criteria in the beginning; hence a more interactive experience, a
   more powerful search/browse tool.

The goals and design of the project have to be clarified, however. There is
ongoing discussion (see another tor-dev thread [7] e.g.) whether perhaps
the focus could be to create a backend which would speak the full Onionoo
protocol and therefore be a potential replacement not only for relay search
and exoneraTor, but also for other components: all presently-speaking
Onionoo applications could be made to talk to the new backend, for example.
The overall count of components will hopefully be reduced in any case, but
ideally, we would end up with a much more integrated Tor Metrics (and maybe
beyond) ecosystem.
Many open questions, however - see again [7]. Obviously discussions are
very welcome indeed!

I'm wfn on OFTC (#tor-dev, #nottor), also reachable via XMPP <
phistopheles at jabber.org>, and am very much up for any kind of chat. :) I'll
be busy with exams in the first three weeks of June, though - but will find
time for sure!


[1] https://www.torproject.org/getinvolved/volunteer#metricsSearch

[2] https://metrics.torproject.org/

[3] http://kostas.mkj.lt/gsoc2013.txt

[4] https://metrics.torproject.org/tools.html

[5] https://onionoo.torproject.org/

[6] https://metrics.torproject.org/data.html#relaydesc

[7] https://lists.torproject.org/pipermail/tor-dev/2013-May/004940.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130529/50a7362f/attachment.html>

More information about the tor-dev mailing list