[tor-dev] [GSoC '13] Tor status report - Searchable metrics archive

Mon Jul 22 17:01:19 UTC 2013

Hey all,

I apologize for this unusual timing for a status report, but I ended up
delaying it beyond measure, so better now than later I guess. I can
reiterate it + any updates soon, it's just that I figure I'm long overdue
on informing tor-dev on what's going on.

I've started my project [1] later than is usual, and more or less
immediately ran into what I deemed to be a database / ORM scaling issue
(the thing I'd been actually trying to avoid since writing the proposal),
or at least a behaviour of the ORM which was suboptimal to what we have in
mind: delivering (first and foremost) a searchable metrics archive
backend/database which incorporates, as of current plan, server descriptors
(relays and bridges, turns out a server descriptor model can happily
service both) and server/router statuses across a few year timespan
(currently using v3 consensus documents only), and provides querying
functionality which can extract relations between the two. The 'querying
with relations between the two' part, when tested on a broader span of
data, seemed to be causing trouble to me. I ended up allocating probably
inefficiently large amounts of time to this problem, rewriting the backend
part, and trying to optimize the queries which underlied the ORM (turns out
I didn't need to strip off the ORM abstraction - learned a few things about
SQLAlchemy that way - I will follow-up with an email pointing to current
code (sorry)).

  * The current iteration of the ORM model / backend (which actually is
very simple) solves this problem.
  * Stem descriptor and network status mapping to ORM works, and is nicely
(enough) integrated with the data import (from downloaded metrics archive)
tools, as well as an API to make queries on the ORM.
  * Implemented a partial Onionoo-protocol-adhering (without compression
and without some fields) backend for ?summary and ?details Onionoo queries.
  * Still tidying everything up. And *finally* writing a design document
outlining what we actually ended up with, and what is required till full
Onionoo integration.

Code review will happen pretty soon, and hopefully we'll have some
discussion upon where to go from here. Karsten mentioned that it might be
possible to use the existing Onionoo incarnation to continue providing
bandwidth weight etc. data (basically stuff from extra-info), and it might
be possible to join the two systems into an Onionoo-supporting backend
which will cover all / majority of archives available. Another (or) further
avenue would be to continue with the initial proposed plan to extend the
query format; and to build a frontend which would make use of the extended
query format. Expect another email with links to (decent) code.

[1]: http://kostas.mkj.lt/gsoc2013/gsoc2013.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130722/d5e545af/attachment.html>