[tor-dev] [GSoC 2013] Status report - Searchable metrics archive

Kostas Jakeliunas kostas at jakeliunas.com
Fri Sep 6 05:59:24 UTC 2013


This status update is less extensive/dramatic compared to the last one, but
I'm still happy to report to be slowly moving ahead towards a stable
searchable archive system. In short, I've been working on what I said I
should work on in the last report, more or less:

I should now move on with implementing / extending the Onionoo API, in
> particular, working on date range queries, and refining/rewriting the "list
> status entries" API point (see below). Need to carefully plan some
> things, and always keep an updated API document. (Also need to update and
> publish a separate, more detailed specification document.)

The searchable metrics backend (encompassing a part of the database and the
Onionoo-like API) [2] is still happily chugging along online; quite a few
folk ran some queries on it, including myself. I'm in the process of
expanding my benchmark.py tool to generate realistic-looking
parallel/asynchronous traffic for different kinds of relays and API points;
from my limited parallelized benchmarking so far, everything looks good -
the bottlenecks are still localized to the individual queries. I should try
and generate human-readable more-or-less rigorous benchmarking reports and
publish them.

I've briefly run the backend on an EC2 instance again, to compare benchmark
outputs and average query times. Natural database caching seems to be
helping quite a bit for the current online database (at ts.mkj.lt), by
which I mean, indexes and some query results get cached via natural usage
of the backend/database. I've been tinkering on a simple system to pre-warm
indexes upon application start (without the need for any PostgreSQL
extensions), for more uniformly distributed query times. Overall though, we
seem to be doing OK in regards to actual query times.

Made updates to the Onionoo API, but haven't pushed them yet (hoped to do
that until this report, but finally learning not to delay); expect them
soon (and subsequent updates to the Onionoo API doc [3] as well, once that
happens.) I'm now trying to incorporate the most recent Karsten's feedback
in regards to API points, parameters and some preliminary simplistic
caching. Basically, once the date range parameters are working nicely for
all three types of documents currently provided by the API, and once the
status entry API point is returning valid-after summaries/ranges in a more
intuitive document format, the whole thing will hopefully be able to
satisfy actual usefulness criteria to a significant extent. Together with
caching, it will hopefully be able to be considered as an almost-proper
(smallish, as of now) subset of the Onionoo API/backend.

Did some experiments with PostgreSQL's pg_trgm module for full-text search
(so that search strings matching only the middle of some descriptor field
could work); I realize that's not priority now, but I was curious to see if
it would work. Nothing conclusive thus far, unfortunately.

Specification document explaining more detailed design and applicable use
cases for the Onionoo-like API coming along.

I should continue finalizing the Onionoo-like API into a working, non-hacky
state; keep the codebase clean and maybe do some cleanup refactoring;
continue observing database performance; write more documentation; and if
all is well, expand the list of fields contained in the three status
documents. Besides all this,

   - update the database to the latest consensuses and descriptors;
   - turn on the cronjob for rsync and import of newest archives;
   - caching;
   - hopefully later integration with the bandwidth and weights documents
   in Onionoo proper;
   - import 2008 year data, etc.

These couple of weeks coincide with an (not quite planned for)
apartment-moving period, but I *really* hope I have the vast majority of
the things mentioned working in a decent state by the end of the next week,
to leave ourselves some wiggle space to ensure the resulting system is as
stable as possible before the official date for the end of the coding
period. I'll probably be available throughout the weekend/s, just in case.

Cheers to you all

[1]: https://lists.torproject.org/pipermail/tor-dev/2013-August/005310.html
[2]: https://lists.torproject.org/pipermail/tor-dev/2013-August/005311.html
[3]: https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130906/242afda0/attachment.html>

More information about the tor-dev mailing list