[tor-dev] Onionoo protocol/implementation nuances / Onionoo-connected metrics project stuff

Wed Jul 31 21:32:22 UTC 2013

A couple of thoughts to add from my experience working with DB's:

-Consider a partitioning strategy for the data if it makes sense. Queries
which will hit only a single partition will be much faster in general.
-Check out advanced indexing strategies like Solr. this can sometimes speed
up certain queries by orders of magnitude, and works well with batch
systems. Often it is easier to setup & maintain than advanced DB tuning.
-I second Karsten's comment on multi-column indexes. Designing the right
indexes & columns to include isn't simple though - you have to know what is
going to be searched.
-If you know in advance what kind's of where clauses you will see, consider
implementing partial indexes.

If you have specific problem queries w/ explain plans, feel free to post
them and I would be glad to spend a few minutes looking them over.

Charlie

On Tue, Jul 30, 2013 at 3:35 AM, Karsten Loesing <karsten at torproject.org>wrote:

> On 7/29/13 10:15 PM, Kostas Jakeliunas wrote:
> > It should also be possible to do efficient *estimated* COUNTs (using
> > reltuples [1, 2], provided the DB can be regularly VACUUMed + ANALYZEd
> > (postgres-specific awesomeness)) - i.e. if everything is set up right,
> > doing COUNTs would be efficient. This would be nice not only because one
> > could run very quick queries asking e.g. "how many consensuses include
> > nickname LIKE %moo% between [daterange1, daterange2]?" (if e.g. full text
> > search is set up) but also, if we have to resort to sometimes returning
> an
> > arbitrary subset of results (or sorted however we wish, but the sorting
> > being done already on a small subset of results, if that makes sense),
> we'd
> > be able to also supply info how many other results matching these
> > particular criteria there are, and so on. The usefulness of all this
> really
> > depends on intended use cases, and I suppose here some discussion could
> be
> > had who / how would an Onionoo system covering all / most of all the
> > descriptor+consensus archives and hopefully having an extended set of
> > filter / result options be used?
>
> I can see how estimated counts could be valuable information.  Or not.
> Do you want to first specify what type of queries you're planning to
> support?
>
> (I didn't spot anything in the other two mails that requires a reply.
> If there are still open questions, please let me know.)
>
> Best,
> Karsten
>
>
> >
> > [1]: http://www.varlena.com/GeneralBits/120.php
> > [2]: http://wiki.postgresql.org/wiki/Slow_Counting
> >
>
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130731/cf68509e/attachment.html>