<div dir="ltr"><div><div><div><div>A couple of thoughts to add from my experience working with DB's:<br><br></div>-Consider a partitioning strategy for the data if it makes sense. Queries which will hit only a single partition will be much faster in general. <br>

</div>-Check out advanced indexing strategies like Solr. this can sometimes speed up certain queries by orders of magnitude, and works well with batch systems. Often it is easier to setup & maintain than advanced DB tuning.<br>

</div>-I second Karsten's comment on multi-column indexes. Designing the right indexes & columns to include isn't simple though - you have to know what is going to be searched.<br></div>-If you know in advance what kind's of where clauses you will see, consider implementing partial indexes.<br>

<div><div><div><div><div><br></div><div>If you have specific problem queries w/ explain plans, feel free to post them and I would be glad to spend a few minutes looking them over.<br><br>Charlie<br></div></div></div></div>

</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jul 30, 2013 at 3:35 AM, Karsten Loesing <span dir="ltr"><<a href="mailto:karsten@torproject.org" target="_blank">karsten@torproject.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 7/29/13 10:15 PM, Kostas Jakeliunas wrote:<br>

> It should also be possible to do efficient *estimated* COUNTs (using<br>

> reltuples [1, 2], provided the DB can be regularly VACUUMed + ANALYZEd<br>

> (postgres-specific awesomeness)) - i.e. if everything is set up right,<br>

> doing COUNTs would be efficient. This would be nice not only because one<br>

> could run very quick queries asking e.g. "how many consensuses include<br>

> nickname LIKE %moo% between [daterange1, daterange2]?" (if e.g. full text<br>

> search is set up) but also, if we have to resort to sometimes returning an<br>

> arbitrary subset of results (or sorted however we wish, but the sorting<br>

> being done already on a small subset of results, if that makes sense), we'd<br>

> be able to also supply info how many other results matching these<br>

> particular criteria there are, and so on. The usefulness of all this really<br>

> depends on intended use cases, and I suppose here some discussion could be<br>

> had who / how would an Onionoo system covering all / most of all the<br>

> descriptor+consensus archives and hopefully having an extended set of<br>

> filter / result options be used?<br>

<br>

</div></div>I can see how estimated counts could be valuable information.  Or not.<br>

Do you want to first specify what type of queries you're planning to<br>

support?<br>

<br>

(I didn't spot anything in the other two mails that requires a reply.<br>

If there are still open questions, please let me know.)<br>

<br>

Best,<br>

Karsten<br>

<div class="im HOEnZb"><br>

<br>

><br>

> [1]: <a href="http://www.varlena.com/GeneralBits/120.php" target="_blank">http://www.varlena.com/GeneralBits/120.php</a><br>

> [2]: <a href="http://wiki.postgresql.org/wiki/Slow_Counting" target="_blank">http://wiki.postgresql.org/wiki/Slow_Counting</a><br>

><br>

<br>

</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>

tor-dev mailing list<br>

<a href="mailto:tor-dev@lists.torproject.org">tor-dev@lists.torproject.org</a><br>

<a href="https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev" target="_blank">https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev</a><br>

</div></div></blockquote></div><br></div>