Hello,<div><br></div><div>another busy benchmarking + profiling period for database querying, but this time more rigorous and awesome.</div><div><br></div><div>  * wrote a generic query analyzer which logs query statements, EXPLAIN, ANALYZE, spots and informs of particular queries that yield inefficient query plans;</div>


<div>  * wrote a very simple but rather exhaustive profiler (using python's cProfile) which logs query times, function calls, etc.; output is used to see which parts of the e.g. backend are slow during API calls; output can be easily used to construct a general query 'profile' for a particular database, etc.; [1]</div>


<div>  * benchmarked lots of different queries using these tools, recorded query times, was able to observe deviations/discrepancies;</div><div>  * uploaded the whole database and benchmarked briefly on an amazon EC2 m2.2xlarge instance; </div>


<div>  * concluded that, provided there is enough memory to cache *and hold* the indexes in cache, query times are good;</div><div>  * in particular, tested the following query scheme extensively: [2] (see comments there as well if curious); concluded that it runs well;</div>


<div>  * opted for testing raw SQL queries (from within Flask/python) - so far, translating them into ORM queries (while being careful) resulted in degraded performance; if we have to end up using raw SQL, I will create a way to encapsulate them nicely;</div>


<div>  * made sure data importing is not slowed and remains a quick-enough procedure;</div><div>  * researched PostgreSQL stuff, especially its two-layer caching; I now have an understanding of the way pgsql caches things in memory, how statistics on index usage are gathered and used for maintaining buffer_cache, etc.</div>


<div>The searchable metrics archive would work best when all of its indexes are kept in memory.</div><div>  * to this end, looked into buffer cache hibernation [3], etc.; I think pg_prewarm [4, 5] would serve our purpose well. (Apparently many business/etc. solutions do find cache prewarming relevant - pity it's not supported in stock PostgreSQL.)</div>


<div><br></div><div>The latter means that</div><div>  * I don't think we can avoid using certain postgresql extensions (if only one) - which means that deploying will always take more than apt-get && pip install, but I believe it is needed;</div>


<div> * next on my agenda is testing pg_prewarm on EC2 and, hopefully, putting our beloved database bottleneck problem to rest.</div><div><br></div><div>I planned to expose the EC2 for public tor-dev inquiry (and ended up delaying status report yet again), but I'll have to do this separately. This is possible, however. Sorry for the delayed report.</div>


<div><br></div><div>##</div><div><br></div><div>More generally,</div><div><br></div><div>I'm happy with my queer queries [2] now;</div><div>the two constraints/goals of</div><div><br></div><div>  * being able to run Onionoo-like queries on the whole descriptor / status entry database</div>


<div>  * being able to get a list of status entries for a particular relay</div><div><br></div><div>will hopefully be put to rest very soon. The former is done, provided I have no trouble setting up a database index precaching system (which will ensure that all queries of the same syntax/scheme run quick enough.)</div>


<div><br></div><div>Overall, I'm spending a bit too much time on a specific problem, but at least I have a more intimate lower-level knowledge of PostgreSQL, which turns out to be very relevant to this project. I hope to be able to soon move to extending Onionoo support and providing a clean API for getting lists of consensuses in which a particular relay was present. And maybe start with the frontend. :)</div>


<div><br></div><div>Kostas.</div><div><br></div><div>[1]: <a href="https://github.com/wfn/torsearch/commit/8e6f16a07c40f7806e98e9c71c1ce0f8e3849911">https://github.com/wfn/torsearch/commit/8e6f16a07c40f7806e98e9c71c1ce0f8e3849911</a></div>


<div>[2]: <a href="https://github.com/wfn/torsearch/blob/master/misc/nested_join.sql">https://github.com/wfn/torsearch/blob/master/misc/nested_join.sql</a></div><div>[3]: <a href="http://postgresql.1045698.n5.nabble.com/patch-for-new-feature-Buffer-Cache-Hibernation-td4370109.html">http://postgresql.1045698.n5.nabble.com/patch-for-new-feature-Buffer-Cache-Hibernation-td4370109.html</a></div>


<div>[4]: <a href="http://www.postgresql.org/message-id/CA+TgmobRrRxCO+t6gcQrw_dJw+Uf9ZEdwf9beJnu+RB5TEBjEw@mail.gmail.com">http://www.postgresql.org/message-id/CA+TgmobRrRxCO+t6gcQrw_dJw+Uf9ZEdwf9beJnu+RB5TEBjEw@mail.gmail.com</a></div>


<div>[5]: <a href="http://raghavt.blogspot.com/2012/04/caching-in-postgresql.html">http://raghavt.blogspot.com/2012/04/caching-in-postgresql.html</a></div>