[tor-bugs] #2923 [Metrics/Metrics website]: Improve materialized views in the metrics database

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Jul 21 16:18:54 UTC 2016


#2923: Improve materialized views in the metrics database
-------------------------------------+--------------------------
 Reporter:  karsten                  |          Owner:
     Type:  enhancement              |         Status:  assigned
 Priority:  High                     |      Milestone:
Component:  Metrics/Metrics website  |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:                           |  Actual Points:
Parent ID:                           |         Points:
 Reviewer:                           |        Sponsor:
-------------------------------------+--------------------------
Changes (by iwakeh):

 * severity:   => Normal


Old description:

> The [https://gitweb.torproject.org/metrics-
> web.git/blob/HEAD:/db/tordir.sql metrics database schema] uses
> periodically updated tables similar to materialized views for aggregating
> statistics.  When inserting data into the database, we write the dates
> that have changed to a separate updates table.  Every three hours, we
> delete the aggregates for these days and recompute them, which takes a
> few minutes.
>
> The recompute step that takes most of the time is `refresh_user_stats()`,
> which is no surprise given the complexity of that function.  We should
> try to simplify this function, possibly by pre-computing partial results
> that can be reused for other statistics.  Ideally, recomputing aggregates
> should run in under one minute, given that we want to add more
> materialized views for more aggregate statistics in the future.  In
> particular, I'd like to know which particular SQL parts slow us down in
> order to avoid them in the future.

New description:

 The [https://gitweb.torproject.org/metrics-
 web.git/tree/modules/legacy/db/tordir.sql metrics database schema] uses
 periodically updated tables similar to materialized views for aggregating
 statistics.  When inserting data into the database, we write the dates
 that have changed to a separate updates table.  Every three hours, we
 delete the aggregates for these days and recompute them, which takes a few
 minutes.

 The recompute step that takes most of the time is `refresh_user_stats()`,
 which is no surprise given the complexity of that function.  We should try
 to simplify this function, possibly by pre-computing partial results that
 can be reused for other statistics.  Ideally, recomputing aggregates
 should run in under one minute, given that we want to add more
 materialized views for more aggregate statistics in the future.  In
 particular, I'd like to know which particular SQL parts slow us down in
 order to avoid them in the future.

--

Comment:

 If this is still a problem I'd suggest using postgres analysis options in
 order to find out if adding an index here and there would help.

 (Corrected the SQL file link in the description.)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2923#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list