[tor-bugs] #5806 [Metrics Data Processor]: Investigate why metrics-db sometimes takes long than 5 minutes to run

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue May 8 14:14:37 UTC 2012


#5806: Investigate why metrics-db sometimes takes long than 5 minutes to run
------------------------------------+---------------------------------------
 Reporter:  karsten                 |          Owner:  karsten
     Type:  defect                  |         Status:  new    
 Priority:  normal                  |      Milestone:         
Component:  Metrics Data Processor  |        Version:         
 Keywords:                          |         Parent:         
   Points:  4                       |   Actualpoints:         
------------------------------------+---------------------------------------
 metrics-db runs once every hour and shouldn't take longer than a few
 minutes.  Ideally, we should run it twice per hour to catch consensuses
 published at :30 of an hour (see #5504).  That's only possible if it
 reliably finishes in under 30 minutes, ideally under 5 minutes.

 My current idea why metrics-db takes so long is that it implements a quite
 crappy algorithm to provide the last 3 days of data via rsync: enumerate
 all files in the current rsync/ directory and store their names and last
 modified times in memory, iterate over our various output directories,
 copy files that are missing in rsync/, and delete the ones in rsync/ that
 are older than 3 days.  The problem is the step where we iterate over
 output directories, because files are not deleted automatically after a
 given time.  Any change needs to be implemented for relay descriptors,
 bridge descriptors, bridge pool assignments, exit lists, etc.

 This task took 1.5 points so far and may take another 2.5 points for a
 real fix.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5806>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list