[tor-bugs] #25161 [Metrics/CollecTor]: Fix another memory problem with the webstats bulk import

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Feb 17 10:35:31 UTC 2018


#25161: Fix another memory problem with the webstats bulk import
-------------------------------+--------------------------
 Reporter:  karsten            |          Owner:  karsten
     Type:  defect             |         Status:  assigned
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+--------------------------
Changes (by iwakeh):

 * owner:  iwakeh => karsten
 * status:  accepted => assigned


Comment:

 Providing plenty of RAM for the import shortens the processing time quite
 a bit due to less GC time.  The 85min using 16G for the entire available
 archives of meronense and weschniakowii together (reported
 [https://trac.torproject.org/projects/tor/ticket/25100#comment:18 here])
 reduce to just 65 min with 30G (of which only 22G were actually used at
 peak time, 10G most of the time).  Of course, timing depends highly on
 available cores (here only four were available) and lesser the type of
 cpu.

 If a machine with 64G is available for import it can just be run on the
 entire 'out' folder of webstats.tp.o and should be fine with 48-56G
 (assuming that weschniakowii represents one of the hosts with the heavier
 log load).
 In case the import gets interrupted the logs will clearly indicate which
 hosts were processed successfully.  This should be used to move the
 already completed imports out of the import directory to save processing
 time.  No problem if that is forgotten, CollecTor won't re-add or
 overwrite anything, but the additional scanning might take longer than
 without.

 Collector properties should be set to single run and have limits turned
 off for importing the already existing sanitized logs.

 I used metrics-lib commit 9f2db9a19 and collector commit 06d1a81d4 and
 performed some manual checks that the resulting sanitized logs stay the
 same except for the intended changes (e.g. removal of '?' etc.).  All
 seemed fine.

 Assigning to 'karsten' as the import seems ready to go.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25161#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list