[tor-bugs] #25161 [Metrics/CollecTor]: Fix another memory problem with the webstats bulk import

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Feb 20 19:25:00 UTC 2018


#25161: Fix another memory problem with the webstats bulk import
-------------------------------+--------------------------
 Reporter:  karsten            |          Owner:  iwakeh
     Type:  defect             |         Status:  assigned
 Priority:  Medium             |      Milestone:
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+--------------------------

Comment (by karsten):

 Looking at the stack trace and the input log files, I noticed that two log
 files are larger than 2G when decompressed:

 {{{
 3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531
 584K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531.xz
 2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601
 404K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601.xz
 }}}

 I just ran another bulk import with just those two files as import and ran
 into the same exception.

 It seems like we shouldn't attempt to decompress these files into a
 `byte[]` in `FileType.decompress`, because Java can only handle arrays
 with up to 2 billion elements:
 https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we
 should work with streams there, not `byte[]`.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25161#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list