[tor-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Feb 21 09:01:53 UTC 2018


#25317: Enable webstats to process large (> 2G) logfiles
-----------------------------------+----------------------
     Reporter:  iwakeh             |      Owner:  iwakeh
         Type:  defect             |     Status:  assigned
     Priority:  High               |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+----------------------
 Quote from #25161, comment 12:
    Looking at the stack trace and the input log files, I noticed that two
 log files are larger than 2G when decompressed:

 {{{
 3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531
 584K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160531.xz
 2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601
 404K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
 access.log-20160601.xz
 }}}

   I just ran another bulk import with just those two files as import and
 ran into the same exception.

   It seems like we shouldn't attempt to decompress these files into a
 `byte[]` in `FileType.decompress`, because Java can only handle arrays
 with up to 2 billion elements:
 https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we
 should work with streams there, not `byte[]`.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25317>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list