[tor-bugs] #13720 [Ooni]: Investigate possible performance improvements to the ooni-pipeline

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Jun 26 23:57:34 UTC 2015


#13720: Investigate possible performance improvements to the ooni-pipeline
-----------------------------+---------------------
     Reporter:  hellais      |      Owner:  hellais
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:
    Component:  Ooni         |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+---------------------

Comment (by dcf):

 For what it's worth, I was also struggling with the slowness of the Python
 yaml module (in the context of [https://lists.torproject.org/pipermail
 /ooni-dev/2015-June/000288.html this project]) to find server-side
 blocking of Tor in OONI reports). For me, yaml.CSafeLoader is ''way''
 faster, like over 30×.

 These are the times to parse 1.5 GB of gzip files, consisting of
 http_requests reports between 2015-06-16 and 2015-06-24:
 {{{
 yaml.safe_load_all(f)
 real    138m29.467s
 user    138m27.808s
 sys     0m6.356s

 yaml.load_all(f, Loader=yaml.CSafeLoader)
 real    4m40.021s
 user    5m21.960s
 sys     0m7.428s
 }}}
 I had tried optimizing the HTML parsing and gzip decompression; the YAML
 decoding was the bottleneck by far.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13720#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list