[tor-bugs] #13720 [Ooni]: Investigate possible performance improvements to the ooni-pipeline
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Jun 26 23:57:34 UTC 2015
#13720: Investigate possible performance improvements to the ooni-pipeline
-----------------------------+---------------------
Reporter: hellais | Owner: hellais
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Ooni | Version:
Resolution: | Keywords:
Actual Points: | Parent ID:
Points: |
-----------------------------+---------------------
Comment (by dcf):
For what it's worth, I was also struggling with the slowness of the Python
yaml module (in the context of [https://lists.torproject.org/pipermail
/ooni-dev/2015-June/000288.html this project]) to find server-side
blocking of Tor in OONI reports). For me, yaml.CSafeLoader is ''way''
faster, like over 30×.
These are the times to parse 1.5 GB of gzip files, consisting of
http_requests reports between 2015-06-16 and 2015-06-24:
{{{
yaml.safe_load_all(f)
real 138m29.467s
user 138m27.808s
sys 0m6.356s
yaml.load_all(f, Loader=yaml.CSafeLoader)
real 4m40.021s
user 5m21.960s
sys 0m7.428s
}}}
I had tried optimizing the HTML parsing and gzip decompression; the YAML
decoding was the bottleneck by far.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13720#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list