[stem/master] Tor descriptor lazy loading

commit 3dac7c51300062d78298b370b1286965652600e4 Merge: 92dd464 6484250 Author: Damian Johnson <atagar@torproject.org> Date: Sun Jan 25 13:57:03 2015 -0800 Tor descriptor lazy loading I've been wanting to do this for years. When reading a descriptor we parsed every field in it. This is necessary if we're validating it, but usually users don't care about validation and only want an attribute or two. When parsing without validation we now lazy load the document, meaning we parse fields on-demand rather than everything upfront. This naturally greatly improves our performance for reading descriptors... Server descriptors: 27% faster Extrainfo descriptors: 71% faster Microdescriptors: 43% faster Consensus: 37% faster It comes at a small cost to our performance for when we read with validation, but not big enough for it to be a concern. As an added benefit this actually makes our code a lot more maintainable too! https://trac.torproject.org/projects/tor/ticket/14011 -------------------------------------------------------------------------------- Benchmarking script -------------------------------------------------------------------------------- import time from stem.descriptor import parse_file start_time, fingerprints = time.time(), [] for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = True): fingerprints.append(desc.fingerprint) count, runtime = len(fingerprints), time.time() - start_time print 'read %i descriptors with validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count) start_time, fingerprints = time.time(), [] for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = False): fingerprints.append(desc.fingerprint) count, runtime = len(fingerprints), time.time() - start_time print 'read %i descriptors without validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count) -------------------------------------------------------------------------------- Results -------------------------------------------------------------------------------- Please keep in mind these are just the results on my system. These are, of course, influenced by your system and background load... Server descriptors: before: read 6679 descriptors with validation, took 10.71 seconds (0.00160 seconds per descriptor) before: read 6679 descriptors without validation, took 4.46 seconds (0.00067 seconds per descriptor) after: read 6679 descriptors with validation, took 11.48 seconds (0.00172 seconds per descriptor) after: read 6679 descriptors without validation, took 3.25 seconds (0.00049 seconds per descriptor) Extrainfo descriptors: before: read 6677 descriptors with validation, took 7.91 seconds (0.00119 seconds per descriptor) before: read 6677 descriptors without validation, took 7.64 seconds (0.00114 seconds per descriptor) after: read 6677 descriptors with validation, took 8.91 seconds (0.00133 seconds per descriptor) after: read 6677 descriptors without validation, took 2.22 seconds (0.00033 seconds per descriptor) Microdescriptors: before: read 10526 descriptors with validation, took 2.41 seconds (0.00023 seconds per descriptor) before: read 10526 descriptors without validation, took 2.34 seconds (0.00022 seconds per descriptor) after: read 10526 descriptors with validation, took 2.74 seconds (0.00026 seconds per descriptor) after: read 10526 descriptors without validation, took 1.34 seconds (0.00013 seconds per descriptor) Consensus: before: read 6688 descriptors with validation, took 2.11 seconds (0.00032 seconds per descriptor) before: read 6688 descriptors without validation, took 2.04 seconds (0.00030 seconds per descriptor) after: read 6688 descriptors with validation, took 2.47 seconds (0.00037 seconds per descriptor) after: read 6688 descriptors without validation, took 1.28 seconds (0.00019 seconds per descriptor) stem/descriptor/__init__.py | 172 ++- stem/descriptor/extrainfo_descriptor.py | 974 +++++++-------- stem/descriptor/microdescriptor.py | 122 +- stem/descriptor/networkstatus.py | 1279 +++++++++----------- stem/descriptor/router_status_entry.py | 737 +++++------ stem/descriptor/server_descriptor.py | 683 +++++------ test/unit/descriptor/extrainfo_descriptor.py | 28 +- .../networkstatus/directory_authority.py | 9 +- test/unit/descriptor/networkstatus/document_v3.py | 36 +- .../descriptor/networkstatus/key_certificate.py | 24 +- test/unit/descriptor/router_status_entry.py | 13 +- test/unit/descriptor/server_descriptor.py | 10 +- 12 files changed, 1915 insertions(+), 2172 deletions(-)
participants (1)
-
atagar@torproject.org