[tor-commits] [stem/master] Tor descriptor lazy loading

atagar at torproject.org atagar at torproject.org
Sun Jan 25 22:37:35 UTC 2015


commit 3dac7c51300062d78298b370b1286965652600e4
Merge: 92dd464 6484250
Author: Damian Johnson <atagar at torproject.org>
Date:   Sun Jan 25 13:57:03 2015 -0800

    Tor descriptor lazy loading
    
    I've been wanting to do this for years.
    
    When reading a descriptor we parsed every field in it. This is necessary if
    we're validating it, but usually users don't care about validation and only
    want an attribute or two.
    
    When parsing without validation we now lazy load the document, meaning we
    parse fields on-demand rather than everything upfront. This naturally greatly
    improves our performance for reading descriptors...
    
      Server descriptors: 27% faster
      Extrainfo descriptors: 71% faster
      Microdescriptors: 43% faster
      Consensus: 37% faster
    
    It comes at a small cost to our performance for when we read with validation,
    but not big enough for it to be a concern. As an added benefit this actually
    makes our code a lot more maintainable too!
    
      https://trac.torproject.org/projects/tor/ticket/14011
    
    --------------------------------------------------------------------------------
    Benchmarking script
    --------------------------------------------------------------------------------
    
    import time
    
    from stem.descriptor import parse_file
    
    start_time, fingerprints = time.time(), []
    
    for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = True):
      fingerprints.append(desc.fingerprint)
    
    count, runtime = len(fingerprints), time.time() - start_time
    print 'read %i descriptors with validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)
    
    start_time, fingerprints = time.time(), []
    
    for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = False):
      fingerprints.append(desc.fingerprint)
    
    count, runtime = len(fingerprints), time.time() - start_time
    print 'read %i descriptors without validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)
    
    --------------------------------------------------------------------------------
    Results
    --------------------------------------------------------------------------------
    
    Please keep in mind these are just the results on my system. These are, of
    course, influenced by your system and background load...
    
    Server descriptors:
    
      before: read 6679 descriptors with validation, took 10.71 seconds (0.00160 seconds per descriptor)
      before: read 6679 descriptors without validation, took 4.46 seconds (0.00067 seconds per descriptor)
    
      after: read 6679 descriptors with validation, took 11.48 seconds (0.00172 seconds per descriptor)
      after: read 6679 descriptors without validation, took 3.25 seconds (0.00049 seconds per descriptor)
    
    Extrainfo descriptors:
    
      before: read 6677 descriptors with validation, took 7.91 seconds (0.00119 seconds per descriptor)
      before: read 6677 descriptors without validation, took 7.64 seconds (0.00114 seconds per descriptor)
    
      after: read 6677 descriptors with validation, took 8.91 seconds (0.00133 seconds per descriptor)
      after: read 6677 descriptors without validation, took 2.22 seconds (0.00033 seconds per descriptor)
    
    Microdescriptors:
    
      before: read 10526 descriptors with validation, took 2.41 seconds (0.00023 seconds per descriptor)
      before: read 10526 descriptors without validation, took 2.34 seconds (0.00022 seconds per descriptor)
    
      after: read 10526 descriptors with validation, took 2.74 seconds (0.00026 seconds per descriptor)
      after: read 10526 descriptors without validation, took 1.34 seconds (0.00013 seconds per descriptor)
    
    Consensus:
    
      before: read 6688 descriptors with validation, took 2.11 seconds (0.00032 seconds per descriptor)
      before: read 6688 descriptors without validation, took 2.04 seconds (0.00030 seconds per descriptor)
    
      after: read 6688 descriptors with validation, took 2.47 seconds (0.00037 seconds per descriptor)
      after: read 6688 descriptors without validation, took 1.28 seconds (0.00019 seconds per descriptor)

 stem/descriptor/__init__.py                        |  172 ++-
 stem/descriptor/extrainfo_descriptor.py            |  974 +++++++--------
 stem/descriptor/microdescriptor.py                 |  122 +-
 stem/descriptor/networkstatus.py                   | 1279 +++++++++-----------
 stem/descriptor/router_status_entry.py             |  737 +++++------
 stem/descriptor/server_descriptor.py               |  683 +++++------
 test/unit/descriptor/extrainfo_descriptor.py       |   28 +-
 .../networkstatus/directory_authority.py           |    9 +-
 test/unit/descriptor/networkstatus/document_v3.py  |   36 +-
 .../descriptor/networkstatus/key_certificate.py    |   24 +-
 test/unit/descriptor/router_status_entry.py        |   13 +-
 test/unit/descriptor/server_descriptor.py          |   10 +-
 12 files changed, 1915 insertions(+), 2172 deletions(-)





More information about the tor-commits mailing list