[tor-bugs] #14011 [Stem]: Implement lazy parsing for zoossh (and maybe Stem)

Tor Bug Tracker & Wiki blackhole at torproject.org
Sun Dec 21 21:32:21 UTC 2014


#14011: Implement lazy parsing for zoossh (and maybe Stem)
-----------------------------+---------------------
 Reporter:  phw              |          Owner:  phw
     Type:  enhancement      |         Status:  new
 Priority:  normal           |      Milestone:
Component:  Stem             |        Version:
 Keywords:  zoossh, parsing  |  Actual Points:
Parent ID:                   |         Points:
-----------------------------+---------------------
 Damian and I had a small discussion regarding lazy parsing (see below) and
 how it could speed up dealing with descriptor data. This might not be an
 awful lot of work for zoossh, so it might be worth implementing it.

 {{{
 18:28 <atagar>  phw: Side note concerning zoossh, another option could be
 lazy parsing for descriptors. If I was to do stem's parsers again that's
 what I'd opt for to make them more performant. That would be a fair bit of
 work, but would both benefit all stem users and have performance just as
 fast as any Go solution (time would all be IO).
 18:29 <atagar>  That said though, Zoossh seems like a great way of
 learning the language so if that's the goal have fun. :)
 18:36 <phw>     atagar: that's actually a good idea, thanks
 18:39 <atagar>  phw: Oh! If you're interested then please open a ticket
 under the Stem component. This is something I've idly given some thought
 to for over a year but never bothered to actually jot down the idea. ;P
 18:39 <atagar>  Didn't expect you to actually think about opting for this
 route.
 18:41 <atagar>  Thought was that reading a descriptor dumps to a simple
 object that's a {keyword: [lines...]} dictionary. The getter methods then
 parse the actual content and cache the results. Upside: far, far faster
 since you only parse the fields you care about, downside: no upfront
 validation is done so malformed content would be acceptable.
 18:42 <atagar>  That said, validation is a far, far smaller concern for
 our users than performance in practice so this is a tradeoff I'd be fine
 with.
 18:42 <atagar>  We could then have a validate() method that simply calls
 all the getters to achieve the same thing we do now.
 18:45 <atagar>  Previously I thought that doing this would break backward
 compatibility which made me a little less keen on it (since we'd then need
 'descriptor v2' objects) but on refelction it doesn't. We could slip this
 in transparently. The only difference users would see would be a
 tremendous speedup if the opt to not have validation.
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/14011>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list