[tor-bugs] #17939 [Onionoo]: Optimise the construction of details documents with field constraints

Mon Dec 28 00:01:11 UTC 2015

#17939: Optimise the construction of details documents with field constraints
-------------------------+---------------------
 Reporter:  fmap         |          Owner:
     Type:  enhancement  |         Status:  new
 Priority:  Low          |      Milestone:
Component:  Onionoo      |        Version:
 Severity:  Minor        |     Resolution:
 Keywords:               |  Actual Points:
Parent ID:               |         Points:
  Sponsor:               |
-------------------------+---------------------

Comment (by fmap):

 > I'm in favor of taking Gson out of the loop for two reasons: it's a
 potential performance bottleneck (though I never measured that), and it's
 a maintenance nightmare because it's just to easy to miss a new details
 document field in that hacked part of the code.

 Regarding a performance bottleneck: an eyeball of the
 [http://hack.rs/~vi/onionoo/flame-56e0e01.svg flame graph we've been
 discussing on the list] suggests 'formatNodeStatus' spends on average ten
 times more time producing 'details' than any other document type. It looks
 like about five percent of total CPU time over the sample, but there's a
 few too many divorced frames to be sure (and I've lost the raw data
 somewhere.) I'll make another recording later and report back with more
 precise details.

 > Regarding the approach, I'd favor one that doesn't require keeping
 anything new in memory but instead process details document contents on
 the fly.  We'll have to read a details document from disk if we want to
 include part of it in a response anyway, and once it's in memory it's
 cheap to create such an index where fields start and end and only pick the
 ones we want.

 That sounds reasonable.

 > It could just be that Gson adds some overhead that we could avoid here.
 And of course the current approach has the downside of being hard to
 maintain, which we could fix.  Maybe we can try out different approaches
 and compare them with respect to performance and robustness?

 Do you mean avoiding Gson in producing a boundary index? I think there's
 more to it than the performance overhead of a redundant parse. In
 populating its result, the parser I referenced is sensitive to structure
 that JSON parsers typically aren't: the length of what the JSON spec calls
 'structural characters' (`/[:,\[\]{}]/`), as well as that of the (variable
 length) whitespace allowed to surround them. I don't see anything in the
 GSON user guide that would admit intelligent interpretation of those
 tokens, and they're critical (in the general case at least) to the precise
 determination of boundaries. That said.. given that written documents
 don't presently include whitespace around structural tokens, it should be
 possible (assuming Gson can retain the initial field ordering) to derive
 the right coordinates from a serialisation into a JSON ADT. But that
 approach strikes me as frail and indirect.

 Though I worry I might've misread your message. Do have other approaches
 in mind to produce a boundary index? Or perhaps you meant only to
 benchmark the proposed implementation against the existing one?

 > Bonus points: we could use this new approach to allow the `fields`
 parameter for other documents than details documents.

 Sounds good.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17939#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online