[metrics-team] Onionoo: Availability and Performance

Sun Dec 27 15:43:11 UTC 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi vi,

On 26/12/15 17:22, vi wrote:
>> Look at this graph:
>> 
>> https://people.torproject.org/~karsten/volatile/onionoo-requests-2015-12-17.png
>>
>>
>> 
The issue was that we forgot to enable Apache disk caching after
>> migrating to another host.  The result was that Onionoo handled
>> each request, rather than having Apache respond to those where it
>> already knows the result.  Onionoo responses can be cached quite
>> well, so not doing so is silly.
> 
> That's really impressive! Reading the 'mod_cache' docs, this seems
> to respect Cache-Control headers, making it an ideal mitigatory
> measure.
> 
> I'm curious to know whether descriptions of the production setup 
> are aggregated anywhere. I imagine they would be (or perhaps are) 
> of assistance to parties trying to provide mirrors, or to build 
> representative testing environments.

Fine question!  There should be such documentation, but I'm afraid
there is not much yet.  But at least we have a document saying what
documentation should exist:

https://gitweb.torproject.org/onionoo.git/tree/CONTRIB.md#n156

>> I'm not entirely clear which fix we should apply.  Would you want
>> to take a look at the ticket and post if you like the suggested
>> fix, and if not, suggest the fix above and say why it's better?
> 
> Responded on Trac:
> 
> https://trac.torproject.org/projects/tor/ticket/16907#comment:6

Thanks!

>> Oh wow, this visualization is tough to follow.  Can you explain a
>> bit more how to read it?
> 
> Sure! There's a good description here:
> 
> http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html sha256:
> ea042d6dffc5f5d6b415c5962feec848e987ea95dd79033e03ac113f532afab9 
> http://www.webcitation.org/6dsiv3Amx
> 
> ..but to expand:
> 
> The visualisation describes a population of sampled stack traces.
> In order to gather such a population, you have an agent observe a
> system for some time, and at some frequency record both (1) the
> function then active on-CPU, together with (2) the sequence of
> nested function calls which lead it to arrive there. Aggregating
> these stacks can usefully direct attention within a given program's
> source tree:
> 
> 1. When first diving into a developed code base, it can be
> difficult to determine which parts to read first. Code frequently
> active during runtime is typically only a small fraction of a
> program's full sources, and that fraction a good indication of
> where the significant logic resides. Aggregating frequently active
> stacks will often identify a comprehensible subset of functions
> integral to operation.
> 
> Further, studying the structure of stacks within this reduced set 
> can reveal control flow (without having to read any code!)
> Assuming enough information to resolve function names, call
> hierarchies (either linear as recorded, or inclusive of branches
> when merging over common roots) form a high-level description of
> operation.
> 
> (I don't advocate flame graphs for this kind of analysis; the 
> frequency indication in frame widths is biased by time, which can 
> over-emphasise expensive but infrequently called functions. But I 
> thought I'd mention it as an application of the data anyway.)
> 
> 2. Similarly, when trying to make a program faster, it can be
> difficult to know where to focus one's efforts. Functions
> frequently on-CPU -- either because they operate slowly, or are
> called frequently -- make good candidates for more focused
> investigation.
> 
> Sometimes what's on CPU is not something which can be directly 
> optimised; e.g. seeing that your program is waiting for a lock 
> isn't so useful as knowing what's waiting for the lock. Which is
> where having the full stack comes in useful: we can trace down
> ancestry to its root cause.
> 
> The remaining exercise is in making those sampled stack traces
> legible, such that those conclusions can be drawn using naive
> visual inspection.
> 
> Flame graphs denote stack frames as rectangles arranged in columns,
> with height describing ancestry; frames at the top of the graph are
> functions sampled directly on-CPU. Each frame is initially assigned
> a fixed width. Frames are ordered lexically along the x-axis, and
> horizontally adjacent matching frames are merged together, such
> that two frame's lengths can be compared visually to identify
> relative frequency. One lower frame dividing into several upper
> frames indicates a branch in the code path. Colours are typically
> arbitrary, but in our case frames are classified by hue; Java
> functions are in green, C++ is yellow, system calls are in red.
> 
> Looking again at 'Onionoo in Flames', we can see that the majority
> of the width is attributed to calls under 'buildResponse' in the
> context of GET requests. We see that its time is divided between
> (1) writing out to the network (infrequent or expensive) and (2)
> generating different document types (the majority of the time on
> CPU). Most document types are produced in relatively little time,
> reading a file from disk and applying some kind of formatting to
> its string representation. Details documents are rather more
> expensive; read documents are at different times both parsed as and
> rendered to JSON. The reason for which you already identified
> below.
> 
> Does that help?

Thanks, very helpful!  I think I got it now.

I wonder if it would be useful to remove red and yellow parts entirely
 and then re-group similar stack traces.  And maybe it's even possible
to remove external library methods *calling* us (at the bottom),
though we'd probably want to keep external library methods *called* by
us.  That would reduce the number of flames and make it easier to spot
single methods requiring large fractions of CPU time.  What do you think?

> (Given the partial symbol table, the graph I posted maybe isn't
> the most welcoming. Sorry.)
> 
>> The only situation where this fails is when the request contains
>> the fields parameter.  Only in that case we'll have to
>> deserialize, pick the fields we want, and serialize again.  I
>> could imagine that this shows up in profiles pretty badly, and
>> I'd love to fix this, I just don't know how.
> 
> Oh! I should've noticed that Gson::toJSON is exclusively parented
> by ResponseBuilder::writeDetailsLines. I have an idea about how to
> fix this, see Trac:
> 
> https://trac.torproject.org/projects/tor/ticket/17939

Can you say easily what fraction of CPU time is spent on Gson::toJSON?
 (Feel free to respond on the ticket.)

>> I think I'll move the toLowerCase() to
>> this.parseFieldsParameter(), as in:
>> 
>> return parameter.toLowerCase().split(",");
>> 
>> The main reason is that I'm trying to avoid touching parameters
>> before they went through the regex.
> 
> Good catch!
> 
>> By the way, I'm not sure if this is helpful, but here are the
>> latest statistics of the Onionoo frontend: [..]
> 
> Sweet. I'll eventually update the bias terms in my URL generator
> to better reflect this distribution, but I'm still thinking about
> how to express joint probabilities.

Happy to take patches to improve statistics.  The ones I added were
relatively cheap to keep and have the potential to give us very first
insights.

>> This is very helpful!  I like the different angles you're taking
>> to approach the load problems.  Please keep digging!
> 
> Thanks for taking the time to respond! I'll keep investigating,
> it's a decent excuse to dive into performance analysis.

Thanks again, this is very helpful!

> Merry congress,

Merry congress to you, too.  If you're around on the 29th, don't miss
the State of the Onion talk, and maybe try to find me or other Tor
people after the talk.

All the best,
Karsten

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWgAcPAAoJEJD5dJfVqbCrIcUIAIYEXzpVKVXVCsiHzrcnMlsO
ZnAdFGc+uCRBOtUDEwxNbGsm33AmwJnZqFRxbjFmaND8Hg0uexCbCV7q8DD2PkyN
HeyFOTj5nvrd4UBH6EpM1kdZR17wZjIdHMooJ7GIU2cQWpDSjyDFKpK/3VzysnOE
2M2CUL1bgLrn9XZyk3mrOfDXyonJHLnshdXn1d4sX1S3Ljbc0vTt1YPICl6I3nTC
LSbiGOiCjQEHxOj6vxIdzVooHGMDY1kmJ72HCn+Rr2cSUzJsLFtepwyReA5bf/JA
I43EddcSQt4tUuQ6F6cPU90TZl65BT2UZ3CGNj60heP+MkEVYRq+eVY0l80D7D4=
=xa3b
-----END PGP SIGNATURE-----