[metrics-team] Onionoo: Availability and Performance

Sat Dec 26 16:22:00 UTC 2015

> Look at this graph:
> 
> https://people.torproject.org/~karsten/volatile/onionoo-requests-2015-12-17.png
> 
> The issue was that we forgot to enable Apache disk caching after
> migrating to another host.  The result was that Onionoo handled each
> request, rather than having Apache respond to those where it already
> knows the result.  Onionoo responses can be cached quite well, so not
> doing so is silly.  

That's really impressive! Reading the 'mod_cache' docs, this seems to
respect Cache-Control headers, making it an ideal mitigatory measure.

I'm curious to know whether descriptions of the production setup
are aggregated anywhere. I imagine they would be (or perhaps are)
of assistance to parties trying to provide mirrors, or to build
representative testing environments.

> I'm not entirely clear which fix we should apply.  Would you want to
> take a look at the ticket and post if you like the suggested fix, and
> if not, suggest the fix above and say why it's better?

Responded on Trac:

  https://trac.torproject.org/projects/tor/ticket/16907#comment:6

> Oh wow, this visualization is tough to follow.  Can you explain a bit
> more how to read it?

Sure! There's a good description here:

  http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
  sha256: ea042d6dffc5f5d6b415c5962feec848e987ea95dd79033e03ac113f532afab9
  http://www.webcitation.org/6dsiv3Amx

..but to expand:

The visualisation describes a population of sampled stack traces. In
order to gather such a population, you have an agent observe a system
for some time, and at some frequency record both (1) the function then
active on-CPU, together with (2) the sequence of nested function calls
which lead it to arrive there. Aggregating these stacks can usefully
direct attention within a given program's source tree:

  1. When first diving into a developed code base, it can be difficult
     to determine which parts to read first. Code frequently active
     during runtime is typically only a small fraction of a program's
     full sources, and that fraction a good indication of where the
     significant logic resides. Aggregating frequently active stacks
     will often identify a comprehensible subset of functions integral
     to operation.

     Further, studying the structure of stacks within this reduced set
     can reveal control flow (without having to read any code!) Assuming
     enough information to resolve function names, call hierarchies
     (either linear as recorded, or inclusive of branches when merging
     over common roots) form a high-level description of operation.

     (I don't advocate flame graphs for this kind of analysis; the
     frequency indication in frame widths is biased by time, which can
     over-emphasise expensive but infrequently called functions. But I
     thought I'd mention it as an application of the data anyway.)

  2. Similarly, when trying to make a program faster, it can be difficult
     to know where to focus one's efforts. Functions frequently on-CPU
     -- either because they operate slowly, or are called frequently --
     make good candidates for more focused investigation.

     Sometimes what's on CPU is not something which can be directly
     optimised; e.g. seeing that your program is waiting for a lock
     isn't so useful as knowing what's waiting for the lock. Which
     is where having the full stack comes in useful: we can trace
     down ancestry to its root cause.

The remaining exercise is in making those sampled stack traces legible,
such that those conclusions can be drawn using naive visual inspection.

Flame graphs denote stack frames as rectangles arranged in columns, with
height describing ancestry; frames at the top of the graph are functions
sampled directly on-CPU. Each frame is initially assigned a fixed width.
Frames are ordered lexically along the x-axis, and horizontally adjacent
matching frames are merged together, such that two frame's lengths can
be compared visually to identify relative frequency. One lower frame
dividing into several upper frames indicates a branch in the code path.
Colours are typically arbitrary, but in our case frames are classified
by hue; Java functions are in green, C++ is yellow, system calls are in
red.

Looking again at 'Onionoo in Flames', we can see that the majority of
the width is attributed to calls under 'buildResponse' in the context of
GET requests. We see that its time is divided between (1) writing out
to the network (infrequent or expensive) and (2) generating different
document types (the majority of the time on CPU). Most document types
are produced in relatively little time, reading a file from disk and
applying some kind of formatting to its string representation. Details
documents are rather more expensive; read documents are at different
times both parsed as and rendered to JSON. The reason for which you
already identified below.

Does that help?

(Given the partial symbol table, the graph I posted maybe isn't the
most welcoming. Sorry.)

> The only situation where this fails is when the request contains the
> fields parameter.  Only in that case we'll have to deserialize, pick
> the fields we want, and serialize again.  I could imagine that this
> shows up in profiles pretty badly, and I'd love to fix this, I just
> don't know how. 

Oh! I should've noticed that Gson::toJSON is exclusively parented by
ResponseBuilder::writeDetailsLines. I have an idea about how to fix
this, see Trac:

  https://trac.torproject.org/projects/tor/ticket/17939

> I think I'll move the toLowerCase() to this.parseFieldsParameter(), as
> in:
> 
>     return parameter.toLowerCase().split(",");
> 
> The main reason is that I'm trying to avoid touching parameters before
> they went through the regex.

Good catch!

> By the way, I'm not sure if this is helpful, but here are the latest
> statistics of the Onionoo frontend:
> [..]

Sweet. I'll eventually update the bias terms in my URL generator to
better reflect this distribution, but I'm still thinking about how to
express joint probabilities.

> This is very helpful!  I like the different angles you're taking to
> approach the load problems.  Please keep digging!

Thanks for taking the time to respond! I'll keep investigating, it's a
decent excuse to dive into performance analysis.

Merry congress,