[tor-scaling] High level scaling goals (was Re: Summary of 5/31 meeting; next steps)

Sat Jun 8 11:51:19 UTC 2019

Mike Perry <mikeperry at torproject.org> writes:

> Notes are on the pad (feel free to update; beware any rando on the
> Internet also can do so): https://nc.riseup.net/s/AEnQ4CRH2kH3fLe
>
> In the meantime, with input from folks on this list and on the wiki
> page, I would like to add the EWMA re-tuning experiment, fill out the
> KIST tuning experiment, and flesh out the metrics section to highlight
> metrics that need new data collection. (I will start separate threads
> for this on-list as I run into questions -- I have several already).
>

Hello list,

during the recent call, I really related to Arthur's comment about the
need to clarify our planned metrics/experiments in a high-level way and
also fit them into a high-level strategy.

So I went to the wiki page [0] and tried to make some high-level
categories of our scaling goals and also fit our metrics into them.

====================================================================================

== High level scaling areas:

==== Latency
     (or, how fast data flows)

     Metrics: CDF-TTFB, CDF-TTLB, CDF-DL

     Notes: Is this the same as "throughput"? Mike mentioned them as
            separate areas in the meeting.

==== Consistency / Performance variance
     (or, how surprised you might be at Tor's overall speed)

     Metrics: CDF-TTFB, CDF-TTLB, CDF-DL

==== Network capacity
     (or, how many more clients can this network fit?) 

     Metrics: Per-Flag Spare Network Capacity, Per-Relay Spare Network Capacity

==== Reliability/Failures
     (or, how frequent connections fail and have to retry)

     Metrics: Failure rainbow, Circuit timeout

     Notes: This might not be user visible, but impacts "latency" and "consistency"

====================================================================================

Some thoughts:

a) We are trying to fit metrics into high-level areas, not experiments
   into areas, right? Mike seems to have done the opposite in the latest
   meeting, so I'm not quite sure how to do this.

b) I think one of the most important things to learn here is how these
   different areas interact with each other. In particular, I think
   "Network capacity" is a super important area since we are looking at
   a huge influx of users, but we can't really look into it isolated,
   since we are interested in seeing how "Latency" and "Consistency"
   changes as more clients come in the game.

   How do we model the way these areas interact with each other? And
   maybe the fact that they are not disjoint means that I have not
   modeled the areas correctly.

c) Have I missed any important areas? Are we missing metrics for any
   important areas? Is this helpful? At some point we should scribe
   these on the wiki, but I'd like some more thinking to happen first.

Cheers!~

[0]: https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceExperiments