[metrics-bugs] #29787 [Metrics/Onionperf]: Enumerate possible failure cases and include failure information in .tpf output

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Mar 14 19:38:15 UTC 2019


#29787: Enumerate possible failure cases and include failure information in .tpf
output
-----------------------------------+--------------------------
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  enhancement        |     Status:  new
     Priority:  Medium             |  Milestone:
    Component:  Metrics/Onionperf  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+--------------------------
 Our current model for distinguishing failures, timeouts, and successes is
 rather simple/arbitrary/confusing:

  - Timeout: We count any measurement with `DIDTIMEOUT=1` and/or with
 `DATACOMPLETE<1` as timeout.
  - Failure: We count any measurement that doesn't have `DIDTIMEOUT=1` and
 that has `DATACOMPLETE>=1` and `READBYTES<FILESIZE` as failure.
  - Success: We count everything else as success.

 We're plotting timeouts and failures [https://metrics.torproject.org
 /torperf-failures.html here]. This is not as useful as it could be.

 It would be so much better to enumerate all possible failure cases and
 include failure information in .tpf output files. Examples:

  - Turns out that long-running tor instances sometimes fail to keep up-to-
 date directory information (#29743), and as a result OnionPerf cannot make
 measurements.
  - Sometimes streams are closed with reason TIMEOUT or DESTROY (pages 1
 and 2 of the first attachment on #29744), and I bet there are several
 subcases in each of these.
  - Regarding timeouts, it does happen that streams are closed by OnionPerf
 (pages 3 and 4 of that same attachment on #29744).
  - There are likely more failure cases that might be less frequent that I
 either did not include them in the #29744 graphs or did not even run into
 them at all in the logs I looked at.

 Can we enumerate all or at least the most typical failure cases and define
 criteria for clearly distinguishing them from each other and from timeouts
 and from successes?

 Can we also try to unambiguously identify these failure cases in existing
 tor/torctl/tgen logs that we process for .tpf files, so that we could
 include failure case IDs for them in the .tpf files?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29787>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list