[tor-bugs] #3036 [Torperf]: Tweak Torperf's .mergedata format and make it the new default

Thu Feb 9 13:18:49 UTC 2012

#3036: Tweak Torperf's .mergedata format and make it the new default
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  karsten
     Type:  enhancement  |         Status:  new    
 Priority:  normal       |      Milestone:         
Component:  Torperf      |        Version:         
 Keywords:               |         Parent:         
   Points:               |   Actualpoints:         
-------------------------+--------------------------------------------------
Changes (by karsten):

 * cc: mikeperry, Sebastian, rransom, arma (added)

Comment:

 I'm picking up this ticket again, because I learned a few days ago that we
 were not archiving Torperf data correctly.  Looks like we lost 2--4 months
 of siv's data.  Oops.

 While looking into the archiving problem I decided to work on the new
 Torperf data format which will be a lot easier to archive than the current
 format.  As a positive side effect, the new format will be much easier to
 understand for non-core Torperf developers.  I'm planning to archive only
 the new format and not archive the current formats in the future.  So, the
 new format should contain all relevant information.

 I realize that the Torperf rewrite won't happen anytime soon, so I'm going
 to implement the new Torperf format in metrics-db.  Torperf will still
 generate the old formats, but metrics-db will convert the output to the
 new format.  Whenever the Torperf rewrite happens it can output the new
 format itself.

 The suggested new format is pretty much as described in this ticket.  The
 basic idea is that there is a single line per Torperf run which is
 sufficient to learn about 1) the Tor and Torperf configuration, 2)
 measurement results, and 3) additional information that might help explain
 the results.

  1. Configuration

  - SOURCE: Configured name of the data source; required.
  - FILESIZE: Configured file size in bytes; required.
  - Other meta data describing the Tor or Torperf configuration, e.g.,
 GUARD for a custom guard choice; optional.

  2. Measurement results

  - START: Time when the connection process starts; required.
  - SOCKET: Time when the socket was created; required.
  - CONNECT: Time when the socket was connected; required.
  - NEGOTIATE: Time when SOCKS 5 authentication methods have been
 negotiated; required.
  - REQUEST: Time when the SOCKS request was sent; required.
  - RESPONSE: Time when the SOCKS response was received; required.
  - DATAREQUEST: Time when the HTTP request was written; required.
  - DATARESPONSE: Time when the first response was received; required.
  - DATACOMPLETE: Time when the payload was complete; required.
  - WRITEBYTES: Total number of bytes written; required.
  - READBYTES: Total number of bytes read; required.
  - DIDTIMEOUT: 1 if the request timed out, 0 otherwise; optional.
  - Other measurement results, e.g., START_RENDCIRC, GOT_INTROCIRC, etc.
 for hidden-service measurements.

  3. Additional information

  - LAUNCH: Time when the circuit was launched; optional.
  - USED_AT: Time when this circuit was used; optional.
  - PATH: List of relays in the circuit, separated by commas; optional.
  - BUILDTIMES: List of times when circuit hops were built, separated by
 commas; optional.
  - TIMEOUT: Circuit build timeout that the Tor client used when building
 this circuit; optional.
  - QUANTILE: Circuit build time quantile that the Tor client uses to
 determine its circuit-build timeout; optional.
  - CIRC_ID: Circuit identifier of the circuit used for this measurement;
 optional.
  - USED_BY: Stream identifier of the stream used for this measurement;
 optional.
  - Other fields containing additional information; optional.

 Note that two pieces of information from the current .extradata files are
 not included in the new Torperf data format:

  - Build timeout details: The current .extradata files contain the full
 BUILDTIMEOUT_SET events that were sent by Tor via its control port.  They
 are not part of the new format, because they mostly explain why Tor picked
 a given circuit build timeout, where the timeout itself is already part of
 the new format.  In theory, it would be possible to include some details
 of the last BUILDTIMEOUT_SET event that was received before a Torperf run
 was finished and written to the .extradata file.

  - Unused circuits: The .extradata files also contain information about
 circuits that were not used by Torperf.  There's hardly any relation to
 the Torperf measurements, so they're left out.  In theory, one could
 include aggregate information about the number of failed circuits before a
 Torperf run was finished and written to the .extradata file.

 I understand that people may find the information that was left out here
 important.  I could also imagine that people find other information
 important.  We can't put all data that was generated while performing
 Torperf measurements in this format.  We'd end up adding Tor's debug logs
 to the format.  We should identify relevant information that is sufficient
 for most analyses.  For example, I can be convinced to add single fields
 or aggregated data from the build timeout events or unused circuits.  But
 if someone wants to analyze a specific aspect of Tor's performance,
 they'll need to keep Tor's logs or controller events in addition to the
 new Torperf data format.

 Please find siv's 5 MiB Torperf data in the new format attached to this
 ticket as an example.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3036#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online