[tor-bugs] #30693 [Circumvention/Snowflake]: Delete old unsanitized logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Jun 1 00:10:57 UTC 2019


#30693: Delete old unsanitized logs
-------------------------------------+------------------------------
 Reporter:  dcf                      |          Owner:  (none)
     Type:  task                     |         Status:  needs_review
 Priority:  Medium                   |      Milestone:
Component:  Circumvention/Snowflake  |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:                           |  Actual Points:
Parent ID:                           |         Points:
 Reviewer:                           |        Sponsor:
-------------------------------------+------------------------------
Changes (by dcf):

 * status:  new => needs_review


Comment:

 I have prepared a candidate sanitized CSV file extracted from the
 sanitized logs. I've placed it in /var/log/snowflake-broker/broker.csv.xz
 for evaluation. It's 6 MB compressed but 1.8 GB uncompressed.

 The sanitized CSV looks like this:
 {{{
 timestamp,event,proxyid,clientid,additional
 2017-07-21 23:40:00,proxy-gets-none,157,,no-clients
 2017-07-21 23:40:00,proxy-polls,157,,
 2017-07-21 23:40:00,proxy-gets-none,149,,no-clients
 2017-07-21 23:40:00,proxy-polls,149,,
 2017-07-21 23:40:00,client-offers,,,
 2017-07-21 23:40:00,proxy-gets-offer,,,
 2017-07-21 23:40:00,proxy-answers,,,
 2017-07-21 23:40:00,client-gets-answer,,,
 2017-07-21 23:40:00,proxy-polls,160,,
 2017-07-21 23:40:00,proxy-gets-none,159,,no-clients
 2017-07-21 23:40:00,proxy-polls,159,,
 2017-07-21 23:40:00,proxy-gets-none,157,,no-clients
 2017-07-21 23:40:00,proxy-polls,157,,
 }}}

 Timestamps are truncated to multiples of 10 minutes. Client and proxy IDs
 are replaced by sequential integers.

 The `event` column can take on these values:
  * `start` the broker was restarted.
  * `client-offers` a client connects, sends an offer, and awaits an
 answer.
  * `client-gets-answer` a client receives a proxy's answer (successful
 broker match).
  * `client-gets-none` a client disconnects without receiving an answer,
 whether because of a timeout or because there were no proxies.
  * `proxy-polls` a proxy connects in order to receive an offer.
  * `proxy-gets-none` a proxy disconnects without receiving a client offer
 (no clients).
  * `proxy-gets-offer` a proxy receives a client offer.
  * `proxy-answers` a proxy sends an answer to the broker.
  * `error` an error; the most common is "http: TLS handshake error". Other
 possibilities are "http2: server: error", "http2: received GOAWAY", or a
 bad HTTP request. The `additional` column distinguishes these cases.

 Some of these have relations to each other. For example `proxy-polls` ≈
 `proxy-gets-none` + `proxy-gets-answer`.

 Using the sanitized CSV, I made a couple of graphs. The first shows shows
 the number of broker outcomes per day, where an outcome is one of the four
 possibilities:
  * A client and proxy are successfully linked up.
  * A proxy connects but doesn't get a client.
  * A client connects but doesn't get a proxy.
  * Some other error occurred.

 Click to embiggen.
 [[Image(broker-interactions.png,100%)]]

 The second graph shows the estimated number of proxies. This is just 10 ×
 `proxy-polls` / s. It's based on the assumption that each proxy polls
 every 10 s. The assumption doesn't hold when there are actually clients,
 but as you can see the estimate is pretty close to 3, which is the number
 of fallback proxy-go instances.

 Click to embiggen.
 [[Image(broker-estimated-proxies.png,100%)]]

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30693#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list