[metrics-bugs] #25196 [Metrics/Statistics]: Cut off recent dates from several CSV files (was: Cut off recent dates from hidserv.csv)

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Mar 7 08:38:16 UTC 2018


#25196: Cut off recent dates from several CSV files
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  karsten
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:  iwakeh              |        Sponsor:
--------------------------------+------------------------------
Changes (by karsten):

 * status:  needs_revision => needs_review


Comment:

 I set up a local metrics-web instance and modified it to run once per hour
 and not cut off any dates at all. I'm
 [https://trac.torproject.org/projects/tor/attachment/ticket/25196/cut-off-
 recent-dates-2018-03-06.pdf attaching a PDF file] showing how statistics
 for given dates (colors) change (y axis) over the UTC day of March 6 (x
 axis). If a colored line changes much over the day, we cannot reasonable
 include it yet and need to cut off that date. There's a trade-off of
 holding back a statistic that is still changing too much vs. delaying a
 statistic more than necessary and not being able to act on the data.

 Here's what I think we should do for all current statistics files:
  - `servers.csv`: We currently cut off 2 days (today = 2018-03-06 and the
 day before = 2018-03-05), but it would be sufficient to cut off just 1 day
 (today). The reason is that this file is based on consensuses and
 referenced server descriptors, all of which are typically available at the
 end of a day.
  - `ipv6servers.csv`: Same as `servers.csv`, except that we don't cut off
 anything yet, though I think we should, following the same rationale as
 above.
  - `advbwdist.csv`: Same as `servers.csv`, except that we already cut off
 just 1 day, so there's no need to change anything here.
  - `bandwidth.csv`: This file is based on statistics reported in extra-
 info descriptors, and those might take more time to come in. We're also
 not doing any estimates on the numbers we go so far, but we're simply
 adding up what we have. So, if 5% of statistics are still missing, those
 missing statistics will still change the end result by 5%. I suggest to
 wait 3 days. We currently cut off 4, but I think 3 should be sufficient.
 The better (long-term) solution would be to compensate missing data by
 extrapolating what we have, but we're not there yet.
  - `connbidirect2.csv`: Same as for `bandwidth.csv`, except that we're
 providing averages where missing descriptors don't affect the result as
 much. Cutting of 2 days will be fine (today and yesterday).
  - `clients.csv` and `userstats-combined.csv`: Same as for
 `connbidirect2.csv`, except that we're being smarter about estimating
 numbers from given reports. Cutting of 2 days will be enough (today and
 yesterday).
  - `hidserv.csv`: Same as `clients.csv` et al., except we're being quite
 smart about extrapolating reported statistics, so that we might even cut
 off just 1 day. But let's do 2 days as before to be on the safe side.
  - `torperf-1.1.csv`: OnionPerf only provides completed days, so it
 depends on when we get those files and whether we get all of them at once.
 I'm less certain here, but I think we're doing okay by cutting off 2 days.
  - `webstats.csv`: I don't have good data, because webstats.tp.o was down
 for a couple days now. This might also change after switching to
 CollecTor's webstats module. I'd say we don't touch this now and revisit
 it after switching to CollecTor.

 Please review [https://gitweb.torproject.org/karsten/metrics-
 web.git/commit/?h=task-25196&id=450d9f1edd880a7d6d46014af6bcc0e211630af7
 commit 450d9f1 in my updated task-25196 branch]. If possible, I'd like to
 make changes tomorrow (Thursday).

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25196#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list