[tor-bugs] #29315 [Metrics/Website]: Write down guidelines for adding new stats

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Apr 25 10:32:14 UTC 2019


#29315: Write down guidelines for adding new stats
-------------------------------------+--------------------------------
 Reporter:  karsten                  |          Owner:  karsten
     Type:  enhancement              |         Status:  needs_revision
 Priority:  Very High                |      Milestone:
Component:  Metrics/Website          |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:  metrics-roadmap-2019-q2  |  Actual Points:
Parent ID:                           |         Points:  3
 Reviewer:  irl                      |        Sponsor:
-------------------------------------+--------------------------------

Comment (by karsten):

 Replying to [comment:14 irl]:
 > I would like for these systems to be as open/transparent as is possible.
 The demarcation between a system that collects metrics and Tor Metrics
 should not just be for Tor Metrics. Anyone should be able to do what Tor
 Metrics does. This means that services publish data, and we pull from the
 service.

 This sounds like a fine recommendation where this is possible. If a system
 can sanitize its data by itself before making it available to us and
 others, great! Let's just be clear that we're shifting complexity and
 maintenance work from Tor Metrics to services run by others. If they have
 the resources to do this, okay.

 But let's consider whether we want to make this a hard requirement. There
 may be services where we're glad that somebody runs them and where we
 cannot expect them to also run sanitizing code. The options in such a case
 are that we either don't get the data, or we sanitize it somewhere. And if
 we can choose where to sanitize it, we can either do it as part of a
 CollecTor module or in a separate tool run on the host that also runs the
 service. In either case we're providing the sanitized data to others who
 can then do everything that Tor Metrics does.

 However, we discussed this topic before, and it seems we still do not
 quite agree. Would it help if we made this a hard requirement with the
 caveat that, if somebody cannot run sanitizing code, ''we'' run it on a
 machine that is not officially part of Tor Metrics?

 > It does not need to be a web server. If there is not already a webserver
 then a Gopher server or TCP port that dumps out the document are also fine
 as far as I'm concerned, maybe karsten has other opinions.

 Gopher? My initial reaction is that we shouldn't fall into the same
 esoterism trap where we also lost Haskell-written TorDNSEL.

 I'd say let's strongly recommend a webserver, and if that's not possible,
 talk to folks.

 > Increasingly I'm thinking that the Tor directory protocol meta format is
 a good format to have metrics in. We already have parsers for these that
 are fast and efficient, and it's easier to detect errors due to the strict
 format (even if #30105 and similar things sometimes slip through). The
 document format also provides for signing of documents, which I'd like to
 see more of our data sources doing. #29624 is looking at defining a new
 format for exit lists, and is using the meta format with Ed25519
 signature.

 Sounds good to me, as a recommendation that likely works for most new
 formats. For example, having sanitized web server logs in the Apache
 format made sense, because then it was possible to use existing tools to
 process them. But yes, for most formats this is a fine recommendation.

 Would you mind taking the draft and the comments above and writing an
 updated draft? I feel like if I continue owning this task, we'll need more
 review rounds. Let me know!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29315#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list