[metrics-bugs] #29624 [Metrics/Exit Scanner]: New version of exit list format

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Mar 7 20:35:38 UTC 2019


#29624: New version of exit list format
-------------------------------------+--------------------------------
 Reporter:  irl                      |          Owner:  karsten
     Type:  task                     |         Status:  needs_revision
 Priority:  Medium                   |      Milestone:
Component:  Metrics/Exit Scanner     |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:  metrics-roadmap-2019-q2  |  Actual Points:
Parent ID:  #29650                   |         Points:
 Reviewer:  irl                      |        Sponsor:
-------------------------------------+--------------------------------
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 Here are my notes from talking this over at today's meeting:

 Replying to [comment:8 karsten]:
 > Replying to [comment:7 notirl]:
 > > We need to work on the use of words like "may". Unless Tor already has
 something for this, let's refer to RFC2119.
 >
 > Makes sense. However, it's been a while that I wrote specs with those
 keywords, and I think I didn't get it right in all cases back then. Do you
 mind going through the spec at the end and correcting keywords
 accordingly?
 >
 > > I don't believe we need to prefix keywords with "Scanner". Was there a
 specific reason for this?
 >
 > My idea was to avoid future conflicts with keywords used in exit list
 entries, and in the header it matters the least to make keywords a bit
 longer. I don't feel strongly, though. Mild preference for keeping the
 prefix.
 >
 > > dir-spec uses kebab-case for keywords, not CamelCase.
 > >
 > > For fields that are already defined in dir-spec, like "contact" we
 should refer to those semantics instead of making up our own.
 >
 > Hmm, should we really mix CamelCase and kebab-case in a single document?
 I think I'd prefer to stay in CamelCase notation.

 We made plans to use kebab-case keywords only in version 2. This means
 that it won't be backward-compatible with version 1 which only uses
 CamelCase keywords. The API can still provide the same methods for
 accessing parts of an exit list, regardless of the version. Let's try
 this.

 Related to this change, we're going to say "contact" rather than
 "ScannerContact" or "scanner-contact", and we're linking to version 3 of
 dir-spec to say that we're using the format specified there.

 > > As above, for date/time formats.
 >
 > Hmm? I copied over the format from dir-spec. The formats should be
 equivalent. Or what do you mean?

 Likewise, we're linking to dir-spec version 3.

 > > We should be specific on our use of country codes. There are
 extensions added by the databases we are using, and we also use our own
 extensions. Maybe we should talk to OONI and see what they are using too
 so we can be unified.
 >
 > I'm not sure what to gain from defining (or linking to) a set of allowed
 country codes. I consider this field mostly informational. But I don't
 really mind. In any case we could move forward with completing this spec
 and writing parsers, and we could later adapt the spec to define a subset
 of valid two-letter country codes.

 For now we'll allow `[A-Z][A-Z]` as valid 2-alpha country code as
 specified in ISO 3166-1 alpha-2. We're writing these as uppercase and
 parsing them case-insensitively.

 > > How does the "Downloaded" keyword work with signed documents? How do
 you see it being used?
 >
 > Signed documents are certainly a challenge. The issue is that this
 keyword is already being used: CollecTor adds it. A better choice (back
 then) would have been to use an annotation for this. But I think the
 `Created` keyword will supersede this keyword anyway. Still, it's there,
 which is why I included it in the spec. Maybe there's a better plan?

 We might use `@downloaded-at` in CollecTor, but we're not going to specify
 a new line like this in version 2 of the exit list specification.

 > > On point 1, this sounds OK. I am starting to think of exit lists in
 the new scanner context as a derived format from the raw measurement
 results in a similar way that our current torperf files are derived from
 onionperf analysis results which are derived from tor/tgen logs.
 > >
 > > As an aside, the format we are deriving from will most likely be
 [[https://pathspider.readthedocs.io/en/latest/using.html#data-
 formats|PATHspider ndjson]]. This is not important for the spec.
 >
 > Makes sense.
 >
 > > On point 2, this also sounds OK. Should we specify that an exit list
 should be used with a specific consensus in applications like ExoneraTor?
 I think no, we should always use the latest exit list and latest consensus
 to give the most up-to-date information available.
 >
 > Agreed, we should leave this up to the application.
 >
 > Changing back to needs_review for the open questions. Thanks!

 I'm going to make changes as outlined above, and then irl is going to
 adapt the MAY/MUST/etc. parts.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29624#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list