[metrics-bugs] #29624 [Metrics/ExoneraTor]: New version of exit list format

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Mar 1 20:22:18 UTC 2019


#29624: New version of exit list format
-------------------------------------------------+-------------------------
 Reporter:  irl                                  |          Owner:  karsten
     Type:  task                                 |         Status:
                                                 |  accepted
 Priority:  Medium                               |      Milestone:
Component:  Metrics/ExoneraTor                   |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  metrics-exit-list-project metrics-   |  Actual Points:
  roadmap-2019-q2                                |
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by karsten):

 Replying to [comment:2 irl]:
 > Replying to [comment:1 karsten]:
 > > > * Source ASN
 > > > * Source country
 > >
 > > I'm not sure about these. We would basically include these by having
 the source look up its IP address in a database. But then the result
 depends on which database (version) the source uses. Of course, whoever
 uses this information could as well look up the source IP address in the
 database (version) of their choice and discard these two fields. Maybe
 this means we shouldn't put too much effort in the source's ability to
 include these two fields. Or we could just omit them from the spec. Not
 sure!
 >
 > I would rather have them as optional if you think that they would not be
 required. I would expect this to be either declared by the user, who
 should know best, or looked up via RIPEstat.

 We can specify these. What format would we expect the country and ASN to
 be in?

 > > This one is tricky. I don't think that the current scanner includes
 scans that ended with unknown failures or timeouts. It includes, for each
 found exit IP address, the latest scan time of a successful run resulting
 in that IP address. It probably omits IP addresses after a given number of
 hours, but we'd have to look at the code in order to know.
 >
 > I would like to have one line per measurement, whether it succeeds, has
 a duplicate result, fails, or whatever. This helps us to understand how
 the tool is performing and doesn't hide information that would be really
 useful in debugging.

 I see your point. However, this would be a backward-incompatible change to
 the current document format where the IP address is unique for the
 `ExitAddress` lines of any given router. And it might not scale if we add
 lots and lots of scans all ending with the same result. Unclear.

 > > > I think we should probably have one line per measurement, so IPv4
 and IPv6 results would be listed separately, not on the same line. In the
 future we may have differing transports to consider (TCP/QUIC/something
 else) so maybe we should not just have IPv4 vs IPv6 but some numeric
 identifier that is later extensible.
 > >
 > > Agreed on the IPv4/IPv6 distinction. I was thinking to simply include
 a new `ExitAddress6` line for IPv6 addresses and continue using
 `ExitAddress` for IPv4 addresses. And I'd probably simply add another
 keyword for the next transport or address version. What else do you have
 in mind?
 >
 > This could also work, but we should do it in a way that we have defined
 a generalised format for the measurement result and then we have specifics
 for IPv4 and IPv6 which should just be that the expected address format is
 different.

 Sounds good.

 > > Relatedly, I'd want to include `OrAddress` and `OrAddress6` for the
 addresses found in the consensus. Background is that I'd like to use exit
 lists as single input document type for ExoneraTor in the future.
 >
 > Perhaps we are describing Internet Address Lists and not Exit Lists?

 Possibly. Maybe this won't scale, either. Unclear.

 > > > Exit lists are not currently included in torspec but probably should
 be. The specification should cover the existing format, and then also the
 new format. We should expect that we will later extend the new format with
 a signature. Maybe we should just figure that out now also.
 > >
 > > Turns out that specifying the existing format is not trivial. Right
 now I'm looking at metrics-lib only, but I think I'll have to look at
 other code that produces/consumes these lists. For example, it would be
 great to know whether `Published` and `LastStatus` in the current format
 are considered required or optional fields, because it would be very
 convenient to lose them in version 2. What other code I should be looking
 at?
 >
 > I've not thought about this yet, but why would it be convenient to lose
 these in version 2?

 Both are rather implementation-specific pieces of information that are not
 really relevant for exit lists. `Published` is used to avoid doing another
 scan until the next descriptor arrives, and `LastStatus` is used to decide
 when to discard a router. Both parts are contained in exit lists, because
 they're not primarily an output format but an internal state file used by
 TorDNSEL.

 It doesn't hurt to have these lines, except that they eat up space.
 However, declaring them as required means we can never remove them from
 future formats without making a backward-incompatible change. But maybe
 this ship has sailed, and we need to consider them required, because they
 have always been there.

 P.S.: Do we need a new Metrics/* subcomponent for this exit list work?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29624#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list