[tor-bugs] #19118 [Metrics/Onionoo]: Add organization name to each relay

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat May 28 13:54:02 UTC 2016


#19118: Add organization name to each relay
-----------------------------+-----------------------------------
 Reporter:  virgil           |          Owner:  karsten
     Type:  enhancement      |         Status:  needs_information
 Priority:  Medium           |      Milestone:
Component:  Metrics/Onionoo  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  hardening        |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+-----------------------------------

Comment (by virgil):

 > The CAIDA data doesn't contain IP address ranges, so we'll have to keep
 using !MaxMind data in addition to CAIDA data. Okay. But that means that
 CAIDA's comprehensiveness in terms of number of ASNs is meaningless to us,
 because we're limited to whatever ASNs are in !MaxMind data. (0)

 You're right. We still only start with the IP#, and it would be a pain to
 implement a method to learn the AS numbers. Okay, that kills any utility
 of CAIDA having more ASs.

 > !MaxMind contains 67 of its 2833 ASNs (not sure where your 53k number
 comes from) that CAIDA does not know about. Right now we'd have
 organization names for these ASNs, but once we switch over to using
 CAIDA's organization names we'd provide less information there. And I'm
 not willing to provide !MaxMind data if CAIDA doesn't have anything for a
 given ASN, because nobody will understand that, nor do I want to provide
 both organization names. This is a serious problem that I don't know how
 to work around cleanly. (-1)

 > CAIDA data is only updated every three months, !MaxMind provides a new
 update every month. It already happens that people ping me because
 !MaxMind's data is old, and that's only going to get worse with CAIDA.
 Somewhat related, !MaxMind has been providing ASN data for many years now
 without major issues whereas CAIDA apparently started providing data only
 2 years ago. (-1)

 [http://www.cidr-report.org/cgi-bin/plota?file=%2fvar%2fdata%2fbgp%2fas2.0
 %2fbgp-as-count%2etxt&descr=Unique%20ASes&ylabel=Unique%20ASes&with=step
 The 53k figure is actually correct]. Additionally, I would never wholly
 replace !MaxMind data with CAIDA---the fields convey very different
 things. !MaxMind says which organization is the registered owner, while
 CAIDA does some cleverness to learn the parent organization.
 Thisveryareverydifferent. I would propose that there be a new field,
 called something like !`parent_organization` for each relay which is
 populated by CAIDA [when it exists].  I claim this sets both of the above
 (-1)s to (0).

 > We'd still need to write, review, and test code to handle CAIDA's data
 format. This could become a neutral if somebody submits a good patch, but
 please only do that if that makes the overall sum positive, or that patch
 might not get accepted. (-1)

 The CAIDA format is a standard CSV. https://commons.apache.org/proper
 /commons-csv/  (0)

 > Operating an Onionoo server becomes a bit harder with an additional data
 source to update. We want more people to run Onionoo servers at some
 point, so we should make that process easier not harder. (-1)

 This is indeed an issue. It seems entirely reasonable to me if
 someonedoesntwant to do the CAIDA data, they simply won't have the
 !`parent_organization` field. Totally cool with that. (0?)

 > !MaxMind indeed contains similar but not equivalent organization names
 which should be exactly the same. However, the actual number is lower than
 what your pairwise comparison implies, and somebody measuring organization
 diversity could always use a similarity metric as yours when looking at
 these strings. Anyway, CAIDA is indeed better here than !MaxMind. (1)

 So I actually low-balled this for you.

 Here'sthe actual numbers.

  * # of ASNs for which MM's organizations are different, yet CAIDA's
 'parent organization' are the same: 3299
  * # of ASNs for which MM's organization are _very_ different, yet CAIDA's
 'parent organization' are the same: 1935



 I attach a list of those 1935 pairs as
 ''[https://trac.torproject.org/projects/tor/attachment/ticket/19118/MMs_very_diff.txt
 MMs_very_diff.txt] .''

 Two AS-ORG names being similar is not sufficient nor necessary for two ASs
 to be correctly grouped under the same parent organization. We totally
 tried to learn these relationships from themaxminddata, and failed. I was
 in the process of deriving my own method from the academic literature
 until I found the CAIDA data which did everything I needed.

 I have no stake in this. We tried to use something like !MaxMind for
 Roster, failed, but then discovered CAIDA worked. You then requested that
 we move as much functionality intoOnionooas possible. So this is me trying
 to do that. It's of course totally fine to say that this is too niche a
 need to be worth including intoOnionoo. In which case, Roster will just
 continue to use its own database for this---which is totally cool. I'm
 just trying to, as you requested, upload the goods we found to
 theOnionooMothership. This is me exerting effort to be a good uploader of
 candidate good things toOnionoo.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19118#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list