[tor-bugs] #25685 [Core Tor/Tor]: Tor relays publish a new descriptor but authorities drop it because they think it's only cosmetically different, and then the relay waits 18 more hours to publish, thus falling out of the consensus

Tor Bug Tracker & Wiki blackhole at torproject.org
Sun Apr 1 23:17:56 UTC 2018


#25685: Tor relays publish a new descriptor but authorities drop it because they
think it's only cosmetically different, and then the relay waits 18 more
hours to publish, thus falling out of the consensus
------------------------------+----------------------------------
     Reporter:  arma          |      Owner:  (none)
         Type:  defect        |     Status:  new
     Priority:  Medium        |  Milestone:
    Component:  Core Tor/Tor  |    Version:
     Severity:  Normal        |   Keywords:  034-roadmap-proposed
Actual Points:                |  Parent ID:
       Points:                |   Reviewer:
      Sponsor:                |
------------------------------+----------------------------------
 We have a design flaw, or at least an impedance mismatch, in our
 descriptor publishing algorithm.

 Relays publish a new descriptor when they think something has sufficiently
 changed (e.g. bandwidth, IP address, exit policy, etc) or when 18 hours
 have passed.

 Directory authorities accept the new descriptor when *they* think it has
 sufficiently changed. If they think it hasn't, they quietly drop it:
 {{{
     log_info(LD_DIRSERV,
              "Not replacing descriptor from %s (source: %s); "
              "differences are cosmetic.",
              router_describe(ri), source);
 }}}

 The trouble comes when things get out of sync: the relay thinks it
 published recently so it is still early in its 18 hour timer, but the
 authorities discarded that descriptor. Then when the "current" descriptor
 becomes 24 hours old, it gets discarded, and the relay falls out of the
 consensus.

 I don't have stats on how frequently this out-of-sync actually happens,
 but it's enough to have tickets filed about it (#23638) and it's enough to
 have confused/sad posts from relay operators about it every month:
 https://lists.torproject.org/pipermail/tor-dev/2018-March/013030.html
 https://lists.torproject.org/pipermail/tor-relays/2018-March/014764.html

 We deployed a bandaid in 0.2.3.4-alpha (commit 1f4b694, #3327), that makes
 relays look in the consensus and publish a new descriptor more
 aggressively if they find they're not listed. That hack is apparently
 needed quite often: in #21642 I said "So 426 of our ~7300 relays stayed in
 the consensus in the last 12.5 hours because of this hack."

 But I think we haven't actually explored whether the bandaid helps all of
 the relays stay in the consensus all of the time, or if there are still
 "holes" in it that mean some relays fall out sometimes. The reports above
 make me think that yes there are still holes.

 Potential ways forward:

 * Match up the descriptor upload timings, as seen by a dir auth, with the
 appearance of relays in the consensus. See how many of the relays
 publishing for reason "version listed in consensus is quite old" are
 missing any hours in the consensus.

 * If there are some that fall out of the consensus entirely, think about
 ways to make the republish more aggressive and earlier, or if it is
 already more aggressive and earlier, figure out why it isn't sticking.

 * Think about ways to make our relay-side decisions about "is it different
 enough" synchronize better with our dirauth-side decisions. Now that we're
 doing hourly consensus documents, can the dir auths be more lenient of
 similar-ish descriptors, because there's only one "winner" of a descriptor
 each hour? This poor synchronization is part of why we couldn't implement
 proposal 275 when we wanted to.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25685>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list