[metrics-bugs] #27076 [Metrics/CollecTor]: Reconfigure collector2.tp.o to do less

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Aug 8 08:48:06 UTC 2018


#27076: Reconfigure collector2.tp.o to do less
-----------------------------------+--------------------------
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  task               |     Status:  new
     Priority:  Medium             |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+--------------------------
 We have two CollecTor instances: collector.tp.o on colchicifolium and
 collector2.tp.o on corsicum. Reasons for having two instances instead of
 one are related to failure tolerance:

  1. Whenever collector.tp.o fails, it doesn't fetch consensuses and votes
 from the directory authorities, and those are only available for an hour.
 If collector.tp.o fails for a couple hours, it can later fetch missing
 descriptors from collector2.tp.o.
  2. While collector.tp.o is down, Onionoo can fetch relay descriptors from
 collector2.tp.o and continue to provide recent data.

 However, I think we went a bit too far when configuring collector2.tp.o to
 also sync descriptors from collector.tp.o. It does that with bridge
 descriptors and sanitized web logs.

 Here's how the two instances are currently configured:

 {{{
 collector.tp.o/colchicifolium:
 RelaySources = Cache, Remote, Sync, Local
 BridgeSources = Local
 ExitlistSources = Remote
 OnionPerfSources = Remote
 WebstatsSources = Local

 collector2.tp.o/corsicum:
 RelaySources = Remote
 BridgeSources = Sync
 ExitlistSources = Remote
 OnionPerfSources = Remote
 WebstatsSources = Sync
 }}}

 It's the two `"Sync"` entries at the bottom. I think we mainly put them in
 so that the respective sync code gets executed, too, so that we would
 notice any issues with that.

 I now believe that these entries are not helpful and potentially harmful,
 for several reasons:

  1. The sync mode of the bridgedescs module does not clean up the
 `recent/` directory after placing descriptors there. The local mode would
 do that, but the sync mode does not. The effect is that bridge descriptors
 in `recent/` pile up and fill up disk space. Even worse, Onionoo fetches
 everything contained in that directory, so that bootstrapping a new
 Onionoo instance downloads vast amounts of data these days.
  2. I don't yet know what happened in #27055, but it seems that
 simplifying the configuration of collector2.tp.o should make that issue at
 least less likely to happen again.

 I could imagine reconfiguring collector2.tp.o to only perform the
 following tasks:

 {{{
 collector2.tp.o/corsicum:
 RelaySources = Remote
 ExitlistSources = Remote
 }}}

 The effect would be that we'd still keep our failure tolerance properties
 and nothing more.

 Does that make sense? Did I miss anything important here?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27076>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list