[tor-bugs] #5212 [Metrics Website]: Write up wiki page with requirements, questions etc for a basic monitoring infrastructure

Mon Feb 27 10:42:04 UTC 2012

#5212: Write up wiki page with requirements, questions etc for a basic monitoring
infrastructure
-----------------------------+----------------------------------------------
 Reporter:  runa             |          Owner:  runa 
     Type:  task             |         Status:  new  
 Priority:  normal           |      Milestone:       
Component:  Metrics Website  |        Version:       
 Keywords:                   |         Parent:  #4407
   Points:                   |   Actualpoints:       
-----------------------------+----------------------------------------------

Comment(by karsten):

 Replying to [comment:1 atagar]:
 > From the dev meeting here's the thoughts that I recall...
 >
 >  * Ideally this would replace my consensus tracker script, Karsten's
 consensus health checker, and SoaT. Those all would be plugins for this
 system.

 My understanding was that the alarming framework would be ''passive'' and
 only look at descriptors or other data sources, but that it shouldn't make
 any ''active'' requests.  We didn't explicitly discuss this, but I think
 if active requests are in scope, then the scope is too broad.  I'm worried
 that we fall into the kitchen-sink trap again.

 Your consensus tracker script should do fine with passively looking at the
 current descriptors and maybe server descriptors.  Part of the consensus-
 health script will be fine with that, too, except for the parts where we
 check if a directory authority tells us a recent consensus or not.  But
 SoaT is mostly based on actively measuring whether exits are evil or not;
 that's something we cannot learn passively from looking at descriptors.

 >  * I suggested using the onionoo protocol for this, in which case the
 first step would be to make an onionoo client in whatever language the
 alarming framework would use.

 I still disagree that this is a good application for the Onionoo protocol.
 That protocol is meant for applications that want to learn about the
 status of single relays or bridges that were running in the past week.
 Onionoo clients should be able to find all information they care about in
 the latest documents they download.  They shouldn't care about past
 Onionoo documents.  That's why Onionoo contains bandwidth history objects,
 for example, instead of expecting Onionoo clients to collect their own
 histories.  Applications that want to create a history of relays or
 bridges should look at the original descriptors.

 Of course, I'm not going to stop you from using Onionoo for anything.  But
 I can't promise not to break it in the future for applications that don't
 use it in the way it was [http://kloesing.github.com/Onionoo/ designed]
 for.  Nor can I promise to add lots of stuff to the formats that is
 useless for the main purpose and that makes documents unnecessarily big.

 I suggest rsync'ing the metrics-recent directory, or parts from it, from
 metrics once per hour.  That's also what the Onionoo server does.  Once
 the alarming framework is deployed on a Tor VM, that's just a local
 connection from one VM to another VM on the same physical host or in the
 same LAN.

 >  * I asked Roger for thoughts on other things that we could monitor and
 he suggested...
 >   * Entropy of bandwidth authority weights, so we know when the
 authority heuristics radically change.
 >   * Tor weather notice for when people should get a shirt. Ideally we'd
 then reach out to them to figure out how their experience as a relay
 operator was going.

 Note that the original Tor Weather use case of notifying operators when a
 node goes down can be implemented using the Onionoo protocol just fine.
 That's based on information from the past week.  The t-shirt thing is
 something that ''could'' be implemented using the Onionoo protocol, but
 it's not what Onionoo was designed for.

 >   * Notice about new especially large big relays.
 >  * After some more thought I realized that we should also look around to
 see what sort of alarming frameworks already exist. That might save us a
 lot of work and maybe provide a nice UI too.

 Using something existing would be good.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5212#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online