[tor-bugs] #4407 [Metrics Website]: Create a basic monitoring infrastructure for large scale events

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue Nov 8 08:02:03 UTC 2011


#4407: Create a basic monitoring infrastructure for large scale events
-----------------------------+----------------------------------------------
 Reporter:  atagar           |          Owner:  karsten
     Type:  task             |         Status:  new    
 Priority:  normal           |      Milestone:         
Component:  Metrics Website  |        Version:         
 Keywords:                   |         Parent:         
   Points:                   |   Actualpoints:         
-----------------------------+----------------------------------------------

Comment(by karsten):

 Replying to [comment:2 atagar]:
 > > My suggestion is to continue using a Java application that gets
 executed once per hour by cron.
 >
 > The first question that comes to mind is: do we need monitors to have
 historical data? This was the reason I avoided the metrics codebase for my
 consensus tracker script. Once you add its java and DB prereqs the
 installation and complexity of the system gets much worse with, I think,
 little benefit.

 We don't need historical data for the monitoring infrastructure.  Or
 rather, we'll want to keep our own state files, but we don't really need
 to have access to past descriptors.  I agree with you that the monitoring
 infrastructure should be independent of the metrics database.

 I came to a similar conclusion a few weeks ago, but for a slightly
 different reason.  We had a single cronjob to download descriptors, import
 them into the metrics database, and run the consensus-health script.  This
 approach turned out to be terribly error-prone.  Whenever the database
 import got stuck, the download stopped and the consensus-health script
 didn't work anymore.  That's why I made the consensus-health script a
 separate component that is independent of the metrics database.

 But hey, Java is not a prereq, it's a programming language.  Whether we
 require a certain JVM and Java libraries or a certain Python version and
 Python APIs makes no difference.  Well, besides the personal developer
 preferences that have an influence on development speed.

 > What I'd like to see is for the alarm infrastructure to use a metrics
 service API, but itself be a separate and distinct component.

 I like the idea of such a metrics service API.  I have a TODO list item
 since way too many months for extracting the common parts of metrics-web
 and metrics-db that handle relay descriptors and put them in a separate
 API.  In the meantime, ExoneraTor copies that code, the consensus-health
 script copies it, the extra-info descriptor health script would copy it,
 and the monitoring infrastructure is going to copy it, too.  Let's finally
 make an API.  I'm going to open a ticket today once I have a rough idea
 how the API could look like.  Will post the ticket number here.

 > That said, this decision is really up to whoever codes it. If it's
 something like the above then I'd be happy to mentor, and if it's an
 expansion of the metrics codebase then guess that ball's in your court. If
 no one gets to it first then I might hack on it later as a client for
 stem.

 We could also discuss what the API is supposed to do, and then implement
 it both in Java and Python.  There are a few Java metrics programs that
 would make use of it, and I think you have a few Python applications which
 would use it, too.

 > > But I already know 1 person who won't like that suggestion. ;)
 >
 > Bold accusation! Actually, if you'd proposed a java project when I first
 joined the community I would have been all over it - I have far more java
 development experience than python.

 Doh! ;)

 > > We can implement trivial things like "we just lost more than 25% of
 the relays in one hour." But what we really need is someone to sit down
 with the descriptor archives and look what are expected changes and what
 changes would be unusual.
 >
 > Right. What I'd like to see first is alarms for when the sky is falling.
 After that it becomes a question of tuning and pattern matching which
 could then easily lead to interesting research projects - hint hint,
 researchy people. :)

 Agreed.  This research project might even turn out to be quite
 interesting!

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4407#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list