[tor-dev] Network Health Monitoring

teor teor at riseup.net
Tue May 21 08:14:49 UTC 2019


> On 15 May 2019, at 22:40, David Goulet <dgoulet at torproject.org> wrote:
>> On 08 May (13:27:31), Iain Learmonth wrote:
>> Hi All,
>> I'm working on #28322 to improve the monitoring of Tor Metrics services,
>> but this also has the side effect of monitoring network health. For
>> example, we'd like to know when Onionoo messes up and starts reporting
>> zero relays, but we also get to learn for free in the same check how
>> many relays we have and alert if that number does something weird.
>> What would be the most useful checks to add here?
>> * Range of expected total relays
>> * Range of expected relays with Guard flag
>> * Range of expected relays with Exit flag
>> * Range of expected consensus weight in each position
> For all of them, what could be reported is if a large fraction disappears all
> the sudden.
> Loosing for instance 500 relays at once is something worth our attention imo.
> Same goes with Exit relays... if we drop from 900 to 500, it is scary.
> For the consensus weight, I would report the outliers. Maybe someone is gaming
> us and so a HUGE values compared to our top usual 10 means something is up.
> As what are the good values, I don't know but I think you can probably figure
> out the average relay we loose/gain every day and scale that like 3 times for
> a warning?

Maybe it's also worth checking how many times each rule would trigger in the
past year?

If the statistics are normally distributed, you could use 4 standard deviations,
so that each rule (falsely) triggers about once a year.


More information about the tor-dev mailing list