Thanks for this email. I exporting more metrics on the control port is a great idea. I wanted to have that for a while
Great to hear that so we have a realistic chance it gets actually implemented :)
There are safety questions to ask ourselves here before blindly exporting many stats.
Sure.
Exporting many stats to the control port unfortunately means that all relay operator can possibly create fancy graphs
making non-public graphs and alerts is the goal
and make them public
public graphs should result in the rejection of affected relays. I'll be submitting a few to bad-relays@ soon since enn.lu apparently does not care when asked to remove their public stats and xml data.
which, depending on the stat, can be harmful.
Furthermore, graphing stats can also means that over time the relay operator stores historical data of everything that happened within the relay and that can be used in many ways to pull off attacks (ex: subpoena to access such data base by LE).
yes, acceptable / unacceptable retention times and granularity should be defined and documented. I'd propose a max. retention time of two weeks.
The Heartbeat log has a minimum of 30 minutes period but a default of 6 hours.
current tor has no restrictions on Heartbeat granularity, you can ask tor to write the data to the logs every other second by issuing "SIGNAL HEARTBEAT" on the control port.
Whatever stats we would end up exporting, I strongly think that keeping delays like that is a strong requirement because we would sort of "bin" those aggregated stats by a "long enough period" instead of having a very fine grained stream of stats that would make it trivial to spot spikes down to the minute.
30 or 60 minutes granularity seems reasonable
Some of the stats below are safe in my opinion like the memory usage but most of them need to be looked at in terms of safety
yes please