Explanation on (negative) spikes in number of relays
Around 2 weeks ago, I posted the following concerns regarding "negative spikes" for the number of relays measured by metrics.torproject.org [1] in forum.torproject.org [2] (and earlier on r/TOR [3] ) :
Are there any good explanations on why these spikes occurred? Might those outages imply some centralized infrastructure? If so, how is that not a major concern? So far, I have not received any definite explanations.
If those spikes happen to be measurement errors, I imagine that there should be logs somewhere to support this explanation. 3. Does such evidence exist publicly? 4. If not (3), where would that evidence be stored? Sources: [1] http://metrics.torproject.org/networksize.html?start=2022-01-01&end=2026-03-... [2] https://forum.torproject.org/t/explanation-on-negative-spikes-in-number-of-r... [3] https://www.reddit.com/r/TOR/comments/1s8wx24/explanation_on_spikes_in_numbe... I appreciate any clues, hints, or pointers on where to ask further. Best, ttlns
Hi, metrics.torproject.org has been suffering load issues for a while now and sometimes produces erroneous graphs. We are in the process of replacing it, but it is going to take us some more time. The main reason we sometimes see spikes that do not reflect actual data is simply memory errors from the Java process that does daily aggregation of network data. The kernel kills it and data is left in an inconsistent state. We do not have public logs, but you would be able to verify the number of relays running on the network by parsing consensus documents that are publicly available at collector.torproject.org. The details on how to compute those statistics are also publicly available on our metrics website [1]. I hope this answers your question. Cheers, -hiro [1] https://metrics.torproject.org/reproducible-metrics.html#servers On 14/4/26 14:57, ttlns via network-health wrote:
Around 2 weeks ago, I posted the following concerns regarding "negative spikes" for the number of relays measured by metrics.torproject.org [1] in forum.torproject.org [2] (and earlier on r/TOR [3] ) :
Are there any good explanations on why these spikes occurred? Might those outages imply some centralized infrastructure? If so, how is that not a major concern? So far, I have not received any definite explanations.
If those spikes happen to be measurement errors, I imagine that there should be logs somewhere to support this explanation. 3. Does such evidence exist publicly? 4. If not (3), where would that evidence be stored?
Sources: [1] http://metrics.torproject.org/networksize.html?start=2022-01-01&end=2026-03-... [2] https://forum.torproject.org/t/explanation-on-negative-spikes-in-number-of-r... [3] https://www.reddit.com/r/TOR/comments/1s8wx24/explanation_on_spikes_in_numbe...
I appreciate any clues, hints, or pointers on where to ask further.
Best, ttlns _______________________________________________ network-health mailing list -- network-health@lists.torproject.org To unsubscribe send an email to network-health-leave@lists.torproject.org
Hi, I forgot to mention a couple of other things. There might be legitimate reasons for those negative spikes. One that I am thinking of is that when there is an operating system update, relays are rebooted and this also causes them to lose their flags: https://metrics.torproject.org/relayflags.html So if you see a negative spike and you know there was an update to one of the libraries used by little-t-tor, it might be what caused it. In February, this might have also affected our relays count too: https://krebsonsecurity.com/2026/02/kimwolf-botnet-swamps-anonymity-network-... So yes, there are a few reasons, and at this moment we cannot always exclude issues with our metrics infrastructure either. I hope this gives you more context, and sorry for the rushed reply from earlier today. Cheers, -hiro On 15/4/26 12:36, Silvia Puglisi [Hiro] wrote:
Hi,
metrics.torproject.org has been suffering load issues for a while now and sometimes produces erroneous graphs. We are in the process of replacing it, but it is going to take us some more time.
The main reason we sometimes see spikes that do not reflect actual data is simply memory errors from the Java process that does daily aggregation of network data. The kernel kills it and data is left in an inconsistent state. We do not have public logs, but you would be able to verify the number of relays running on the network by parsing consensus documents that are publicly available at collector.torproject.org. The details on how to compute those statistics are also publicly available on our metrics website [1].
I hope this answers your question.
Cheers, -hiro
[1] https://metrics.torproject.org/reproducible-metrics.html#servers
On 14/4/26 14:57, ttlns via network-health wrote:
Around 2 weeks ago, I posted the following concerns regarding "negative spikes" for the number of relays measured by metrics.torproject.org [1] in forum.torproject.org [2] (and earlier on r/TOR [3] ) :
Are there any good explanations on why these spikes occurred? Might those outages imply some centralized infrastructure? If so, how is that not a major concern? So far, I have not received any definite explanations.
If those spikes happen to be measurement errors, I imagine that there should be logs somewhere to support this explanation. 3. Does such evidence exist publicly? 4. If not (3), where would that evidence be stored?
Sources: [1] http://metrics.torproject.org/networksize.html?start=2022-01-01&end=2026-03-... [2] https://forum.torproject.org/t/explanation-on-negative-spikes-in-number-of-r... [3] https://www.reddit.com/r/TOR/comments/1s8wx24/explanation_on_spikes_in_numbe...
I appreciate any clues, hints, or pointers on where to ask further.
Best, ttlns _______________________________________________ network-health mailing list -- network-health@lists.torproject.org To unsubscribe send an email to network-health-leave@lists.torproject.org
Hey Hiro, Thank you for the elaborate answer! First of all, I would like to apologize for the initial duplicate messages which got caught by review. Unfortunately, mailman3 [1] did not show any confirmation, that a thread has or will be created and redirected (302 Found) me directly to [2]. If duplicate messages from new users are a frequent event, I suppose that adding some confirmation might help. > you would be able to verify the number of relays running on the network by parsing consensus documents that are publicly available at collector.torproject.org I will give that a try and post an update to this thread. Thank you for the suggestion! > There might be legitimate reasons for those negative spikes. I would suppose that there should not be any reasons for many relays to go offline at the same time. 1. Am I correctly assuming, that a significant drop in number of relays would create a vulnerable state for the overall network? > So if you see a negative spike and you know there was an update to one of the libraries used by little-t-tor, it might be what caused it 2. Am I correctly assuming, that the relay operators are responsible for updates of their own system? 3. If (2), how can or is the integrity of those updates ensured? Whilst the underlying software is OSS, I would doubt that many relay operators review the code thoroughly. e.g. I imagine ideally, there could/should be some voting system (possibly over the directory servers?) for verifying the integrity of an update? [1] https://lists.torproject.org/mailman3 [2] https://lists.torproject.org/mailman3/hyperkitty/list/network-health@lists.torproject.org/2026/4/? Best, ttlns
On 15/4/26 22:55, ttlns via network-health wrote: > Hey Hiro, > > Thank you for the elaborate answer! > > First of all, I would like to apologize for the initial duplicate messages which got caught by review. Unfortunately, mailman3 [1] did not show any confirmation, that a thread has or will be created and redirected (302 Found) me directly to [2]. If duplicate messages from new users are a frequent event, I suppose that adding some confirmation might help. > >> you would be able to verify the number of relays running on the network by parsing consensus documents that are publicly available at collector.torproject.org > I will give that a try and post an update to this thread. Thank you for the suggestion! > >> There might be legitimate > reasons for those negative spikes. > I would suppose that there should not be any reasons for many relays to go offline at the same time. > 1. Am I correctly assuming, that a significant drop in number of relays would create a vulnerable state for the overall network? Hey there, Let me elaborate a little. The negative spike on relay counts are probably an issue with our data pipeline. This is because data is aggregated daily so reboots in those metrics should be averaged out. From what the flag is concerned if a relay updates the OS packages, either the VM reboots or the tor daemon restarts. In this case some of the flags (usually it's the hsdir one) might be lost temporarily that's why you see the flags fluctuating (or dropping and then picking up). Does this answer your doubts? Relay operators are responsible for updating their system yes. What is your concern in this case? Cheers, -hiro > >> So if you see a negative spike and you know there was an update to one > of the libraries used by little-t-tor, it might be what caused it > > 2. Am I correctly assuming, that the relay operators are responsible for updates of their own system? > 3. If (2), how can or is the integrity of those updates ensured? Whilst the underlying software is OSS, I would doubt that many relay operators review the code thoroughly. e.g. I imagine ideally, there could/should be some voting system (possibly over the directory servers?) for verifying the integrity of an update? > > [1] https://lists.torproject.org/mailman3 > [2] https://lists.torproject.org/mailman3/hyperkitty/list/network-health@lists.torproject.org/2026/4/? > > Best, > ttlns > _______________________________________________ > network-health mailing list -- network-health@lists.torproject.org > To unsubscribe send an email to network-health-leave@lists.torproject.org
Hey Hiro, I have now written a project [1] to collect the relay count from [2]. Following observations: 1. Most spikes at [3] (as you suggested) appear to be measurement errors 2. There are 4 spikes visible at [4]: a.) Positive spike in 12/2014, for around 11 hours b.) Negative spike in 12/2021, only one hour c.) Negative spike in 01/2026, only one hour d.) Positive spike in 01/2026, around 11 hours I will have to check if the respective consensus files for (b) and (c) show some signs of corruption (or failed consensus). The relevant logic for parsing the count for running relays can be found at [8]. 3. ) The relay count shows some 24-hour (12h and 30d as well) periodicity [5]. I'm guessing that this is could be due to individuals who operate relay servers on their personal machines. Or maybe certain relay servers crash on high traffic?
Relay operators are responsible for updating their system yes. What is your concern in this case? My concern is a "trust bottleneck" in the release process. Tor appears to only be signed by a maximum of 3 developers [6] and signatures published on [7]. I imagine that the risk of supply chain attacks could be reduced, if there would be a way to verify a release over the directory authorities. That would require the builds to be reproducible and the DA operators to have the capacity to review the source code.
Best, ttlns [1] https://codeberg.org/ttlns/torcollector_relays_stats_test [2] https://collector.torproject.org/archive/relay-descriptors/consensuses/ [3] https://metrics.torproject.org/networksize.html?start=2010-01-01 [4] https://codeberg.org/ttlns/torcollector_relays_stats_test/src/branch/main/as... [5] https://codeberg.org/ttlns/torcollector_relays_stats_test/src/branch/main/as... [6] https://support.torproject.org/little-t-tor/getting-started/verifying/#bsd-l... [7] https://www.torproject.org/download/tor/ [8] https://codeberg.org/ttlns/torcollector_relays_stats_test/src/branch/main/sr...
participants (2)
-
Silvia Puglisi [Hiro] -
ttlns