[tor-dev] How long does it take for 25/50/75% of relays to update to a new Tor version?

Karsten Loesing karsten at torproject.org
Fri Sep 18 10:00:39 UTC 2015

Hash: SHA1

Hello Sebastian and list,

Sebastian again suggested a fun task for the latest measurement team
1-1-1 task exchange round [0] that I picked up.  He asked the
following question:

When a new Tor version is released, how long does it take for 25, 50,
75% of all relays to update? How often do relays update in general? Is
there an effect when for example Debian updates its packages? It would
help me to get an idea what I need to answer these questions :-)
"update" means you're running, then you go down, then within a few
minutes you're back up again with a changed version

Here's what I came up with:

I suggest we use two data sets as input for this task.  The first data
set consists of archived consensuses, available at:


We'll need "valid-after" and "server-versions" from the header and "v"
and "w" lines from all contained entries.  Example:

valid-after 2015-09-18 08:00:00
v Tor
w Bandwidth=450
v Tor
w Bandwidth=23

We can use those lines to *count* the number of running relays with a
given version string, and we can sum up the *fraction* of consensus
weights of relays running a given version string.  Both can be useful,
though I suspect the latter to be more meaningful.

We should aggregate these consensus parts into daily statistics to
make them easier to handle.  We can do that by keeping counters per
(date, version) tuple and per (date) for a) number of relays and b)
consensus weight, and then we divide the (date, version) numbers by
the (date) numbers to obtain averages.  We'd also determine whether a
version is recommended, whether it's the latest recommended version in
its series, and whether it's the latest recommended stable version.
The result would be a .csv file like the following (example data!):


Read this as: "On 2015-09-18, version, which was both
recommended and the latest in its series, was run by 38.7% of relays
by count or 45.1% by consensus weight." (again, example data!)

You could then draw graphs similar to
https://metrics.torproject.org/versions.html, but with a percent scale
on y.  Though I'm not sure how readable those graphs would be with a
few dozen lines in them.  Still, they might be useful to explore the
data set.

The second data set that would be useful here is events related to
versions.  The following events come to mind:

 - tagged as version in Git (use a command similar to [1]),

 - tagged as alpha, beta, rc, or stable (look at version string),

 - first/last recommended by directory authorities for relays (parse
from consensuses),

 - last recommended as newest version in a series or newest stable
version (also parse from consensuses), or

 - first released in Debian stable (unclear how to obtain those dates).

A possible data format would be:


For exploratory purposes, you could add these as colored vertical
lines to the graphs above.  Of course, adding more elements to a
probably already overloaded graph doesn't exactly make it easier to
understand.  But if you limit the x axis on just a month or two, it
might be useful.

So, regarding your original question: "How long does it take for
25/50/75% of relays to update to a new Tor version?"  I'm not sure
whether that's really the best question to ask/answer here.  Some
versions will never be deployed on 75%, 50%, or even 25% of relays,
because a new version was released shortly after.  But that doesn't
mean that the previous version was bad.

This question also weights all versions the same, though in reality
some releases are more important than others.  If you get one data
point for each version and then aggregate them by taking the average,
what exactly does that tell you?

I think that question could be rephrased to: "How long does it take
for 25/50/75% of relays to update to a new Tor version *series*?"  And
I think that's something you could answer with the first data set above.

But there are other interesting questions you could answer with the
two data sets.  For example,

 - What fraction of relays (by number or consensus weight) is running
(un-)recommended versions?

 - What fraction of relays (by number of consensus weight) is (not)
running the latest version in a series or latest stable version?

Those could again be answered by a graph that uses the first data set,
dates on x and fractions on y, and possibly events from the second
data set as vertical lines.  Explore, explore.

There, the hour is over ("It took an hour to write, I thought it would
take an hour to read.", Fry, Futurama).  Hope this is useful in any
way.  And maybe others have good/better ideas for you as well?  If you
come up with some interesting answers, please post them here.

All the best,

[0] 1-1-1 task exchange: you get 1 minute to describe a task that
would take somebody else roughly 1 hour and that they will do for you
within 1 week (review a document, write some analysis code, fix a
small bug, etc.; better come prepared to get the most out of this;
give 1, take 1)

[1] `git log --tags --simplify-by-decoration --pretty="format:%ci %d"
| grep "tag: tor" | less`
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org


More information about the tor-dev mailing list