[tor-dev] Proposal 303: When and how to remove support for protocol versions

David Goulet dgoulet at torproject.org
Thu May 23 18:49:22 UTC 2019

On 22 May (18:57:06), Mike Perry wrote:
> Nick Mathewson:
> > Filename: 303-protover-removal-policy.txt
> > Title: When and how to remove support for protocol versions
> > Author: Nick Mathewson
> > Created: 21 May 2019
> > Status: Draft
> > 
> > 1. Background
> > 
> >    With proposal 264, added support for "subprotocol versions" -- a
> >    means to declare which features are required for participation in the
> >    Tor network.  We also created a mechanism (refined later in proposal
> >    297) for telling Tor clients and relays that they cannot participate
> >    effectively in the Tor network, and they need to shut down.
> > 
> >    In this document, we describe a policy according to which these
> >    decisions should be made in practice.
> > 
> > 2. Recommending features (for clients and relays)
> > 
> >    A subprotocol version SHOULD become recommended soon after all
> >    release series that did not provide it become unsupported (within a
> >    month or so).
> > 
> >    For example, the current oldest LTS release series is 0.2.9; when it
> >    becomes unsupported in 2020, the oldest supported release series will
> >    be 0.3.5.  Suppose that 0.2.9 supports a subprotocol Cupcake=1, and
> >    that all stable 0.3.5.x versions support Cupcake=1-3.  Around one
> >    month after the end of 0.2.9 support, Cupcake=3 should become a
> >    _recommended_ protocol for clients and relays.
> > 
> >    Additionally, a feature can become _recommended_ because of security
> >    reasons.  If we believe that it is a terrible idea to run an old
> >    protocol, we can make it _recommended_ for relays or clients or both.
> >    We should not do this lightly, since it will be annoying.
> To be clear, "_recommended_" and "_required_" terms here are from
> Proposal #264, Section 4, right? Aka the consensus lines?
> These only affect WARN-vs-exit behavior by clients and relays that lack
> support, right? Clients and relays will still *negotiate* and use
> protocol versions that they both have, even if they are not listed as
> either recommended or required?

Afaiu, you can only negotiate what you know which is the protover list you
support (the one advertised by relays).

For instance, if "Cupcake=1-3" is what you support as a client but the
recommended is "Cupcake=2-3", you can still do "1" but you will be warned.

If _required_, let say "Cupcake=3" but the client is "Cupcake=1-2", then the
client does _not_ join the network. If _required_ is "Cupcake=1-3" for both
the relay and client, then yes they can use version "1" instead of "3" if I'm
not mistaken else "Cupcake=3" should be used.

> Are there cases where they don't/won't negotiate to use a new protover
> field, such as for anonymity fragmentation reasons? How do we handle
> those?

As an example for the prop289 (authenticated SENDMEs), we handle that with a
consensus parameters that flip knobs at once to avoid partitioning problem as
much as possible. _And_ then the protover is changed changed into the
_recommended_ or _required_ field depending on where we are.

> (I am trying to gauge the impact of this proposal on our ability to roll
> out new features that clients can use right away vs ensure that old
> clients and relays can still work. It seems to focus on the latter,
> and I want to get a handle on at what expense).
> > 3. Requiring features (for relays)
> > 
> >    We regularly update the directory authorities to require relays to
> >    run certain versions of Tor or later.  We generally do this after a
> >    short outreach campaign to get as many relays as possible to upgrade.
> > 
> >    We MAY make a feature required for relays one month after every
> >    version without it is obsolete and unsupported, though it is better
> >    to wait three months if possible.
> > 
> >    We SHOULD make a feature required for relays within 12 months after
> >    every version without it is obsolete and unsupported.
> As a cultural signaling thing, I think it is better to say to relay
> operators, "keep your relay's operating system and its Tor up to date,
> or please don't run it anymore (aka we'll shut it down for you)."
> I think its bad culturally if we signal to people that we need relays so
> badly that it doesn't matter if they are unpatched, or if the OS is
> unpatched, or if they accidentally publish their relay and ssh keys to a
> public github repo. (Relays running on a system that hasn't received any
> patches or security updates in 12 months is the administrator diligence
> equivalent of publishing admin keys to public github, IMO, if not its
> actual functional equivalent).
> Not only does it encourage a sloppy mindset about paying attention to
> relay systems, it also slows down our development of new protocols, and
> impedes major network upgrades.

I'm very much agreeing with this. We do "force-ask" the directory authorities
to follow the latest stable up to at worst 2 stable behind. There are reasons
for that, maintainenance but also security.

Relays have to be sharp at upgrading... A relay that is not, that we end up
excluding from the consensus because the version is too dangerous (remember
heartbleed), can be considered in my opinion more a liability then a useful

Having capacity in my opinion is as important as having relays that are up to
date. Every release we rollout very important features that if not deployed
network wide, we don't get the benefit of them until years to come (basically
when the previous LTS is EOL...).

And that forces us into a position of backporting sometimes big block of codes
(DoS subsystem is one example).

Still today, there are still 1000+ relays (on 0.2.9) that can't be used for
Onion Service v3... It is a 1/6th of the network and we've released relay
support 2 years ago... And we have _specific_ code to avoid picking those
relays so all these edge case also accumulates in the code over time.

> (As an aside, I would like to take a hard look at the LTS series, and
> brainstorm how much it would cost us to provide official, reproducibly
> built repos for every distribution whose LTS policies we find expensive
> and cumbersome to support.. Or at least do some analysis of which changes
> have been or will be extremely expensive or impossible to roll out due
> to being blocked on needing to maintain the LTS).

If we could convince Debian to consider an EOL version a "security issue" and
thus accept to pull in the new next supported stable in their stable
package...... that would be grand because then even Debian LTS relay operator
could still benefit from getting newer versions, improving the network and
thus the security of all on Tor.

I know I know, challenges and sometimes a bad idea but with this proposal, it
might be a good time to also take a hard look at how things are and change
paradigm even if it means a painful transition.

> > 4. Requiring features (for clients)
> > 
> >    Clients take the longest time to update, and are often the least
> >    able to fetch upgrades. Because of this, we should be very careful
> >    about making subprotocol versions required on clients, and should
> >    only do so for fairly compelling reasons.
> Is this true? From our Tor Browser metrics (which could use some kind of
> totaling), it looks like most Tor Browser users upgrade pretty quickly:
> https://metrics.torproject.org/webstats-tb.html
> What kinds of clients don't upgrade? I got the impression that it was
> mostly things like old botnet cruft that didn't..

My guts feeling is that relays actually take longer...

> >    We SHOULD NOT make a feature required for clients until it has been
> >    _recommended_ for clients for at first 9 months.
> > 
> >    We SHOULD make a feature required for clients if it has been
> >    _recommended_ for clients for at least 18 months.
> I guess since we're talking about causing clients to exit() in both
> these cases, it might be OK to be conservative here...

Honestly, a client exit()ing is indeed a pain point but we get to that
situation because it is not safe anymore for the client to join the network.
I find that less worrying than relays starting to exit() all the sudden
because we've pushed a required protover, we end up with 3000 dead relays...

I would be for reducing those values much more. As an example, again with
prop289 (authenticated sendmes), we are talking a deployment plan that spans
almost 5 years...

We can't publish FlowCtrl=1 protover until 035 is EOL which is in 3 years and
then once we have that, we have another 9 months to go for _recommended_ and
then 18 months before we can force it in required.

This means 4.5 years of deployment for a _security_ feature that is overall
helping the network and specific attacks... I think we can do much better and
we should.

And maybe that comes with laxing our backport policy or rethinking our LTS?
I'm not entirely sure...

Historically, we do have a quick transition when a version is EOL and the
package follow, see the drops here:


As long as package follow, usually the majority of relays do upgrade to them
in matter of some months. Which re-enforces my point about Debian + packaging

> But again, I am really worried about future network scalability and
> performance upgrades getting stalled because we don't want to change
> things that fragment client anonymity.. Does that mean that for some
> kinds of new features, we can't flip a switch because we're trying to
> give clients another 1.5 years *past the EOL of the last LTS* to
> upgrade?

As more examples here, we were forced to backport the DoS subsystem down many
versions, that was some work!... Testing all relay versions, I had to
sometimes wait weeks before my relay could get the Guard flag again... The
pain was real at that time.

If we talk in terms of tor scaling, we'll start doing pretty big drastic
changes to the protocol or even just how tor the binary operates. If every
step takes _years_, we'll fail this "tor game" in my opinion over time.

I know pastly has some results about how different cell scheduling (KIST,
KISTLite, Vanilla) between relays is actually badly affecting the network...
and there is NO way to change that quickly until all our EOL dies out and

I'm almost at the point of proposing "remote relay upgrades" like Tor Browser
does ... :P. I know, hard, but at least we would be extremely agile on going
forward *but* also rolling back anything that f*** the network (and it
happened before, we had to rollout parameters). Tor Browser did insane work
there so we could cherry-pick on that imo.

> I would enjoy a session in Stockholm that walked through how we would
> use this proposal and proposal 264 to roll out a handful of involved
> changes, such as walking onions, onion service DoS protections, conflux,
> explicit congestion control signaling, full datagram Tor, etc. 


> It would be awesome if such a session could result in a proposal like
> this one, but the flip side: explaining how to use protovers to roll out
> involved features so that clients adopt them quickly and safely (and
> what sorts of changes can be done quickly, and what sorts of changes
> require waiting 4 years for LTS to EOL + 1.5 more years for clients to
> update so as not to fragment anonymity).

Yes, something concrete, something that after Stockholm we can be happy with
and apply it. Not just brainstorm and then this whole thing dies off.

But yes overall, I'm in favor that we think in terms of reducing the waiting
time for anything to be rolled out on the network instead of waiting years for
one single feature to be fully deployed. Some comes from us to change a bit
our policy but also a lot comes from our relay operator to be good operators
and upgrade to our stables much faster.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20190523/b48a4874/attachment.sig>

More information about the tor-dev mailing list