On 22 May (18:57:06), Mike Perry wrote:
Nick Mathewson:
Filename: 303-protover-removal-policy.txt Title: When and how to remove support for protocol versions Author: Nick Mathewson Created: 21 May 2019 Status: Draft
Background
With proposal 264, added support for "subprotocol versions" -- a means to declare which features are required for participation in the Tor network. We also created a mechanism (refined later in proposal 297) for telling Tor clients and relays that they cannot participate effectively in the Tor network, and they need to shut down.
In this document, we describe a policy according to which these decisions should be made in practice.
Recommending features (for clients and relays)
A subprotocol version SHOULD become recommended soon after all release series that did not provide it become unsupported (within a month or so).
For example, the current oldest LTS release series is 0.2.9; when it becomes unsupported in 2020, the oldest supported release series will be 0.3.5. Suppose that 0.2.9 supports a subprotocol Cupcake=1, and that all stable 0.3.5.x versions support Cupcake=1-3. Around one month after the end of 0.2.9 support, Cupcake=3 should become a _recommended_ protocol for clients and relays.
Additionally, a feature can become _recommended_ because of security reasons. If we believe that it is a terrible idea to run an old protocol, we can make it _recommended_ for relays or clients or both. We should not do this lightly, since it will be annoying.
To be clear, "_recommended_" and "_required_" terms here are from Proposal #264, Section 4, right? Aka the consensus lines?
These only affect WARN-vs-exit behavior by clients and relays that lack support, right? Clients and relays will still *negotiate* and use protocol versions that they both have, even if they are not listed as either recommended or required?
Afaiu, you can only negotiate what you know which is the protover list you support (the one advertised by relays).
For instance, if "Cupcake=1-3" is what you support as a client but the recommended is "Cupcake=2-3", you can still do "1" but you will be warned.
If _required_, let say "Cupcake=3" but the client is "Cupcake=1-2", then the client does _not_ join the network. If _required_ is "Cupcake=1-3" for both the relay and client, then yes they can use version "1" instead of "3" if I'm not mistaken else "Cupcake=3" should be used.
Are there cases where they don't/won't negotiate to use a new protover field, such as for anonymity fragmentation reasons? How do we handle those?
As an example for the prop289 (authenticated SENDMEs), we handle that with a consensus parameters that flip knobs at once to avoid partitioning problem as much as possible. _And_ then the protover is changed changed into the _recommended_ or _required_ field depending on where we are.
(I am trying to gauge the impact of this proposal on our ability to roll out new features that clients can use right away vs ensure that old clients and relays can still work. It seems to focus on the latter, and I want to get a handle on at what expense).
Requiring features (for relays)
We regularly update the directory authorities to require relays to run certain versions of Tor or later. We generally do this after a short outreach campaign to get as many relays as possible to upgrade.
We MAY make a feature required for relays one month after every version without it is obsolete and unsupported, though it is better to wait three months if possible.
We SHOULD make a feature required for relays within 12 months after every version without it is obsolete and unsupported.
As a cultural signaling thing, I think it is better to say to relay operators, "keep your relay's operating system and its Tor up to date, or please don't run it anymore (aka we'll shut it down for you)."
I think its bad culturally if we signal to people that we need relays so badly that it doesn't matter if they are unpatched, or if the OS is unpatched, or if they accidentally publish their relay and ssh keys to a public github repo. (Relays running on a system that hasn't received any patches or security updates in 12 months is the administrator diligence equivalent of publishing admin keys to public github, IMO, if not its actual functional equivalent).
Not only does it encourage a sloppy mindset about paying attention to relay systems, it also slows down our development of new protocols, and impedes major network upgrades.
I'm very much agreeing with this. We do "force-ask" the directory authorities to follow the latest stable up to at worst 2 stable behind. There are reasons for that, maintainenance but also security.
Relays have to be sharp at upgrading... A relay that is not, that we end up excluding from the consensus because the version is too dangerous (remember heartbleed), can be considered in my opinion more a liability then a useful piece.
Having capacity in my opinion is as important as having relays that are up to date. Every release we rollout very important features that if not deployed network wide, we don't get the benefit of them until years to come (basically when the previous LTS is EOL...).
And that forces us into a position of backporting sometimes big block of codes (DoS subsystem is one example).
Still today, there are still 1000+ relays (on 0.2.9) that can't be used for Onion Service v3... It is a 1/6th of the network and we've released relay support 2 years ago... And we have _specific_ code to avoid picking those relays so all these edge case also accumulates in the code over time.
(As an aside, I would like to take a hard look at the LTS series, and brainstorm how much it would cost us to provide official, reproducibly built repos for every distribution whose LTS policies we find expensive and cumbersome to support.. Or at least do some analysis of which changes have been or will be extremely expensive or impossible to roll out due to being blocked on needing to maintain the LTS).
If we could convince Debian to consider an EOL version a "security issue" and thus accept to pull in the new next supported stable in their stable package...... that would be grand because then even Debian LTS relay operator could still benefit from getting newer versions, improving the network and thus the security of all on Tor.
I know I know, challenges and sometimes a bad idea but with this proposal, it might be a good time to also take a hard look at how things are and change paradigm even if it means a painful transition.
Requiring features (for clients)
Clients take the longest time to update, and are often the least able to fetch upgrades. Because of this, we should be very careful about making subprotocol versions required on clients, and should only do so for fairly compelling reasons.
Is this true? From our Tor Browser metrics (which could use some kind of totaling), it looks like most Tor Browser users upgrade pretty quickly: https://metrics.torproject.org/webstats-tb.html
What kinds of clients don't upgrade? I got the impression that it was mostly things like old botnet cruft that didn't..
My guts feeling is that relays actually take longer...
We SHOULD NOT make a feature required for clients until it has been _recommended_ for clients for at first 9 months.
We SHOULD make a feature required for clients if it has been _recommended_ for clients for at least 18 months.
I guess since we're talking about causing clients to exit() in both these cases, it might be OK to be conservative here...
Honestly, a client exit()ing is indeed a pain point but we get to that situation because it is not safe anymore for the client to join the network. I find that less worrying than relays starting to exit() all the sudden because we've pushed a required protover, we end up with 3000 dead relays...
I would be for reducing those values much more. As an example, again with prop289 (authenticated sendmes), we are talking a deployment plan that spans almost 5 years...
We can't publish FlowCtrl=1 protover until 035 is EOL which is in 3 years and then once we have that, we have another 9 months to go for _recommended_ and then 18 months before we can force it in required.
This means 4.5 years of deployment for a _security_ feature that is overall helping the network and specific attacks... I think we can do much better and we should.
And maybe that comes with laxing our backport policy or rethinking our LTS? I'm not entirely sure...
Historically, we do have a quick transition when a version is EOL and the package follow, see the drops here:
https://metrics.torproject.org/versions.html?start=2018-02-22&end=2019-0...
As long as package follow, usually the majority of relays do upgrade to them in matter of some months. Which re-enforces my point about Debian + packaging ;).
But again, I am really worried about future network scalability and performance upgrades getting stalled because we don't want to change things that fragment client anonymity.. Does that mean that for some kinds of new features, we can't flip a switch because we're trying to give clients another 1.5 years *past the EOL of the last LTS* to upgrade?
As more examples here, we were forced to backport the DoS subsystem down many versions, that was some work!... Testing all relay versions, I had to sometimes wait weeks before my relay could get the Guard flag again... The pain was real at that time.
If we talk in terms of tor scaling, we'll start doing pretty big drastic changes to the protocol or even just how tor the binary operates. If every step takes _years_, we'll fail this "tor game" in my opinion over time.
I know pastly has some results about how different cell scheduling (KIST, KISTLite, Vanilla) between relays is actually badly affecting the network... and there is NO way to change that quickly until all our EOL dies out and transition...
I'm almost at the point of proposing "remote relay upgrades" like Tor Browser does ... :P. I know, hard, but at least we would be extremely agile on going forward *but* also rolling back anything that f*** the network (and it happened before, we had to rollout parameters). Tor Browser did insane work there so we could cherry-pick on that imo.
I would enjoy a session in Stockholm that walked through how we would use this proposal and proposal 264 to roll out a handful of involved changes, such as walking onions, onion service DoS protections, conflux, explicit congestion control signaling, full datagram Tor, etc.
+1
It would be awesome if such a session could result in a proposal like this one, but the flip side: explaining how to use protovers to roll out involved features so that clients adopt them quickly and safely (and what sorts of changes can be done quickly, and what sorts of changes require waiting 4 years for LTS to EOL + 1.5 more years for clients to update so as not to fragment anonymity).
Yes, something concrete, something that after Stockholm we can be happy with and apply it. Not just brainstorm and then this whole thing dies off.
But yes overall, I'm in favor that we think in terms of reducing the waiting time for anything to be rolled out on the network instead of waiting years for one single feature to be fully deployed. Some comes from us to change a bit our policy but also a lot comes from our relay operator to be good operators and upgrade to our stables much faster.
Cheers! David