Mike Perry mikeperry@torproject.org wrote:
It's occurred to me that we have yet to provide any official recommendations with respect to best practices for operating Tor relays.
While you seem to be focused on improving the security, in my opinion a best practices document should also mention which practices will guarantee that the relay gets a bad exit flag if they become known.
Including parts of or linking to: https://trac.torproject.org/projects/tor/wiki/doc/badRelays probably wouldn't hurt.
I only became aware of the page recently myself. I was pleasantly surprised, as I previously was under the impression that running an intercepting HTTP cache on an exit relay "to safe traffic" was still considered acceptable.
Attack Vector #2: Advanced Persistent Threat Key Theft
I'm confused by the use of the buzzword APT. In my experience it's commonly used to describe an imaginary and nearly omnipotent attacker who has a more or less unlimited budget and isn't limited to actually using the network to achieve its goal.
This definition is great if the main goal is spreading FUD to increase ones budget or getting votes, but it makes coming up with effective defences kind of hard.
This might explain why the definition is often used by government agencies, political parties and snake-oil vendors, but it doesn't explain why you would want to use it in a technical document.
I assume "your" APT is somehow less capable then "my" APT, but without knowing your definition I can't really tell if the proposed defenses are effective against it.
Adding your definition to the document would help, but personally I would prefer it if the term APT wouldn't be used at all.
If one-time methods fail or are beyond reach, the adversary has to resort to persistent machine compromise to retain access to node key material.
The APT attacker can use the same vector as #1 or perhaps an external vector such as daemon compromise, but they then must also plant a backdoor that would do something like trawl through the RAM of a machine, sniff out the keys (perhaps even grabbing the ephemeral TLS keys directly), and transmit them offsite for collection.
This is a significantly more expensive position for the adversary to maintain, because it is possible to notice upon a thorough forensic investigation during a perhaps unrelated incident, and it may trigger firewall warnings or other common least privilege defense alarms inadvertently.
I think this attack would be a lot more expensive than motivating the right Debian developers to compromise a significant part of the interesting Tor relays the next time they get updated.
This attack would not only be harder to defend against, it also sounds cool if we call it the apt(8)-based APT attack.
"My" APT could do that, but I assume "yours" can't?
Unfortunately, it is also a more expensive attack to defend against, because it requires extensive auditing and assurance mechanisms on the part of the relay operator.
I wonder how many relay operators are even motivated to protect against a highly capable attacker. For example "my" APT is unlikely to be after me and a compromise of my (or any other) relays is unlikely to significantly affect me personally.
While I'm not intentionally using an insecure system configuration and I don't expect my relays to be less secure than the average relay, I certainly expect them to be less secure than the system I'm using to write this message.
Defenses
It seems clear that the above indicates that at minimum relays should protect against one-time key compromise. Some further thought shows that it is possible to make the APT adversary's task harder as well, albeit with significantly more effort.
That's not clear to me at all. Are you saying that a relay operator who doesn't want to follow the "minimum" best practices document (once it exists) shouldn't be running a relay (or at least be embarrassed)?
Once you start your tor process(es), you will want to copy your identity key offsite, and then remove it. Tor does not need it to remain on disk after startup, and removing it ensures that an attacker must deploy a kernel exploit to obtain it from memory. While you should not re-use the identity key after unexplained reboots, you may want to retain a copy for planned reboots and tor maintenance.
How often can a relay regenerate the identity key without becoming a burden to the network?
I reused the identity keys after unexplained reboots in the past as I assumed the cost of a new key (unknown to me) would be higher than the cost of a compromise (unknown) multiplied by the likelihood of the occurrence (also unknown to me, but estimated to be rather low compared to other possible reboot causes).
In cases where a reboot is assumed to have been caused by a system compromise, I wouldn't consider merely regenerating the key without re-installing the whole system from known-good media "best practice" anyway.
Ok, that's it. What do people think? Personally, I think that if we can require a kernel exploit and/or weird memory gymnastics for key compromise, that would be a *huge* improvement. Do the above recommendations actually accomplish that?
Are "weird memory gymnastics" really that much more effort than getting the relevant keys through ptrace directly?
I suspect getting the keys through either mechanism might be trivial compared to getting the infrastructure in place to use the keys for a non-theoretical attack that is cost-effective.
I think your proposed measures might be useful for a relay operator with a compatible system who is interested in spending more time on his relay's security than he already is.
It's not clear to me, though, that they improve the security of the Tor network significantly enough to be worth requiring them or even calling them best practices (which could demotivate operators who can't or don't want to implement them).
Trying to require the steps or shaming operators into following them might reduce the number of relay operators (or limit their growth) significantly enough to make the attacks you seem to be concerned about cheaper ...
Having said that, I don't see anything wrong with putting your suggestions in a section that starts with a paragraph like:
| Here are a couple of things you could do to improve your | relay's security some more. Whether or not you consider | them worthwhile is up to you and if you decide against some | or all of them or if they don't work on your system, your | relay is still appreciated.
Even the APT defenses end up not working out, I would sleep a lot better at night if most relays deployed only the defenses to one-time key theft... Thoughts on that?
I'm not too worried about the APT.
Fabian