Hello list,
hope everyone is safe and doing well!
I present you an initial draft of a proposal on PoW-based defences for onion services under DoS.
The proposal is not finished yet and it needs tuning and fixing. There are many places marked with XXX and TODO around the proposal that should be addressed.
The important part is that looking at the numbers it does seem like this proposal can work as a concept and serve its intended purpose. The most handwavey parts of the proposal right now are [INTRO_QUEUE] and [POW_SECURITY] and if this thing fails in the end, it's probably gonna be something that slipped over there. Hence, we should polish these sections before we proceed with any sort of engineering here.
In any case, I decided to send it to the list even in premature form, so that it can serve as a stable point of reference in subsequent discussions. It can also be found in my git repo: https://github.com/asn-d6/torspec/tree/pow-over-intro
Cheers and stay safe!
---
Filename: xxx-pow-over-intro-v1 Title: A First Take at PoW Over Introduction Circuits Author: George Kadianakis Created: 2 April 2020 Status: Draft
0. Abstract
This proposal aims to thwart introduction flooding DoS attacks by introducing a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
1. Motivation
So far our attempts at limiting the impact of introduction flooding DoS attacks on onion services has been focused on horizontal scaling with Onionbalance, optimizing the CPU usage of Tor and applying congestion control using rate limiting. While these measures move the goalpost forward, a core problem with onion service DoS is that building rendezvous circuits is a costly procedure both for the service and for the network. If we ever hope to have truly reachable global onion services, we need to make it harder for attackers to overload the service with introduction requests.
This proposal achieves this by allowing onion services to specify an optional dynamic proof-of-work scheme that its clients need to participate in if they want to get served.
With the right parameters, this proof-of-work scheme acts as a gatekeeper to block amplification attacks by attackers while letting legitimate clients through.
1.1. Threat model [THREAT_MODEL]
1.1.1. Attacker profiles [ATTACKER_MODEL]
This proposal is written to thwart specific attackers. A simple PoW proposal cannot defend against all and every DoS attack on the Internet, but there are adverary models we can defend against.
Let's start with some adversary profiles:
"The script-kiddie"
The script-kiddie has a single computer and pushes it to its limits. Perhaps it also has a VPS and a pwned server. We are talking about an attacker with total access to 10 Ghz of CPU and 10 GBs of RAM. We consider the total cost for this attacker to be zero $.
"The small botnet"
The small botnet is a bunch of computers lined up to do an introduction flooding attack. Assuming 500 medium-range computers, we are talking about an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We consider the upfront cost for this attacker to be about $400.
"The large botnet"
The large botnet is a serious operation with many thousands of computers organized to do this attack. Assuming 100k medium-range computers, we are talking about an attacker with total access to 200 Thz of CPU and 200 TB of RAM. The upfront cost for this attacker is about $36k.
We hope that this proposal can help us defend against the script-kiddie attacker and small botnets. To defend against a large botnet we would need more tools in our disposal (see [FUTURE_WORK]).
{XXX: Do the above make sense? What other attackers do we care about? What other metrics do we care about? Network speed? I got the botnet costs from here [REF_BOTNET] Back up our claims of defence.}
1.1.2. User profiles [USER_MODEL]
We have attackers and we have users. Here are a few user profiles:
"The standard web user"
This is a standard laptop/desktop user who is trying to browse the web. They don't know how these defences work and they don't care to configure or tweak them. They are gonna use the default values and if the site doesn't load, they are gonna close their browser and be sad at Tor. They run a 2Ghz computer with 4GB of RAM.
"The motivated user"
This is a user that really wants to reach their destination. They don't care about the journey; they just want to get there. They know what's going on; they are willing to tweak the default values and make their computer do expensive multi-minute PoW computations to get where they want to be.
"The mobile user"
This is a motivated user on a mobile phone. Even tho they want to read the news article, they don't have much leeway on stressing their machine to do more computation.
We hope that this proposal will allow the motivated user to always connect where they want to connect to, and also give more chances to the other user groups to reach the destination.
1.1.3. The DoS Catch-22 [CATCH22]
This proposal is not perfect and it does not cover all the use cases. Still, we think that by covering some use cases and giving reachability to the people who really need it, we will severely demotivate the attackers from continuing the DoS attacks and hence stop the DoS threat all together. Furthermore, by increasing the cost to launch a DoS attack, a big class of DoS attackers will disappear from the map, since the expected ROI will decrease.
2. System Overview
2.1. Tor protocol overview
+----------------------------------+ | | +-------+ INTRO1 +-----------+ INTRO2 +--------+ | |Client |-------->|Intro Point|------->| PoW |-----------+ | +-------+ +-----------+ |Verifier| | | +--------+ | | | | | | | | | +----------v---------+ | | |Intro Priority Queue| | +---------+--------------------+---+ | | | Rendezvous | | | circuits | | | v v v
The proof-of-work scheme specified in this proposal takes place during the introduction phase of the onion service protocol. It's an optional mechanism that only occurs if the service requires it. It can be enabled and disabled either through its torrc or through the control port.
In summary, the following steps are taken for the protocol to complete:
1) Service encodes PoW parameters in descriptor [DESC_POW] 2) Client fetches descriptor and computes PoW [CLIENT_POW] 3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW] 4) Service verifies PoW and queues introduction based on PoW effort [SERVICE_VERIFY]
2.2. Proof-of-work overview
2.2.1. Primitives
For our proof-of-work scheme we want to minimize the spread of resources between a motivated attacker and legitimate clients. This means that we are looking to minimize any benefits that GPUs or ACICs can offer to an attacker.
For this reason we chose argon2 [REF_ARGON2] as the hash function for our proof-of-work scheme since it's well audited and GPU-resistant and to some extend ASIC-resistant as well.
As a password hash function, argon2 by default outputs 32 bytes of hash, and takes as primary input a message and a nonce/salt. For the purposes of this specification we will define an argon2() function as: uint8_t hash_output[32] = argon2(uint8_t *message, uint8_t *nonce)'.
See section [ARGON_PARAMS] for more information on the secondary inputs of argon2.
2.2.2. Dynamic PoW
DoS is a dynamic problem where the attacker's capabilities constantly change, and hence we want our proof-of-work system to be dynamic and not stuck with a static difficulty setting. Hence, instead of forcing clients to go below a static target like in Bitcoin to be successful, we ask clients to "bid" using their PoW effort. Effectively, a client gets higher priority the higher effort they put into their proof-of-work. This is similar to how proof-of-stake works but instead of staking coins, you stake work.
The benefit here is that legitimate clients who really care about getting access can spend a big amount of effort into their PoW computation, which should guarantee access to the service given reasonable adversary models. See [POW_SECURITY] for more details about these guarantees and tradeoffs.
3. Protocol specification
3.1. Service encodes PoW parameters in descriptor [DESC_POW]
This whole protocol starts with the service encoding the PoW parameters in the 'encrypted' (inner) part of the v3 descriptor. As follows:
"pow-params" SP type SP seed-b64 SP expiration-time NL
[At most once]
type: The type of PoW system used. We call the one specified here "v1"
seed-b64: A random seed that should be used as the input to the PoW hash function. Should be 32 random bytes encoded in base64 without trailing padding.
expiration-time: A timestamp after which the above seed expires and is no longer valid as the input for PoW. It's needed so that the size of our replay cache does not grow infinitely. It should be set to an hour in the future (+- some randomness). {TODO: PARAM_TUNING}
{XXX: Expiration time makes us even more susceptible to clock skews, but it's needed so that our replay cache refreshes. How to fix this? See [CLIENT_BEHAVIOR] for more details.}
3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
If a client receives a descriptor with "pow-params", it should assume that the service is expecting a PoW input as part of the introduction protocol.
In such cases, the client should have been configured with a specific PoW 'target' (which is a 32-byte integer similar to the 'target' of Bitcoin [REF_TARGET]). See [POW_SECURITY] for more information of how such a target should be set. For the purposes of this section, we will assume that the target has been set automatically by Tor, or the user configured it manually.
Now the client parses the descriptor and extracts the PoW parameters. It makes sure that the expiration-time has not expired and if it has, it needs to fetch a new descriptor.
To complete the PoW the client follows the following logic:
a) Client generates 'nonce' as 32 random bytes. b) Client derives 'seed' by decoding 'seed-b64'. c) Client computes hash_output = argon2(seed, nonce) d) Client interprets hash_output as a 32-byte big-endian integer. e) Client checks if int(hash_output) <= target. e1) If yes, success! The client uses 'hash_output' as the hash and 'nonce' and 'seed' as its inputs. e2) If no, fail! The client interprets 'nonce' as a big-endian integer, increments it by one, and goes back to step (c).
At the end of the above procedure, the client should have a triplet (hash_output, seed, nonce) that can be used as the answer to the PoW puzzle. How quickly this happens depends solely on the 'target' parameter.
3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
Now that the client has an answer to the puzzle it's time to encode it into an INTRODUCE1 cell. To do so the client adds an extension to the encrypted portion of the INTRODUCE1 cell by using the EXTENSIONS field (see [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the INTRODUCE1 cell only gets read by the onion service and is ignored by the introduction point.
We propose a new EXT_FIELD_TYPE value:
[01] -- PROOF_OF_WORK
The EXT_FIELD content format is:
POW_VERSION [1 byte] POW_SEED [32 bytes] POW_NONCE [32 bytes] POW_OUTPUT [32 bytes]
where:
POW_VERSION is 1 for the protocol specified in this proposal POW_SEED is 'seed' from the section above POW_NONCE is 'nonce' from the section above POW_OUTPUT is 'hash_output' from the section above
{XXX: do we need POW_VERSION? Perhaps we can use EXT_FIELD_TYPE as version} {XXX: do we need to encode the SEED? Perhaps we can ommit it since the service already knows it. But what happens in cases of desynch, if client has diff seed from service?} {XXX: Do we need to include the output? Probably not. The service has to compute it anyway during verification. What's the use?}
This will increase the INTRODUCE1 payload size by 99 bytes since the extension type and length is 2 extra bytes, the N_EXTENSIONS field is always present and currently set to 0 and the EXT_FIELD is 97 bytes. According to ticket #33650, INTRODUCE1 cells currently have more than 200 bytes available.
3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it should check its configuration on whether proof-of-work is required to complete the introduction. If it's not required, the extension SHOULD BE ignored. If it is required, the service follows the procedure detailed in this section.
3.4.1. PoW verification
To verify the client's proof-of-work the service extracts (hash_output, seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
1) Make sure that the client's seed is identical to the active seed. 2) Check the client's nonce for replays (see [REPLAY_PROTECTION] section). 3) Verify that 'hash_output =?= argon2(seed, nonce)
If any of these steps fail the service MUST ignore this introduction request and abort the protocol.
If all the steps passed, then the circuit is added to the introduction queue as detailed in section [INTRO_QUEUE].
3.4.1.1. Replay protection [REPLAY_PROTECTION]
The service MUST NOT accept introduction requests with the same (seed, nonce) tuple. For this reason a replay protection mechanism must be employed.
The simplest way is to use a simple hash table to check whether a (seed, nonce) tuple has been used before for the actiev duration of a seed. Depending on how long a seed stays active this might be a viable solution with reasonable memory/time overhead.
If there is a worry that we might get too many introductions during the lifetime of a seed, we can use a Bloom filter as our replay cache mechanism. The probabilistic nature of Bloom filters means that sometimes we will flag some connections as replays even if they are not; with this false positive probability increasing as the number of entries increase. However, with the right parameter tuning this probability should be negligible and well handled by clients. {TODO: PARAM_TUNING}
3.4.2. The Introduction Queue [INTRO_QUEUE]
3.4.2.1. Adding introductions to the introduction queue
When PoW is enabled and a verified introduction comes through, the service instead of jumping straight into rendezvous, queues it and prioritizes it based on how much effort was devoted by the client to PoW. This means that introduction requests with high effort should be prioritized over those with low effort.
To do so, the service maintains an "introduction priority queue" data structure. Each element in that priority queue is an introduction request, and its priority is the effort put into its PoW:
When a verified introduction comes through, the service interprets the PoW hash as a 32-byte big-endian integer 'hash_int' and based on that integer it inserts it into the right position of the priority_queue: The smallest 'hash_int' goes forward in the queue. If two elements have the same value, the older one has priority over the newer one. {XXX: Is this operation with 32-bytes integers expensive? How to make cheaper?}
{TODO: PARAM_TUNING: If the priority queue is only ordered based on the effort what attacks can happen in various scenarios? Do we want to order on time+effort? Which scenarios and attackers should we examine here?}
{TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? Can we use WRED usefully?}
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
The service should handle introductions by pulling from the introduction queue.
Similar to how our cell scheduler works, the onion service subsystem will poll the priority queue every 100ms tick and process the first 20 cells from the priority queue (if they exist). The service will perform the rendezvous and the rest of the onion service protocol as normal.
With this tempo, we can process 200 introduction cells per second. {XXX: Is this good?}
{TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells per 100ms is probably unmaintainable, since each cell is quite expensive: doing so involving path selection, crypto and making circuits. We will need to profile this procedure and see how we can do this scheduling better.}
{XXX: This might be a nice place to promote multithreading. Queues and pools are nice objects to do multithreading since you can have multiple threads pull from the queue, or leave stuff on the queue. Not sure if this should be in the proposal tho.}
4. Attacker strategies [ATTACK_META]
Now that we defined our protocol we need to start tweaking the various knobs. But before we can do that, we first need to understand a few high-level attacker strategies to see what we are fighting against.
4.1.1. Total overwhelm strat
Given the way the introduction queue works (see [HANDLE_QUEUE]), a very effective strategy for the attacker is to totally overwhelm the queue processing by sending more high-effort introductions than the onion service can handle at any given tick.
To do so, the attacker would have to send at least 20 high-effort introduction cells every 100ms, where high-effort is a PoW which is above the estimated level of "the motivated user" (see [USER_MODEL]).
An easier attack for the adversary, is the same strategy but with introduction cells that are all above the comfortable level of "the standard user" (see [USER_MODEL]). This would block out all standard users and only allow motivated users to pass.
{XXX: What other attack strategies we should care about?}
5. Parameter tuning [POW_SECURITY]
There are various parameters in this system that need to be tuned.
We will first start by tuning the default difficulty of our PoW system. That's gonna define an expected time for attackers and clients to succeed.
We are then gonna tune the parameters of the argon2 hash function. That will define the resources that an attacker needs to spend to overwhelm the onion service, the resources that the service needs to spend to verify introduction requests, and the resources that legitimate clients need to spend to get to the onon service.
5.1. PoW Difficulty settings
The difficulty setting of our PoW basically dictates how difficult it should be to get a success in our PoW system. In classic PoW systems, "success" is defined as getting a hash output below the "target". However, since our system is dynamic, we define "success" as an abstract high-effort computation.
Even tho our system is dynamic, we still need default difficulty settings that will define the metagame. The client and attacker can still aim higher or lower, but for UX purposes and for analysis purposes we do need to define some difficulties.
We hence created the table (see [REF_TABLE]) below which shows how much time a legitimate client with a single machine should expect to burn before they get a single success. The x-axis is how many successes we want the attacker to be able to do per second: the more successes we allow the adversary, the more they can overwhelm our introduction queue. The y-axis is how many machines the adversary has in her disposal, ranging from just 5 to 1000.
=============================================================== | Expected Time (in seconds) Per Success For One Machine | =========================================================================== | | | Attacker Succeses 1 5 10 20 30 50 | | per second | | | | 5 5 1 0 0 0 0 | | 50 50 10 5 2 1 1 | | 100 100 20 10 5 3 2 | | Attacker 200 200 40 20 10 6 4 | | Boxes 300 300 60 30 15 10 6 | | 400 400 80 40 20 13 8 | | 500 500 100 50 25 16 10 | | 1000 1000 200 100 50 33 20 | | | ============================================================================
Here is how you can read the table above:
- If an adversary has a botnet with 1000 boxes, and we want to limit her to 1 success per second, then a legitimate client with a single box should be expected to spend 1000 seconds getting a single success.
- If an adversary has a botnet with 1000 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 200 seconds getting a single success.
- If an adversary has a botnet with 500 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 100 seconds getting a single success.
- If an adversary has access to 50 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 10 seconds getting a single success.
- If an adversary has access to 5 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 1 seconds getting a single success.
With the above table we can create some profiles for default values of our PoW difficulty. So for example, we can use the last case as the default parameter for Tor Browser, and then create three more profiles for more expensive cases, scaling up to the first case which could be hardest since the client is expected to spend 15 minutes for a single introduction.
{TODO: PARAM_TUNING You can see that this section is completely CPU/memory agnostic, and it does not take into account potential optimizations that can come from GPU/ASICs. This is intentional so that we don't put more variables into this equation right now, but as this proposal moves forward we will need to put more concrete values here.}
5.2. Argon2 parameters [ARGON_PARAMS]
We now need to define the secondary argon2 parameters as defined in [REF_ARGON2]. This includes the number of lanes 'h', the memory size 'm', the number of iterations 't'. Section 9 of [REF_ARGON2] recommends an approach of how to tune these parameters.
To tune these parameters we are looking to *minimize* the verification speed of an onion service, while *maximizing* the sparse resources spent by an adversary trying to overwhelm the service using [ATTACK_META].
When it comes to verification speed, to verify a single introduction cell the service needs to do a single argon2 call: so the service will need to do hundreds of those per second as INTRODUCE2 cells arrive. The service will have to do this verification step even for very cheap zero-effort PoW received, so this has to be a cheap procedure so that it doesn't become a DoS vector of each own. Hence each individual argon2 call must be cheap enough to be able to be done comfortably and plentifuly by an onion service with a single host (or horizontally scaled with Onionbalance).
At the same time, the adversary will have to do thousands of these calls if she wants to make high-effort PoW, so it's this assymetry that we are looking to exploit here. Right now, the most expensive resource for adversaries is the RAM size, and that's why we chose argon2 which is memory-hard.
To minmax this game we will need
{TODO: PARAM_TUNING: I've had a hard time minmaxing this game for argon2. Even argon2 invocations with a small memory parameter will take multiple milliseconds to run on my machine, and the parameters recommended in section 8 of the paper all take many hundreds of milliseconds. This is just not practical for our use case, since we want to process hundreds of such PoW per second... I also did not manage to find a benchmark of argon2 calls for different CPU/GPU/FPGA configurations.}
5. Client behavior [CLIENT_BEHAVIOR]
This proposal introduces a bunch of new ways where a legitimate client can fail to reach the onion service.
Furthermore, there is currently no end-to-end way for the onion service to inform the client that the introduction failed. The INTRO_ACK cell is not end-to-end (it's from the introduction point to the client) and hence it does not allow the service to inform the client that the rendezvous is never gonna occur.
Let's examine a few such cases:
5.1. Timeout issues
Alice can fail to reach the onion service if her introduction request falls off the priority queue, or if the priority queue is so big that the connection times out.
Is building a new introduction circuit sufficient here? Or do we need to build an end-to-end mechanism over the introduction circuit to inform her? {XXX}
How should timeout values change here since the priority queue will cause bigger delays than usual to rendezvous? Can there be some feedback mechanism to inform the client of its queue position or ETA?
5.2. Seed expiration issues
As mentioned in [DESC_POW], the expiration timestamp on the PoW seed can cause issues with clock skewed clients. Furthermore, even not clock skewed clients can encounter TOCTOU-style race conditions here.
How should this be handled? Should we have multiple active seeds at the same time similar to how we have overlapping descriptors and time periods in v3? This would solve the problem but it grows the complexity of the system substantially. {XXX}
5.3. Other descriptor issues
Another race condition here is if the service enables PoW, while a client has a cached descriptor. How will the client notice that PoW is needed? Does it need to fetch a new descriptor? Should there be another feedback mechanism? {XXX}
5. Discussion
5.1. UX
This proposal has user facing UX consequences. Here are a few UX approaches with increasing engineering difficulty:
a) Tor Browser needs a "range field" which the user can use to specify how much effort they want to spend in PoW if this ever occurs while they are browsing. The ranges could be from "Easy" to "Difficult", or we could try to estimate time using an average computer. This setting is in the Tor Browser settings and users need to find it.
b) We start with a default effort setting, and then we use the new onion errors (see #19251) to estimate when an onion service connection has failed because of DoS, and only then we present the user a "range field" which they can set dynamically. Detecting when an onion service connection has failed because of DoS can be hard because of the lack of feedback (see [CLIENT_BEHAVIOR])
c) We start with a default effort setting, and if things fail we automatically try to figure out an effort setting that will work for the user by doing some trial-and-error connections with different effort values. Until the connection succeeds we present a "Service is overwhelmed, please wait" message to the user.
For this proposal to work initially we need at least (a), and then we can start thinking of how far we want to take it.
5.2. Future directions [FUTURE_WORK]
This is just the beginning in DoS defences for Tor and there are various future avenues that we can investigate. Here is a brief summary of these:
"More advanced PoW schemes" -- We could use more advanced memory-hard PoW schemes like MTP-argon2 or Itsuku to make it even harder for adversaries to create successful PoWs. Unfortunately these schemes have much bigger proof sizes, and they won't fit in INTRODUCE1 cells. See #31223 for more details.
"Third-party anonymous credentials" -- We can use anonymous credentials and a third-party token issuance server on the clearnet to issue tokens based on PoW or CAPTCHA and then use those tokens to get access to the service. See [REF_CREDS] for more details.
"PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas where we present a hard puzzle to the user when connecting to the onion service, and if they solve it we then give the user a bunch of anonymous tokens that can be used in the future. This can all happen between the client and the service without a need for a third party.
All of the above approaches are much more complicated than this proposal, and hence we want to start easy before we get into more serious projects.
5.3. Environment
We love the environment! We are concerned of how PoW schemes can waste energy by doing useless hash iterations. Here is a few reasons we still decided to pursue a PoW approach here:
"We are not making things worse" -- DoS attacks are already happening and attackers are already burning energy to carry them out both on the attacker side, on the service side and on the network side. We think that asking legitimate clients to carry out PoW computations is not gonna affect the equation too much, since an attacker right now can very quickly cause the same damage that hundreds of legitimate clients do a whole day.
"We hope to make things better" -- The hope is that proposals like this will make the DoS actors go away and hence the PoW system will not be used. As long as DoS is happening there will be a waste of energy, but if we manage to demotivate them with technical means, the network as a whole will less wasteful. Also see [CATCH22] for a similar argument.
6. References
[REF_ARGON2]: https://github.com/P-H-C/phc-winner-argon2/blob/master/argon2-specs.pdf https://password-hashing.net/#argon2 [REF_TABLE]: The table is based on the script below plus some manual editing for readability: https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04 [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01... [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html [REF_TARGET]: https://en.bitcoin.it/wiki/Target
On 02 Apr (18:54:59), George Kadianakis wrote:
Hello list,
hope everyone is safe and doing well!
I present you an initial draft of a proposal on PoW-based defences for onion services under DoS.
The proposal is not finished yet and it needs tuning and fixing. There are many places marked with XXX and TODO around the proposal that should be addressed.
The important part is that looking at the numbers it does seem like this proposal can work as a concept and serve its intended purpose. The most handwavey parts of the proposal right now are [INTRO_QUEUE] and [POW_SECURITY] and if this thing fails in the end, it's probably gonna be something that slipped over there. Hence, we should polish these sections before we proceed with any sort of engineering here.
In any case, I decided to send it to the list even in premature form, so that it can serve as a stable point of reference in subsequent discussions. It can also be found in my git repo: https://github.com/asn-d6/torspec/tree/pow-over-intro
Cheers and stay safe!
Filename: xxx-pow-over-intro-v1 Title: A First Take at PoW Over Introduction Circuits Author: George Kadianakis Created: 2 April 2020 Status: Draft
- Abstract
This proposal aims to thwart introduction flooding DoS attacks by introducing a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
- Motivation
So far our attempts at limiting the impact of introduction flooding DoS attacks on onion services has been focused on horizontal scaling with Onionbalance, optimizing the CPU usage of Tor and applying congestion control using rate limiting. While these measures move the goalpost forward, a core problem with onion service DoS is that building rendezvous circuits is a costly procedure both for the service and for the network. If we ever hope to have truly reachable global onion services, we need to make it harder for attackers to overload the service with introduction requests.
This proposal achieves this by allowing onion services to specify an optional dynamic proof-of-work scheme that its clients need to participate in if they want to get served.
With the right parameters, this proof-of-work scheme acts as a gatekeeper to block amplification attacks by attackers while letting legitimate clients through.
1.1. Threat model [THREAT_MODEL]
1.1.1. Attacker profiles [ATTACKER_MODEL]
This proposal is written to thwart specific attackers. A simple PoW proposal cannot defend against all and every DoS attack on the Internet, but there are adverary models we can defend against.
Let's start with some adversary profiles:
"The script-kiddie"
The script-kiddie has a single computer and pushes it to its limits. Perhaps it also has a VPS and a pwned server. We are talking about an attacker with total access to 10 Ghz of CPU and 10 GBs of RAM. We consider the total cost for this attacker to be zero $.
"The small botnet"
The small botnet is a bunch of computers lined up to do an introduction flooding attack. Assuming 500 medium-range computers, we are talking about an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We consider the upfront cost for this attacker to be about $400.
"The large botnet"
The large botnet is a serious operation with many thousands of computers organized to do this attack. Assuming 100k medium-range computers, we are talking about an attacker with total access to 200 Thz of CPU and 200 TB of RAM. The upfront cost for this attacker is about $36k.
We hope that this proposal can help us defend against the script-kiddie attacker and small botnets. To defend against a large botnet we would need more tools in our disposal (see [FUTURE_WORK]).
{XXX: Do the above make sense? What other attackers do we care about? What other metrics do we care about? Network speed? I got the botnet costs from here [REF_BOTNET] Back up our claims of defence.}
1.1.2. User profiles [USER_MODEL]
We have attackers and we have users. Here are a few user profiles:
"The standard web user"
This is a standard laptop/desktop user who is trying to browse the web. They don't know how these defences work and they don't care to configure or tweak them. They are gonna use the default values and if the site doesn't load, they are gonna close their browser and be sad at Tor. They run a 2Ghz computer with 4GB of RAM.
"The motivated user"
This is a user that really wants to reach their destination. They don't care about the journey; they just want to get there. They know what's going on; they are willing to tweak the default values and make their computer do expensive multi-minute PoW computations to get where they want to be.
"The mobile user"
This is a motivated user on a mobile phone. Even tho they want to read the news article, they don't have much leeway on stressing their machine to do more computation.
We hope that this proposal will allow the motivated user to always connect where they want to connect to, and also give more chances to the other user groups to reach the destination.
1.1.3. The DoS Catch-22 [CATCH22]
This proposal is not perfect and it does not cover all the use cases. Still, we think that by covering some use cases and giving reachability to the people who really need it, we will severely demotivate the attackers from continuing the DoS attacks and hence stop the DoS threat all together. Furthermore, by increasing the cost to launch a DoS attack, a big class of DoS attackers will disappear from the map, since the expected ROI will decrease.
- System Overview
2.1. Tor protocol overview
+----------------------------------+ | |
+-------+ INTRO1 +-----------+ INTRO2 +--------+ | |Client |-------->|Intro Point|------->| PoW |-----------+ | +-------+ +-----------+ |Verifier| | | +--------+ | | | | | | | | | +----------v---------+ | | |Intro Priority Queue| | +---------+--------------------+---+ | | | Rendezvous | | | circuits | | | v v v
The proof-of-work scheme specified in this proposal takes place during the introduction phase of the onion service protocol. It's an optional mechanism that only occurs if the service requires it. It can be enabled and disabled either through its torrc or through the control port.
In summary, the following steps are taken for the protocol to complete:
- Service encodes PoW parameters in descriptor [DESC_POW]
- Client fetches descriptor and computes PoW [CLIENT_POW]
- Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
- Service verifies PoW and queues introduction based on PoW effort [SERVICE_VERIFY]
2.2. Proof-of-work overview
2.2.1. Primitives
For our proof-of-work scheme we want to minimize the spread of resources between a motivated attacker and legitimate clients. This means that we are looking to minimize any benefits that GPUs or ACICs can offer to an attacker.
For this reason we chose argon2 [REF_ARGON2] as the hash function for our proof-of-work scheme since it's well audited and GPU-resistant and to some extend ASIC-resistant as well.
As a password hash function, argon2 by default outputs 32 bytes of hash, and takes as primary input a message and a nonce/salt. For the purposes of this specification we will define an argon2() function as: uint8_t hash_output[32] = argon2(uint8_t *message, uint8_t *nonce)'.
See section [ARGON_PARAMS] for more information on the secondary inputs of argon2.
2.2.2. Dynamic PoW
DoS is a dynamic problem where the attacker's capabilities constantly change, and hence we want our proof-of-work system to be dynamic and not stuck with a static difficulty setting. Hence, instead of forcing clients to go below a static target like in Bitcoin to be successful, we ask clients to "bid" using their PoW effort. Effectively, a client gets higher priority the higher effort they put into their proof-of-work. This is similar to how proof-of-stake works but instead of staking coins, you stake work.
So this means that desktop users will be prioritized over mobile users basically unless I make my phone use X% of battery?
The benefit here is that legitimate clients who really care about getting access can spend a big amount of effort into their PoW computation, which should guarantee access to the service given reasonable adversary models. See [POW_SECURITY] for more details about these guarantees and tradeoffs.
- Protocol specification
3.1. Service encodes PoW parameters in descriptor [DESC_POW]
This whole protocol starts with the service encoding the PoW parameters in the 'encrypted' (inner) part of the v3 descriptor. As follows:
"pow-params" SP type SP seed-b64 SP expiration-time NL [At most once] type: The type of PoW system used. We call the one specified here "v1" seed-b64: A random seed that should be used as the input to the PoW hash function. Should be 32 random bytes encoded in base64 without trailing padding. expiration-time: A timestamp after which the above seed expires and is no longer valid as the input for PoW. It's needed so that the size of our replay cache does not grow infinitely. It should be set to an hour in the future (+- some randomness). {TODO: PARAM_TUNING}
Format is?
{XXX: Expiration time makes us even more susceptible to clock skews, but it's needed so that our replay cache refreshes. How to fix this? See [CLIENT_BEHAVIOR] for more details.}
Would probably allow some room like +/- 1 or 2 hours ... something like that unless this would fill our replay cache?
3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
If a client receives a descriptor with "pow-params", it should assume that the service is expecting a PoW input as part of the introduction protocol.
What happens with clients _without_ PoW support? They basically won't be able to connect I suppose? Or be put in the prio queue at the service at the very hand with work done = 0 ?
In such cases, the client should have been configured with a specific PoW 'target' (which is a 32-byte integer similar to the 'target' of Bitcoin [REF_TARGET]). See [POW_SECURITY] for more information of how such a target should be set. For the purposes of this section, we will assume that the target has been set automatically by Tor, or the user configured it manually.
Now the client parses the descriptor and extracts the PoW parameters. It makes sure that the expiration-time has not expired and if it has, it needs to fetch a new descriptor.
To complete the PoW the client follows the following logic:
a) Client generates 'nonce' as 32 random bytes. b) Client derives 'seed' by decoding 'seed-b64'. c) Client computes hash_output = argon2(seed, nonce) d) Client interprets hash_output as a 32-byte big-endian integer. e) Client checks if int(hash_output) <= target. e1) If yes, success! The client uses 'hash_output' as the hash and 'nonce' and 'seed' as its inputs. e2) If no, fail! The client interprets 'nonce' as a big-endian integer, increments it by one, and goes back to step (c).
At the end of the above procedure, the client should have a triplet (hash_output, seed, nonce) that can be used as the answer to the PoW puzzle. How quickly this happens depends solely on the 'target' parameter.
3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
Now that the client has an answer to the puzzle it's time to encode it into an INTRODUCE1 cell. To do so the client adds an extension to the encrypted portion of the INTRODUCE1 cell by using the EXTENSIONS field (see [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the INTRODUCE1 cell only gets read by the onion service and is ignored by the introduction point.
We propose a new EXT_FIELD_TYPE value:
[01] -- PROOF_OF_WORK
The EXT_FIELD content format is:
POW_VERSION [1 byte] POW_SEED [32 bytes] POW_NONCE [32 bytes] POW_OUTPUT [32 bytes]
where:
POW_VERSION is 1 for the protocol specified in this proposal POW_SEED is 'seed' from the section above POW_NONCE is 'nonce' from the section above POW_OUTPUT is 'hash_output' from the section above
{XXX: do we need POW_VERSION? Perhaps we can use EXT_FIELD_TYPE as version}
I would still keep it for the cost of 1 byte. Reason is that I think EXT_FIELD_TYPE should denote a "type of extension" and in this case anything related to PoW is 0x01. Then what comes next, depends on the POW_VERSION.
{XXX: do we need to encode the SEED? Perhaps we can ommit it since the service already knows it. But what happens in cases of desynch, if client has diff seed from service?}
Service has no way of notifying back the client that the PoW validation failed... so the service should just use the seed it has meaning not needed?
{XXX: Do we need to include the output? Probably not. The service has to compute it anyway during verification. What's the use?}
Same reason I would say. The only thing I could see that both the POW_SEED and POW_OUTPUT would be "useful" is if they could avoid the service doing validation by just comparing if these params?
This will increase the INTRODUCE1 payload size by 99 bytes since the extension type and length is 2 extra bytes, the N_EXTENSIONS field is always present and currently set to 0 and the EXT_FIELD is 97 bytes. According to ticket #33650, INTRODUCE1 cells currently have more than 200 bytes available.
3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it should check its configuration on whether proof-of-work is required to complete the introduction. If it's not required, the extension SHOULD BE ignored. If it is required, the service follows the procedure detailed in this section.
3.4.1. PoW verification
To verify the client's proof-of-work the service extracts (hash_output, seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
- Make sure that the client's seed is identical to the active seed.
- Check the client's nonce for replays (see [REPLAY_PROTECTION] section).
- Verify that 'hash_output =?= argon2(seed, nonce)
So wait, the service also has to do the PoW for each client by computing the Argon2 hash for each cell? Or am I mis-understanding?
If any of these steps fail the service MUST ignore this introduction request and abort the protocol.
If all the steps passed, then the circuit is added to the introduction queue as detailed in section [INTRO_QUEUE].
3.4.1.1. Replay protection [REPLAY_PROTECTION]
The service MUST NOT accept introduction requests with the same (seed, nonce) tuple. For this reason a replay protection mechanism must be employed.
The simplest way is to use a simple hash table to check whether a (seed, nonce) tuple has been used before for the actiev duration of a seed. Depending on how long a seed stays active this might be a viable solution with reasonable memory/time overhead.
If there is a worry that we might get too many introductions during the lifetime of a seed, we can use a Bloom filter as our replay cache mechanism. The probabilistic nature of Bloom filters means that sometimes we will flag some connections as replays even if they are not; with this false positive probability increasing as the number of entries increase. However, with the right parameter tuning this probability should be negligible and well handled by clients. {TODO: PARAM_TUNING}
3.4.2. The Introduction Queue [INTRO_QUEUE]
3.4.2.1. Adding introductions to the introduction queue
When PoW is enabled and a verified introduction comes through, the service instead of jumping straight into rendezvous, queues it and prioritizes it based on how much effort was devoted by the client to PoW. This means that introduction requests with high effort should be prioritized over those with low effort.
To do so, the service maintains an "introduction priority queue" data structure. Each element in that priority queue is an introduction request, and its priority is the effort put into its PoW:
When a verified introduction comes through, the service interprets the PoW hash as a 32-byte big-endian integer 'hash_int' and based on that integer it inserts it into the right position of the priority_queue: The smallest 'hash_int' goes forward in the queue. If two elements have the same value, the older one has priority over the newer one. {XXX: Is this operation with 32-bytes integers expensive? How to make cheaper?}
{TODO: PARAM_TUNING: If the priority queue is only ordered based on the effort what attacks can happen in various scenarios? Do we want to order on time+effort? Which scenarios and attackers should we examine here?}
{TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? Can we use WRED usefully?}
I think you'll be bound by the amount of data a connection inbuf can take which has an upper bound of 32 cells each read event.
Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells (at most 32) in that priority queue and once done, we would process it until we return to handling the connection inbuf.
In other words, the queue size, with tor's architecture, is bound to the number of cells upper bound you can get when doing a recv() pass which is 32 cells.
Nevertheless, that limit is weirdly hardcoded in tor so you should definitely think of a way to upper bound the queue and just drop the rest. A good starting point would be that 32 cells number?
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
The service should handle introductions by pulling from the introduction queue.
Similar to how our cell scheduler works, the onion service subsystem will poll the priority queue every 100ms tick and process the first 20 cells from the priority queue (if they exist). The service will perform the rendezvous and the rest of the onion service protocol as normal.
With this tempo, we can process 200 introduction cells per second.
As I described above, I think we might want to do something like that for simplicity at first which is "empty inbuf by priority queuing all INTRODUCE2" and once done, process them.
Thus, it won't be like the cell scheduler that accumulates until a certain tick (10msec) and then process it all.
{XXX: Is this good?}
{TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells per 100ms is probably unmaintainable, since each cell is quite expensive: doing so involving path selection, crypto and making circuits. We will need to profile this procedure and see how we can do this scheduling better.}
With the above, we should be within the same performance as we have right now since we just deferring the processing of INTRODUCE2 cell after the inbuf is emptied.
{XXX: This might be a nice place to promote multithreading. Queues and pools are nice objects to do multithreading since you can have multiple threads pull from the queue, or leave stuff on the queue. Not sure if this should be in the proposal tho.}
I would _love_ to but could be too early for that if we consider that we are still unsure that this defense will be useful or not (according to Mike as a discussion on IRC).
- Attacker strategies [ATTACK_META]
Now that we defined our protocol we need to start tweaking the various knobs. But before we can do that, we first need to understand a few high-level attacker strategies to see what we are fighting against.
4.1.1. Total overwhelm strat
Given the way the introduction queue works (see [HANDLE_QUEUE]), a very effective strategy for the attacker is to totally overwhelm the queue processing by sending more high-effort introductions than the onion service can handle at any given tick.
To do so, the attacker would have to send at least 20 high-effort introduction cells every 100ms, where high-effort is a PoW which is above the estimated level of "the motivated user" (see [USER_MODEL]).
An easier attack for the adversary, is the same strategy but with introduction cells that are all above the comfortable level of "the standard user" (see [USER_MODEL]). This would block out all standard users and only allow motivated users to pass.
{XXX: What other attack strategies we should care about?}
- Parameter tuning [POW_SECURITY]
There are various parameters in this system that need to be tuned.
We will first start by tuning the default difficulty of our PoW system. That's gonna define an expected time for attackers and clients to succeed.
We are then gonna tune the parameters of the argon2 hash function. That will define the resources that an attacker needs to spend to overwhelm the onion service, the resources that the service needs to spend to verify introduction requests, and the resources that legitimate clients need to spend to get to the onon service.
5.1. PoW Difficulty settings
The difficulty setting of our PoW basically dictates how difficult it should be to get a success in our PoW system. In classic PoW systems, "success" is defined as getting a hash output below the "target". However, since our system is dynamic, we define "success" as an abstract high-effort computation.
Even tho our system is dynamic, we still need default difficulty settings that will define the metagame. The client and attacker can still aim higher or lower, but for UX purposes and for analysis purposes we do need to define some difficulties.
We hence created the table (see [REF_TABLE]) below which shows how much time a legitimate client with a single machine should expect to burn before they get a single success. The x-axis is how many successes we want the attacker to be able to do per second: the more successes we allow the adversary, the more they can overwhelm our introduction queue. The y-axis is how many machines the adversary has in her disposal, ranging from just 5 to 1000.
=============================================================== | Expected Time (in seconds) Per Success For One Machine |
=========================================================================== | | | Attacker Succeses 1 5 10 20 30 50 | | per second | | | | 5 5 1 0 0 0 0 | | 50 50 10 5 2 1 1 | | 100 100 20 10 5 3 2 | | Attacker 200 200 40 20 10 6 4 | | Boxes 300 300 60 30 15 10 6 | | 400 400 80 40 20 13 8 | | 500 500 100 50 25 16 10 | | 1000 1000 200 100 50 33 20 | | | ============================================================================
Here is how you can read the table above:
If an adversary has a botnet with 1000 boxes, and we want to limit her to 1 success per second, then a legitimate client with a single box should be expected to spend 1000 seconds getting a single success.
If an adversary has a botnet with 1000 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 200 seconds getting a single success.
If an adversary has a botnet with 500 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 100 seconds getting a single success.
If an adversary has access to 50 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 10 seconds getting a single success.
If an adversary has access to 5 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 1 seconds getting a single success.
With the above table we can create some profiles for default values of our PoW difficulty. So for example, we can use the last case as the default parameter for Tor Browser, and then create three more profiles for more expensive cases, scaling up to the first case which could be hardest since the client is expected to spend 15 minutes for a single introduction.
{TODO: PARAM_TUNING You can see that this section is completely CPU/memory agnostic, and it does not take into account potential optimizations that can come from GPU/ASICs. This is intentional so that we don't put more variables into this equation right now, but as this proposal moves forward we will need to put more concrete values here.}
5.2. Argon2 parameters [ARGON_PARAMS]
We now need to define the secondary argon2 parameters as defined in [REF_ARGON2]. This includes the number of lanes 'h', the memory size 'm', the number of iterations 't'. Section 9 of [REF_ARGON2] recommends an approach of how to tune these parameters.
To tune these parameters we are looking to *minimize* the verification speed of an onion service, while *maximizing* the sparse resources spent by an adversary trying to overwhelm the service using [ATTACK_META].
When it comes to verification speed, to verify a single introduction cell the service needs to do a single argon2 call: so the service will need to do hundreds of those per second as INTRODUCE2 cells arrive. The service will have to do this verification step even for very cheap zero-effort PoW received, so this has to be a cheap procedure so that it doesn't become a DoS vector of each own. Hence each individual argon2 call must be cheap enough to be able to be done comfortably and plentifuly by an onion service with a single host (or horizontally scaled with Onionbalance).
At the same time, the adversary will have to do thousands of these calls if she wants to make high-effort PoW, so it's this assymetry that we are looking to exploit here. Right now, the most expensive resource for adversaries is the RAM size, and that's why we chose argon2 which is memory-hard.
To minmax this game we will need
{TODO: PARAM_TUNING: I've had a hard time minmaxing this game for argon2. Even argon2 invocations with a small memory parameter will take multiple milliseconds to run on my machine, and the parameters recommended in section 8 of the paper all take many hundreds of milliseconds. This is just not practical for our use case, since we want to process hundreds of such PoW per second... I also did not manage to find a benchmark of argon2 calls for different CPU/GPU/FPGA configurations.}
- Client behavior [CLIENT_BEHAVIOR]
This proposal introduces a bunch of new ways where a legitimate client can fail to reach the onion service.
Furthermore, there is currently no end-to-end way for the onion service to inform the client that the introduction failed. The INTRO_ACK cell is not end-to-end (it's from the introduction point to the client) and hence it does not allow the service to inform the client that the rendezvous is never gonna occur.
Let's examine a few such cases:
5.1. Timeout issues
Alice can fail to reach the onion service if her introduction request falls off the priority queue, or if the priority queue is so big that the connection times out.
Is building a new introduction circuit sufficient here? Or do we need to build an end-to-end mechanism over the introduction circuit to inform her? {XXX}
How should timeout values change here since the priority queue will cause bigger delays than usual to rendezvous? Can there be some feedback mechanism to inform the client of its queue position or ETA?
I don't see this proposal adding new delays for the rendezvous circuit because as of now, if you as a client get in the queue the 32th, you will be handled by the service after 32 cells but if you get in the priority queue the 32th, same situation.
Only way to inform the client I see would be a ACK from service to IP.
5.2. Seed expiration issues
As mentioned in [DESC_POW], the expiration timestamp on the PoW seed can cause issues with clock skewed clients. Furthermore, even not clock skewed clients can encounter TOCTOU-style race conditions here.
How should this be handled? Should we have multiple active seeds at the same time similar to how we have overlapping descriptors and time periods in v3? This would solve the problem but it grows the complexity of the system substantially. {XXX}
5.3. Other descriptor issues
Another race condition here is if the service enables PoW, while a client has a cached descriptor. How will the client notice that PoW is needed? Does it need to fetch a new descriptor? Should there be another feedback mechanism? {XXX}
I assume current behavior would kick in that is failing to introduce, ditch descriptor, refetch and succeed.
Without a feedback from the service, not much we can do there :S.
- Discussion
5.1. UX
This proposal has user facing UX consequences. Here are a few UX approaches with increasing engineering difficulty:
a) Tor Browser needs a "range field" which the user can use to specify how much effort they want to spend in PoW if this ever occurs while they are browsing. The ranges could be from "Easy" to "Difficult", or we could try to estimate time using an average computer. This setting is in the Tor Browser settings and users need to find it.
b) We start with a default effort setting, and then we use the new onion errors (see #19251) to estimate when an onion service connection has failed because of DoS, and only then we present the user a "range field" which they can set dynamically. Detecting when an onion service connection has failed because of DoS can be hard because of the lack of feedback (see [CLIENT_BEHAVIOR])
c) We start with a default effort setting, and if things fail we automatically try to figure out an effort setting that will work for the user by doing some trial-and-error connections with different effort values. Until the connection succeeds we present a "Service is overwhelmed, please wait" message to the user.
For this proposal to work initially we need at least (a), and then we can start thinking of how far we want to take it.
This is not a simple concept for non technical users. A default value will be used 99.9% of the time so I would strongly consider making it hard on ourselves to find a good value instead of the other way. And possibly never exposing that "range of effort" to the user, could be done all under the hood.
5.2. Future directions [FUTURE_WORK]
This is just the beginning in DoS defences for Tor and there are various future avenues that we can investigate. Here is a brief summary of these:
"More advanced PoW schemes" -- We could use more advanced memory-hard PoW schemes like MTP-argon2 or Itsuku to make it even harder for adversaries to create successful PoWs. Unfortunately these schemes have much bigger proof sizes, and they won't fit in INTRODUCE1 cells. See #31223 for more details.
"Third-party anonymous credentials" -- We can use anonymous credentials and a third-party token issuance server on the clearnet to issue tokens based on PoW or CAPTCHA and then use those tokens to get access to the service. See [REF_CREDS] for more details.
"PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas where we present a hard puzzle to the user when connecting to the onion service, and if they solve it we then give the user a bunch of anonymous tokens that can be used in the future. This can all happen between the client and the service without a need for a third party.
All of the above approaches are much more complicated than this proposal, and hence we want to start easy before we get into more serious projects.
5.3. Environment
We love the environment! We are concerned of how PoW schemes can waste energy by doing useless hash iterations. Here is a few reasons we still decided to pursue a PoW approach here:
"We are not making things worse" -- DoS attacks are already happening and attackers are already burning energy to carry them out both on the attacker side, on the service side and on the network side. We think that asking legitimate clients to carry out PoW computations is not gonna affect the equation too much, since an attacker right now can very quickly cause the same damage that hundreds of legitimate clients do a whole day.
"We hope to make things better" -- The hope is that proposals like this will make the DoS actors go away and hence the PoW system will not be used. As long as DoS is happening there will be a waste of energy, but if we manage to demotivate them with technical means, the network as a whole will less wasteful. Also see [CATCH22] for a similar argument.
- References
https://password-hashing.net/#argon2
[REF_TABLE]: The table is based on the script below plus some manual editing for readability: https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04 [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01... [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html [REF_TARGET]: https://en.bitcoin.it/wiki/Target
Good stuff asn!!!
Cheers! David
Trimming to stuff that I just want to reply to; I otherwise agree.
Note: in a couple places I replied directly to asn's OP, because I noticed some more questions that I could answer.
On 4/2/20 2:30 PM, David Goulet wrote:
On 02 Apr (18:54:59), George Kadianakis wrote:
2.2.2. Dynamic PoW
DoS is a dynamic problem where the attacker's capabilities constantly change, and hence we want our proof-of-work system to be dynamic and not stuck with a static difficulty setting. Hence, instead of forcing clients to go below a static target like in Bitcoin to be successful, we ask clients to "bid" using their PoW effort. Effectively, a client gets higher priority the higher effort they put into their proof-of-work. This is similar to how proof-of-stake works but instead of staking coins, you stake work.
So this means that desktop users will be prioritized over mobile users basically unless I make my phone use X% of battery?
Yes. We should be clear that this is not meant to be on all the time, and that yes, it is likely to sacrifice access by mobile users, depending on the current attack volume/difficulty level.
The Tor Browser Android UI could inform users the service is under attack and direct them try on their desktop instead.
If a client receives a descriptor with "pow-params", it should assume that the service is expecting a PoW input as part of the introduction protocol.
What happens with clients _without_ PoW support? They basically won't be able to connect I suppose? Or be put in the prio queue at the service at the very hand with work done = 0 ?
The work_done=0 will be better. Fake work with actual work_done=0 is just as easy to create as omitting work, for an attacker.
3.4.1. PoW verification
To verify the client's proof-of-work the service extracts (hash_output, seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
- Make sure that the client's seed is identical to the active seed.
- Check the client's nonce for replays (see [REPLAY_PROTECTION] section).
- Verify that 'hash_output =?= argon2(seed, nonce)
So wait, the service also has to do the PoW for each client by computing the Argon2 hash for each cell? Or am I mis-understanding?
Yes, but the service side only has to run the hash once, which should be fast.
The client/attacker is hashing many times, to search for a nonce that will satisfy the target hash value comparison.
But oops! We forgot to list that directly above, which might have caused the confusion:
0) Check that hash_output <= target_level
3.4.2. The Introduction Queue [INTRO_QUEUE]
3.4.2.1. Adding introductions to the introduction queue
When PoW is enabled and a verified introduction comes through, the service instead of jumping straight into rendezvous, queues it and prioritizes it based on how much effort was devoted by the client to PoW. This means that introduction requests with high effort should be prioritized over those with low effort.
To do so, the service maintains an "introduction priority queue" data structure. Each element in that priority queue is an introduction request, and its priority is the effort put into its PoW:
When a verified introduction comes through, the service interprets the PoW hash as a 32-byte big-endian integer 'hash_int' and based on that integer it inserts it into the right position of the priority_queue: The smallest 'hash_int' goes forward in the queue. If two elements have the same value, the older one has priority over the newer one. {XXX: Is this operation with 32-bytes integers expensive? How to make cheaper?}
One option: either subtract or and-off the target, so we're comparing difficulty relative to the target - ie a much smaller integer space.
In blockchain world, typically the difficulty target is expressed as a bitmask anyway, probably for reasons like this.
{TODO: PARAM_TUNING: If the priority queue is only ordered based on the effort what attacks can happen in various scenarios? Do we want to order on time+effort? Which scenarios and attackers should we examine here?}
{TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? Can we use WRED usefully?}
I think you'll be bound by the amount of data a connection inbuf can take which has an upper bound of 32 cells each read event.
Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells (at most 32) in that priority queue and once done, we would process it until we return to handling the connection inbuf.
In other words, the queue size, with tor's architecture, is bound to the number of cells upper bound you can get when doing a recv() pass which is 32 cells.
Nevertheless, that limit is weirdly hardcoded in tor so you should definitely think of a way to upper bound the queue and just drop the rest. A good starting point would be that 32 cells number?
dgoulet and I think the following will work better than always doing 32 cells at a time. Basically, the idea is to split our INTRO2 handling into "top-half" and "bottom-half" handlers.
Top-half handling: 1) read 32 cells off inbuf as usual 2) do AES relay cell decryption as usual 3) Parse relay headers, handle all cells as usual, except: a) in hs_service_receive_introduce2(), add to pqueue and return without further processing of those 4) Return to rest of mainloop
Then, separately, also in mainloop, do the bottom half. (TODO: What group priority?)
Bottom-half handling: I) pop a single intro2 off of pqueue (ie: max difficulty in queue) II) Compare this difficulty to desc difficulty. If lower, lower desc difficulty III) Parse it and launch RP circuit (as per current bottom half of hs_service_receive_introduce2()) IV) trim pqueue elements, if queue "too big" (TODO: how to trim?) V) Compare trim point difficulty to descriptor difficulty, if trim point was higher than descriptor value, raise desc difficulty VI) return to libevent/mainloop again
The astute reader will note that even without PoW, the above can provide almost the exact same functionality as the naive rate limiting currently done at intropoints - just cut the queue arbitrarily. Basically PoW and the pqueue just gives us a smarter way to decide who to reply to.
However, it still has the following potential issues: A) AES will bottleneck us at ~100Mbit-300Mbit at #2 in top-half above B) Extra mainloop() iterations for INTRO2s may be expensive (or not?)
For A, onionbalance will still help by adding new back-end instances providing intropoints via separate back-end Tor daemons, either on the same box or different boxes.
But it will only help up to a point. A HSDesc maxes out at 30k, so at some point we'll run out of space to list more intropoints in a single descriptor. At that point, we can still list different intropoints at each HSDir position, but after that, we are screwed.
We can alternatively avoid the AES bottleneck by moving this whole system to the intropoint tor relay. Basically, we split hs_intro_received_introduce1() into a top and bottom half in almost exactly the same way we split hs_service_receive_introduce2(), and we use the unencrypted extension field instead of the encrypted one.
This is a very straight-forward v1.5 change, but it requires intropoints on the network to upgrade for it to work.
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
The service should handle introductions by pulling from the introduction queue.
Similar to how our cell scheduler works, the onion service subsystem will poll the priority queue every 100ms tick and process the first 20 cells from the priority queue (if they exist). The service will perform the rendezvous and the rest of the onion service protocol as normal.
With this tempo, we can process 200 introduction cells per second.
As I described above, I think we might want to do something like that for simplicity at first which is "empty inbuf by priority queuing all INTRODUCE2" and once done, process them.
Thus, it won't be like the cell scheduler that accumulates until a certain tick (10msec) and then process it all.
{XXX: Is this good?}
{TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells per 100ms is probably unmaintainable, since each cell is quite expensive: doing so involving path selection, crypto and making circuits. We will need to profile this procedure and see how we can do this scheduling better.}
With the above, we should be within the same performance as we have right now since we just deferring the processing of INTRODUCE2 cell after the inbuf is emptied.
{XXX: This might be a nice place to promote multithreading. Queues and pools are nice objects to do multithreading since you can have multiple threads pull from the queue, or leave stuff on the queue. Not sure if this should be in the proposal tho.}
I would _love_ to but could be too early for that if we consider that we are still unsure that this defense will be useful or not (according to Mike as a discussion on IRC).
As described above, multithreading still provides a multiplier in the AES bottleneck case, even over onionbalance.
But, there may be more bottlenecks than just AES crypto, so this is a further argument for not jumping the gun just yet, and trying v1 first (or even a simple prototype without pow, that just cuts the queue arbitrarily), and getting some more profiling data.
Next steps (not necessarily in order):
a) pqueue plan review + more detailed description b) Figure out pqueue trim mechanism - can we do better than O(n)? c) test using randomx as a hash function in various ways, esp wrt key usage, cache setup, and VM infos d) test pqueue refactoring, maybe without pow involved yet e) specify a v1.5 that works at intropoint (to avoid AES bottleneck) f) Merge this thread back into a single proposal document g) other stuff we forgot, XXX's, TODOs, etc.
On 06 Apr (17:08:12), Mike Perry wrote:
[snip]
I think you'll be bound by the amount of data a connection inbuf can take which has an upper bound of 32 cells each read event.
Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells (at most 32) in that priority queue and once done, we would process it until we return to handling the connection inbuf.
In other words, the queue size, with tor's architecture, is bound to the number of cells upper bound you can get when doing a recv() pass which is 32 cells.
Nevertheless, that limit is weirdly hardcoded in tor so you should definitely think of a way to upper bound the queue and just drop the rest. A good starting point would be that 32 cells number?
dgoulet and I think the following will work better than always doing 32 cells at a time. Basically, the idea is to split our INTRO2 handling into "top-half" and "bottom-half" handlers.
Top-half handling:
- read 32 cells off inbuf as usual
- do AES relay cell decryption as usual
- Parse relay headers, handle all cells as usual, except: a) in hs_service_receive_introduce2(), add to pqueue and return without further processing of those
- Return to rest of mainloop
Agree with the above. It is trivial to do this today so very low engineering cost.
Then, separately, also in mainloop, do the bottom half. (TODO: What group priority?)
Bottom-half handling: I) pop a single intro2 off of pqueue (ie: max difficulty in queue) II) Compare this difficulty to desc difficulty. If lower, lower desc difficulty III) Parse it and launch RP circuit (as per current bottom half of hs_service_receive_introduce2()) IV) trim pqueue elements, if queue "too big" (TODO: how to trim?) V) Compare trim point difficulty to descriptor difficulty, if trim point was higher than descriptor value, raise desc difficulty VI) return to libevent/mainloop again
I would maybe try to convince you that we could dequeue more than 1 cell here because this behavior is changing quite a bit the current state of HS.
Right now, we would get 32 cells out of the inbuf and one at a time, process it then go back to mainloop.
This new algorithm means that we would process 1 single cell at each mainloop event instead of 32. This is quite a decrease. Ok, it is not that exact ration because maybe dequeue the inbuf without processing the decrypted INTRO2 is fast but it is still a full mainloop run per cell is clearly slower than right now.
We need some sort of performance measurements here to make an informed decision but my guts feeling tells me that we might want to don't know process 5 or 10 cells instead of 1 per mainloop round.
We _should_ run timing measurement here to see how much delaying INTRO2 processing to another mainloop event affects the overall rate of introduction.
But let say a full mainloop run takes 100msec, we will process 50 introductions per second... that looks quite low? But could be already what we do now, unknown.
The astute reader will note that even without PoW, the above can provide almost the exact same functionality as the naive rate limiting currently done at intropoints - just cut the queue arbitrarily. Basically PoW and the pqueue just gives us a smarter way to decide who to reply to.
However, it still has the following potential issues: A) AES will bottleneck us at ~100Mbit-300Mbit at #2 in top-half above B) Extra mainloop() iterations for INTRO2s may be expensive (or not?)
Possibly, from the above, some analysis should happen. I can easily do that once we get the tracing API upstream.
For A, onionbalance will still help by adding new back-end instances providing intropoints via separate back-end Tor daemons, either on the same box or different boxes.
But it will only help up to a point. A HSDesc maxes out at 30k, so at some point we'll run out of space to list more intropoints in a single descriptor. At that point, we can still list different intropoints at each HSDir position, but after that, we are screwed.
Small correctino. HSDesc max at 50k for v3 and 20k for v2. But lets just consider v3 for the forseable future :D.
[snip]
I would _love_ to but could be too early for that if we consider that we are still unsure that this defense will be useful or not (according to Mike as a discussion on IRC).
As described above, multithreading still provides a multiplier in the AES bottleneck case, even over onionbalance.
But, there may be more bottlenecks than just AES crypto, so this is a further argument for not jumping the gun just yet, and trying v1 first (or even a simple prototype without pow, that just cuts the queue arbitrarily), and getting some more profiling data.
As an initial step, I agree. Onionbalance provides an easy way for service to outsource client introduction to more CPUs.
But, in my opinion, onionbalance is a power user solution and thus usually a small percentage of our .onion users that can take advantage of it. As .onion move more and more in the mobile sphere and client to client applications (onionshare, ricochet), it ain't much of an option :S.
Without using more CPUs in Tor, I have a _hard_ time seeing tor scale over time especially for services. As long as we keep that in mind with our designs, I'm good :).
Next steps (not necessarily in order):
a) pqueue plan review + more detailed description b) Figure out pqueue trim mechanism - can we do better than O(n)?
Initially, we could simply go with an upper limit and just drop cells as you queue them if you are above limit? As in drop back() if you reach the limit everytime you queue?
Else, we can get into more complicated schemes with queueing rate versus processing rate and come down with a golden number to strike a memory and CPU balance...?
c) test using randomx as a hash function in various ways, esp wrt key usage, cache setup, and VM infos d) test pqueue refactoring, maybe without pow involved yet e) specify a v1.5 that works at intropoint (to avoid AES bottleneck) f) Merge this thread back into a single proposal document
We could put the engineering details could be in an Annexe of the proposal if we don't want to loose track of it and not dispersed on a mailing list :)?
I'm happy to help with this, let me know.
g) other stuff we forgot, XXX's, TODOs, etc.
Cheers! David
Phew! This is loooonnnng but excellent! Comments in-line!
On 4/2/20 10:54 AM, George Kadianakis wrote:i
Hello list,
hope everyone is safe and doing well!
I present you an initial draft of a proposal on PoW-based defences for onion services under DoS.
The proposal is not finished yet and it needs tuning and fixing. There are many places marked with XXX and TODO around the proposal that should be addressed.
The important part is that looking at the numbers it does seem like this proposal can work as a concept and serve its intended purpose. The most handwavey parts of the proposal right now are [INTRO_QUEUE] and [POW_SECURITY] and if this thing fails in the end, it's probably gonna be something that slipped over there. Hence, we should polish these sections before we proceed with any sort of engineering here.
In any case, I decided to send it to the list even in premature form, so that it can serve as a stable point of reference in subsequent discussions. It can also be found in my git repo: https://github.com/asn-d6/torspec/tree/pow-over-intro
Cheers and stay safe!
Filename: xxx-pow-over-intro-v1 Title: A First Take at PoW Over Introduction Circuits Author: George Kadianakis Created: 2 April 2020 Status: Draft
- Abstract
This proposal aims to thwart introduction flooding DoS attacks by introducing a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
- Motivation
So far our attempts at limiting the impact of introduction flooding DoS attacks on onion services has been focused on horizontal scaling with Onionbalance, optimizing the CPU usage of Tor and applying congestion control using rate limiting. While these measures move the goalpost forward, a core problem with onion service DoS is that building rendezvous circuits is a costly procedure both for the service and for the network. If we ever hope to have truly reachable global onion services, we need to make it harder for attackers to overload the service with introduction requests.
This proposal achieves this by allowing onion services to specify an optional dynamic proof-of-work scheme that its clients need to participate in if they want to get served.
With the right parameters, this proof-of-work scheme acts as a gatekeeper to block amplification attacks by attackers while letting legitimate clients through.
1.1. Threat model [THREAT_MODEL]
1.1.1. Attacker profiles [ATTACKER_MODEL]
This proposal is written to thwart specific attackers. A simple PoW proposal cannot defend against all and every DoS attack on the Internet, but there are adverary models we can defend against.
Let's start with some adversary profiles:
"The script-kiddie"
The script-kiddie has a single computer and pushes it to its limits. Perhaps it also has a VPS and a pwned server. We are talking about an attacker with total access to 10 Ghz of CPU and 10 GBs of RAM. We consider the total cost for this attacker to be zero $.
"The small botnet"
The small botnet is a bunch of computers lined up to do an introduction flooding attack. Assuming 500 medium-range computers, we are talking about an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We consider the upfront cost for this attacker to be about $400.
"The large botnet"
The large botnet is a serious operation with many thousands of computers organized to do this attack. Assuming 100k medium-range computers, we are talking about an attacker with total access to 200 Thz of CPU and 200 TB of RAM. The upfront cost for this attacker is about $36k.
1.1.2. User profiles [USER_MODEL]
We have attackers and we have users. Here are a few user profiles:
"The standard web user"
This is a standard laptop/desktop user who is trying to browse the web. They don't know how these defences work and they don't care to configure or tweak them. They are gonna use the default values and if the site doesn't load, they are gonna close their browser and be sad at Tor. They run a 2Ghz computer with 4GB of RAM.
"The motivated user"
This is a user that really wants to reach their destination. They don't care about the journey; they just want to get there. They know what's going on; they are willing to tweak the default values and make their computer do expensive multi-minute PoW computations to get where they want to be.
"The mobile user"
This is a motivated user on a mobile phone. Even tho they want to read the news article, they don't have much leeway on stressing their machine to do more computation.
We hope that this proposal will allow the motivated user to always connect where they want to connect to, and also give more chances to the other user groups to reach the destination.
1.1.3. The DoS Catch-22 [CATCH22]
This proposal is not perfect and it does not cover all the use cases. Still, we think that by covering some use cases and giving reachability to the people who really need it, we will severely demotivate the attackers from continuing the DoS attacks and hence stop the DoS threat all together. Furthermore, by increasing the cost to launch a DoS attack, a big class of DoS attackers will disappear from the map, since the expected ROI will decrease.
- System Overview
2.1. Tor protocol overview
+----------------------------------+ | |
+-------+ INTRO1 +-----------+ INTRO2 +--------+ | |Client |-------->|Intro Point|------->| PoW |-----------+ | +-------+ +-----------+ |Verifier| | | +--------+ | | | | | | | | | +----------v---------+ | | |Intro Priority Queue| | +---------+--------------------+---+ | | | Rendezvous | | | circuits | | | v v v
The proof-of-work scheme specified in this proposal takes place during the introduction phase of the onion service protocol. It's an optional mechanism that only occurs if the service requires it. It can be enabled and disabled either through its torrc or through the control port.
In summary, the following steps are taken for the protocol to complete:
- Service encodes PoW parameters in descriptor [DESC_POW]
- Client fetches descriptor and computes PoW [CLIENT_POW]
- Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
- Service verifies PoW and queues introduction based on PoW effort [SERVICE_VERIFY]
2.2. Proof-of-work overview
2.2.1. Primitives
For our proof-of-work scheme we want to minimize the spread of resources between a motivated attacker and legitimate clients. This means that we are looking to minimize any benefits that GPUs or ACICs can offer to an attacker.
For this reason we chose argon2 [REF_ARGON2] as the hash function for our proof-of-work scheme since it's well audited and GPU-resistant and to some extend ASIC-resistant as well.
FWIW, I think we should also consider https://github.com/tevador/RandomX, which is based on argon2 plus some additional sauce, and comes as a library with C exports, pretty much tuned for our usecase.
The downside is that it is C++ itself, but if we use it as an optional external build dep (only Tor Browser and onion services need this thing.. relays do not), that should be fine.
As a password hash function, argon2 by default outputs 32 bytes of hash, and takes as primary input a message and a nonce/salt. For the purposes of this specification we will define an argon2() function as: uint8_t hash_output[32] = argon2(uint8_t *message, uint8_t *nonce)'.
See section [ARGON_PARAMS] for more information on the secondary inputs of argon2.
2.2.2. Dynamic PoW
DoS is a dynamic problem where the attacker's capabilities constantly change, and hence we want our proof-of-work system to be dynamic and not stuck with a static difficulty setting. Hence, instead of forcing clients to go below a static target like in Bitcoin to be successful, we ask clients to "bid" using their PoW effort. Effectively, a client gets higher priority the higher effort they put into their proof-of-work. This is similar to how proof-of-stake works but instead of staking coins, you stake work.
The benefit here is that legitimate clients who really care about getting access can spend a big amount of effort into their PoW computation, which should guarantee access to the service given reasonable adversary models. See [POW_SECURITY] for more details about these guarantees and tradeoffs.
- Protocol specification
3.1. Service encodes PoW parameters in descriptor [DESC_POW]
This whole protocol starts with the service encoding the PoW parameters in the 'encrypted' (inner) part of the v3 descriptor. As follows:
"pow-params" SP type SP seed-b64 SP expiration-time NL [At most once] type: The type of PoW system used. We call the one specified here "v1" seed-b64: A random seed that should be used as the input to the PoW hash function. Should be 32 random bytes encoded in base64 without trailing padding. expiration-time: A timestamp after which the above seed expires and is no longer valid as the input for PoW. It's needed so that the size of our replay cache does not grow infinitely. It should be set to an hour in the future (+- some randomness). {TODO: PARAM_TUNING} {XXX: Expiration time makes us even more susceptible to clock skews, but it's needed so that our replay cache refreshes. How to fix this? See [CLIENT_BEHAVIOR] for more details.}
We should also include the lowest difficulty successfully serviced from the queue recently (last N seconds of time), in this field, as a hint of the difficulty level that clients should shoot for, as a minimum.
This will save them from waiting for quite so many timeouts, and from doing too much needless work.
3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
If a client receives a descriptor with "pow-params", it should assume that the service is expecting a PoW input as part of the introduction protocol.
In such cases, the client should have been configured with a specific PoW 'target' (which is a 32-byte integer similar to the 'target' of Bitcoin [REF_TARGET]). See [POW_SECURITY] for more information of how such a target should be set. For the purposes of this section, we will assume that the target has been set automatically by Tor, or the user configured it manually.
Now the client parses the descriptor and extracts the PoW parameters. It makes sure that the expiration-time has not expired and if it has, it needs to fetch a new descriptor.
To complete the PoW the client follows the following logic:
a) Client generates 'nonce' as 32 random bytes. b) Client derives 'seed' by decoding 'seed-b64'. c) Client computes hash_output = argon2(seed, nonce) d) Client interprets hash_output as a 32-byte big-endian integer. e) Client checks if int(hash_output) <= target. e1) If yes, success! The client uses 'hash_output' as the hash and 'nonce' and 'seed' as its inputs. e2) If no, fail! The client interprets 'nonce' as a big-endian integer, increments it by one, and goes back to step (c).
At the end of the above procedure, the client should have a triplet (hash_output, seed, nonce) that can be used as the answer to the PoW puzzle. How quickly this happens depends solely on the 'target' parameter.
Nice. Clarification (as per dgoule's verification question):
The hard part here is finding the nonce that matches the target. Verification (running the hash once) should be easy.
RandomX does require (relatively) a lot of setup for the VM, etc, so we will need to be careful about preserving the right pieces of setup there. But, if we do that right, we're fine.
3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
Now that the client has an answer to the puzzle it's time to encode it into an INTRODUCE1 cell. To do so the client adds an extension to the encrypted portion of the INTRODUCE1 cell by using the EXTENSIONS field (see [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the INTRODUCE1 cell only gets read by the onion service and is ignored by the introduction point.
We propose a new EXT_FIELD_TYPE value:
[01] -- PROOF_OF_WORK
The EXT_FIELD content format is:
POW_VERSION [1 byte] POW_SEED [32 bytes] POW_NONCE [32 bytes] POW_OUTPUT [32 bytes]
where:
POW_VERSION is 1 for the protocol specified in this proposal POW_SEED is 'seed' from the section above POW_NONCE is 'nonce' from the section above POW_OUTPUT is 'hash_output' from the section above
{XXX: do we need POW_VERSION? Perhaps we can use EXT_FIELD_TYPE as version} {XXX: do we need to encode the SEED? Perhaps we can ommit it since the service already knows it. But what happens in cases of desynch, if client has diff seed from service?} {XXX: Do we need to include the output? Probably not. The service has to compute it anyway during verification. What's the use?}
This will increase the INTRODUCE1 payload size by 99 bytes since the extension type and length is 2 extra bytes, the N_EXTENSIONS field is always present and currently set to 0 and the EXT_FIELD is 97 bytes. According to ticket #33650, INTRODUCE1 cells currently have more than 200 bytes available.
3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it should check its configuration on whether proof-of-work is required to complete the introduction. If it's not required, the extension SHOULD BE ignored. If it is required, the service follows the procedure detailed in this section.
3.4.1. PoW verification
To verify the client's proof-of-work the service extracts (hash_output, seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
- Make sure that the client's seed is identical to the active seed.
- Check the client's nonce for replays (see [REPLAY_PROTECTION] section).
- Verify that 'hash_output =?= argon2(seed, nonce)
If any of these steps fail the service MUST ignore this introduction request and abort the protocol.
If all the steps passed, then the circuit is added to the introduction queue as detailed in section [INTRO_QUEUE].
3.4.1.1. Replay protection [REPLAY_PROTECTION]
The service MUST NOT accept introduction requests with the same (seed, nonce) tuple. For this reason a replay protection mechanism must be employed.
The simplest way is to use a simple hash table to check whether a (seed, nonce) tuple has been used before for the actiev duration of a seed. Depending on how long a seed stays active this might be a viable solution with reasonable memory/time overhead.
If there is a worry that we might get too many introductions during the lifetime of a seed, we can use a Bloom filter as our replay cache mechanism. The probabilistic nature of Bloom filters means that sometimes we will flag some connections as replays even if they are not; with this false positive probability increasing as the number of entries increase. However, with the right parameter tuning this probability should be negligible and well handled by clients. {TODO: PARAM_TUNING}
3.4.2. The Introduction Queue [INTRO_QUEUE]
3.4.2.1. Adding introductions to the introduction queue
When PoW is enabled and a verified introduction comes through, the service instead of jumping straight into rendezvous, queues it and prioritizes it based on how much effort was devoted by the client to PoW. This means that introduction requests with high effort should be prioritized over those with low effort.
To do so, the service maintains an "introduction priority queue" data structure. Each element in that priority queue is an introduction request, and its priority is the effort put into its PoW:
When a verified introduction comes through, the service interprets the PoW hash as a 32-byte big-endian integer 'hash_int' and based on that integer it inserts it into the right position of the priority_queue: The smallest 'hash_int' goes forward in the queue. If two elements have the same value, the older one has priority over the newer one. {XXX: Is this operation with 32-bytes integers expensive? How to make cheaper?}
{TODO: PARAM_TUNING: If the priority queue is only ordered based on the effort what attacks can happen in various scenarios? Do we want to order on time+effort? Which scenarios and attackers should we examine here?}
{TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? Can we use WRED usefully?}
We should record the lowest difficulty level that was successfully serviced from the priority queue, and post it in the descriptor.
{TODO: PARAM_TUNING: Is lowest enough? Do we want to timebound that? How does it combine with options above? }
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
The service should handle introductions by pulling from the introduction queue.
Similar to how our cell scheduler works, the onion service subsystem will poll the priority queue every 100ms tick and process the first 20 cells from the priority queue (if they exist). The service will perform the rendezvous and the rest of the onion service protocol as normal.
With this tempo, we can process 200 introduction cells per second. {XXX: Is this good?}
{TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells per 100ms is probably unmaintainable, since each cell is quite expensive: doing so involving path selection, crypto and making circuits. We will need to profile this procedure and see how we can do this scheduling better.}
{XXX: This might be a nice place to promote multithreading. Queues and pools are nice objects to do multithreading since you can have multiple threads pull from the queue, or leave stuff on the queue. Not sure if this should be in the proposal tho.}
I think we should only do multi-threading in v1 if it still fits in a single Tor release cycle. Otherwise slide it to a v1.5 or v2.
Onionbalance is a backstop option instead of multithreading.. The traffic analysis characteristics are not as good for this case, so if it helps in practice, we should be sure to do multithreading ASAP (but still maybe not v1? It really depends on how deep a rabbit-hole it is for this case).
- Attacker strategies [ATTACK_META]
Now that we defined our protocol we need to start tweaking the various knobs. But before we can do that, we first need to understand a few high-level attacker strategies to see what we are fighting against.
4.1.1. Total overwhelm strat
Given the way the introduction queue works (see [HANDLE_QUEUE]), a very effective strategy for the attacker is to totally overwhelm the queue processing by sending more high-effort introductions than the onion service can handle at any given tick.
To do so, the attacker would have to send at least 20 high-effort introduction cells every 100ms, where high-effort is a PoW which is above the estimated level of "the motivated user" (see [USER_MODEL]).
If the queue generates a libevent callback event when it has N entries, is this better? (Is such a callback hard to create?)
An easier attack for the adversary, is the same strategy but with introduction cells that are all above the comfortable level of "the standard user" (see [USER_MODEL]). This would block out all standard users and only allow motivated users to pass.
{XXX: What other attack strategies we should care about?}
- Parameter tuning [POW_SECURITY]
There are various parameters in this system that need to be tuned.
We will first start by tuning the default difficulty of our PoW system. That's gonna define an expected time for attackers and clients to succeed.
We are then gonna tune the parameters of the argon2 hash function. That will define the resources that an attacker needs to spend to overwhelm the onion service, the resources that the service needs to spend to verify introduction requests, and the resources that legitimate clients need to spend to get to the onon service.
5.1. PoW Difficulty settings
The difficulty setting of our PoW basically dictates how difficult it should be to get a success in our PoW system. In classic PoW systems, "success" is defined as getting a hash output below the "target". However, since our system is dynamic, we define "success" as an abstract high-effort computation.
Even tho our system is dynamic, we still need default difficulty settings that will define the metagame. The client and attacker can still aim higher or lower, but for UX purposes and for analysis purposes we do need to define some difficulties.
We hence created the table (see [REF_TABLE]) below which shows how much time a legitimate client with a single machine should expect to burn before they get a single success. The x-axis is how many successes we want the attacker to be able to do per second: the more successes we allow the adversary, the more they can overwhelm our introduction queue. The y-axis is how many machines the adversary has in her disposal, ranging from just 5 to 1000.
=============================================================== | Expected Time (in seconds) Per Success For One Machine |
=========================================================================== | | | Attacker Succeses 1 5 10 20 30 50 | | per second | | | | 5 5 1 0 0 0 0 | | 50 50 10 5 2 1 1 | | 100 100 20 10 5 3 2 | | Attacker 200 200 40 20 10 6 4 | | Boxes 300 300 60 30 15 10 6 | | 400 400 80 40 20 13 8 | | 500 500 100 50 25 16 10 | | 1000 1000 200 100 50 33 20 | | | ============================================================================
Here is how you can read the table above:
If an adversary has a botnet with 1000 boxes, and we want to limit her to 1 success per second, then a legitimate client with a single box should be expected to spend 1000 seconds getting a single success.
If an adversary has a botnet with 1000 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 200 seconds getting a single success.
If an adversary has a botnet with 500 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 100 seconds getting a single success.
If an adversary has access to 50 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 10 seconds getting a single success.
If an adversary has access to 5 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 1 seconds getting a single success.
With the above table we can create some profiles for default values of our PoW difficulty. So for example, we can use the last case as the default parameter for Tor Browser, and then create three more profiles for more expensive cases, scaling up to the first case which could be hardest since the client is expected to spend 15 minutes for a single introduction.
{TODO: PARAM_TUNING You can see that this section is completely CPU/memory agnostic, and it does not take into account potential optimizations that can come from GPU/ASICs. This is intentional so that we don't put more variables into this equation right now, but as this proposal moves forward we will need to put more concrete values here.}
This is excellent analysis!
Do we know how many "successes per second" (ie: INTRO2+rend response+nginx) a typically spec'ed HS can serve? That would be a useful stat for comparison. Is 50/second unreasonable to expect to survive, on the typical service side?
Related: At what point do people need onionbalance, typically? And how far does that get you, in req/sec handling on a single machine? On multiple machines?
5.2. Argon2 parameters [ARGON_PARAMS]
We now need to define the secondary argon2 parameters as defined in [REF_ARGON2]. This includes the number of lanes 'h', the memory size 'm', the number of iterations 't'. Section 9 of [REF_ARGON2] recommends an approach of how to tune these parameters.
To tune these parameters we are looking to *minimize* the verification speed of an onion service, while *maximizing* the sparse resources spent by an adversary trying to overwhelm the service using [ATTACK_META].
When it comes to verification speed, to verify a single introduction cell the service needs to do a single argon2 call: so the service will need to do hundreds of those per second as INTRODUCE2 cells arrive. The service will have to do this verification step even for very cheap zero-effort PoW received, so this has to be a cheap procedure so that it doesn't become a DoS vector of each own. Hence each individual argon2 call must be cheap enough to be able to be done comfortably and plentifuly by an onion service with a single host (or horizontally scaled with Onionbalance).
At the same time, the adversary will have to do thousands of these calls if she wants to make high-effort PoW, so it's this assymetry that we are looking to exploit here. Right now, the most expensive resource for adversaries is the RAM size, and that's why we chose argon2 which is memory-hard.
To minmax this game we will need
{TODO: PARAM_TUNING: I've had a hard time minmaxing this game for argon2. Even argon2 invocations with a small memory parameter will take multiple milliseconds to run on my machine, and the parameters recommended in section 8 of the paper all take many hundreds of milliseconds. This is just not practical for our use case, since we want to process hundreds of such PoW per second... I also did not manage to find a benchmark of argon2 calls for different CPU/GPU/FPGA configurations.}
TODO: We should write something similar for RandomX.
- Client behavior [CLIENT_BEHAVIOR]
This proposal introduces a bunch of new ways where a legitimate client can fail to reach the onion service.
Furthermore, there is currently no end-to-end way for the onion service to inform the client that the introduction failed. The INTRO_ACK cell is not end-to-end (it's from the introduction point to the client) and hence it does not allow the service to inform the client that the rendezvous is never gonna occur.
Let's examine a few such cases:
5.1. Timeout issues
Alice can fail to reach the onion service if her introduction request falls off the priority queue, or if the priority queue is so big that the connection times out.
Is building a new introduction circuit sufficient here? Or do we need to build an end-to-end mechanism over the introduction circuit to inform her? {XXX}
How should timeout values change here since the priority queue will cause bigger delays than usual to rendezvous? Can there be some feedback mechanism to inform the client of its queue position or ETA?
Clients could estimate this time based on the published descriptor difficulty (ie: lowest-needed-to-service), and how long such a difficulty takes on their platform. They could record their own history for stats and UX reporting.
5.2. Seed expiration issues
As mentioned in [DESC_POW], the expiration timestamp on the PoW seed can cause issues with clock skewed clients. Furthermore, even not clock skewed clients can encounter TOCTOU-style race conditions here.
How should this be handled? Should we have multiple active seeds at the same time similar to how we have overlapping descriptors and time periods in v3? This would solve the problem but it grows the complexity of the system substantially. {XXX}
5.3. Other descriptor issues
Another race condition here is if the service enables PoW, while a client has a cached descriptor. How will the client notice that PoW is needed? Does it need to fetch a new descriptor? Should there be another feedback mechanism? {XXX}
- Discussion
5.1. UX
This proposal has user facing UX consequences. Here are a few UX approaches with increasing engineering difficulty:
a) Tor Browser needs a "range field" which the user can use to specify how much effort they want to spend in PoW if this ever occurs while they are browsing. The ranges could be from "Easy" to "Difficult", or we could try to estimate time using an average computer. This setting is in the Tor Browser settings and users need to find it.
If clients can estimate based on the difficulty, this could be a notice instead of a config option: "This site will take about X seconds to access, as it is under attack. Please be patient, or give up." There is no need to config anything. You decide to give up on a site-by-site and visit-by-visit basis, depending on how important that site is to you at that time.
b) We start with a default effort setting, and then we use the new onion errors (see #19251) to estimate when an onion service connection has failed because of DoS, and only then we present the user a "range field" which they can set dynamically. Detecting when an onion service connection has failed because of DoS can be hard because of the lack of feedback (see [CLIENT_BEHAVIOR])
c) We start with a default effort setting, and if things fail we automatically try to figure out an effort setting that will work for the user by doing some trial-and-error connections with different effort values. Until the connection succeeds we present a "Service is overwhelmed, please wait" message to the user.
For this proposal to work initially we need at least (a), and then we can start thinking of how far we want to take it.
5.2. Future directions [FUTURE_WORK]
This is just the beginning in DoS defences for Tor and there are various future avenues that we can investigate. Here is a brief summary of these:
"More advanced PoW schemes" -- We could use more advanced memory-hard PoW schemes like MTP-argon2 or Itsuku to make it even harder for adversaries to create successful PoWs. Unfortunately these schemes have much bigger proof sizes, and they won't fit in INTRODUCE1 cells. See #31223 for more details.
"Third-party anonymous credentials" -- We can use anonymous credentials and a third-party token issuance server on the clearnet to issue tokens based on PoW or CAPTCHA and then use those tokens to get access to the service. See [REF_CREDS] for more details.
"PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas where we present a hard puzzle to the user when connecting to the onion service, and if they solve it we then give the user a bunch of anonymous tokens that can be used in the future. This can all happen between the client and the service without a need for a third party.
All of the above approaches are much more complicated than this proposal, and hence we want to start easy before we get into more serious projects.
5.3. Environment
We love the environment! We are concerned of how PoW schemes can waste energy by doing useless hash iterations. Here is a few reasons we still decided to pursue a PoW approach here:
"We are not making things worse" -- DoS attacks are already happening and attackers are already burning energy to carry them out both on the attacker side, on the service side and on the network side. We think that asking legitimate clients to carry out PoW computations is not gonna affect the equation too much, since an attacker right now can very quickly cause the same damage that hundreds of legitimate clients do a whole day.
"We hope to make things better" -- The hope is that proposals like this will make the DoS actors go away and hence the PoW system will not be used. As long as DoS is happening there will be a waste of energy, but if we manage to demotivate them with technical means, the network as a whole will less wasteful. Also see [CATCH22] for a similar argument.
- References
https://password-hashing.net/#argon2
[REF_TABLE]: The table is based on the script below plus some manual editing for readability: https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04 [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01... [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html [REF_TARGET]: https://en.bitcoin.it/wiki/Target _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hello list,
hope everyone is safe and doing well!
I present you an initial draft of a proposal on PoW-based defences for onion services under DoS.
Hello again,
many thanks for all the thoughtful feedback!
In the end of this email I inline a new version of the proposal addressing various issues discussed over IRC and on this thread. Here is a rough changelog:
- Specifying some features we might want from "v1.5". - Adding suggested-effort to the descriptor. - Specifying the effort() function. - Specifying the format of the expiration time. - Adding a protocol-specific label to the PoW computation. - Removing the seed and output values from the INTRODUCE1 cell. - Specifying what happens when a client does not send a PoW token when PoW is enabled. - Revamping the UX section. - Added Mike and David in the authors list.
I'm also pushing the spec to my git repo so that you can see a diff: https://github.com/asn-d6/torspec/tree/pow-over-intro
Now before going in to the proposal here are the three big topics currently under discussion in the thread:
== How the scheduler should work ==
I'm not gonna touch on this, since David is writing an initial draft of a scheduler design soon, so let's wait for that email before we discuss this further.
== Should there be a target difficulty on the descriptor? ==
I have made changes in the proposal to this effect. See sections [EFFORT_ESTIMATION] and [CLIENT_TIMEOUT] for more information.
While there is no hard-target difficulty, the descriptor now contains a suggested difficulty that clients should aim at. The service will still add requests with lower effort than the suggested one in the priority queue. That's to make the system more resilient to attacks in cases where the client cannot get the latest descriptor (and hence latest suggested effort) due to the descriptor upload/fetch rate-limiting restrictions in place.
== Which PoW function should we use? ==
The proposal suggests argon2, and Mike has been looking at Randomx. However, after further consideration and speaking with some people (props to Alex Biryukov), it seems like those two functions are not well fitted for this purpose, since they are memory-hard both for the client and the service. And since we are trying to minimize the verification overhead, so that the service can do hundreds of verifications per second, they don't seem like good fits.
In particular, slimming down argon2 to the point that services can do hundreds of those verifications per second, results in an argon2 without any memory-hardness. And Randomx is even heavier, since it uses argon2 under the hood and also does extra stuff. In particular, from some preliminary computations, it seems like the top-half of the cell processing takes about 2ms, whereas Randomx takes at least 17ms in my computer, which means that it puts an almost 1000% overhead to the top-half processing of a single introduction.
This means that assymetric PoW schemes like Equihash and family is what we should be looking at next. These schemes aim to have small proof sizes, and be memory-hard for the prover, but lightweight for the verifier. They are currently used by Zcash so there is quite some literature and improvements.
In particular, Equihash has two important parameters (n,k). These parameters together control the proof size (so for example, Equihash<144,5> has a 100B proof, and Equihash<200,9> has a 1344B proof), and the 'k' parameter controls the verification speed (the verifier has to do 2^k hash invocations to do the verification). Also see this for more details: https://forum.bitcoingold.org/t/our-new-equihash-equihash-btg/1512 https://www.cryptolux.org/images/b/b9/Equihash.pdf
The good thing here is that these parameters look good and offer good security. Furthermore, Equihash is used by big and friendly projects like Zcash.
The negative thing is that because Equihash is widely used there is already ASIC hardware for it, so we would need to look at the parameters we pick and how ASIC-friendly they are. Furthermore, an attacker who buys Equihash ASIC can also use it for coin mining which makes it an easier investment.
IMO, we should look more into Equihash and other assymetric types of PoW, as well as speak with people who know Equihash well.
Finally, our proposal has a big benefit over the blockchain use cases: it's much more agile. We can deploy changes to the PoW algorithm without having to hard-fork, and we can do this even through the consensus for maximum agility. This means that we should try to use this agility to our advantage.
Looking forward to more feedback!
=============
And here comes the updated proposal:
Filename: xxx-pow-over-intro-v1 Title: A First Take at PoW Over Introduction Circuits Author: George Kadianakis, Mike Perry, David Goulet Created: 2 April 2020 Status: Draft
0. Abstract
This proposal aims to thwart introduction flooding DoS attacks by introducing a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
1. Motivation
So far our attempts at limiting the impact of introduction flooding DoS attacks on onion services has been focused on horizontal scaling with Onionbalance, optimizing the CPU usage of Tor and applying congestion control using rate limiting. While these measures move the goalpost forward, a core problem with onion service DoS is that building rendezvous circuits is a costly procedure both for the service and for the network. For more information on the limitations of rate-limiting when defending against DDoS, see [REF_TLS_1].
If we ever hope to have truly reachable global onion services, we need to make it harder for attackers to overload the service with introduction requests. This proposal achieves this by allowing onion services to specify an optional dynamic proof-of-work scheme that its clients need to participate in if they want to get served.
With the right parameters, this proof-of-work scheme acts as a gatekeeper to block amplification attacks by attackers while letting legitimate clients through.
1.1. Related work
For a similar concept, see the three internet drafts that have been proposed for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
1.2. Threat model [THREAT_MODEL]
1.2.1. Attacker profiles [ATTACKER_MODEL]
This proposal is written to thwart specific attackers. A simple PoW proposal cannot defend against all and every DoS attack on the Internet, but there are adverary models we can defend against.
Let's start with some adversary profiles:
"The script-kiddie"
The script-kiddie has a single computer and pushes it to its limits. Perhaps it also has a VPS and a pwned server. We are talking about an attacker with total access to 10 Ghz of CPU and 10 GBs of RAM. We consider the total cost for this attacker to be zero $.
"The small botnet"
The small botnet is a bunch of computers lined up to do an introduction flooding attack. Assuming 500 medium-range computers, we are talking about an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We consider the upfront cost for this attacker to be about $400.
"The large botnet"
The large botnet is a serious operation with many thousands of computers organized to do this attack. Assuming 100k medium-range computers, we are talking about an attacker with total access to 200 Thz of CPU and 200 TB of RAM. The upfront cost for this attacker is about $36k.
We hope that this proposal can help us defend against the script-kiddie attacker and small botnets. To defend against a large botnet we would need more tools in our disposal (see [FUTURE_DESIGNS]).
{XXX: Do the above make sense? What other attackers do we care about? What other metrics do we care about? Network speed? I got the botnet costs from here [REF_BOTNET] Back up our claims of defence.}
1.2.2. User profiles [USER_MODEL]
We have attackers and we have users. Here are a few user profiles:
"The standard web user"
This is a standard laptop/desktop user who is trying to browse the web. They don't know how these defences work and they don't care to configure or tweak them. They are gonna use the default values and if the site doesn't load, they are gonna close their browser and be sad at Tor. They run a 2Ghz computer with 4GB of RAM.
"The motivated user"
This is a user that really wants to reach their destination. They don't care about the journey; they just want to get there. They know what's going on; they are willing to tweak the default values and make their computer do expensive multi-minute PoW computations to get where they want to be.
"The mobile user"
This is a motivated user on a mobile phone. Even tho they want to read the news article, they don't have much leeway on stressing their machine to do more computation.
We hope that this proposal will allow the motivated user to always connect where they want to connect to, and also give more chances to the other user groups to reach the destination.
1.2.3. The DoS Catch-22 [CATCH22]
This proposal is not perfect and it does not cover all the use cases. Still, we think that by covering some use cases and giving reachability to the people who really need it, we will severely demotivate the attackers from continuing the DoS attacks and hence stop the DoS threat all together. Furthermore, by increasing the cost to launch a DoS attack, a big class of DoS attackers will disappear from the map, since the expected ROI will decrease.
2. System Overview
2.1. Tor protocol overview
+----------------------------------+ | | +-------+ INTRO1 +-----------+ INTRO2 +--------+ | |Client |-------->|Intro Point|------->| PoW |-----------+ | +-------+ +-----------+ |Verifier| | | +--------+ | | | | | | | | | +----------v---------+ | | |Intro Priority Queue| | +---------+--------------------+---+ | | | Rendezvous | | | circuits | | | v v v
The proof-of-work scheme specified in this proposal takes place during the introduction phase of the onion service protocol.
The system described in this proposal is not meant to be on all the time, and should only be enabled by services when under duress. The percentage of clients receiving puzzles can also be configured based on the load of the service.
In summary, the following steps are taken for the protocol to complete:
1) Service encodes PoW parameters in descriptor [DESC_POW] 2) Client fetches descriptor and computes PoW [CLIENT_POW] 3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW] 4) Service verifies PoW and queues introduction based on PoW effort [SERVICE_VERIFY]
2.2. Proof-of-work overview
2.2.1. Primitives
For our proof-of-work scheme we want to minimize the spread of resources between a motivated attacker and legitimate clients. This means that we are looking to minimize any benefits that GPUs or ACICs can offer to an attacker.
For this reason we chose argon2 [REF_ARGON2] as the hash function for our proof-of-work scheme since it's well audited and GPU-resistant and to some extend ASIC-resistant as well.
As a password hash function, argon2 by default outputs 32 bytes of hash, and takes as primary input a message and a nonce/salt. For the purposes of this specification we will define an argon2() function as: uint8_t hash_output[32] = argon2(uint8_t *message, uint8_t *nonce)'.
See section [ARGON_PARAMS] for more information on the secondary inputs of argon2.
2.2.2. Dynamic PoW
DoS is a dynamic problem where the attacker's capabilities constantly change, and hence we want our proof-of-work system to be dynamic and not stuck with a static difficulty setting. Hence, instead of forcing clients to go below a static target like in Bitcoin to be successful, we ask clients to "bid" using their PoW effort. Effectively, a client gets higher priority the higher effort they put into their proof-of-work. This is similar to how proof-of-stake works but instead of staking coins, you stake work.
The benefit here is that legitimate clients who really care about getting access can spend a big amount of effort into their PoW computation, which should guarantee access to the service given reasonable adversary models. See [PARAM_TUNING] for more details about these guarantees and tradeoffs.
As a way to improve reachability and UX, the service tries to estimate the effort needed for clients to get access at any given time and places it in the descriptor. See [EFFORT_ESTIMATION] for more details.
2.2.3. PoW effort
For our dynamic PoW system to work, we will need to be able to compare PoW tokens with each other. To do so we define a function: unsigned effort(uint8_t *token) which takes as its argument a hash output token, and returns the number of leading zero bits on it.
So for example effort(0000000110001010110100101) == 7.
3. Protocol specification
3.1. Service encodes PoW parameters in descriptor [DESC_POW]
This whole protocol starts with the service encoding the PoW parameters in the 'encrypted' (inner) part of the v3 descriptor. As follows:
"pow-params" SP type SP seed-b64 SP expiration-time NL
[At most once]
type: The type of PoW system used. We call the one specified here "v1"
seed-b64: A random seed that should be used as the input to the PoW hash function. Should be 32 random bytes encoded in base64 without trailing padding.
suggested-effort: An unsigned integer specifying an effort value that clients should aim for when contacting the service. See [EFFORT_ESTIMATION] for more details here.
expiration-time: A timestamp in "YYYY-MM-DD SP HH:MM:SS" format after which the above seed expires and is no longer valid as the input for PoW. It's needed so that the size of our replay cache does not grow infinitely. It should be set to three hours in the future (+- some randomness). {TODO: PARAM_TUNING}
{XXX: Expiration time makes us even more susceptible to clock skews, but it's needed so that our replay cache refreshes. How to fix this? See [CLIENT_BEHAVIOR] for more details.}
3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
If a client receives a descriptor with "pow-params", it should assume that the service is expecting a PoW input as part of the introduction protocol.
The client parses the descriptor and extracts the PoW parameters. It makes sure that the <expiration-time> has not expired and if it has, it needs to fetch a new descriptor.
The client should then extract the <suggested-effort> field to configure its PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values that will cause an infinite PoW computation. {XXX: How to enforce this?}
To complete the PoW the client follows the following logic:
a) Client generates 'nonce' as 32 random bytes. b) Client derives 'seed' by decoding 'seed-b64'. c) Client derives 'labeled_seed = seed + "TorV1PoW"' d) Client computes hash_output = argon2(labeled_seed, nonce) e) Client checks if effort(hash_output) >= target. e1) If yes, success! The client uses 'hash_output' as the puzzle solution and 'nonce' and 'seed' as its inputs. e2) If no, fail! The client interprets 'nonce' as a big-endian integer, increments it by one, and goes back to step (d).
At the end of the above procedure, the client should have a triplet (hash_output, seed, nonce) that can be used as the answer to the PoW puzzle. How quickly this happens depends solely on the 'target' parameter.
3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
Now that the client has an answer to the puzzle it's time to encode it into an INTRODUCE1 cell. To do so the client adds an extension to the encrypted portion of the INTRODUCE1 cell by using the EXTENSIONS field (see [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the INTRODUCE1 cell only gets read by the onion service and is ignored by the introduction point.
We propose a new EXT_FIELD_TYPE value:
[01] -- PROOF_OF_WORK
The EXT_FIELD content format is:
POW_VERSION [1 byte] POW_NONCE [32 bytes]
where:
POW_VERSION is 1 for the protocol specified in this proposal POW_NONCE is 'nonce' from the section above
This will increase the INTRODUCE1 payload size by 33 bytes since the extension type and length is 2 extra bytes, the N_EXTENSIONS field is always present and currently set to 0 and the EXT_FIELD is 32 bytes. According to ticket #33650, INTRODUCE1 cells currently have more than 200 bytes available.
3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it should check its configuration on whether proof-of-work is required to complete the introduction. If it's not required, the extension SHOULD BE ignored. If it is required, the service follows the procedure detailed in this section.
If the service requires the PROOF_OF_WORK extension but received an INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD consider this cell as a zero-effort introduction for the purposes of the priority queue (see section [INTRO_QUEUE]).
3.4.1. PoW verification [POW_VERIFY]
To verify the client's proof-of-work the service extracts (hash_output, seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
1) Make sure that the client's seed is identical to the active seed. 2) Check the client's nonce for replays (see [REPLAY_PROTECTION] section). 3) Verify that 'hash_output =?= argon2(seed, nonce)
If any of these steps fail the service MUST ignore this introduction request and abort the protocol.
If all the steps passed, then the circuit is added to the introduction queue as detailed in section [INTRO_QUEUE].
3.4.1.1. Replay protection [REPLAY_PROTECTION]
The service MUST NOT accept introduction requests with the same (seed, nonce) tuple. For this reason a replay protection mechanism must be employed.
The simplest way is to use a simple hash table to check whether a (seed, nonce) tuple has been used before for the actiev duration of a seed. Depending on how long a seed stays active this might be a viable solution with reasonable memory/time overhead.
If there is a worry that we might get too many introductions during the lifetime of a seed, we can use a Bloom filter as our replay cache mechanism. The probabilistic nature of Bloom filters means that sometimes we will flag some connections as replays even if they are not; with this false positive probability increasing as the number of entries increase. However, with the right parameter tuning this probability should be negligible and well handled by clients. {TODO: PARAM_TUNING}
3.4.2. The Introduction Queue [INTRO_QUEUE]
3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
When PoW is enabled and a verified introduction comes through, the service instead of jumping straight into rendezvous, queues it and prioritizes it based on how much effort was devoted by the client to PoW. This means that introduction requests with high effort should be prioritized over those with low effort.
To do so, the service maintains an "introduction priority queue" data structure. Each element in that priority queue is an introduction request, and its priority is the effort put into its PoW:
When a verified introduction comes through, the service uses the effort() function with hash_output as its input, and uses the output to place requests into the right position of the priority_queue: The bigger the effort, the more priority it gets in the queue. If two elements have the same effort, the older one has priority over the newer one.
{TODO: PARAM_TUNING: If the priority queue is only ordered based on the effort what attacks can happen in various scenarios? Do we want to order on time+effort? Which scenarios and attackers should we examine here?}
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
The service should handle introductions by pulling from the introduction queue.
Similar to how our cell scheduler works, the onion service subsystem will poll the priority queue every 100ms tick and process the first 20 cells from the priority queue (if they exist). The service will perform the rendezvous and the rest of the onion service protocol as normal.
With this tempo, we can process 200 introduction cells per second. {XXX: Is this good?}
After the introduction request is handled from the queue, the service trims the priority queue if the queue is too big. {TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? Can we use WRED usefully?}
{TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells per 100ms is probably unmaintainable, since each cell is quite expensive: doing so involving path selection, crypto and making circuits. We will need to profile this procedure and see how we can do this scheduling better.}
3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
During its operation the service continuously keeps track of the received PoW cell efforts to inform its clients of the effort they should put in their introduction to get service. The service informs the clients by using the <suggested-effort> field in the descriptor.
In particular, the service starts with a default suggested-effort value of 15.
Everytime the service handles an introduction request from the priority queue in [HANDLE_QUEUE], the service compares the request's effort to the current suggested-effort value. If the new request's effort is lower than the suggested-effort, set the suggested-effort equal to the effort of the new request.
Everytime the service trims the priority queue in [HANDLE_QUEUE], the service compares the request at the trim point against the current suggested-effort value. If the trimmed request's effort is higher than the suggested-effort, set the suggested-effort equal to the effort of the new request.
The above two operations are meant to balance the suggested effort based on the requests currently waiting in the priority queue. If the priority queue is filled with high-effort requests, make the suggested effort higher. And when all the high-effort requests get handled and the priority queue is back to normal operation, relax the suggested effort to lower levels.
The suggested-effort is not a hard limit to the efforts that are accepted by the service, and it's only meant to serve as a guideline for clients to reduce the number of unsuccessful requests that get to the service. The service still adds requests with lower effort than suggested-effort to the priority queue in [ADD_QUEUE].
{XXX: What attacks are possible here?}
3.4.3.1. Updating descriptor with new suggested effort
When a service changes its suggested-effort value, it SHOULD upload a new descriptor with the new value.
The service should avoid uploading descriptors too often to avoid overwheming the HSDirs. The service SHOULD NOT upload descriptors more often than 'hs-pow-desc-upload-rate-limit' seconds (which is controlled through a consensus parameter and has a default value of 300 seconds).
{XXX: Is this too often? Or too rare? Perhaps we can set different limits for when the difficulty goes up and different for when it goes down. It's more important to update the descriptor when the difficulty goes up.}
{XXX: What attacks are possible here? Can the attacker intentionally hit this rate-limit and then influence the suggested effort so that clients do not learn about the new effort? The service will still accept efforts lower than the suggested effort so the attack is not so serious, but it still can be a problem.}
4. Client behavior [CLIENT_BEHAVIOR]
This proposal introduces a bunch of new ways where a legitimate client can fail to reach the onion service.
Furthermore, there is currently no end-to-end way for the onion service to inform the client that the introduction failed. The INTRO_ACK cell is not end-to-end (it's from the introduction point to the client) and hence it does not allow the service to inform the client that the rendezvous is never gonna occur.
For this reason we need to define some client behaviors to work around these issues.
4.1. Clients handling timeouts [CLIENT_TIMEOUT]
Alice can fail to reach the onion service if her introduction request gets trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not get through its priority queue in time and the connection times out.
{XXX: How should timeout values change here since the priority queue will cause bigger delays than usual to rendezvous?}
This section presents a heuristic method for the client getting service even in such scenarios.
If the rendezvous request times out, the client SHOULD fetch a new descriptor for the service to make sure that it's using the right suggested-effort for the PoW and the right PoW seed. The client SHOULD NOT fetch service descriptors more often than every 'hs-pow-desc-fetch-rate-limit' seconds (which is controlled through a consensus parameter and has a default value of 600 seconds).
{XXX: Is this too rare? Too often?}
When the client fetches a new descriptor, it should try connecting to the service with the new suggested-effort and PoW seed. If that doesn't work, it should double the effort and retry. The client should keep on doubling-and-retrying until it manages to get service, or its able to fetch a new descriptor again.
{XXX: This means that the client will keep on spinning and doubling-and-retrying for a service under this situation. There will never be a "Client connection timed out" page for the user. Is this good? Is this bad? Should we stop doubling-and-retrying after some iterations? Or should we throw a custom error page to the user, and ask the user to stop spinning whenever they want?}
4.2. Seed expiration issues
As mentioned in [DESC_POW], the expiration timestamp on the PoW seed can cause issues with clock skewed clients. Furthermore, even not clock skewed clients can encounter TOCTOU-style race conditions here.
The client descriptor refetch logic of [CLIENT_TIMEOUT] should take care of such seed-expiration issues, since the client will refetch the descriptor.
{XXX: Is this sufficient? Should we have multiple active seeds at the same time similar to how we have overlapping descriptors and time periods in v3? This would solve the problem but it grows the complexity of the system substantially.}
4.3. Other descriptor issues
Another race condition here is if the service enables PoW, while a client has a cached descriptor. How will the client notice that PoW is needed? Does it need to fetch a new descriptor? Should there be another feedback mechanism? {XXX}
5. Attacker strategies [ATTACK_META]
Now that we defined our protocol we need to start tweaking the various knobs. But before we can do that, we first need to understand a few high-level attacker strategies to see what we are fighting against.
5.1.1. Total overwhelm strat
Given the way the introduction queue works (see [HANDLE_QUEUE]), a very effective strategy for the attacker is to totally overwhelm the queue processing by sending more high-effort introductions than the onion service can handle at any given tick.
To do so, the attacker would have to send at least 20 high-effort introduction cells every 100ms, where high-effort is a PoW which is above the estimated level of "the motivated user" (see [USER_MODEL]).
An easier attack for the adversary, is the same strategy but with introduction cells that are all above the comfortable level of "the standard user" (see [USER_MODEL]). This would block out all standard users and only allow motivated users to pass.
{XXX: What other attack strategies we should care about?}
6. Parameter tuning [PARAM_TUNING]
There are various parameters in this system that need to be tuned.
We will first start by tuning the default difficulty of our PoW system. That's gonna define an expected time for attackers and clients to succeed.
We are then gonna tune the parameters of the argon2 hash function. That will define the resources that an attacker needs to spend to overwhelm the onion service, the resources that the service needs to spend to verify introduction requests, and the resources that legitimate clients need to spend to get to the onon service.
6.1. PoW Difficulty settings
The difficulty setting of our PoW basically dictates how difficult it should be to get a success in our PoW system. In classic PoW systems, "success" is defined as getting a hash output below the "target". However, since our system is dynamic, we define "success" as an abstract high-effort computation.
Even tho our system is dynamic, we still need default difficulty settings that will define the metagame. The client and attacker can still aim higher or lower, but for UX purposes and for analysis purposes we do need to define some difficulties.
We hence created the table (see [REF_TABLE]) below which shows how much time a legitimate client with a single machine should expect to burn before they get a single success. The x-axis is how many successes we want the attacker to be able to do per second: the more successes we allow the adversary, the more they can overwhelm our introduction queue. The y-axis is how many machines the adversary has in her disposal, ranging from just 5 to 1000.
=============================================================== | Expected Time (in seconds) Per Success For One Machine | =========================================================================== | | | Attacker Succeses 1 5 10 20 30 50 | | per second | | | | 5 5 1 0 0 0 0 | | 50 50 10 5 2 1 1 | | 100 100 20 10 5 3 2 | | Attacker 200 200 40 20 10 6 4 | | Boxes 300 300 60 30 15 10 6 | | 400 400 80 40 20 13 8 | | 500 500 100 50 25 16 10 | | 1000 1000 200 100 50 33 20 | | | ============================================================================
Here is how you can read the table above:
- If an adversary has a botnet with 1000 boxes, and we want to limit her to 1 success per second, then a legitimate client with a single box should be expected to spend 1000 seconds getting a single success.
- If an adversary has a botnet with 1000 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 200 seconds getting a single success.
- If an adversary has a botnet with 500 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 100 seconds getting a single success.
- If an adversary has access to 50 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 10 seconds getting a single success.
- If an adversary has access to 5 boxes, and we want to limit her to 5 successes per second, then a legitimate client with a single box should be expected to spend 1 seconds getting a single success.
With the above table we can create some profiles for default values of our PoW difficulty. So for example, we can use the last case as the default parameter for Tor Browser, and then create three more profiles for more expensive cases, scaling up to the first case which could be hardest since the client is expected to spend 15 minutes for a single introduction.
{TODO: PARAM_TUNING You can see that this section is completely CPU/memory agnostic, and it does not take into account potential optimizations that can come from GPU/ASICs. This is intentional so that we don't put more variables into this equation right now, but as this proposal moves forward we will need to put more concrete values here.}
6.2. Argon2 parameters [ARGON_PARAMS]
We now need to define the secondary argon2 parameters as defined in [REF_ARGON2]. This includes the number of lanes 'h', the memory size 'm', the number of iterations 't'. Section 9 of [REF_ARGON2] recommends an approach of how to tune these parameters.
To tune these parameters we are looking to *minimize* the verification speed of an onion service, while *maximizing* the sparse resources spent by an adversary trying to overwhelm the service using [ATTACK_META].
When it comes to verification speed, to verify a single introduction cell the service needs to do a single argon2 call: so the service will need to do hundreds of those per second as INTRODUCE2 cells arrive. The service will have to do this verification step even for very cheap zero-effort PoW received, so this has to be a cheap procedure so that it doesn't become a DoS vector of each own. Hence each individual argon2 call must be cheap enough to be able to be done comfortably and plentifuly by an onion service with a single host (or horizontally scaled with Onionbalance).
At the same time, the adversary will have to do thousands of these calls if she wants to make high-effort PoW, so it's this assymetry that we are looking to exploit here. Right now, the most expensive resource for adversaries is the RAM size, and that's why we chose argon2 which is memory-hard.
To minmax this game we will need
{TODO: PARAM_TUNING: I've had a hard time minmaxing this game for argon2. Even argon2 invocations with a small memory parameter will take multiple milliseconds to run on my machine, and the parameters recommended in section 8 of the paper all take many hundreds of milliseconds. This is just not practical for our use case, since we want to process hundreds of such PoW per second... I also did not manage to find a benchmark of argon2 calls for different CPU/GPU/FPGA configurations.}
7. Discussion
7.1. UX
This proposal has user facing UX consequences.
Here is some UX improvements that don't need user-input:
- Primarily, there should be a way for Tor Browser to display to users that additional time (and resources) will be needed to access a service that is under attack. Depending on the design of the system, it might even be possible to estimate how much time it will take.
And here are a few UX approaches that will need user-input and have an increasing engineering difficulty. Ideally this proposal will not need user-input and the default behavior should work for almost all cases.
a) Tor Browser needs a "range field" which the user can use to specify how much effort they want to spend in PoW if this ever occurs while they are browsing. The ranges could be from "Easy" to "Difficult", or we could try to estimate time using an average computer. This setting is in the Tor Browser settings and users need to find it.
b) We start with a default effort setting, and then we use the new onion errors (see #19251) to estimate when an onion service connection has failed because of DoS, and only then we present the user a "range field" which they can set dynamically. Detecting when an onion service connection has failed because of DoS can be hard because of the lack of feedback (see [CLIENT_BEHAVIOR])
c) We start with a default effort setting, and if things fail we automatically try to figure out an effort setting that will work for the user by doing some trial-and-error connections with different effort values. Until the connection succeeds we present a "Service is overwhelmed, please wait" message to the user.
7.2. Future work [FUTURE_WORK]
7.2.1. Incremental improvements to this proposal
There are various improvements that can be done in this proposal, and while we are trying to keep this v1 version simple, we need to keep the design extensible so that we build more features into it. In particular:
- End-to-end introduction ACKs
This proposal suffers from various UX issues because there is no end-to-end mechanism for an onion service to inform the client about its introduction request. If we had end-to-end introduction ACKs many of the problems from [CLIENT_BEHAVIOR] would be aleviated. The problem here is that end-to-end ACKs require modifications on the introduction point code and a network update which is a lengthy process.
- Multithreading scheduler
Our scheduler is pretty limited by the fact that Tor has a single-threaded design. If we improve our multithreading support we could handle a much greater amount of introduction requests per second.
7.2.2. Future designs [FUTURE_DESIGNS]
This is just the beginning in DoS defences for Tor and there are various futured designs and schemes that we can investigate. Here is a brief summary of these:
"More advanced PoW schemes" -- We could use more advanced memory-hard PoW schemes like MTP-argon2 or Itsuku to make it even harder for adversaries to create successful PoWs. Unfortunately these schemes have much bigger proof sizes, and they won't fit in INTRODUCE1 cells. See #31223 for more details.
"Third-party anonymous credentials" -- We can use anonymous credentials and a third-party token issuance server on the clearnet to issue tokens based on PoW or CAPTCHA and then use those tokens to get access to the service. See [REF_CREDS] for more details.
"PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas where we present a hard puzzle to the user when connecting to the onion service, and if they solve it we then give the user a bunch of anonymous tokens that can be used in the future. This can all happen between the client and the service without a need for a third party.
All of the above approaches are much more complicated than this proposal, and hence we want to start easy before we get into more serious projects.
7.3. Environment
We love the environment! We are concerned of how PoW schemes can waste energy by doing useless hash iterations. Here is a few reasons we still decided to pursue a PoW approach here:
"We are not making things worse" -- DoS attacks are already happening and attackers are already burning energy to carry them out both on the attacker side, on the service side and on the network side. We think that asking legitimate clients to carry out PoW computations is not gonna affect the equation too much, since an attacker right now can very quickly cause the same damage that hundreds of legitimate clients do a whole day.
"We hope to make things better" -- The hope is that proposals like this will make the DoS actors go away and hence the PoW system will not be used. As long as DoS is happening there will be a waste of energy, but if we manage to demotivate them with technical means, the network as a whole will less wasteful. Also see [CATCH22] for a similar argument.
8. References
[REF_ARGON2]: https://github.com/P-H-C/phc-winner-argon2/blob/master/argon2-specs.pdf https://password-hashing.net/#argon2 [REF_TABLE]: The table is based on the script below plus some manual editing for readability: https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04 [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01... [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html [REF_TARGET]: https://en.bitcoin.it/wiki/Target [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10 [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt