Hi all,
On Mon, Jun 22, 2020 at 4:52 PM George Kadianakis desnacked@riseup.net wrote:
Also, tevador let me know if you'd like me to add you as a co-author on the proposal based on all your great feedback so far.
Thanks for the update. Yes, you can add me as a co-author.
During our performance measurements in [TOR_MEASUREMENTS] we learned that the "top half" takes about 0.26 msecs in average, without doing any sort of PoW verification.
Interesting. This confirms the need for fast PoW verification. Equi-X takes about 0.05 ms to verify, so the top-half throughput should still be over 3000 requests per second.
Our scheduler is pretty limited by the fact that Tor has a single-threaded design.
If both the top- and bottom- halves are processed by the same thread, this opens up the possibility for a "hybrid" attack. Given the performance figures for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the attacker can optimally deny service by submitting 91 high-effort requests and 1520 invalid requests per second. This will completely saturate the main loop because:
0.31*(1520+91) ~ 0.5 sec. 5.5*91 ~ 0.5 sec.
This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and half the compute requirement of [ATTACK_BOTTOM_HALF].
Alternatively, the attacker can adjust the ratio between invalid and high-effort requests depending on their bandwidth and compute capabilities.
{TODO: BLOCKER: What's the max size of the queue? Do we trim it, or do we just stop adding new requests when it reaches max size? Can we use WRED? Trimming is currently used EFFORT_ESTIMATION, so if we don't do it we need to find a different way to estimate effort. See tevador's [REF_TEVADOR_2] email.}
The simplest approach is to have a "soft" maximum size of the queue. All requests with valid PoW can be added to the queue, but once per second, the queue is trimmed to the "soft" max size by removing timed-out and low-effort requests. I used this approach in my simulations (see below).
The EXT_FIELD content format is: POW_VERSION [1 byte] POW_NONCE [16 bytes] POW_SEED [4 bytes]
If we want to use Equi-X, you also need to add the solution (16 bytes) and I also recommend adding the client's target effort.
POW_VERSION [1 byte] POW_NONCE [16 bytes] POW_EFFORT [4 bytes] POW_SEED [4 bytes] POW_SOLUTION [16 bytes]
43 bytes total including the extension type and length.
The client's algorithm in section 3.2 should be modified to:
a) Client selects a target effort E. b) Client generates a random 16-byte nonce N. c) Client derives seed C by decoding 'seed-b64'. d) Client calculates S = equix_solve(C || N || E) e) Client calculates R = blake2b(C || N || E || S) f) Client checks if R * E <= UINT32_MAX. f1) If yes, success! The client can submit N, E, the first 4 bytes of C and S. f2) If no, fail! The client interprets N as a 16-byte little-endian integer, increments it by 1 and goes back to step d).
We could also add the server algorithm in 3.4:
a) Find a valid seed C that starts with POW_SEED. Fail if no such seed exists. b) Fail if E = POW_EFFORT is lower than the minimum effort. c) Fail if N = POW_NONCE is present in the replay cache. d) Calculate R = blake2b(C || N || E || S) e) Fail if R * E > UINT32_MAX f) Fail if equix_verify(C || N || E, S) != EQUIX_OK g) Put the request in the queue with a priority of E
3.4.3. PoW effort estimation [EFFORT_ESTIMATION] {XXX: BLOCKER: Figure out of this system makes sense}
I wrote a simple simulation in Python to test different ways of adjusting the suggested effort. The results are here: https://github.com/tevador/scratchpad/blob/master/tor-pow/effort_sim.md
In summary, I suggest to use MIN_EFFORT = 1000 and the following algorithm to calculate the suggested effort:
1. Sum the effort of all valid requests that have been received since the last HS descriptor update. This includes all handled requests, trimmed requests and requests still in the queue. 2. Divide the sum by the max. number of requests that the service could have handled during that time (SVC_BOTTOM_CAPACITY * HS_UPDATE_PERIOD). 3. Suggested effort = max(MIN_EFFORT, result)
This algorithm can both increase and reduce the suggested effort.
My simulations also show that bottom-half attacks are not feasible (even "The large botnet" cannot completely deny access to the service). I think further research and testing should focus on top-half attacks (or hybrid attacks).
{XXX: Does this mean that this system can auto-enable and auto-disable the DoS subsystem with reasonable accuracy?}
I tried to make the effort estimation algorithm disable PoW automatically (by setting suggested effort to 0), but it led to oscillating behavior in the case of a sustained attack (i.e. once the suggested effort is set to 0, clients cannot connect anymore because the attacker can easily saturate the service). Having an acceptably low minimum effort could work better. MIN_EFFORT = 1000 as I suggested will take around 1 second on a quad core CPU. Mobile clients can simply ignore the suggested effort and always try to connect without PoW.
The algorithm can be easily modified to auto-enable PoW in case of an attack (but will not auto-disable it). This can be done by keeping suggested effort zero as long as no requests have been trimmed from the queue. The PoW subsystem could be disabled manually by the HS operator if needed.
{XXX: BLOCKER: How should timeout values change here since the priority queue will cause bigger delays than usual to rendezvous?}
I think the timeout should stay the same. Attackers don't care about timeouts and longer timeout values would increase users' time-to-connect when the service is under attack and the suggested effort hasn't been updated yet.