Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

1 Jul 2020

      Hi all,

On Mon, Jun 22, 2020 at 4:52 PM George Kadianakis <desnacked@riseup.net> wrote:
...
Also, tevador let me know if you'd like me to add you as a co-author on the
proposal based on all your great feedback so far.
Thanks for the update. Yes, you can add me as a co-author.
...
During our performance measurements in [TOR_MEASUREMENTS] we learned that
the "top half" takes about 0.26 msecs in average, without doing any sort
of PoW verification.
Interesting. This confirms the need for fast PoW verification. Equi-X
takes about 0.05 ms to verify, so the top-half throughput should still
be over 3000 requests per second.
...
Our scheduler is pretty limited by the fact that Tor has a single-threaded
design.
If both the top- and bottom- halves are processed by the same thread,
this opens up the possibility for a "hybrid" attack. Given the
performance figures for the bottom half (0.31 ms/req.) and the top
half (5.5 ms/req.), the attacker can optimally deny service by
submitting 91 high-effort requests and 1520 invalid requests per
second. This will completely saturate the main loop because:

0.31*(1520+91) ~ 0.5 sec.
5.5*91         ~ 0.5 sec.

This attack only has half the bandwidth requirement of
[ATTACK_TOP_HALF] and half the compute requirement of
[ATTACK_BOTTOM_HALF].

Alternatively, the attacker can adjust the ratio between invalid and
high-effort requests depending on their bandwidth and compute
capabilities.
...
{TODO: BLOCKER: What's the max size of the queue? Do we trim it, or do
we just stop adding new requests when it reaches max size? Can we use WRED?
Trimming is currently used EFFORT_ESTIMATION, so if we don't do it we need
to find a different way to estimate effort. See tevador's [REF_TEVADOR_2]
email.}
The simplest approach is to have a "soft" maximum size of the queue.
All requests with valid PoW can be added to the queue, but once per
second, the queue is trimmed to the "soft" max size by removing
timed-out and low-effort requests. I used this approach in my
simulations (see below).
...
The EXT_FIELD content format is:
    POW_VERSION    [1 byte]
    POW_NONCE      [16 bytes]
    POW_SEED       [4 bytes]
If we want to use Equi-X, you also need to add the solution (16 bytes)
and I also recommend adding the client's target effort.

    POW_VERSION    [1 byte]
    POW_NONCE      [16 bytes]
    POW_EFFORT     [4 bytes]
    POW_SEED       [4 bytes]
    POW_SOLUTION   [16 bytes]

43 bytes total including the extension type and length.

The client's algorithm in section 3.2 should be modified to:

  a) Client selects a target effort E.
  b) Client generates a random 16-byte nonce N.
  c) Client derives seed C by decoding 'seed-b64'.
  d) Client calculates S = equix_solve(C || N || E)
  e) Client calculates R = blake2b(C || N || E || S)
  f) Client checks if R * E <= UINT32_MAX.
    f1) If yes, success! The client can submit N, E, the first 4 bytes of C
        and S.
    f2) If no, fail! The client interprets N as a 16-byte little-endian
        integer, increments it by 1 and goes back to step d).

We could also add the server algorithm in 3.4:

  a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
     exists.
  b) Fail if E = POW_EFFORT is lower than the minimum effort.
  c) Fail if N = POW_NONCE is present in the replay cache.
  d) Calculate R = blake2b(C || N || E || S)
  e) Fail if R * E > UINT32_MAX
  f) Fail if equix_verify(C || N || E, S) != EQUIX_OK
  g) Put the request in the queue with a priority of E
...
3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
{XXX: BLOCKER: Figure out of this system makes sense}
I wrote a simple simulation in Python to test different ways of
adjusting the suggested effort. The results are here:
https://github.com/tevador/scratchpad/blob/master/tor-pow/effort_sim.md

In summary, I suggest to use MIN_EFFORT = 1000 and the following
algorithm to calculate the suggested effort:

1. Sum the effort of all valid requests that have been received since the
   last HS descriptor update. This includes all handled requests, trimmed
   requests and requests still in the queue.
2. Divide the sum by the max. number of requests that the service could have
   handled during that time (SVC_BOTTOM_CAPACITY * HS_UPDATE_PERIOD).
3. Suggested effort = max(MIN_EFFORT, result)

This algorithm can both increase and reduce the suggested effort.

My simulations also show that bottom-half attacks are not feasible
(even "The large botnet" cannot completely deny access to the
service). I think further research and testing should focus on
top-half attacks (or hybrid attacks).
...
{XXX: Does this mean that this system can auto-enable and auto-disable the
DoS subsystem with reasonable accuracy?}
I tried to make the effort estimation algorithm disable PoW
automatically (by setting suggested effort to 0), but it led to
oscillating behavior in the case of a sustained attack (i.e. once the
suggested effort is set to 0, clients cannot connect anymore because
the attacker can easily saturate the service). Having an acceptably
low minimum effort could work better. MIN_EFFORT = 1000 as I suggested
will take around 1 second on a quad core CPU. Mobile clients can
simply ignore the suggested effort and always try to connect without
PoW.

The algorithm can be easily modified to auto-enable PoW in case of an
attack (but will not auto-disable it). This can be done by keeping
suggested effort zero as long as no requests have been trimmed from
the queue. The PoW subsystem could be disabled manually by the HS
operator if needed.
...
{XXX: BLOCKER: How should timeout values change here since the priority
queue will cause bigger delays than usual to rendezvous?}
I think the timeout should stay the same. Attackers don't care about
timeouts and longer timeout values would increase users'
time-to-connect when the service is under attack and the suggested
effort hasn't been updated yet.