[tor-commits] [torspec/master] Note potential memory exhaustion DoS.

Fri Sep 25 11:35:03 UTC 2015

commit 2b0eb5172f7d3effae6d3aabb2e15efd36926b1d
Author: Mike Perry <mikeperry-git at torproject.org>
Date:   Fri Sep 11 17:09:28 2015 -0700

    Note potential memory exhaustion DoS.
    
    Also clarify terminology and address some formatting issues.
---
 proposals/xxx-padding-negotiation.txt |  184 ++++++++++++++++++++-------------
 1 file changed, 113 insertions(+), 71 deletions(-)

diff --git a/proposals/xxx-padding-negotiation.txt b/proposals/xxx-padding-negotiation.txt
index 005045a..4202540 100644
--- a/proposals/xxx-padding-negotiation.txt
+++ b/proposals/xxx-padding-negotiation.txt
@@ -26,19 +26,23 @@ relays, or request that relays not send them padding to conserve
 bandwidth. This proposal aims to create a mechanism for clients to do
 both of these.
 
+It also establishes consensus parameters to limit the amount of padding
+that relays will send, to prevent custom wingnut clients from requesting
+too much.
+
 
 2. Link-level padding
 
-Padding is most urgently needed to defend against a malicious or
+Padding is most useful if it can defend against a malicious or
 compromised guard node. However, link-level padding is still useful to
 defend against an adversary that can merely observe a Guard node
-externally. Right now, the only case where link-level padding is known
-to defend against any realistic attacker is for low-resolution
-netflow-based attacks (see Proposal 251[1]).
+externally, such as for low-resolution netflow-based attacks (see
+Proposal 251[1]).
 
-In that scenario, the primary mechanism we need is a way for mobile
-clients to tell their Guards to stop padding, or to pad less often. The
-following Trunnel payloads should cover the needed parameters:
+In that scenario, the primary negotiation mechanism we need is a way for
+mobile clients to tell their Guards to stop padding, or to pad less
+often. The following Trunnel payloads should cover the needed
+parameters:
 
     const CELL_PADDING_COMMAND_STOP = 1;
     const CELL_PADDING_COMMAND_START = 2;
@@ -49,9 +53,9 @@ following Trunnel payloads should cover the needed parameters:
       u8 command IN [CELL_PADDING_COMMAND_STOP];
     };
 
-    /* This command tells the relay to alter its min and max timeout
-       range values, and send padding at that rate (resuming if
-       stopped). */
+    /* This command tells the relay to alter its min and max netflow
+       timeout range values, and send padding at that rate (resuming
+       if stopped). */
     struct cell_padding_start {
       u8 command IN [CELL_PADDING_COMMAND_START];
 
@@ -63,25 +67,28 @@ following Trunnel payloads should cover the needed parameters:
       u16 ito_high_ms;
     };
 
-More complicated forms of link padding can still be specified using
-the primitives in Section 3, by using "leaky pipe" topology to send
-the RELAY commands to the Guard node instead of to later nodes in the
-circuit.
+More complicated forms of link-level padding can still be specified
+using the primitives in Section 3, by using "leaky pipe" topology to
+send the RELAY commands to the Guard node instead of to later nodes in
+the circuit.
 
 
 3. End-to-end circuit padding
 
-For end-to-end padding, we need two types of additional features: the
+For circuit-level padding, we need two types of additional features: the
 ability to schedule additional incoming cells at one or more fixed
 points in the future, and the ability to schedule a statistical
 distribution of arbitrary padding to overlay on top of non-padding
 traffic (aka "Adaptive Padding").
 
 In both cases, these messages will be sent from clients to middle nodes
-using "leaky pipe" property of the 'recognized' field of RELAY cells,
-allowing padding to originate from middle nodes on a circuit in a way
-that is not detectable from the Guard node. This same mechanism can also
-be used to request padding from the Guard node itself.
+using the "leaky pipe" property of the 'recognized' field of RELAY
+cells, allowing padding to originate from middle nodes on a circuit in a
+way that is not detectable from the Guard node.
+
+This same mechanism can also be used to request padding from the Guard
+node itself, to achieve link-level padding without the additional
+overhead requirements on middle nodes.
 
 3.1. Fixed-schedule padding message (RELAY_COMMAND_PADDING_SCHEDULE)
 
@@ -92,7 +99,8 @@ fixed time points in the future to send cells.
 XXX: 80 timers is a lot to allow every client to create. We may want to
 have something that checks this structure to ensure it actually
 schedules no more than N in practice, until we figure out how to
-optimize either libevent or timer scheduling/packet delivery.
+optimize either libevent or timer scheduling/packet delivery. See also
+Section 4.3.
 
 The RELAY_COMMAND_PADDING_SCHEDULE body is specified in Trunnel as
 follows:
@@ -121,8 +129,9 @@ with 100 cells in 3*MAX_INT microseconds from the receipt of this cell.
 The following message is a generalization of the Adaptive Padding
 defense specified in "Timing Attacks and Defenses"[2].
 
-The message encodes either one or two state machines, each of which
-contain two histograms ("Burst" and "Gap") governing their behavior.
+The message encodes either one or two state machines, each of which can
+contain one or two histograms ("Burst" and "Gap") governing their
+behavior.
 
 The "Burst" histogram specifies the delay probabilities for sending a
 padding packet after the arrival of a non-padding data packet.
@@ -141,25 +150,34 @@ allows a client to specify the types of incoming packets that cause the
 state machine to decide to schedule padding cells (and/or when to cease
 scheduling them).
 
-Note that our generalization of the Adaptive Padding state machine
-actually gives clients full control over the state transition events,
-even allowing them to specify a single-state state machine if desired.
-See Sections 3.2.1 and 3.2.2 for details.
+The client also maintains its own local histogram state machine(s), for
+reacting to traffic on its end.
+
+Note that our generalization of the Adaptive Padding state machine also
+gives clients full control over the state transition events, even
+allowing them to specify a single-state Burst-only state machine if
+desired. See Sections 3.2.1 and 3.2.2 for details.
 
 The histograms and the associated state machine packet layout is
 specified in Trunnel as follows:
 
     /* These constants form a bitfield to specify the types of events
-     * that can cause padding to start or stop from a given state. */
+     * that can cause transitions between state machine states.
+     *
+     * Note that SENT and RECV are relative to this endpoint. For
+     * relays, SENT means packets destined towards the client and
+     * RECV means packets destined towards the relay. On the client,
+     * SENT means packets destined towards the relay, where as RECV
+     * means packets destined towards the client.
+     */
     const RELAY_PADDING_TRANSITION_EVENT_NONPADDING_RECV = 1;
     const RELAY_PADDING_TRANSITION_EVENT_NONPADDING_SENT = 2;
     const RELAY_PADDING_TRANSITION_EVENT_PADDING_SENT = 4;
     const RELAY_PADDING_TRANSITION_EVENT_PADDING_RECV = 8;
 
-    /*
-      This encodes a histogram delay distribution representing the
-      probability of sending a single RELAY_DROP cell after a given
-      delay in response to a non-padding cell.
+    /* This payload encodes a histogram delay distribution representing
+     * the probability of sending a single RELAY_DROP cell after a
+     * given delay in response to a non-padding cell.
      */
     struct burst_state {
       u8 histogram_len IN [2..51];
@@ -167,35 +185,31 @@ specified in Trunnel as follows:
       u32 start_usec;
       u16 max_sec;
 
-      /*
-         This is a bitfield that specifies which direction and types
-         of traffic that cause us to abort our scheduled packet and
-         return to waiting for another event from transition_burst_events.
+      /* This is a bitfield that specifies which direction and types
+       * of traffic that cause us to abort our scheduled packet and
+       * return to waiting for another event from transition_burst_events.
        */
       u8 transition_start_events;
 
-      /*
-         This is a bitfield that specifies which direction and types
-         of traffic that cause us to remain in the burst state: Cancel the
-         pending padding packet (if any), and schedule another padding
-         packet from our histogram.
+      /* This is a bitfield that specifies which direction and types
+       * of traffic that cause us to remain in the burst state: Cancel the
+       * pending padding packet (if any), and schedule another padding
+       * packet from our histogram.
        */
       u8 transition_reschedule_events;
 
-      /*
-         This is a bitfield that specifies which direction and types
-         of traffic that cause us to transition to the Gap state.
-       */
+      /* This is a bitfield that specifies which direction and types
+       * of traffic that cause us to transition to the Gap state. */
       u8 transition_gap_events;
 
-      /* Should we remove tokens from the histogram as packets are sent? */
+      /* If true, remove tokens from the histogram upon padding and
+       * non-padding activity. */
       u8 remove_toks IN [0,1];
     };
 
-    /*
-      This histogram encodes a delay distribution representing the
-      probability of sending a single additional padding packet after
-      sending a padding packet that originated at this hop.
+    /* This histogram encodes a delay distribution representing the
+     * probability of sending a single additional padding packet after
+     * sending a padding packet that originated at this hop.
      */
     struct gap_state {
       u8 histogram_len IN [2..51];
@@ -204,31 +218,31 @@ specified in Trunnel as follows:
       u16 max_sec;
 
       /* This is a bitfield which specifies which direction and types
-         of traffic should cause us to transition back to the start
-         state (ie: abort scheduling packets completely). */
+       * of traffic should cause us to transition back to the start
+       * state (ie: abort scheduling packets completely). */
       u8 transition_start_events;
 
       /* This is a bitfield which specifies which direction and types
-         of traffic should cause us to transition back to the burst
-         state (and schedule a packet from the burst histogram). */
+       * of traffic should cause us to transition back to the burst
+       * state (and schedule a packet from the burst histogram). */
       u8 transition_burst_events;
 
-      /*
-         This is a bitfield that specifies which direction and types
-         of traffic that cause us to remain in the gap state: Cancel the
-         pending padding packet (if any), and schedule another padding
-         packet from our histogram.
+      /* This is a bitfield that specifies which direction and types
+       * of traffic that cause us to remain in the gap state: Cancel the
+       * pending padding packet (if any), and schedule another padding
+       * packet from our histogram.
        */
       u8 transition_reschedule_events;
 
-      /* Should we remove tokens from the histogram as packets are sent? */
+      /* If true, remove tokens from the histogram upon padding and
+         non-padding activity. */
       u8 remove_toks IN [0,1];
     };
 
     struct adaptive_padding_machine {
       /* This is a bitfield which specifies which direction and types
-         of traffic should cause us to transition to the burst
-         state (and schedule a packet from the burst histogram). */
+       * of traffic should cause us to transition to the burst
+       * state (and schedule a packet from the burst histogram). */
        u8 transition_burst_events;
 
        struct burst_state burst;
@@ -236,7 +250,7 @@ specified in Trunnel as follows:
     };
 
     /* This is the full payload of a RELAY_COMMAND_PADDING_ADAPTIVE
-       cell. */
+     * cell. */
     struct relay_command_padding_adaptive {
        u8 num_machines IN [1,2];
 
@@ -307,16 +321,24 @@ single-state machine if desired.
 
 Clients are expected to maintain their own local version of the state
 machines, for reacting to their own locally generated traffic, in
-addition to sending one or more state machines to the middle relay.
+addition to sending one or more state machines to the middle relay. The
+histograms that the client uses locally will differ from the ones it
+sends to the upstream relay.
 
-The histograms that the client uses locally will likely differ from the
-ones it sends to the upstream relay.
+On the client, the "SENT" direction means packets destined towards the
+upstream, where as "RECV" means packets destined towards the client.
+However, on the relay, the "SENT" direction means packets destined
+towards the client, where as "RECV" means packets destined towards the
+relay.
 
 3.2.2. The original Adaptive Padding algorithm
 
 As we have noted, the state machines above represent a generalization of
 the original Adaptive Padding algorithm. To implement the original
-behavior, the following flags should be set:
+behavior, the following flags should be set in both the client and
+the relay state machines:
+
+ num_machines = 1;
 
  machines[0].transition_burst_events =
     RELAY_PADDING_TRANSITION_EVENT_NONPADDING_SENT;
@@ -338,6 +360,10 @@ The rest of the transition fields would be 0.
 Adding additional transition flags will either increase or decrease the
 amount of padding sent, depending on their placement.
 
+The second machine slot is provided in the event that it proves useful
+to have separate state machines reacting to both sent and received
+traffic.
+
 3.2.3. Histogram decoding/representation
 
 Each of the histograms' fields represent a probability distribution that
@@ -413,11 +439,18 @@ Actual optimal histogram and state transition construction for different
 traffic types is expected to be a topic for further research.
 
 Intuitively, the burst state is used to detect when the line is idle
-(and should therefore have few or no tokens in low histogram bins), and
-the gap state is used to fill in otherwise idle periods with artificial
+(and should therefore have few or no tokens in low histogram bins). The
+lack of tokens in the low histogram bins causes the system to remain in
+the burst state until the actual application traffic either slows,
+stalls, or has a gap.
+
+The gap state is used to fill in otherwise idle periods with artificial
 payloads from the server (and should have many tokens in low bins, and
-possibly some also at higher bins). However, more complicated
-interactions are also possible.
+possibly some also at higher bins).
+
+It should be noted that due to our generalization of these states and
+their transition possibilities, more complicated interactions are also
+possible.
 
 
 4. Security considerations and mitigations
@@ -425,7 +458,7 @@ interactions are also possible.
 The risks from this proposal are primarily DoS/resource exhaustion, and
 side channels.
 
-4.1. Traffic handling
+4.1. Rate limiting
 
 Fully client-requested padding introduces a vector for resource
 amplification attacks and general network overload due to
@@ -509,6 +542,15 @@ rather than from the expected interior node, clients should alert the
 user of the possibility of that circuit endpoint introducing a
 side-channel attack, and/or close the circuit.
 
+4.5 Memory exhaustion
+
+Because interior nodes do not have information on the current circuits
+SENDME windows, it is possible for malicious clients to consume the
+buffers of relays by specifying padding, and then not reading from the
+associated circuits.
+
+XXX: This is bad. We need to add padding-level flow control windows :(
+
 -------------------
 
 1. https://gitweb.torproject.org/torspec.git/tree/proposals/251-netflow-padding.txt