commit 17df80f84b3425b13c71eb8ab29d23e3a59863d3
Author: David Goulet <dgoulet(a)torproject.org>
Date: Tue Nov 3 11:29:38 2020 -0500
prop328: Initial draft
Signed-off-by: Mike Perry <mikeprry(a)torproject.org>
Signed-off-by: David Goulet <dgoulet(a)torproject.org>
---
proposals/000-index.txt | 2 +
proposals/328-relay-overload-report.md | 199 +++++++++++++++++++++++++++++++++
proposals/BY_INDEX.md | 1 +
proposals/README.md | 1 +
4 files changed, 203 insertions(+)
diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index f6ff770..6851cff 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -248,6 +248,7 @@ Proposals by number:
325 Packed relay cells: saving space on small commands [OPEN]
326 The "tor-relay" Well-Known Resource Identifier [OPEN]
327 A First Take at PoW Over Introduction Circuits [DRAFT]
+328 Make Relays Report When They Are Overloaded [DRAFT]
Proposals by status:
@@ -257,6 +258,7 @@ Proposals by status:
294 TLS 1.3 Migration
316 FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
327 A First Take at PoW Over Introduction Circuits
+ 328 Make Relays Report When They Are Overloaded
NEEDS-REVISION:
212 Increase Acceptable Consensus Age [for 0.2.4.x+]
219 Support for full DNS and DNSSEC resolution in Tor [for 0.2.5.x]
diff --git a/proposals/328-relay-overload-report.md b/proposals/328-relay-overload-report.md
new file mode 100644
index 0000000..b05d289
--- /dev/null
+++ b/proposals/328-relay-overload-report.md
@@ -0,0 +1,199 @@
+```
+Filename: 328-relay-overload-report.md
+Title: Make Relays Report When They Are Overloaded
+Author: David Goulet, Mike Perry
+Created: November 3rd 2020
+Status: Draft
+```
+
+# 0. Introduction
+
+Many relays are likely sometimes under heavy load in terms of memory, CPU or
+network resources which in turns diminishes their ability to efficiently relay
+data through the network.
+
+Having the capability of learning if a relay is overloaded would allow us to
+make better informed load balancing decisions. For instance, we can make our
+bandwidth scanners more intelligent on how they allocate bandwidth based on
+such metrics from relays.
+
+We could furthermore improve our network health monitoring and pinpoint relays
+possibly misbehaving or under DDoS attack.
+
+# 1. Metrics to Report
+
+We propose that relays start collecting several metrics (see section 2)
+reflecting their loads from different component of tor.
+
+Then, we propose that 3 new lines be added to the extra-info document (see
+dir-spec.txt, section 2.1.2) if only the overload case arrise.
+
+This following describes a series of metrics to collect but more might come in
+the future and thus this is not an exhaustive list.
+
+# 1.1. General Overload
+
+The general overload line indicates that a relay has reached an "overloaded
+state" which can be one or many of the following load metrics:
+
+ - Any OOMkiller invocation due to memory pressure
+ - Any onionskins are dropped
+ - CPU utilization of Tor's mainloop CPU core above 90% for 60 sec
+ - TCP port exhaustion
+
+The format of the overloaded line added in the extra-info document is as
+follow:
+
+```
+"overload-reached" YYYY-MM-DD HH:MM:SS NL
+ [At most once.]
+```
+
+The timestamp is when a at least one metrics was detected. It should always be
+at the hour and thus, as an example, "2020-01-10 13:00:00" is an expected
+timestamp. Because this is a binary state, if the line is present, we consider
+that it was hit at the very least once somewhere between the provided
+timestamp and the "published" timestamp of the document which is when the
+document was generated.
+
+The overload field should remain in place for 72 hours since last triggered.
+If the limits are reached again in this period, the timestamp is updated, and
+this 72 hour period restarts.
+
+# 1.2. Token bucket size
+
+Relays should report the 'BandwidthBurst' and 'BandwidthRate' limits in their
+descriptor, as well as the number of times these limits were reached, for read
+and write, in the past 24 hours starting at the provided timestamp rounded
+down to the hour.
+
+```
+"overload-ratelimits" SP YYYY-MM-DD SP HH:MM:SS
+ SP rate-limit SP burst-limit
+ SP read-rate-count SP read-burst-count
+ SP write-rate-count SP write-burst-count NL
+ [At most once.]
+```
+
+The "rate-limit" and "burst-limit" are the raw values from the BandwidthRate
+and BandwidthBurst found in the torrc configuration file.
+
+The "{read|write}-rate-count" and "{read|write}-burst-count" are the counts of
+how many times the reported limits were exhausted and thus the maximum between
+the read and write count occurances.
+
+# 1.3. File Descriptor Exhaustion
+
+Not having enough file descriptors in this day of age is really a
+misconfiguration or a too old operation system. That way, we can very quickly
+notice which relay has a value too small and we can notify them.
+
+This should be published in this format:
+
+```
+"overload-fd-exhausted" YYYY-MM-DD HH:MM:SS NL
+ [At most once.]
+```
+
+As the overloaded line, the timestamp indicates that the maximum was reached
+between the this timestamp and the "published" timestamp of the document.
+
+This overload field should remain in place for 72 hours since last triggered.
+If the limits are reached again in this period, the timestamp is updated, and
+this 72 hour period restarts.
+
+# 2. Load Metrics
+
+This section proposes a series of metrics that should be collected and
+reported to the MetricsPort. The Prometheus format (only one supported for
+now) is described for each metrics but each of them are prefixed with the
+following in order to have a proper namespace for "load" events:
+
+`tor_load_`
+
+## 2.1 Out-Of-Memory (OOM) Invocation
+
+Tor's OOM manages caches and queues of all sorts. Relays have many of them and
+so any invocation of the OOM should be reported.
+
+```
+# HELP Total number of bytes the OOM has cleaned up
+# TYPE counter
+tor_load_oom_bytes_total{<LABEL>} <VALUE>
+```
+
+Running counter of how many bytes were cleaned up by the OOM for a tor
+component identified by a label (see list below). To make sense, this should
+be visualized with the rate() function.
+
+Possible LABELs for which the OOM was triggered:
+ - `cell`: Circuit cell queue
+ - `dns`: DNS resolution cache
+ - `geoip`: GeoIP cache
+ - `hsdir`: Onion service descriptors
+
+## 2.2 Onionskin Queues
+
+Onionskins handling is one of the few items that tor processes in parallel but
+they can be dropped for various reasons when under load. For this metrics to
+make sense, we also need to gather how many onionskins are we processing and
+thus one can provide a total processed versus dropped ratio:
+
+```
+# HELP Total number of onionskins
+# TYPE counter
+tor_load_onionskin_total{<LABEL>} <NUM>
+```
+
+Possible LABELs are:
+ - `processed`: Indicating how many were processed.
+ - `dropped`: Indicating how many were dropped due to load.
+
+## 2.3 File Descriptor Exhaustion
+
+Relays can reach a "ulimit" (on Linux) cap that is the number of allowed
+opened file descriptors. In Tor's use case, this is mostly sockets. File
+descriptors should be reported as follow:
+
+```
+# HELP Total number of file descriptors
+# TYPE gauge
+tor_load_fd_total{<LABEL>} <NUM>
+```
+
+Possible LABELs are:
+ - `remaining`: How many file descriptors remains that is can be opened.
+
+Note: since tor does track that value in order to reserve a block for critical
+port such as the Control Port, that value can easily be exported.
+
+## 2.4 TCP Port Exhaustion
+
+TCP protocol is capped at 65535 ports and thus if the relay ever is unable to
+open more outbound sockets, that is an overloaded state. It should be
+reported:
+
+```
+# HELP Total number of opened outbound connections.
+# TYPE gauge
+tor_load_socket_total{<LABEL>} <NUM>
+```
+
+Possible LABELs are:
+ - `outbound`: Sockets used for outbound connections.
+
+## 2.5 Connection Bucket Limit
+
+Rate limited connections track bandwidth using a bucket system. Once the
+bucket is filled and tor wants to send more, it pauses until it is refilled a
+second later. Once that is hit, it should be reported:
+
+```
+# HELP Total number of global connection bucket limit reached
+# TYPE counter
+tor_load_global_rate_limit_reached_total{<LABEL>} <NUM>
+```
+
+Possible LABELs are:
+ - `read`: Read side of the global rate limit bucket.
+ - `write`: Write side of the global rate limit bucket.
diff --git a/proposals/BY_INDEX.md b/proposals/BY_INDEX.md
index 90a151d..3d80c1c 100644
--- a/proposals/BY_INDEX.md
+++ b/proposals/BY_INDEX.md
@@ -245,4 +245,5 @@ Below are a list of proposals sorted by their proposal number. See
* [`325-packed-relay-cells.md`](/proposals/325-packed-relay-cells.md): Packed relay cells: saving space on small commands [OPEN]
* [`326-tor-relay-well-known-uri-rfc8615.md`](/proposals/326-tor-relay-well-known-uri-rfc8615.md): The "tor-relay" Well-Known Resource Identifier [OPEN]
* [`327-pow-over-intro.txt`](/proposals/327-pow-over-intro.txt): A First Take at PoW Over Introduction Circuits [DRAFT]
+* [`328-relay-overload-report.md`](/proposals/328-relay-overload-report.md): Make Relays Report When They Are Overloaded [DRAFT]
diff --git a/proposals/README.md b/proposals/README.md
index 904009e..44996ad 100644
--- a/proposals/README.md
+++ b/proposals/README.md
@@ -108,6 +108,7 @@ discussion.
* [`294-tls-1.3.txt`](/proposals/294-tls-1.3.txt): TLS 1.3 Migration
* [`316-flashflow.md`](/proposals/316-flashflow.md): FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
* [`327-pow-over-intro.txt`](/proposals/327-pow-over-intro.txt): A First Take at PoW Over Introduction Circuits
+* [`328-relay-overload-report.md`](/proposals/328-relay-overload-report.md): Make Relays Report When They Are Overloaded
## NEEDS-REVISION proposals: ideas that we can't implement as-is