[tor-commits] [torspec/main] prop328: Update it to reflect implementation

dgoulet at torproject.org dgoulet at torproject.org
Mon Nov 1 19:03:51 UTC 2021


commit d2b9010b2ebbf293377516af98810c6597687292
Author: David Goulet <dgoulet at torproject.org>
Date:   Mon Nov 1 14:53:51 2021 -0400

    prop328: Update it to reflect implementation
    
    Signed-off-by: David Goulet <dgoulet at torproject.org>
---
 proposals/328-relay-overload-report.md | 39 ++++++++++++++++------------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/proposals/328-relay-overload-report.md b/proposals/328-relay-overload-report.md
index 5f4c6fd..9348c80 100644
--- a/proposals/328-relay-overload-report.md
+++ b/proposals/328-relay-overload-report.md
@@ -39,9 +39,15 @@ state" which can be one or many of the following load metrics:
    - Any OOM invocation due to memory pressure
    - Any ntor onionskins are dropped
    - TCP port exhaustion
-   - DNS timeout reached
+   - DNS timeout reached (X% of timeouts over Y seconds).
    - CPU utilization of Tor's mainloop CPU core above 90% for 60 sec
+     [Never implemented]
    - Control port overload (too many messages queued)
+     [Never implemented]
+
+For DNS timeouts, the X and Y are consensus parameters
+(overload_dns_timeout_scale_percent and overload_dns_timeout_period_secs)
+defined in param-spec.txt.
 
 The format of the overloaded line added in the server descriptor document is
 as follows:
@@ -64,9 +70,7 @@ this 72 hour period restarts.
 
 The 'version' field is set to '1' for the initial implementation of this
 proposal which includes all the above overload metrics except from the CPU and
-control port overload. The first version also uses a primitive logic for
-detecting DNS timeouts (only if libevent failed a set of 3 DNS requests/retries
-in a row).
+control port overload. 
 
 # 1.2. Token bucket size
 
@@ -121,10 +125,7 @@ proposal which detects fd exhaustion only when a socket open fails.
 
 This section proposes a series of metrics that should be collected and
 reported to the MetricsPort. The Prometheus format (only one supported for
-now) is described for each metrics but each of them are prefixed with the
-following in order to have a proper namespace for "load" events:
-
-`tor_load_`
+now) is described for each metrics.
 
 ## 2.1 Out-Of-Memory (OOM) Invocation
 
@@ -134,7 +135,7 @@ so any invocation of the OOM should be reported.
 ```
 # HELP Total number of bytes the OOM has cleaned up
 # TYPE counter
-tor_load_oom_bytes_total{<LABEL>} <VALUE>
+tor_relay_load_oom_bytes_total{<LABEL>} <VALUE>
 ```
 
 Running counter of how many bytes were cleaned up by the OOM for a tor
@@ -157,7 +158,7 @@ thus one can provide a total processed versus dropped ratio:
 ```
 # HELP Total number of onionskins
 # TYPE counter
-tor_load_onionskin_total{<LABEL>} <NUM>
+tor_relay_load_onionskins_total{<LABEL>} <NUM>
 ```
 
 Possible LABELs are:
@@ -173,14 +174,14 @@ opened file descriptors. In Tor's use case, this is mostly sockets. File
 descriptors should be reported as follow:
 
 ```
-# HELP Total number of file descriptors
+# HELP Total number of sockets
 # TYPE gauge
-tor_load_fd_total{<LABEL>} <NUM>
+tor_relay_load_socket_total{<LABEL>} <NUM>
 ```
 
 Possible LABELs are:
-  - `state=total`: Maximum number of file descriptors allowed open
-  - `state=opened`: How many file descriptors are opened.
+  - <none>: How many available sockets.
+  - `state=opened`: How many sockets are opened.
 
 Note: since tor does track that value in order to reserve a block for critical
 port such as the Control Port, that value can easily be exported.
@@ -192,15 +193,11 @@ open more outbound sockets, that is an overloaded state. It should be
 reported:
 
 ```
-# HELP Total number of opened outbound connections.
+# HELP Total number of times we ran out of TCP ports
 # TYPE gauge
-tor_load_socket_total{<LABEL>} <NUM>
+tor_relay_load_tcp_exhaustion_total <NUM>
 ```
 
-Possible LABELs are:
-  - `state=total`: Maximum number of sockets allowed open
-  - `state=opened`: How sockets are opened.
-
 ## 2.5 Connection Bucket Limit
 
 Rate limited connections track bandwidth using a bucket system. Once the
@@ -210,7 +207,7 @@ second later. Once that is hit, it should be reported:
 ```
 # HELP Total number of global connection bucket limit reached
 # TYPE counter
-tor_load_global_rate_limit_reached_total{<LABEL>} <NUM>
+tor_relay_load_global_rate_limit_reached_total{<LABEL>} <NUM>
 ```
 
 Possible LABELs are:





More information about the tor-commits mailing list