March 2021 - tor-commits - lists.torproject.org

[translation/communitytpo-contentspot] https://gitweb.torproject.org/translation.git/commit/?h=communitytpo-contentspot
by translation＠torproject.org 30 Mar '21

30 Mar '21

commit 2b18e81f9f30ddb718c327e1d61b379c720a4ea6 Author: Translation commit bot <translation(a)torproject.org> Date: Tue Mar 30 19:45:13 2021 +0000 https://gitweb.torproject.org/translation.git/commit/?h=communitytpo-conten… --- contents+hu.po | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/contents+hu.po b/contents+hu.po index bb28fd80fe..dc7e46f033 100644 --- a/contents+hu.po +++ b/contents+hu.po @@ -5,6 +5,7 @@ # erinm, 2021 # Gus, 2021 # István Dávid <istvandavid(a)icloud.com>, 2021 +# Emma Peel, 2021 # msgid "" msgstr "" @@ -12,7 +13,7 @@ msgstr "" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2021-03-18 19:45+CET\n" "PO-Revision-Date: 2019-12-11 10:50+0000\n" -"Last-Translator: István Dávid <istvandavid(a)icloud.com>, 2021\n" +"Last-Translator: Emma Peel, 2021\n" "Language-Team: Hungarian (https://www.transifex.com/otf/teams/1519/hu/)\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" @@ -16400,7 +16401,7 @@ msgstr "Az oldal közreműködői:" #: templates/outreach-talk.html:77 templates/two-columns-page.html:30 msgid "Back to previous page: " -msgstr "" +msgstr "Visszatérés az előző oldalra:" #: templates/outreach-talk.html:77 templates/two-columns-page.html:30 msgid "Edit this page"

1 0

[translation/gettor-website-contentspot] https://gitweb.torproject.org/translation.git/commit/?h=gettor-website-contentspot
by translation＠torproject.org 30 Mar '21

30 Mar '21

commit ba9b54ca48d780cac5e968b4126adcf92f40d693 Author: Translation commit bot <translation(a)torproject.org> Date: Tue Mar 30 17:45:17 2021 +0000 https://gitweb.torproject.org/translation.git/commit/?h=gettor-website-cont… --- contents+zh-TW.po | 2 ++ 1 file changed, 2 insertions(+) diff --git a/contents+zh-TW.po b/contents+zh-TW.po index 0b936d23f4..1fc98d564d 100644 --- a/contents+zh-TW.po +++ b/contents+zh-TW.po @@ -141,6 +141,8 @@ msgid "" "to use bridges. To use pluggable transports, click 'Configure' in the Tor " "Launcher window that appears when you first run Tor Browser." msgstr "" +"當你第一次使用Tor瀏覽器的時候，軟件會問你是否想用橋接中繼站。請你在第一次啟動Tor瀏覽器時，在Tor啟動器開啟的視窗中選擇 “設定” " +"以使用可插式傳輸。" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid ""

1 0

[translation/tor-launcher-properties] https://gitweb.torproject.org/translation.git/commit/?h=tor-launcher-properties
by translation＠torproject.org 30 Mar '21

30 Mar '21

commit d855f24dd52deeb5ad25d09fea220875469e9ba9 Author: Translation commit bot <translation(a)torproject.org> Date: Tue Mar 30 17:17:49 2021 +0000 https://gitweb.torproject.org/translation.git/commit/?h=tor-launcher-proper… --- zh-TW/torlauncher.properties | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/zh-TW/torlauncher.properties b/zh-TW/torlauncher.properties index 47a0ca04b2..398eecb578 100644 --- a/zh-TW/torlauncher.properties +++ b/zh-TW/torlauncher.properties @@ -1,7 +1,7 @@ ### Copyright (c) 2020, The Tor Project, Inc. ### See LICENSE for licensing information. -torlauncher.error_title=洋蔥路由啟動工具 +torlauncher.error_title=Tor啟動工具 torlauncher.tor_exited_during_startup=洋蔥瀏覽器在啟動時意外終止。這可能是因為 torrc 設定檔的錯誤、洋蔥瀏覽器本身、您系統上其他程式的漏洞或是硬體故障所造成。必須先解決潛在的問題並重新啟動洋蔥瀏覽器，本程式才能正常運作。 torlauncher.tor_exited=洋蔥瀏覽器意外終止了。這可能是因為洋蔥瀏覽器本身的程式錯誤、您系統上的其它程式或是硬體故障所造成。除非您重新啟動洋蔥瀏覽器，否則洋蔥瀏覽器將無法連線到任何網站。如果此問題持續發生，請寄送您的洋蔥瀏覽器錯誤報告給支援團隊。

1 0

[translation/gettor-website-contentspot] https://gitweb.torproject.org/translation.git/commit/?h=gettor-website-contentspot
by translation＠torproject.org 30 Mar '21

30 Mar '21

commit ab31d3ebf7c302ff4d27447f512175bc037fa0cf Author: Translation commit bot <translation(a)torproject.org> Date: Tue Mar 30 17:15:21 2021 +0000 https://gitweb.torproject.org/translation.git/commit/?h=gettor-website-cont… --- contents+zh-TW.po | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/contents+zh-TW.po b/contents+zh-TW.po index 7108264b70..0b936d23f4 100644 --- a/contents+zh-TW.po +++ b/contents+zh-TW.po @@ -66,15 +66,15 @@ msgstr "- 第四步：如果需要的話，你可以使用橋接器！" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "## GetTor Responder on Twitter" -msgstr "" +msgstr "## Twitter上的GetTor回覆器" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "GetTor is currently not working on Twitter." -msgstr "" +msgstr "GetTor暫時未有在Twitter上運作。" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "## How to verify a digital signature" -msgstr "" +msgstr "## 如何驗證數位簽章" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "" @@ -86,7 +86,7 @@ msgstr "簽署數位簽章是用來確保套裝軟體是來自其開發者且沒 msgid "" "In GetTor emails we provide a link to a file with the same name as the " "package and the extension \".asc\". These .asc files are OpenPGP signatures." -msgstr "" +msgstr "在GetTor發出的電子郵件中，我們會附上一個和套裝軟體同名，副檔名為「.asc」的檔案。這些 .asc 檔案是OpenPGP數位簽章。 " #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "" @@ -99,16 +99,20 @@ msgid "" "For example, `torbrowser-install-win64-8.5.4_en-US.exe` is accompanied by " "`torbrowser-install-win64-8.5.4_en-US.exe.asc`." msgstr "" +"例如，「torbrowser-install-win64-8.5.4_en-US.exe.asc」會隨「torbrowser-install-" +"win64-8.5.4_en-US.exe」被一併下載。" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "" "Check [how to verify a digital signature](https://support.torproject.org/tbb" "/how-to-verify-signature/)." msgstr "" +"你可以看看[如何驗證數位簽章](https://support.torproject.org/tbb/how-to-verify-" +"signature/)。" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid "## How to get bridges" -msgstr "" +msgstr "## 怎樣可以取得橋接中繼站" #: https//gettor.torproject.org/ (content/contents+en.lrpage.body) msgid ""

1 0

[torspec/master] Add Acknowledgements to Proposal 324.
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit f66a457cb27202931d276f8b38ee92223344bd0d Author: Mike Perry <mikeperry-git(a)torproject.org> Date: Thu Mar 25 20:56:48 2021 +0000 Add Acknowledgements to Proposal 324. --- proposals/324-rtt-congestion-control.txt | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/proposals/324-rtt-congestion-control.txt b/proposals/324-rtt-congestion-control.txt index 91e1254..807471b 100644 --- a/proposals/324-rtt-congestion-control.txt +++ b/proposals/324-rtt-congestion-control.txt @@ -1053,8 +1053,15 @@ also be used as a side channel. So we must limit its use to a couple of cells per circuit, at most. https://blog.torproject.org/tor-security-advisory-relay-early-traffic-confi… +9. Acknowledgements -9. [CITATIONS] +Immense thanks to Toke Høiland-Jørgensen for considerable input into all +aspects of the TCP congestion control background material for this proposal, +as well as review of our versions of the algorithms. + + + +10. [CITATIONS] 1. Options for Congestion Control in Tor-Like Networks. https://lists.torproject.org/pipermail/tor-dev/2020-January/014140.html

1 0

[torspec/master] Proposal 329: Conflux traffic splitting
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit 9f39565e61b65bb5f65d37ed779d17fe1c596179 Author: David Goulet <dgoulet(a)torproject.org> Date: Wed Feb 6 17:24:01 2019 -0500 Proposal 329: Conflux traffic splitting --- proposals/329-traffic-splitting.txt | 843 ++++++++++++++++++++++++++++++++++++ 1 file changed, 843 insertions(+) diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt new file mode 100644 index 0000000..746c6c4 --- /dev/null +++ b/proposals/329-traffic-splitting.txt @@ -0,0 +1,843 @@ +Filename: 329-traffic-splitting.txt +Title: Overcoming Tor's Bottlenecks with Traffic Splitting +Author: David Goulet, Mike Perry +Created: 2020-11-25 +Status: Draft + +0. Status + + This proposal describes the Conflux [CONFLUX] system developed by Mashael + AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at improving + Tor client network performance by dynamically splitting traffic between two + circuits. + + +1. Overview + +1.1. Multipath TCP Design Space + + In order to understand our improvements to Conflux, it is important to + properly conceptualize what is involved in the design of multipath + algorithms in general. + + The design space is broken into two orthogonal parts: congestion + control algorithms that apply to each path, and traffic scheduling + algorithms that decide when to send packets to send on each path. + + MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled + congestion control updates single-path congestion control algorithms to + account for shared bottlenecks between the paths, so that the combined + congestion control algorithms do not overwhelm any bottlenecks that + happen to be shared between the multiple paths. Various ways of + accomplishing this have been proposed and implemented in the Linux kernel. + + Because Tor's congestion control only concerns itself with bottnecks in Tor + relay queues, and not with any other bottlenecks (such as intermediate + Internet routers), we can avoid this complexity merely by specifying that + any paths that are constructed should not share any relays. In this way, we + can proceed to use the exact same congestion control as specified in Proposal + 324, for each path. + + For this reason, this proposal will focus on the traffic scheduling + algorithms, rather than coupling. We propose three candidate algorithms + that have been studied in the literature, and will compare their + performance using simulation and consensus parameters. + +1.2. Divergence from the initial Conflux design + + The initial [CONFLUX] paper doesn't provide any indications on how to handle + the size of out-of-order cell queue, which we consider a potential dangerous + memory DoS vector (see [MEMORY_DOS]). It also used RTT as the sole heuristic + for selecting which circuit to send on, which may vary depending on the + geographical locations of the participant relays, without considering their + actual available circuit capacity (which will be available to us via Proposal + 324). Additionally, since the publication of [CONFLUX], more modern + packet scheduling algorithms have been developed, which aim to reduce + out-of-order queue size. + + We propose mitigations for these issues using modern scheduling algorithms, + as well as implementations options for avoiding the out-of-order queue at + Exit relays. Additionally, we consider resumption, side channel, and traffic + analysis risks and benefits in [RESUMPTION], [SIDE_CHANNELS] and + [TRAFFIC_ANALYSIS]. + + +2. Design + + The following section describes the Conflux design. Each sub-section is a + building block to the multipath design that Conflux proposes. + + The circuit construction is as follow: + + Primary Circuit (lower RTT) + +-------+ +--------+ + |Guard 1|----->|Middle 1|----------+ + +---^---+ +--------+ | + +-----+ | +--v---+ + | OP +------+ | Exit |--> ... + +-----+ | +--^---+ + +---v---+ +--------+ | + |Guard 2|----->|Middle 2|----------+ + +-------+ +--------+ + Secondary Circuit (higher RTT) + + Both circuits are built using current Tor path selection, however they + SHOULD NOT share the same Guard relay, or middle relay. By avoiding + using the same relays in these positions in the path, we ensure + additional path capacity, and eliminate the need to use more complicated + 'coupled' congestion control algorithms from the MPTCP literature[COUPLED]. + This both simplifies design, and improves performance. + + Then, the OP needs to link the two circuits together, as described in + [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE]. + + For ease of explanation, the primary circuit is the circuit with lower RTT, + and the secondary circuit is the circuit with higher RTT. Initial RTT + is measured during circuit linking, as described in [LINKING_CIRCUITS]. + RTT is continually measured using SENDME timing, as in Proposal 324. + This means that during use, the primary circuit and secondary circuit may + switch roles, depending on unrelated network congestion caused by other + Tor clients. + + We also support linking onion service circuits together. In this case, + only two rendezvous circuits are linked. Each of these RP circuits will be + constructed separately, and then linked. However, the same path constraints + apply to each half of the circuits (no shared relays between the legs). + Should, by chance, the service and the client sides end up sharing some + relays, this is not catastrophic. Multipath TCP researchers we have + consulted believe Tor's congestion control from Proposal 324 to be + sufficient in this rare case. + + Only two circuits SHOULD be linked together. However, implementations + SHOULD make it easy for researchers to *test* more than two paths, as this + has been shown to assist in traffic analysis resistance[WTF_SPLIT]. At + minimum, this means not hardcoding only two circuits in the implementation. + + If the number of circuits exceeds the current number of guard relays, + guard relays MAY be re-used, but implementations SHOULD use the same + number of Guards as paths. + + Linked circuits MUST NOT be extended further once linked (ie: + 'cannibalization' is not supported). + +2.1. Advertising support for conflux + + We propose a new protocol version in order to advertise support for + circuit linking on the relay side: + + "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell + header used for multipath circuit which are linked with the + new RELAY_CIRCUIT_LINK relay cell command. + + XXX: Advertise this in onion service descriptor. + XXX: Onion service descriptor can advertise more than two circuits? + + The next section describes how the circuits are linked together. + +2.2. Linking circuits [LINKING_CIRCUITS] + + To link circuits, we propose new relay commands that are sent on both + circuits, as well as a response to confirm the join, and an ack of this + response. These commands create a 3way handshake, which allows each + endpoint to measure the initial RTT of each leg upon link, without + needing to wait for any data. + + All three stages of this handshake are sent on *each* circuit leg to be + linked. + + To save round trips, these cells SHOULD be combined with the initial + RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See + [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each case. + + There are other ways to do this linking that we have considered, but they + seem not to be significantly better than this method, especially since we + can use Proposal 325 to eliminate the RTT cost of this setup before sending + data. For those other ideas, see [ALTERNATIVE_LINKING] and + [ALTERNATIVE_RTT], in the appendix. + + The first two parts of the handshake establish the link, and enable + resumption: + + 16 -- RELAY_CIRCUIT_LINK + + Sent from the OP to the exit/service in order to link + circuits together at the end point. + + 17 -- RELAY_CIRCUIT_LINKED + + Sent from the exit/service to the OP, to confirm the circuits + were linked. + + These cells have the following contents: + + VERSION [1 byte] + PAYLOAD [variable, up to end of relay payload] + + The VERSION tells us which circuit linking mechanism to use. At this point + in time, only 0x01 is recognized and is the one described by the Conflux + design. + + For version 0x01, the PAYLOAD contains: + + NONCE [32 bytes] + LAST_SEQNO_SENT [8 bytes] + LAST_SEQNO_RECV [8 bytes] + + XXX: Should we let endpoints specify their preferred [SCHEDULING] alg + here, to override consensus params? This has benefits: eg low-memory + mobile clients can ask for an alg that is better for their reorder + queues. But it also has complexity risk, if the other endpoint does + not want to support it, because of its own memory issues. + + The NONCE contains a random 256-bit secret, used to associate the two + circuits together. The nonce must not be shared outside of the circuit + transmission, or data may be injected into TCP streams. This means it + MUST NOT be logged to disk. + + The two sequence number fields are 0 upon initial link, but non-zero in the + case of a resumption attempt (See [RESUMPTION]). + + If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both + circuits MUST be closed. + + The third stage of the handshake exists to help the exit/service measure + initial RTT, for use in [SCHEDULING]: + + 18 -- RELAY_CIRCUIT_LINKED_RTT_ACK + + Sent from the OP to the exit/service, to provide initial RTT + measurement for the exit/service. + + For timeout of the handshake, clients should use the normal SOCKS/stream + timeout already in use for RELAY_BEGIN. + + These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED, + and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each + endpoint to measure the initial RTT of each leg. + +2.2. Linking Circuits from OP to Exit [LINKING_EXIT] + + To link exit circuits, two circuits to the same exit are built. The + client records the circuit build time of each. + + If the circuits are being built on-demand, for immediate use, the + circuit with the lower build time SHOULD use Proposal 325 to append its + first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the + lower circuit build time. The exit MUST respond on this same leg. + After that, actual RTT measurements MUST be used to determine + future transmissions, as specified in [SCHEDULING]. + + The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are + measured by the client, to determine each circuit RTT to determine + primary vs secondary circuit use, and for packet scheduling. Similarly, + the exit measures the RTT times between RELAY_COMMAND_LINKED and + RELAY_COMMAND_LINKED_ACK, for the same purpose. + +2.3. Linking circuits to an onion service [LINKING_SERVICE] + + For onion services, we will only concern ourselves with linking + rendezvous circuits. + + To join rendezvous circuits, clients make two introduce requests to a + service's intropoint, causing it to create two rendezvous circuits, to + meet the client at two separate rendezvous points. These introduce + requests MUST be sent to the same intropoint (due to potential use of + onionbalance), and SHOULD be sent back-to-back on the same intro + circuit. They MAY be combined with Proposal 325. + + The first rendezvous circuit to get joined SHOULD use Proposal 325 + to append the RELAY_BEGIN command, and the service MUST answer + on this circuit, until RTT can be measured. + + Once both circuits are linked and RTT is measured, packet scheduling + should be used, as per [SCHEDULING]. + +2.4. Congestion Control Application [CONGESTION_CONTROL] + + The SENDMEs for congestion control are performed per-leg. As data arrives, + regardless of its ordering, it is counted towards SENDME delivery. In this + way, 'cwnd - package_window' of each leg always reflects the available + data to send on each leg. This is important for [SCHEDULING]. + + The Congestion control Stream XON/XOFF can be sent on either leg, and + applies to the stream's transmission on both legs. + +2.5. Sequencing [SEQUENCING] + + With multiple paths for data, the problem of data re-ordering appears. In + other words, cells can arrive out of order from the two circuits where cell + N + 1 arrives before the cell N. + + Handling this reordering operates after congestion control for each circuit + leg, but before relay cell command processing or stream data delivery. + + For the receiver to be able to reorder the receiving cells, a sequencing + scheme needs to be implemented. However, because Tor does not drop or + reorder packets inside of a circuit, this sequence number can be very + small. It only has to signal that a cell comes after those arriving + on another circuit. + + To achieve this, we add a small sequence number to the common relay + header for all relay cells on linked circuits. This sequence number + is meant to signal the number of cells sent on the *other* leg, so + that each endpoint knows how many cells are still in-flight on + another leg. It is different from the absolute sequence number used + in [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that + number, using relative arithmetic. + + Relay command [1 byte] + Recognized [2 bytes] + StreamID [2 bytes] + Digest [4 bytes] + Length [2 bytes] + > LongSeq [1 bit] # If this bit is set, use 31 bits for Seq + > Sequencing [7 or 31 bits] + Data [Remainder] + + The sequence number is only set for the first cell after the endpoint + switches legs. In this case, LongSeq is set to 1, and the Sequencing + field is 31 more bits. Otherwise it is a 1 byte 0 value. + + These fields MUST be present on ALL end-to-end relay cells on each leg + that come from the endpoint, following a RELAY_CIRCUIT_LINK command. + + They are absent on 'leaky pipe' RELAY_COMMAND_DROP and + RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, + as opposed to the endpoint, to support padding. + + When an endpoint switches legs, on the first cell in a new leg, + LongSeq is set to 1, and the following 31 bits represent the *total* + number of cells sent on the *other* leg, before the switch. The receiver + must wait for that number of cells to arrive from the previous leg + before delivering that cell. + + XXX: In the rare event that we send more than 2^31 cells (~1TB) on a + single leg, do we force a switch of legs, or expand the field further? + + An alternative method of sequencing, that assumes that the endpoint + knows when it is going to switch, the cell before it switches, is + specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires + only 1 byte for sequence number and switch signaling, but requires that + the sender know that it is planning to switch, the cell before it switches. + (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can switch based on + RTT change, so it may be one cell late in that case). + +2.6. Resumption [RESUMPTION] + + In the event that a circuit leg is destroyed, they MAY be resumed. + + Resumption is achieved by re-using the NONCE and method to the same endpoint + (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need not + use the same middle and guard relays, but should not share any relays + with any existing legs(s). + + To provide resumption, endpoints store an absolute 64bit cell counter of + the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT), + as well the last sequence number they have delivered in-order to edge + connections corresponding to a conflux pair (their LAST_SEQNO_RECV). + Additionally, endpoints MAY store the entire contents of unacked + inflight cells (ie the 'package_window' from proposal 324), for each + leg, along with information corresponding to those cells' absolute + sequence numbers. + + These 64 bit absolute counters can wrap without issue, as congestion + windows will never grow to 2^64 cells until well past the Singularity. + However, it is possible that extremely long, bulk circuits could + exceed 2^64 total sent or received cells, so endpoints SHOULD handle + wrapped sequence numbers for purposes of computing retransmit + information. (But even this case is unlikely to happen within the next + decade or so). + + Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used to + convey the sequence numbers of the last cell the relay sent and received on + that leg. The other endpoint can use these sequence numbers to determine if + it received the in-flight data or not, or sent more data since that point, + up to and including this absolute sequence number. If LAST_SEQNO_SENT + has not been received, the endpoint MAY transmit the missing data, if it + still has it buffered. + + Because both endpoints get information about the other side's absolute + SENT sequence number, they will know exactly how many re-transmitted + packets to expect, should the circuit stay open. Re-transmitters + should not re-increment their absolute sent fields while re-transmitting. + + If it does not have this missing data due to memory pressure, that endpoint + should destroy *both* legs, as this represents unrecoverable data loss. + + Otherwise, the new circuit can be re-joined, and its RTT can be compared + to the remaining circuit to determine if the new leg is primary or + secondary. + + It is even possible to resume conflux circuits where both legs have + been collapsed using this scheme, if endpoints continue to buffer their + unacked package_window data for some time after this close. However, + see [TRAFFIC_ANALYSIS] for more details on the full scope of this + issue. + + If endpoints are buffering package_window data, such data should be + given priority to be freed in any oomkiller invocation. See + [MEMORY_DOS] for more oomkiller information. + + +3. Traffic Scheduling [SCHEDULING] + + In order to load balance the traffic between the two circuits, the original + conflux paper used only RTT. However, with Proposal 324, we will have + accurate information on the instantaneous available bandwidth of each + circuit leg, as 'cwnd - package_window' (see Section 3 of Proposal 324). + + Some additional RTT optimizations are also useful, to improve + responsiveness and minimize out-of-order queue sizes. + + We specify two traffic schedulers from the multipath literature and adapt + them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has three + variants, with different trade offs. + + However, see the [TRAFFIC_ANALYSIS] sections of this proposal for important + details on how this selection can be changed, to reduce website traffic + fingerprinting. + +3.1. LowRTT Scheduling [LOWRTT_TOR] + + This scheduling algorithm is based on the original [CONFLUX] paper, with + ideas from [MPTCP]'s minRTT/LowRTT scheduler. + + In this algorithm, endpoints send cells on the circuit with lower RTT + (primary circuit). This continues while the congestion window on the + circuit has available room: ie whenever cwnd - package_window > 0. + + Whenever the primary circuit's congestion window becomes full, the + secondary circuit is used. We stop reading on the send window source + (edge connection) when both congestion windows become full. + + In this way, unlike original conflux, we switch to the secondary circuit + without causing congestion on the primary circuit. This improves both + load times, and overall throughput. + + This behavior matches minRTT from [MPTCP], sometimes called LowRTT. + + It may be better to stop reading on the edge connection when the primary + congestion window becomes full, rather than switch to the secondary + circuit as soon as the primary congestion window becomes full. (Ie: only + switch if the RTTs themselves change which circuit is primary). This is + what was done in the original Conflux paper. This behavior effectively + causes us to optimize for responsiveness and congestion avoidance, rather + than throughput. For evaluation, we should control this switching behavior + with a consensus parameter (see [CONSENSUS_PARAMETERS]). + + Because of potential side channel risk (see [SIDE_CHANNELS]), a third + variant of this algorithm, where the primary circuit is chosen during the + [LINKING_CIRCUITS] handshake and never changed, should also be possible + to control via consensus parameter. + +3.2. BLEST Scheduling [BLEST_TOR] + + [BLEST] attempts to predict the availability of the primary circuit, and + use this information to reorder transmitted data, to minimize head-of-line + blocking in the recipient (and thus minimize out-of-order queues there). + + BLEST_TOR uses the primary circuit until the congestion window is full. + Then, it uses the relative RTT times of the two circuits to calculate + how much data can be sent on the secondary circuit faster than if we + just waited for the primary circuit to become available. + + This is achieved by computing two variables at the sender: + + rtts = secondary.currRTT / primary.currRTT + primary_limit = primary.cwnd + (rtts-1)/2)*rtts + + Note: This (rtts-1)/2 factor represents anticipated congestion window + growth over this period.. it may be different for Tor, depending on CC alg. + + If primary_limit < secondary.cwnd - (secondary.package_window + 1), then + there is enough space on the secondary circuit to send data faster than we + could than waiting for the primary circuit. + + XXX: Note that BLEST uses total_send_window where we use secondary.cwnd in + this check. total_send_window is min(recv_win, CWND). But since Tor does + not use receive windows and intead uses stream XON/XOFF, we only use CWND. There + is some concern this may alter BLEST's buffer minimization properties, + but since receive window should only matter if the application is slower + than Tor, and XON/XOFF should cover that case, hopefully this is fine. + + Otherwise, if the primary_limit condition is not hit, cease reading + on source edge connections until SENDME acks come back. + + Here is the pseudocode for this: + + while source.has_data_to_send(): + if primary.cwnd > primary.package_window: + primary.send(source.get_packet()) + continue + + rtts = secondary.currRTT / primary.currRTT + primary_limit = (primary.cwnd + (rtts-1)/2)*rtts + + if primary_limit < secondary.cwnd - (secondary.package_window+1): + secondary.send(source.get_packet()) + else: + break # done for now, wait for an ACK to free up CWND space and restart + + Note that BLEST also has a parameter lambda that is updated whenever HoL + blocking occurs. Because it is expensive and takes significant time to + signal this over Tor, we omit this. + XXX: See [REORDER_SIGNALING] section if we want this lambda feedback. + +3.3. Reorder queue signaling [REORDER_SIGNALING] + + Reordering should be fairly simple task. By following using the sequence + number field in [SEQUENCING], endpoints can know how many cells are still + in flight on the other leg. + + To reorder them properly, a buffer of out of order cells needs to be kept. + On the Exit side, this can quickly become overwhelming considering ten of + thousands of possible circuits can be held open leading to gigabytes of + memory being used. There is a clear potential memory DoS vector which means + that a tor implementation should be able to limit the size of those queues. + + Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the + primary circuit will minimize or eliminate this out-of-order buffer. + + XXX: The remainder of this section may be over-complicating things... We + only need these concepts if we want to use BLEST's lambda feedback. + + The default for this queue size is governed by the 'cflx_reorder_client' + and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]). + 'cflx_reorder_srv' applies to Exits and onion services. Both parameters + can be overridden by Torrc, to larger or smaller than the consensus + parameter. (Low memory clients may want to lower it; SecureDrop onion + services or other high-upload services may want to raise it). + + When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down + the circuit leg that has data waiting in the queue and use of that leg must + cease, until it drains to half of this value, at which point an + RELAY_CONFLUX_XON is sent. Note that this is different than the stream + XON/XOFF from Proposal 324. + + XXX: [BLEST] actually does not cease use of a path in this case, but + instead uses this signal to adjust the lambda parameter, which biases + traffic away from that leg. + + +4. Security Considerations + +4.1. Memory Denial of Service [MEMORY_DOS] + + Both reorder queues and retransmit buffers inherently represent a memory + denial of service condition. + + For [RESUMPTION] retransmit buffers, endpoints that support this + feature SHOULD free retransmit information as soon as they get close + to memory pressure. This prevents resumption while data is in flight, + but will not otherwise harm operation. + + For reorder buffers, adversaries can potentially impact this at any + point, but most obviously and most severely from the client position. + + In particular, clients can lie about sequence numbers, sending cells + with sequence numbers such that the next expected sequence number + is never sent. They can do this repeatedly on many circuits, to exhaust + memory at exits. + + One option is to only allow actual traffic splitting in the downstream + direction, towards clients, and always use the primary circuit for + everything in the upstream direction. However, the ability to support + conflux from the client to the exit shows promise against traffic + analysis (see [WTF_SPLIT]). + + The other option is to use [BLEST_TOR] from clients to exits, as it has + predictable interleaved cell scheduling, and minimizes reorder queues + at exits. If the ratios prescribed by that algorithm are not followed + within some bounds, the other endpoint can close both circuits, and + free the queue memory. + + This still leaves the possibility that intermediate relays may block + a leg, allowing cells to traverse only one leg, thus still accumulating + at the reorder queue. Clients can also spoof sequence numbers similarly, + to make it appear that they are following [BLEST_TOR], without + actually sending any data on one of the legs. + + To handle either of these cases, when a relay is under memory pressure, + the circuit OOM killer SHOULD free and close circuits with the oldest + reorder queue data, first. This heuristic was shown to be best during + the [SNIPER] attack OOM killer iteration cycle. + +4.2. Side Channels [SIDE_CHANNELS] + + Two potential side channels may be introduced by the use of Conflux: + 1. RTT leg-use bias by altering SENDME latency + 2. Location info leaks through the use of both leg's latencies + + For RTT and leg-use bias, Guard relays could delay legs to introduce + a pattern into the delivery of cells at the exit relay, by varying the + latency of SENDME cells (every 100th cell) to change the distribution + of traffic to send information. This attack could be performed in either + direction of traffic, to bias traffic load off of a particular Guard. + If an adversary controls both Guards, it could in theory send a binary + signal more easily, by alternating delays on each. + + However, this risk weighs against the potential benefits against traffic + fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring + cryptographic tagging attacks, this side channel provides significantly + lower information over time than inter-packet-delay based side channels + that are already available to Guards and routers along the path to the + Guard. + + Tor currently provides no defenses against already existing + single-circuit delay-based side channels, though both circuit padding + and [BACKLIT] are potential options it could conceivably deploy. The + [BACKLIT] paper also has an excellent review of the various methods + that have been studied for such single circuit side channels, and + the [BACKLIT] style RTT monitoring could be used to protect against + these conflux side channels as well. Circuit padding can also help + to obscure which cells are SENDMEs, since circuit padding is not + counted towards SENDME totals. + + The second class of side channel is where the Exit relay may be able + to use the two legs to further infer more information about client + location. See [LATENCY_LEAK] for more details. It is unclear at + this time how much more severe this is for two paths than just one. + + We should preserve the ability to disable conflux to and from Exit + relays, should these side channels prove more severe, or should + it prove possible to mitigate single-circuit side channels, but + not conflux side channels. + + In all cases, all of these side channels appear less severe for onion + service traffic, due to the higher path variability due to relay + selection, as well as the end-to-end nature of conflux in that case. + This indicates that our ability to enable/disable conflux for services + should be separate from Exits. + +4.3. Traffic analysis [TRAFFIC_ANALYSIS] + + Even though conflux shows benefits against traffic analysis in [WTF_SPLIT], + these gains may be moot if the adversary is able to perform packet counting + and timing analysis at guards to guess which specific circuits are linked. + In particular, the 3 way handshake in [LINKING_CIRCUITS] may be quite + noticeable. + + As one countermeasure, it may be possible to eliminate the third leg + (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via measuring + the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, but this + will introduce cross-component complexity into Tor's protocol that + could quickly become unwieldy and fragile. + + Additionally, the conflux handshake may make onion services stand out + more, regardless of the number of stages in the handshake. For this + reason, it may be more wise to simply address these issues with circuit + padding machines during circuit setup (see padding-spec.txt). + + Additional traffic analysis considerations arise when combining conflux + with padding, for purposes of mitigating traffic fingerprinting. For this, + it seems wise to treat the packet schedulers as another piece of a combined + optimization problem in tandem with optimizing padding machines, perhaps + introducing randomness or fudge factors their scheduling, as a parameterized + distribution. For details, see + https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDev… + + Finally, conflux may exacerbate forms of confirmation-based traffic + analysis that close circuits to determine concretely if they were in use, + since closing either leg might cause resumption to fail. TCP RST + injection can perform this attack on the side, without surveillance + capability. [RESUMPTION] with buffering of the inflight unacked + package_window data, for retransmit, is a partial mitigation, if + endpoints buffer this data for retransmission for a brief time even + if both legs close. This seems more feasible for onion services, + which are more vulnerable to this attack. However, if the adversary + controls the client, they will notice the resumption re-link, and + still obtain confirmation that way. + + It seems the only way to fully mitigate these kinds of attacks is with + the Snowflake pluggable transport, which provides its own resumption + and retransmit behavior. Additionally, Snowflake's use of UDP DTLS also + protects against TCP RST injection, which we suspect to be the main + vector for such attacks. + + In the future, a DTLS or QUIC transport for Tor such as masque could + provide similar RST injection resistance, and resumption at + Guard/Bridge nodes, as well. + + +5. System Interactions + + - congestion control + - EWMA and KIST + - CBT and number of guards + - Onion service circ obfuscation + - Future UDP (it may increase the need for UDP to buffer before dropping) + - Padding (no sequence numbers on padding cells, as per [SEQUENCING]) + - Also, any padding machines may need re-tuning + - No 'cannibalization' of linked circuits + + +6. Consensus and Torrc Parameters [CONSENSUS] + + - conflux_circs + - Number of conflux circuits + + - conflux_sched_exits, conflux_sched_clients, conflux_sched_service + - Three forms of LOWRTT_TOR, and BLEST_TOR + + - ConfluxOnionService + - ConfluxOnionCircs + + +7. Tuning Experiments [EXPERIMENTS] + + - conflux_sched & conflux_exits + - Exit reorder queue size + - Responsiveness vs throughput tradeoff? + - Congestion control + - EWMA and KIST + - num guards & conflux_circs + + +Appended A [ALTERNATIVES] + +A.1 BEGIN/END sequencing [ALTERNATIVE_SEQUENCING] + + In this method of signaling, we increment the sequence number by 1 + only when we switch legs, and use BEGIN/END "bookends" to know that + all data on a leg has been received. + + To achieve this, we add a small sequence number to the common relay + header for all relay cells on linked circuits, as well as a field to + signal the beginning of a sequence, intermediate data, and the end + of a sequence. + + Relay command [1 byte] + Recognized [2 bytes] + StreamID [2 bytes] + Digest [4 bytes] + Length [2 bytes] + > Switching [2 bits] # 01 = BEGIN, 00 = CONTINUE, 10 = END + > Sequencing [6 bits] + Data [PAYLOAD_LEN - 12 - Length bytes] + + These fields MUST be present on ALL end-to-end relay cells on each leg + that come from the endpoint, following a RELAY_CIRCUIT_LINK command. + + They are absent on 'leaky pipe' RELAY_COMMAND_DROP and + RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, + as opposed to the endpoint, to support padding. + + Sequence numbers are incremented by one when an endpoint switches legs + to transmit a cell. This number will wrap; implementations should treat + 0 as the next sequence after 2^6-1. Because we do not expect to support + significantly more than 2 legs, and much fewer than 63, this is not an + issue. + + The first cell on a new circuit MUST use the BEGIN code for switching. + Cells are delivered from that circuit until an END switching signal is + received, even if cells arrive first on another circuit with the next + sequence number before and END switching field. Recipients MUST only + deliver cells with a BEGIN, if their Sequencing number is one more than + the last END. + +A.2 Alternative Link Handshake [ALTERNATIVE_LINKING] + + The circuit linking in [LINKING_CIRCUITS] could be done as encrypted + ntor onionskin extension fields, similar to those used by v3 onions. + + This approach has at least four problems: + i). For onion services, since the onionskins traverse the intro circuit + and then return on the rend circuit, this handshake cannot measure + RTT there. + ii). Since these onionskins are larger, and have no PFS, an adversary + at the middle relay knows that the onionskin is for linking, and + can potentially try to obtain the onionskin key for attacks on + the link. + iii). It makes linking circuits more fragile, since they could timeout + due to CBT, or other issues during construction. + iv). The overhead in processing this onionskin through onionskin queues + adds additional time for linking, even in the Exit case, making + that RTT potentially noisy. + + Additionally, it is not clear that this approach actually saves us + anything in terms of setup time, because we can optimize away the + linking phase using Proposal 325, to combine initial RELAY_BEGIN cells + with RELAY_CIRCUIT_LINK. + +A.3. Alternative RTT measurement [ALTERNATIVE_RTT] + + Instead of measuring RTTs during [LINKING_CIRCUITS], we could create + PING/PONG cells, whose sole purpose is to allow endpoints to measure + RTT. + + This was rejected for several reasons. First, during circuit use, we + already have SENDMEs to measure RTT. Every 100 cells (or + 'circwindow_inc' from Proposal 324), we are able to re-measure RTT based + on the time between that Nth cell and the SENDME ack. So we only need + PING/PONG to measure initial circuit RTT. + + If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above, + we might be able to specify a PING/PONG/PING handshake solely for + measuring initial RTT, especially for onion service circuits. + + The reason for not making a dedicated PING/PONG for this purpose is that + it is context-free. Even if we were able to use onionskins for linking + and resumption, to avoid additional data in handshake that just measures + RTT, we would have to enforce that this PING/PONG/PING only follows the + exact form needed by this proposal, at the expected time, and at no + other points. + + If we do not enforce this specific use of PING/PONG/PING, it becomes + another potential side channel, for use in attacks such as [DROPMARK]. + + In general, Tor is planning to remove current forms of context-free and + semantic-free cells from its protocol: + https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 + + We should not add more. + + +Appendix B: Acknowledgments + + Thanks to Per Hurtig for helping us with the framing of the MPTCP + problem space. + + Thanks to Simone Ferlin for clarifications on the [BLEST] + paper, and for pointing us at the Linux kernel implementation. + + Extreme thanks goes again to Toke Høiland-Jørgensen, who helped + immensely towards our understanding of how the BLEST condition relates + to edge connection pushback, and for clearing up many other + misconceptions we had. + + Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian + Goldberg, for the original [CONFLUX] paper! + + +References: + +[CONFLUX] + https://freehaven.net/anonbib/papers/pets2013/paper_65.pdf + +[BLEST] + https://olivier.mehani.name/publications/2016ferlin_blest_blocking_estimati… + https://opus.lib.uts.edu.au/bitstream/10453/140571/2/08636963.pdf + https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.95/net/mptcp/mptcp_ble… + +[WTF_SPLIT] + https://www.comsys.rwth-aachen.de/fileadmin/papers/2020/2020-delacadena-tra… + +[COUPLED] + https://datatracker.ietf.org/doc/html/rfc6356 + https://www.researchgate.net/profile/Xiaoming_Fu2/publication/230888515_Del… + http://staff.ustc.edu.cn/~kpxue/paper/ToN-wwj-2020.04.pdf + https://www.thinkmind.org/articles/icn_2019_2_10_30024.pdf + https://arxiv.org/pdf/1308.3119.pdf + +[BACKLIT] + https://www.freehaven.net/anonbib/cache/acsac11-backlit.pdf + +[LATENCY_LEAK] + https://www.freehaven.net/anonbib/cache/ccs07-latency-leak.pdf + https://www.robgjansen.com/publications/howlow-pets2013.pdf + +[SNIPER] + https://www.freehaven.net/anonbib/cache/sniper14.pdf + +[DROPMARK] + https://www.freehaven.net/anonbib/cache/sniper14.pdf

1 0

[torspec/master] Fix dropmark citation
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit d30a5a2fff4451374954fce7dcc94444f57013fc Author: Mike Perry <mikeperry-git(a)torproject.org> Date: Fri Mar 26 15:20:32 2021 +0000 Fix dropmark citation --- proposals/329-traffic-splitting.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt index 8d73a8c..1412f5d 100644 --- a/proposals/329-traffic-splitting.txt +++ b/proposals/329-traffic-splitting.txt @@ -851,4 +851,4 @@ References: https://www.freehaven.net/anonbib/cache/sniper14.pdf [DROPMARK] - https://www.freehaven.net/anonbib/cache/sniper14.pdf + https://www.petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf

1 0

[torspec/master] Reformat Prop329.
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456 Author: Mike Perry <mikeperry-git(a)torproject.org> Date: Fri Mar 26 14:49:55 2021 +0000 Reformat Prop329. --- proposals/329-traffic-splitting.txt | 1407 ++++++++++++++++++----------------- 1 file changed, 709 insertions(+), 698 deletions(-) diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt index 746c6c4..8d73a8c 100644 --- a/proposals/329-traffic-splitting.txt +++ b/proposals/329-traffic-splitting.txt @@ -6,657 +6,668 @@ Status: Draft 0. Status - This proposal describes the Conflux [CONFLUX] system developed by Mashael - AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at improving - Tor client network performance by dynamically splitting traffic between two - circuits. + This proposal describes the Conflux [CONFLUX] system developed by + Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at + improving Tor client network performance by dynamically splitting + traffic between two circuits. 1. Overview 1.1. Multipath TCP Design Space - In order to understand our improvements to Conflux, it is important to - properly conceptualize what is involved in the design of multipath - algorithms in general. - - The design space is broken into two orthogonal parts: congestion - control algorithms that apply to each path, and traffic scheduling - algorithms that decide when to send packets to send on each path. - - MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled - congestion control updates single-path congestion control algorithms to - account for shared bottlenecks between the paths, so that the combined - congestion control algorithms do not overwhelm any bottlenecks that - happen to be shared between the multiple paths. Various ways of - accomplishing this have been proposed and implemented in the Linux kernel. - - Because Tor's congestion control only concerns itself with bottnecks in Tor - relay queues, and not with any other bottlenecks (such as intermediate - Internet routers), we can avoid this complexity merely by specifying that - any paths that are constructed should not share any relays. In this way, we - can proceed to use the exact same congestion control as specified in Proposal - 324, for each path. - - For this reason, this proposal will focus on the traffic scheduling - algorithms, rather than coupling. We propose three candidate algorithms - that have been studied in the literature, and will compare their - performance using simulation and consensus parameters. + In order to understand our improvements to Conflux, it is important to + properly conceptualize what is involved in the design of multipath + algorithms in general. + + The design space is broken into two orthogonal parts: congestion control + algorithms that apply to each path, and traffic scheduling algorithms + that decide when to send packets to send on each path. + + MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled + congestion control updates single-path congestion control algorithms to + account for shared bottlenecks between the paths, so that the combined + congestion control algorithms do not overwhelm any bottlenecks that + happen to be shared between the multiple paths. Various ways of + accomplishing this have been proposed and implemented in the Linux + kernel. + + Because Tor's congestion control only concerns itself with bottnecks in + Tor relay queues, and not with any other bottlenecks (such as + intermediate Internet routers), we can avoid this complexity merely by + specifying that any paths that are constructed should not share any + relays. In this way, we can proceed to use the exact same congestion + control as specified in Proposal 324, for each path. + + For this reason, this proposal will focus on the traffic scheduling + algorithms, rather than coupling. We propose three candidate algorithms + that have been studied in the literature, and will compare their + performance using simulation and consensus parameters. 1.2. Divergence from the initial Conflux design - The initial [CONFLUX] paper doesn't provide any indications on how to handle - the size of out-of-order cell queue, which we consider a potential dangerous - memory DoS vector (see [MEMORY_DOS]). It also used RTT as the sole heuristic - for selecting which circuit to send on, which may vary depending on the - geographical locations of the participant relays, without considering their - actual available circuit capacity (which will be available to us via Proposal - 324). Additionally, since the publication of [CONFLUX], more modern - packet scheduling algorithms have been developed, which aim to reduce - out-of-order queue size. - - We propose mitigations for these issues using modern scheduling algorithms, - as well as implementations options for avoiding the out-of-order queue at - Exit relays. Additionally, we consider resumption, side channel, and traffic - analysis risks and benefits in [RESUMPTION], [SIDE_CHANNELS] and - [TRAFFIC_ANALYSIS]. + The initial [CONFLUX] paper doesn't provide any indications on how to + handle the size of out-of-order cell queue, which we consider a + potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used + RTT as the sole heuristic for selecting which circuit to send on, which + may vary depending on the geographical locations of the participant + relays, without considering their actual available circuit capacity + (which will be available to us via Proposal 324). Additionally, since + the publication of [CONFLUX], more modern packet scheduling algorithms + have been developed, which aim to reduce out-of-order queue size. + + We propose mitigations for these issues using modern scheduling + algorithms, as well as implementations options for avoiding the + out-of-order queue at Exit relays. Additionally, we consider resumption, + side channel, and traffic analysis risks and benefits in [RESUMPTION], + [SIDE_CHANNELS] and [TRAFFIC_ANALYSIS]. 2. Design - The following section describes the Conflux design. Each sub-section is a - building block to the multipath design that Conflux proposes. - - The circuit construction is as follow: - - Primary Circuit (lower RTT) - +-------+ +--------+ - |Guard 1|----->|Middle 1|----------+ - +---^---+ +--------+ | - +-----+ | +--v---+ - | OP +------+ | Exit |--> ... - +-----+ | +--^---+ - +---v---+ +--------+ | - |Guard 2|----->|Middle 2|----------+ - +-------+ +--------+ - Secondary Circuit (higher RTT) - - Both circuits are built using current Tor path selection, however they - SHOULD NOT share the same Guard relay, or middle relay. By avoiding - using the same relays in these positions in the path, we ensure - additional path capacity, and eliminate the need to use more complicated - 'coupled' congestion control algorithms from the MPTCP literature[COUPLED]. - This both simplifies design, and improves performance. - - Then, the OP needs to link the two circuits together, as described in - [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE]. - - For ease of explanation, the primary circuit is the circuit with lower RTT, - and the secondary circuit is the circuit with higher RTT. Initial RTT - is measured during circuit linking, as described in [LINKING_CIRCUITS]. - RTT is continually measured using SENDME timing, as in Proposal 324. - This means that during use, the primary circuit and secondary circuit may - switch roles, depending on unrelated network congestion caused by other - Tor clients. - - We also support linking onion service circuits together. In this case, - only two rendezvous circuits are linked. Each of these RP circuits will be - constructed separately, and then linked. However, the same path constraints - apply to each half of the circuits (no shared relays between the legs). - Should, by chance, the service and the client sides end up sharing some - relays, this is not catastrophic. Multipath TCP researchers we have - consulted believe Tor's congestion control from Proposal 324 to be - sufficient in this rare case. - - Only two circuits SHOULD be linked together. However, implementations - SHOULD make it easy for researchers to *test* more than two paths, as this - has been shown to assist in traffic analysis resistance[WTF_SPLIT]. At - minimum, this means not hardcoding only two circuits in the implementation. - - If the number of circuits exceeds the current number of guard relays, - guard relays MAY be re-used, but implementations SHOULD use the same - number of Guards as paths. - - Linked circuits MUST NOT be extended further once linked (ie: - 'cannibalization' is not supported). + The following section describes the Conflux design. Each sub-section is + a building block to the multipath design that Conflux proposes. + + The circuit construction is as follow: + + Primary Circuit (lower RTT) + +-------+ +--------+ + |Guard 1|----->|Middle 1|----------+ + +---^---+ +--------+ | + +-----+ | +--v---+ + | OP +------+ | Exit |--> ... + +-----+ | +--^---+ + +---v---+ +--------+ | + |Guard 2|----->|Middle 2|----------+ + +-------+ +--------+ + Secondary Circuit (higher RTT) + + Both circuits are built using current Tor path selection, however they + SHOULD NOT share the same Guard relay, or middle relay. By avoiding + using the same relays in these positions in the path, we ensure + additional path capacity, and eliminate the need to use more complicated + 'coupled' congestion control algorithms from the MPTCP + literature[COUPLED]. This both simplifies design, and improves + performance. + + Then, the OP needs to link the two circuits together, as described in + [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE]. + + For ease of explanation, the primary circuit is the circuit with lower + RTT, and the secondary circuit is the circuit with higher RTT. Initial + RTT is measured during circuit linking, as described in + [LINKING_CIRCUITS]. RTT is continually measured using SENDME timing, as + in Proposal 324. This means that during use, the primary circuit and + secondary circuit may switch roles, depending on unrelated network + congestion caused by other Tor clients. + + We also support linking onion service circuits together. In this case, + only two rendezvous circuits are linked. Each of these RP circuits will + be constructed separately, and then linked. However, the same path + constraints apply to each half of the circuits (no shared relays between + the legs). Should, by chance, the service and the client sides end up + sharing some relays, this is not catastrophic. Multipath TCP researchers + we have consulted believe Tor's congestion control from Proposal 324 to + be sufficient in this rare case. + + Only two circuits SHOULD be linked together. However, implementations + SHOULD make it easy for researchers to *test* more than two paths, as + this has been shown to assist in traffic analysis resistance[WTF_SPLIT]. + At minimum, this means not hardcoding only two circuits in the + implementation. + + If the number of circuits exceeds the current number of guard relays, + guard relays MAY be re-used, but implementations SHOULD use the same + number of Guards as paths. + + Linked circuits MUST NOT be extended further once linked (ie: + 'cannibalization' is not supported). 2.1. Advertising support for conflux - We propose a new protocol version in order to advertise support for - circuit linking on the relay side: - - "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell - header used for multipath circuit which are linked with the - new RELAY_CIRCUIT_LINK relay cell command. + We propose a new protocol version in order to advertise support for + circuit linking on the relay side: + + "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell + header used for multipath circuit which are linked with the + new RELAY_CIRCUIT_LINK relay cell command. + + XXX: Advertise this in onion service descriptor. + XXX: Onion service descriptor can advertise more than two circuits? - XXX: Advertise this in onion service descriptor. - XXX: Onion service descriptor can advertise more than two circuits? - - The next section describes how the circuits are linked together. + The next section describes how the circuits are linked together. 2.2. Linking circuits [LINKING_CIRCUITS] - To link circuits, we propose new relay commands that are sent on both - circuits, as well as a response to confirm the join, and an ack of this - response. These commands create a 3way handshake, which allows each - endpoint to measure the initial RTT of each leg upon link, without - needing to wait for any data. - - All three stages of this handshake are sent on *each* circuit leg to be - linked. - - To save round trips, these cells SHOULD be combined with the initial - RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See - [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each case. - - There are other ways to do this linking that we have considered, but they - seem not to be significantly better than this method, especially since we - can use Proposal 325 to eliminate the RTT cost of this setup before sending - data. For those other ideas, see [ALTERNATIVE_LINKING] and - [ALTERNATIVE_RTT], in the appendix. - - The first two parts of the handshake establish the link, and enable - resumption: - - 16 -- RELAY_CIRCUIT_LINK - - Sent from the OP to the exit/service in order to link - circuits together at the end point. - - 17 -- RELAY_CIRCUIT_LINKED - - Sent from the exit/service to the OP, to confirm the circuits - were linked. - - These cells have the following contents: - - VERSION [1 byte] - PAYLOAD [variable, up to end of relay payload] - - The VERSION tells us which circuit linking mechanism to use. At this point - in time, only 0x01 is recognized and is the one described by the Conflux - design. - - For version 0x01, the PAYLOAD contains: - - NONCE [32 bytes] - LAST_SEQNO_SENT [8 bytes] - LAST_SEQNO_RECV [8 bytes] - - XXX: Should we let endpoints specify their preferred [SCHEDULING] alg - here, to override consensus params? This has benefits: eg low-memory - mobile clients can ask for an alg that is better for their reorder - queues. But it also has complexity risk, if the other endpoint does - not want to support it, because of its own memory issues. - - The NONCE contains a random 256-bit secret, used to associate the two - circuits together. The nonce must not be shared outside of the circuit - transmission, or data may be injected into TCP streams. This means it - MUST NOT be logged to disk. - - The two sequence number fields are 0 upon initial link, but non-zero in the - case of a resumption attempt (See [RESUMPTION]). - - If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both - circuits MUST be closed. - - The third stage of the handshake exists to help the exit/service measure - initial RTT, for use in [SCHEDULING]: - - 18 -- RELAY_CIRCUIT_LINKED_RTT_ACK - - Sent from the OP to the exit/service, to provide initial RTT - measurement for the exit/service. - - For timeout of the handshake, clients should use the normal SOCKS/stream - timeout already in use for RELAY_BEGIN. - - These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED, - and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each - endpoint to measure the initial RTT of each leg. + To link circuits, we propose new relay commands that are sent on both + circuits, as well as a response to confirm the join, and an ack of this + response. These commands create a 3way handshake, which allows each + endpoint to measure the initial RTT of each leg upon link, without + needing to wait for any data. + + All three stages of this handshake are sent on *each* circuit leg to be + linked. + + To save round trips, these cells SHOULD be combined with the initial + RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See + [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each + case. + + There are other ways to do this linking that we have considered, but + they seem not to be significantly better than this method, especially + since we can use Proposal 325 to eliminate the RTT cost of this setup + before sending data. For those other ideas, see [ALTERNATIVE_LINKING] + and [ALTERNATIVE_RTT], in the appendix. + + The first two parts of the handshake establish the link, and enable + resumption: + + 16 -- RELAY_CIRCUIT_LINK + + Sent from the OP to the exit/service in order to link + circuits together at the end point. + + 17 -- RELAY_CIRCUIT_LINKED + + Sent from the exit/service to the OP, to confirm the circuits + were linked. + + These cells have the following contents: + + VERSION [1 byte] + PAYLOAD [variable, up to end of relay payload] + + The VERSION tells us which circuit linking mechanism to use. At this + point in time, only 0x01 is recognized and is the one described by the + Conflux design. + + For version 0x01, the PAYLOAD contains: + + NONCE [32 bytes] + LAST_SEQNO_SENT [8 bytes] + LAST_SEQNO_RECV [8 bytes] + + XXX: Should we let endpoints specify their preferred [SCHEDULING] alg + here, to override consensus params? This has benefits: eg low-memory + mobile clients can ask for an alg that is better for their reorder + queues. But it also has complexity risk, if the other endpoint does not + want to support it, because of its own memory issues. + + The NONCE contains a random 256-bit secret, used to associate the two + circuits together. The nonce must not be shared outside of the circuit + transmission, or data may be injected into TCP streams. This means it + MUST NOT be logged to disk. + + The two sequence number fields are 0 upon initial link, but non-zero in + the case of a resumption attempt (See [RESUMPTION]). + + If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both + circuits MUST be closed. + + The third stage of the handshake exists to help the exit/service measure + initial RTT, for use in [SCHEDULING]: + + 18 -- RELAY_CIRCUIT_LINKED_RTT_ACK + + Sent from the OP to the exit/service, to provide initial RTT + measurement for the exit/service. + + For timeout of the handshake, clients should use the normal SOCKS/stream + timeout already in use for RELAY_BEGIN. + + These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED, + and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each + endpoint to measure the initial RTT of each leg. 2.2. Linking Circuits from OP to Exit [LINKING_EXIT] - To link exit circuits, two circuits to the same exit are built. The - client records the circuit build time of each. - - If the circuits are being built on-demand, for immediate use, the - circuit with the lower build time SHOULD use Proposal 325 to append its - first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the - lower circuit build time. The exit MUST respond on this same leg. - After that, actual RTT measurements MUST be used to determine - future transmissions, as specified in [SCHEDULING]. - - The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are - measured by the client, to determine each circuit RTT to determine - primary vs secondary circuit use, and for packet scheduling. Similarly, - the exit measures the RTT times between RELAY_COMMAND_LINKED and - RELAY_COMMAND_LINKED_ACK, for the same purpose. - + To link exit circuits, two circuits to the same exit are built. The + client records the circuit build time of each. + + If the circuits are being built on-demand, for immediate use, the + circuit with the lower build time SHOULD use Proposal 325 to append its + first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the + lower circuit build time. The exit MUST respond on this same leg. After + that, actual RTT measurements MUST be used to determine future + transmissions, as specified in [SCHEDULING]. + + The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are + measured by the client, to determine each circuit RTT to determine + primary vs secondary circuit use, and for packet scheduling. Similarly, + the exit measures the RTT times between RELAY_COMMAND_LINKED and + RELAY_COMMAND_LINKED_ACK, for the same purpose. + 2.3. Linking circuits to an onion service [LINKING_SERVICE] - - For onion services, we will only concern ourselves with linking - rendezvous circuits. - - To join rendezvous circuits, clients make two introduce requests to a - service's intropoint, causing it to create two rendezvous circuits, to - meet the client at two separate rendezvous points. These introduce - requests MUST be sent to the same intropoint (due to potential use of - onionbalance), and SHOULD be sent back-to-back on the same intro - circuit. They MAY be combined with Proposal 325. - - The first rendezvous circuit to get joined SHOULD use Proposal 325 - to append the RELAY_BEGIN command, and the service MUST answer - on this circuit, until RTT can be measured. - - Once both circuits are linked and RTT is measured, packet scheduling - should be used, as per [SCHEDULING]. - + + For onion services, we will only concern ourselves with linking + rendezvous circuits. + + To join rendezvous circuits, clients make two introduce requests to a + service's intropoint, causing it to create two rendezvous circuits, to + meet the client at two separate rendezvous points. These introduce + requests MUST be sent to the same intropoint (due to potential use of + onionbalance), and SHOULD be sent back-to-back on the same intro + circuit. They MAY be combined with Proposal 325. + + The first rendezvous circuit to get joined SHOULD use Proposal 325 to + append the RELAY_BEGIN command, and the service MUST answer on this + circuit, until RTT can be measured. + + Once both circuits are linked and RTT is measured, packet scheduling + should be used, as per [SCHEDULING]. + 2.4. Congestion Control Application [CONGESTION_CONTROL] - - The SENDMEs for congestion control are performed per-leg. As data arrives, - regardless of its ordering, it is counted towards SENDME delivery. In this - way, 'cwnd - package_window' of each leg always reflects the available - data to send on each leg. This is important for [SCHEDULING]. - - The Congestion control Stream XON/XOFF can be sent on either leg, and - applies to the stream's transmission on both legs. - + + The SENDMEs for congestion control are performed per-leg. As data + arrives, regardless of its ordering, it is counted towards SENDME + delivery. In this way, 'cwnd - package_window' of each leg always + reflects the available data to send on each leg. This is important for + [SCHEDULING]. + + The Congestion control Stream XON/XOFF can be sent on either leg, and + applies to the stream's transmission on both legs. + 2.5. Sequencing [SEQUENCING] - - With multiple paths for data, the problem of data re-ordering appears. In - other words, cells can arrive out of order from the two circuits where cell - N + 1 arrives before the cell N. - - Handling this reordering operates after congestion control for each circuit - leg, but before relay cell command processing or stream data delivery. - - For the receiver to be able to reorder the receiving cells, a sequencing - scheme needs to be implemented. However, because Tor does not drop or - reorder packets inside of a circuit, this sequence number can be very - small. It only has to signal that a cell comes after those arriving - on another circuit. - - To achieve this, we add a small sequence number to the common relay - header for all relay cells on linked circuits. This sequence number - is meant to signal the number of cells sent on the *other* leg, so - that each endpoint knows how many cells are still in-flight on - another leg. It is different from the absolute sequence number used - in [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that - number, using relative arithmetic. - - Relay command [1 byte] - Recognized [2 bytes] - StreamID [2 bytes] - Digest [4 bytes] - Length [2 bytes] - > LongSeq [1 bit] # If this bit is set, use 31 bits for Seq - > Sequencing [7 or 31 bits] - Data [Remainder] - - The sequence number is only set for the first cell after the endpoint - switches legs. In this case, LongSeq is set to 1, and the Sequencing - field is 31 more bits. Otherwise it is a 1 byte 0 value. - - These fields MUST be present on ALL end-to-end relay cells on each leg - that come from the endpoint, following a RELAY_CIRCUIT_LINK command. - - They are absent on 'leaky pipe' RELAY_COMMAND_DROP and - RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, - as opposed to the endpoint, to support padding. - - When an endpoint switches legs, on the first cell in a new leg, - LongSeq is set to 1, and the following 31 bits represent the *total* - number of cells sent on the *other* leg, before the switch. The receiver - must wait for that number of cells to arrive from the previous leg - before delivering that cell. - - XXX: In the rare event that we send more than 2^31 cells (~1TB) on a - single leg, do we force a switch of legs, or expand the field further? - - An alternative method of sequencing, that assumes that the endpoint - knows when it is going to switch, the cell before it switches, is - specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires - only 1 byte for sequence number and switch signaling, but requires that - the sender know that it is planning to switch, the cell before it switches. - (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can switch based on - RTT change, so it may be one cell late in that case). + + With multiple paths for data, the problem of data re-ordering appears. + In other words, cells can arrive out of order from the two circuits + where cell N + 1 arrives before the cell N. + + Handling this reordering operates after congestion control for each + circuit leg, but before relay cell command processing or stream data + delivery. + + For the receiver to be able to reorder the receiving cells, a sequencing + scheme needs to be implemented. However, because Tor does not drop or + reorder packets inside of a circuit, this sequence number can be very + small. It only has to signal that a cell comes after those arriving on + another circuit. + + To achieve this, we add a small sequence number to the common relay + header for all relay cells on linked circuits. This sequence number is + meant to signal the number of cells sent on the *other* leg, so that + each endpoint knows how many cells are still in-flight on another leg. + It is different from the absolute sequence number used in + [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that + number, using relative arithmetic. + + Relay command [1 byte] + Recognized [2 bytes] + StreamID [2 bytes] + Digest [4 bytes] + Length [2 bytes] + > LongSeq [1 bit] # If this bit is set, use 31 bits for Seq + > Sequencing [7 or 31 bits] + Data [Remainder] + + The sequence number is only set for the first cell after the endpoint + switches legs. In this case, LongSeq is set to 1, and the Sequencing + field is 31 more bits. Otherwise it is a 1 byte 0 value. + + These fields MUST be present on ALL end-to-end relay cells on each leg + that come from the endpoint, following a RELAY_CIRCUIT_LINK command. + + They are absent on 'leaky pipe' RELAY_COMMAND_DROP and + RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as + opposed to the endpoint, to support padding. + + When an endpoint switches legs, on the first cell in a new leg, LongSeq + is set to 1, and the following 31 bits represent the *total* number of + cells sent on the *other* leg, before the switch. The receiver must wait + for that number of cells to arrive from the previous leg before + delivering that cell. + + XXX: In the rare event that we send more than 2^31 cells (~1TB) on a + single leg, do we force a switch of legs, or expand the field further? + + An alternative method of sequencing, that assumes that the endpoint + knows when it is going to switch, the cell before it switches, is + specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires + only 1 byte for sequence number and switch signaling, but requires that + the sender know that it is planning to switch, the cell before it + switches. (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can + switch based on RTT change, so it may be one cell late in that case). 2.6. Resumption [RESUMPTION] - In the event that a circuit leg is destroyed, they MAY be resumed. - - Resumption is achieved by re-using the NONCE and method to the same endpoint - (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need not - use the same middle and guard relays, but should not share any relays - with any existing legs(s). - - To provide resumption, endpoints store an absolute 64bit cell counter of - the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT), - as well the last sequence number they have delivered in-order to edge - connections corresponding to a conflux pair (their LAST_SEQNO_RECV). - Additionally, endpoints MAY store the entire contents of unacked - inflight cells (ie the 'package_window' from proposal 324), for each - leg, along with information corresponding to those cells' absolute - sequence numbers. - - These 64 bit absolute counters can wrap without issue, as congestion - windows will never grow to 2^64 cells until well past the Singularity. - However, it is possible that extremely long, bulk circuits could - exceed 2^64 total sent or received cells, so endpoints SHOULD handle - wrapped sequence numbers for purposes of computing retransmit - information. (But even this case is unlikely to happen within the next - decade or so). - - Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used to - convey the sequence numbers of the last cell the relay sent and received on - that leg. The other endpoint can use these sequence numbers to determine if - it received the in-flight data or not, or sent more data since that point, - up to and including this absolute sequence number. If LAST_SEQNO_SENT - has not been received, the endpoint MAY transmit the missing data, if it - still has it buffered. - - Because both endpoints get information about the other side's absolute - SENT sequence number, they will know exactly how many re-transmitted - packets to expect, should the circuit stay open. Re-transmitters - should not re-increment their absolute sent fields while re-transmitting. - - If it does not have this missing data due to memory pressure, that endpoint - should destroy *both* legs, as this represents unrecoverable data loss. - - Otherwise, the new circuit can be re-joined, and its RTT can be compared - to the remaining circuit to determine if the new leg is primary or - secondary. - - It is even possible to resume conflux circuits where both legs have - been collapsed using this scheme, if endpoints continue to buffer their - unacked package_window data for some time after this close. However, - see [TRAFFIC_ANALYSIS] for more details on the full scope of this - issue. - - If endpoints are buffering package_window data, such data should be - given priority to be freed in any oomkiller invocation. See - [MEMORY_DOS] for more oomkiller information. + In the event that a circuit leg is destroyed, they MAY be resumed. + + Resumption is achieved by re-using the NONCE and method to the same + endpoint (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path + need not use the same middle and guard relays, but should not share any + relays with any existing legs(s). + + To provide resumption, endpoints store an absolute 64bit cell counter of + the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT), + as well the last sequence number they have delivered in-order to edge + connections corresponding to a conflux pair (their LAST_SEQNO_RECV). + Additionally, endpoints MAY store the entire contents of unacked + inflight cells (ie the 'package_window' from proposal 324), for each + leg, along with information corresponding to those cells' absolute + sequence numbers. + + These 64 bit absolute counters can wrap without issue, as congestion + windows will never grow to 2^64 cells until well past the Singularity. + However, it is possible that extremely long, bulk circuits could exceed + 2^64 total sent or received cells, so endpoints SHOULD handle wrapped + sequence numbers for purposes of computing retransmit information. (But + even this case is unlikely to happen within the next decade or so). + + Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used + to convey the sequence numbers of the last cell the relay sent and + received on that leg. The other endpoint can use these sequence numbers + to determine if it received the in-flight data or not, or sent more data + since that point, up to and including this absolute sequence number. If + LAST_SEQNO_SENT has not been received, the endpoint MAY transmit the + missing data, if it still has it buffered. + + Because both endpoints get information about the other side's absolute + SENT sequence number, they will know exactly how many re-transmitted + packets to expect, should the circuit stay open. Re-transmitters should + not re-increment their absolute sent fields while re-transmitting. + + If it does not have this missing data due to memory pressure, that + endpoint should destroy *both* legs, as this represents unrecoverable + data loss. + + Otherwise, the new circuit can be re-joined, and its RTT can be compared + to the remaining circuit to determine if the new leg is primary or + secondary. + + It is even possible to resume conflux circuits where both legs have been + collapsed using this scheme, if endpoints continue to buffer their + unacked package_window data for some time after this close. However, see + [TRAFFIC_ANALYSIS] for more details on the full scope of this issue. + + If endpoints are buffering package_window data, such data should be + given priority to be freed in any oomkiller invocation. See [MEMORY_DOS] + for more oomkiller information. 3. Traffic Scheduling [SCHEDULING] - In order to load balance the traffic between the two circuits, the original - conflux paper used only RTT. However, with Proposal 324, we will have - accurate information on the instantaneous available bandwidth of each - circuit leg, as 'cwnd - package_window' (see Section 3 of Proposal 324). - - Some additional RTT optimizations are also useful, to improve - responsiveness and minimize out-of-order queue sizes. - - We specify two traffic schedulers from the multipath literature and adapt - them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has three - variants, with different trade offs. - - However, see the [TRAFFIC_ANALYSIS] sections of this proposal for important - details on how this selection can be changed, to reduce website traffic - fingerprinting. + In order to load balance the traffic between the two circuits, the + original conflux paper used only RTT. However, with Proposal 324, we + will have accurate information on the instantaneous available bandwidth + of each circuit leg, as 'cwnd - package_window' (see Section 3 of + Proposal 324). + + Some additional RTT optimizations are also useful, to improve + responsiveness and minimize out-of-order queue sizes. + + We specify two traffic schedulers from the multipath literature and + adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has + three variants, with different trade offs. + + However, see the [TRAFFIC_ANALYSIS] sections of this proposal for + important details on how this selection can be changed, to reduce + website traffic fingerprinting. 3.1. LowRTT Scheduling [LOWRTT_TOR] - This scheduling algorithm is based on the original [CONFLUX] paper, with - ideas from [MPTCP]'s minRTT/LowRTT scheduler. - - In this algorithm, endpoints send cells on the circuit with lower RTT - (primary circuit). This continues while the congestion window on the - circuit has available room: ie whenever cwnd - package_window > 0. - - Whenever the primary circuit's congestion window becomes full, the - secondary circuit is used. We stop reading on the send window source - (edge connection) when both congestion windows become full. - - In this way, unlike original conflux, we switch to the secondary circuit - without causing congestion on the primary circuit. This improves both - load times, and overall throughput. - - This behavior matches minRTT from [MPTCP], sometimes called LowRTT. - - It may be better to stop reading on the edge connection when the primary - congestion window becomes full, rather than switch to the secondary - circuit as soon as the primary congestion window becomes full. (Ie: only - switch if the RTTs themselves change which circuit is primary). This is - what was done in the original Conflux paper. This behavior effectively - causes us to optimize for responsiveness and congestion avoidance, rather - than throughput. For evaluation, we should control this switching behavior - with a consensus parameter (see [CONSENSUS_PARAMETERS]). - - Because of potential side channel risk (see [SIDE_CHANNELS]), a third - variant of this algorithm, where the primary circuit is chosen during the - [LINKING_CIRCUITS] handshake and never changed, should also be possible - to control via consensus parameter. + This scheduling algorithm is based on the original [CONFLUX] paper, with + ideas from [MPTCP]'s minRTT/LowRTT scheduler. + + In this algorithm, endpoints send cells on the circuit with lower RTT + (primary circuit). This continues while the congestion window on the + circuit has available room: ie whenever cwnd - package_window > 0. + + Whenever the primary circuit's congestion window becomes full, the + secondary circuit is used. We stop reading on the send window source + (edge connection) when both congestion windows become full. + + In this way, unlike original conflux, we switch to the secondary circuit + without causing congestion on the primary circuit. This improves both + load times, and overall throughput. + + This behavior matches minRTT from [MPTCP], sometimes called LowRTT. + + It may be better to stop reading on the edge connection when the primary + congestion window becomes full, rather than switch to the secondary + circuit as soon as the primary congestion window becomes full. (Ie: only + switch if the RTTs themselves change which circuit is primary). This is + what was done in the original Conflux paper. This behavior effectively + causes us to optimize for responsiveness and congestion avoidance, + rather than throughput. For evaluation, we should control this switching + behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]). + + Because of potential side channel risk (see [SIDE_CHANNELS]), a third + variant of this algorithm, where the primary circuit is chosen during + the [LINKING_CIRCUITS] handshake and never changed, should also be + possible to control via consensus parameter. 3.2. BLEST Scheduling [BLEST_TOR] - [BLEST] attempts to predict the availability of the primary circuit, and - use this information to reorder transmitted data, to minimize head-of-line - blocking in the recipient (and thus minimize out-of-order queues there). - - BLEST_TOR uses the primary circuit until the congestion window is full. - Then, it uses the relative RTT times of the two circuits to calculate - how much data can be sent on the secondary circuit faster than if we - just waited for the primary circuit to become available. - - This is achieved by computing two variables at the sender: - - rtts = secondary.currRTT / primary.currRTT - primary_limit = primary.cwnd + (rtts-1)/2)*rtts - - Note: This (rtts-1)/2 factor represents anticipated congestion window - growth over this period.. it may be different for Tor, depending on CC alg. - - If primary_limit < secondary.cwnd - (secondary.package_window + 1), then - there is enough space on the secondary circuit to send data faster than we - could than waiting for the primary circuit. - - XXX: Note that BLEST uses total_send_window where we use secondary.cwnd in - this check. total_send_window is min(recv_win, CWND). But since Tor does - not use receive windows and intead uses stream XON/XOFF, we only use CWND. There - is some concern this may alter BLEST's buffer minimization properties, - but since receive window should only matter if the application is slower - than Tor, and XON/XOFF should cover that case, hopefully this is fine. - - Otherwise, if the primary_limit condition is not hit, cease reading - on source edge connections until SENDME acks come back. - - Here is the pseudocode for this: - - while source.has_data_to_send(): - if primary.cwnd > primary.package_window: - primary.send(source.get_packet()) - continue - - rtts = secondary.currRTT / primary.currRTT - primary_limit = (primary.cwnd + (rtts-1)/2)*rtts - - if primary_limit < secondary.cwnd - (secondary.package_window+1): - secondary.send(source.get_packet()) - else: - break # done for now, wait for an ACK to free up CWND space and restart - - Note that BLEST also has a parameter lambda that is updated whenever HoL - blocking occurs. Because it is expensive and takes significant time to - signal this over Tor, we omit this. - XXX: See [REORDER_SIGNALING] section if we want this lambda feedback. + [BLEST] attempts to predict the availability of the primary circuit, and + use this information to reorder transmitted data, to minimize + head-of-line blocking in the recipient (and thus minimize out-of-order + queues there). + + BLEST_TOR uses the primary circuit until the congestion window is full. + Then, it uses the relative RTT times of the two circuits to calculate + how much data can be sent on the secondary circuit faster than if we + just waited for the primary circuit to become available. + + This is achieved by computing two variables at the sender: + + rtts = secondary.currRTT / primary.currRTT + primary_limit = primary.cwnd + (rtts-1)/2)*rtts + + Note: This (rtts-1)/2 factor represents anticipated congestion window + growth over this period.. it may be different for Tor, depending on CC + alg. + + If primary_limit < secondary.cwnd - (secondary.package_window + 1), then + there is enough space on the secondary circuit to send data faster than + we could than waiting for the primary circuit. + + XXX: Note that BLEST uses total_send_window where we use secondary.cwnd + in this check. total_send_window is min(recv_win, CWND). But since Tor + does not use receive windows and intead uses stream XON/XOFF, we only + use CWND. There is some concern this may alter BLEST's buffer + minimization properties, but since receive window should only matter if + the application is slower than Tor, and XON/XOFF should cover that case, + hopefully this is fine. + + Otherwise, if the primary_limit condition is not hit, cease reading on + source edge connections until SENDME acks come back. + + Here is the pseudocode for this: + + while source.has_data_to_send(): + if primary.cwnd > primary.package_window: + primary.send(source.get_packet()) + continue + + rtts = secondary.currRTT / primary.currRTT + primary_limit = (primary.cwnd + (rtts-1)/2)*rtts + + if primary_limit < secondary.cwnd - (secondary.package_window+1): + secondary.send(source.get_packet()) + else: + break # done for now, wait for SENDME to free up CWND and restart + + Note that BLEST also has a parameter lambda that is updated whenever HoL + blocking occurs. Because it is expensive and takes significant time to + signal this over Tor, we omit this. + + XXX: See [REORDER_SIGNALING] section if we want this lambda feedback. 3.3. Reorder queue signaling [REORDER_SIGNALING] - Reordering should be fairly simple task. By following using the sequence - number field in [SEQUENCING], endpoints can know how many cells are still - in flight on the other leg. - - To reorder them properly, a buffer of out of order cells needs to be kept. - On the Exit side, this can quickly become overwhelming considering ten of - thousands of possible circuits can be held open leading to gigabytes of - memory being used. There is a clear potential memory DoS vector which means - that a tor implementation should be able to limit the size of those queues. - - Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the - primary circuit will minimize or eliminate this out-of-order buffer. - - XXX: The remainder of this section may be over-complicating things... We - only need these concepts if we want to use BLEST's lambda feedback. - - The default for this queue size is governed by the 'cflx_reorder_client' - and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]). - 'cflx_reorder_srv' applies to Exits and onion services. Both parameters - can be overridden by Torrc, to larger or smaller than the consensus - parameter. (Low memory clients may want to lower it; SecureDrop onion - services or other high-upload services may want to raise it). - - When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down - the circuit leg that has data waiting in the queue and use of that leg must - cease, until it drains to half of this value, at which point an - RELAY_CONFLUX_XON is sent. Note that this is different than the stream - XON/XOFF from Proposal 324. - - XXX: [BLEST] actually does not cease use of a path in this case, but - instead uses this signal to adjust the lambda parameter, which biases - traffic away from that leg. + Reordering should be fairly simple task. By following using the sequence + number field in [SEQUENCING], endpoints can know how many cells are + still in flight on the other leg. + + To reorder them properly, a buffer of out of order cells needs to be + kept. On the Exit side, this can quickly become overwhelming + considering ten of thousands of possible circuits can be held open + leading to gigabytes of memory being used. There is a clear potential + memory DoS vector which means that a tor implementation should be able + to limit the size of those queues. + + Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the + primary circuit will minimize or eliminate this out-of-order buffer. + + XXX: The remainder of this section may be over-complicating things... We + only need these concepts if we want to use BLEST's lambda feedback. + + The default for this queue size is governed by the 'cflx_reorder_client' + and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]). + 'cflx_reorder_srv' applies to Exits and onion services. Both parameters + can be overridden by Torrc, to larger or smaller than the consensus + parameter. (Low memory clients may want to lower it; SecureDrop onion + services or other high-upload services may want to raise it). + + When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down + the circuit leg that has data waiting in the queue and use of that leg + must cease, until it drains to half of this value, at which point an + RELAY_CONFLUX_XON is sent. Note that this is different than the stream + XON/XOFF from Proposal 324. + + XXX: [BLEST] actually does not cease use of a path in this case, but + instead uses this signal to adjust the lambda parameter, which biases + traffic away from that leg. 4. Security Considerations 4.1. Memory Denial of Service [MEMORY_DOS] - Both reorder queues and retransmit buffers inherently represent a memory - denial of service condition. - - For [RESUMPTION] retransmit buffers, endpoints that support this - feature SHOULD free retransmit information as soon as they get close - to memory pressure. This prevents resumption while data is in flight, - but will not otherwise harm operation. - - For reorder buffers, adversaries can potentially impact this at any - point, but most obviously and most severely from the client position. - - In particular, clients can lie about sequence numbers, sending cells - with sequence numbers such that the next expected sequence number - is never sent. They can do this repeatedly on many circuits, to exhaust - memory at exits. - - One option is to only allow actual traffic splitting in the downstream - direction, towards clients, and always use the primary circuit for - everything in the upstream direction. However, the ability to support - conflux from the client to the exit shows promise against traffic - analysis (see [WTF_SPLIT]). - - The other option is to use [BLEST_TOR] from clients to exits, as it has - predictable interleaved cell scheduling, and minimizes reorder queues - at exits. If the ratios prescribed by that algorithm are not followed - within some bounds, the other endpoint can close both circuits, and - free the queue memory. - - This still leaves the possibility that intermediate relays may block - a leg, allowing cells to traverse only one leg, thus still accumulating - at the reorder queue. Clients can also spoof sequence numbers similarly, - to make it appear that they are following [BLEST_TOR], without - actually sending any data on one of the legs. - - To handle either of these cases, when a relay is under memory pressure, - the circuit OOM killer SHOULD free and close circuits with the oldest - reorder queue data, first. This heuristic was shown to be best during - the [SNIPER] attack OOM killer iteration cycle. + Both reorder queues and retransmit buffers inherently represent a memory + denial of service condition. + + For [RESUMPTION] retransmit buffers, endpoints that support this feature + SHOULD free retransmit information as soon as they get close to memory + pressure. This prevents resumption while data is in flight, but will not + otherwise harm operation. + + For reorder buffers, adversaries can potentially impact this at any + point, but most obviously and most severely from the client position. + + In particular, clients can lie about sequence numbers, sending cells + with sequence numbers such that the next expected sequence number is + never sent. They can do this repeatedly on many circuits, to exhaust + memory at exits. + + One option is to only allow actual traffic splitting in the downstream + direction, towards clients, and always use the primary circuit for + everything in the upstream direction. However, the ability to support + conflux from the client to the exit shows promise against traffic + analysis (see [WTF_SPLIT]). + + The other option is to use [BLEST_TOR] from clients to exits, as it has + predictable interleaved cell scheduling, and minimizes reorder queues at + exits. If the ratios prescribed by that algorithm are not followed + within some bounds, the other endpoint can close both circuits, and free + the queue memory. + + This still leaves the possibility that intermediate relays may block a + leg, allowing cells to traverse only one leg, thus still accumulating at + the reorder queue. Clients can also spoof sequence numbers similarly, to + make it appear that they are following [BLEST_TOR], without actually + sending any data on one of the legs. + + To handle either of these cases, when a relay is under memory pressure, + the circuit OOM killer SHOULD free and close circuits with the oldest + reorder queue data, first. This heuristic was shown to be best during + the [SNIPER] attack OOM killer iteration cycle. 4.2. Side Channels [SIDE_CHANNELS] - Two potential side channels may be introduced by the use of Conflux: - 1. RTT leg-use bias by altering SENDME latency - 2. Location info leaks through the use of both leg's latencies - - For RTT and leg-use bias, Guard relays could delay legs to introduce - a pattern into the delivery of cells at the exit relay, by varying the - latency of SENDME cells (every 100th cell) to change the distribution - of traffic to send information. This attack could be performed in either - direction of traffic, to bias traffic load off of a particular Guard. - If an adversary controls both Guards, it could in theory send a binary - signal more easily, by alternating delays on each. - - However, this risk weighs against the potential benefits against traffic - fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring - cryptographic tagging attacks, this side channel provides significantly - lower information over time than inter-packet-delay based side channels - that are already available to Guards and routers along the path to the - Guard. - - Tor currently provides no defenses against already existing - single-circuit delay-based side channels, though both circuit padding - and [BACKLIT] are potential options it could conceivably deploy. The - [BACKLIT] paper also has an excellent review of the various methods - that have been studied for such single circuit side channels, and - the [BACKLIT] style RTT monitoring could be used to protect against - these conflux side channels as well. Circuit padding can also help - to obscure which cells are SENDMEs, since circuit padding is not - counted towards SENDME totals. - - The second class of side channel is where the Exit relay may be able - to use the two legs to further infer more information about client - location. See [LATENCY_LEAK] for more details. It is unclear at - this time how much more severe this is for two paths than just one. - - We should preserve the ability to disable conflux to and from Exit - relays, should these side channels prove more severe, or should - it prove possible to mitigate single-circuit side channels, but - not conflux side channels. - - In all cases, all of these side channels appear less severe for onion - service traffic, due to the higher path variability due to relay - selection, as well as the end-to-end nature of conflux in that case. - This indicates that our ability to enable/disable conflux for services - should be separate from Exits. + Two potential side channels may be introduced by the use of Conflux: + 1. RTT leg-use bias by altering SENDME latency + 2. Location info leaks through the use of both leg's latencies + + For RTT and leg-use bias, Guard relays could delay legs to introduce a + pattern into the delivery of cells at the exit relay, by varying the + latency of SENDME cells (every 100th cell) to change the distribution of + traffic to send information. This attack could be performed in either + direction of traffic, to bias traffic load off of a particular Guard. + If an adversary controls both Guards, it could in theory send a binary + signal more easily, by alternating delays on each. + + However, this risk weighs against the potential benefits against traffic + fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring + cryptographic tagging attacks, this side channel provides significantly + lower information over time than inter-packet-delay based side channels + that are already available to Guards and routers along the path to the + Guard. + + Tor currently provides no defenses against already existing + single-circuit delay-based side channels, though both circuit padding + and [BACKLIT] are potential options it could conceivably deploy. The + [BACKLIT] paper also has an excellent review of the various methods that + have been studied for such single circuit side channels, and the + [BACKLIT] style RTT monitoring could be used to protect against these + conflux side channels as well. Circuit padding can also help to obscure + which cells are SENDMEs, since circuit padding is not counted towards + SENDME totals. + + The second class of side channel is where the Exit relay may be able to + use the two legs to further infer more information about client + location. See [LATENCY_LEAK] for more details. It is unclear at this + time how much more severe this is for two paths than just one. + + We should preserve the ability to disable conflux to and from Exit + relays, should these side channels prove more severe, or should it prove + possible to mitigate single-circuit side channels, but not conflux side + channels. + + In all cases, all of these side channels appear less severe for onion + service traffic, due to the higher path variability due to relay + selection, as well as the end-to-end nature of conflux in that case. + This indicates that our ability to enable/disable conflux for services + should be separate from Exits. 4.3. Traffic analysis [TRAFFIC_ANALYSIS] - Even though conflux shows benefits against traffic analysis in [WTF_SPLIT], - these gains may be moot if the adversary is able to perform packet counting - and timing analysis at guards to guess which specific circuits are linked. - In particular, the 3 way handshake in [LINKING_CIRCUITS] may be quite - noticeable. - - As one countermeasure, it may be possible to eliminate the third leg - (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via measuring - the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, but this - will introduce cross-component complexity into Tor's protocol that - could quickly become unwieldy and fragile. - - Additionally, the conflux handshake may make onion services stand out - more, regardless of the number of stages in the handshake. For this - reason, it may be more wise to simply address these issues with circuit - padding machines during circuit setup (see padding-spec.txt). - - Additional traffic analysis considerations arise when combining conflux - with padding, for purposes of mitigating traffic fingerprinting. For this, - it seems wise to treat the packet schedulers as another piece of a combined - optimization problem in tandem with optimizing padding machines, perhaps - introducing randomness or fudge factors their scheduling, as a parameterized - distribution. For details, see - https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDev… - - Finally, conflux may exacerbate forms of confirmation-based traffic - analysis that close circuits to determine concretely if they were in use, - since closing either leg might cause resumption to fail. TCP RST - injection can perform this attack on the side, without surveillance - capability. [RESUMPTION] with buffering of the inflight unacked - package_window data, for retransmit, is a partial mitigation, if - endpoints buffer this data for retransmission for a brief time even - if both legs close. This seems more feasible for onion services, - which are more vulnerable to this attack. However, if the adversary - controls the client, they will notice the resumption re-link, and - still obtain confirmation that way. - - It seems the only way to fully mitigate these kinds of attacks is with - the Snowflake pluggable transport, which provides its own resumption - and retransmit behavior. Additionally, Snowflake's use of UDP DTLS also - protects against TCP RST injection, which we suspect to be the main - vector for such attacks. - - In the future, a DTLS or QUIC transport for Tor such as masque could - provide similar RST injection resistance, and resumption at - Guard/Bridge nodes, as well. + Even though conflux shows benefits against traffic analysis in + [WTF_SPLIT], these gains may be moot if the adversary is able to perform + packet counting and timing analysis at guards to guess which specific + circuits are linked. In particular, the 3 way handshake in + [LINKING_CIRCUITS] may be quite noticeable. + + As one countermeasure, it may be possible to eliminate the third leg + (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via + measuring the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, + but this will introduce cross-component complexity into Tor's protocol + that could quickly become unwieldy and fragile. + + Additionally, the conflux handshake may make onion services stand out + more, regardless of the number of stages in the handshake. For this + reason, it may be more wise to simply address these issues with circuit + padding machines during circuit setup (see padding-spec.txt). + + Additional traffic analysis considerations arise when combining conflux + with padding, for purposes of mitigating traffic fingerprinting. For + this, it seems wise to treat the packet schedulers as another piece of a + combined optimization problem in tandem with optimizing padding + machines, perhaps introducing randomness or fudge factors their + scheduling, as a parameterized distribution. For details, see + https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDev… + + Finally, conflux may exacerbate forms of confirmation-based traffic + analysis that close circuits to determine concretely if they were in + use, since closing either leg might cause resumption to fail. TCP RST + injection can perform this attack on the side, without surveillance + capability. [RESUMPTION] with buffering of the inflight unacked + package_window data, for retransmit, is a partial mitigation, if + endpoints buffer this data for retransmission for a brief time even if + both legs close. This seems more feasible for onion services, which are + more vulnerable to this attack. However, if the adversary controls the + client, they will notice the resumption re-link, and still obtain + confirmation that way. + + It seems the only way to fully mitigate these kinds of attacks is with + the Snowflake pluggable transport, which provides its own resumption and + retransmit behavior. Additionally, Snowflake's use of UDP DTLS also + protects against TCP RST injection, which we suspect to be the main + vector for such attacks. + + In the future, a DTLS or QUIC transport for Tor such as masque could + provide similar RST injection resistance, and resumption at Guard/Bridge + nodes, as well. 5. System Interactions @@ -665,7 +676,7 @@ Status: Draft - EWMA and KIST - CBT and number of guards - Onion service circ obfuscation - - Future UDP (it may increase the need for UDP to buffer before dropping) + - Future UDP (may increase need for UDP to buffer before dropping) - Padding (no sequence numbers on padding cells, as per [SEQUENCING]) - Also, any padding machines may need re-tuning - No 'cannibalization' of linked circuits @@ -697,116 +708,116 @@ Appended A [ALTERNATIVES] A.1 BEGIN/END sequencing [ALTERNATIVE_SEQUENCING] - In this method of signaling, we increment the sequence number by 1 - only when we switch legs, and use BEGIN/END "bookends" to know that - all data on a leg has been received. - - To achieve this, we add a small sequence number to the common relay - header for all relay cells on linked circuits, as well as a field to - signal the beginning of a sequence, intermediate data, and the end - of a sequence. - - Relay command [1 byte] - Recognized [2 bytes] - StreamID [2 bytes] - Digest [4 bytes] - Length [2 bytes] - > Switching [2 bits] # 01 = BEGIN, 00 = CONTINUE, 10 = END - > Sequencing [6 bits] - Data [PAYLOAD_LEN - 12 - Length bytes] - - These fields MUST be present on ALL end-to-end relay cells on each leg - that come from the endpoint, following a RELAY_CIRCUIT_LINK command. - - They are absent on 'leaky pipe' RELAY_COMMAND_DROP and - RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, - as opposed to the endpoint, to support padding. - - Sequence numbers are incremented by one when an endpoint switches legs - to transmit a cell. This number will wrap; implementations should treat - 0 as the next sequence after 2^6-1. Because we do not expect to support - significantly more than 2 legs, and much fewer than 63, this is not an - issue. - - The first cell on a new circuit MUST use the BEGIN code for switching. - Cells are delivered from that circuit until an END switching signal is - received, even if cells arrive first on another circuit with the next - sequence number before and END switching field. Recipients MUST only - deliver cells with a BEGIN, if their Sequencing number is one more than - the last END. + In this method of signaling, we increment the sequence number by 1 only + when we switch legs, and use BEGIN/END "bookends" to know that all data + on a leg has been received. + + To achieve this, we add a small sequence number to the common relay + header for all relay cells on linked circuits, as well as a field to + signal the beginning of a sequence, intermediate data, and the end of a + sequence. + + Relay command [1 byte] + Recognized [2 bytes] + StreamID [2 bytes] + Digest [4 bytes] + Length [2 bytes] + > Switching [2 bits] # 01 = BEGIN, 00 = CONTINUE, 10 = END + > Sequencing [6 bits] + Data [PAYLOAD_LEN - 12 - Length bytes] + + These fields MUST be present on ALL end-to-end relay cells on each leg + that come from the endpoint, following a RELAY_CIRCUIT_LINK command. + + They are absent on 'leaky pipe' RELAY_COMMAND_DROP and + RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as + opposed to the endpoint, to support padding. + + Sequence numbers are incremented by one when an endpoint switches legs + to transmit a cell. This number will wrap; implementations should treat + 0 as the next sequence after 2^6-1. Because we do not expect to support + significantly more than 2 legs, and much fewer than 63, this is not an + issue. + + The first cell on a new circuit MUST use the BEGIN code for switching. + Cells are delivered from that circuit until an END switching signal is + received, even if cells arrive first on another circuit with the next + sequence number before and END switching field. Recipients MUST only + deliver cells with a BEGIN, if their Sequencing number is one more than + the last END. A.2 Alternative Link Handshake [ALTERNATIVE_LINKING] - The circuit linking in [LINKING_CIRCUITS] could be done as encrypted - ntor onionskin extension fields, similar to those used by v3 onions. - - This approach has at least four problems: - i). For onion services, since the onionskins traverse the intro circuit - and then return on the rend circuit, this handshake cannot measure - RTT there. - ii). Since these onionskins are larger, and have no PFS, an adversary - at the middle relay knows that the onionskin is for linking, and - can potentially try to obtain the onionskin key for attacks on - the link. - iii). It makes linking circuits more fragile, since they could timeout - due to CBT, or other issues during construction. - iv). The overhead in processing this onionskin through onionskin queues - adds additional time for linking, even in the Exit case, making - that RTT potentially noisy. - - Additionally, it is not clear that this approach actually saves us - anything in terms of setup time, because we can optimize away the - linking phase using Proposal 325, to combine initial RELAY_BEGIN cells - with RELAY_CIRCUIT_LINK. + The circuit linking in [LINKING_CIRCUITS] could be done as encrypted + ntor onionskin extension fields, similar to those used by v3 onions. + + This approach has at least four problems: + i). For onion services, since onionskins traverse the intro circuit + and return on the rend circuit, this handshake cannot measure + RTT there. + ii). Since these onionskins are larger, and have no PFS, an adversary + at the middle relay knows that the onionskin is for linking, and + can potentially try to obtain the onionskin key for attacks on + the link. + iii). It makes linking circuits more fragile, since they could timeout + due to CBT, or other issues during construction. + iv). The overhead in processing this onionskin in onionskin queues + adds additional time for linking, even in the Exit case, making + that RTT potentially noisy. + + Additionally, it is not clear that this approach actually saves us + anything in terms of setup time, because we can optimize away the + linking phase using Proposal 325, to combine initial RELAY_BEGIN cells + with RELAY_CIRCUIT_LINK. A.3. Alternative RTT measurement [ALTERNATIVE_RTT] - Instead of measuring RTTs during [LINKING_CIRCUITS], we could create - PING/PONG cells, whose sole purpose is to allow endpoints to measure - RTT. - - This was rejected for several reasons. First, during circuit use, we - already have SENDMEs to measure RTT. Every 100 cells (or - 'circwindow_inc' from Proposal 324), we are able to re-measure RTT based - on the time between that Nth cell and the SENDME ack. So we only need - PING/PONG to measure initial circuit RTT. - - If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above, - we might be able to specify a PING/PONG/PING handshake solely for - measuring initial RTT, especially for onion service circuits. - - The reason for not making a dedicated PING/PONG for this purpose is that - it is context-free. Even if we were able to use onionskins for linking - and resumption, to avoid additional data in handshake that just measures - RTT, we would have to enforce that this PING/PONG/PING only follows the - exact form needed by this proposal, at the expected time, and at no - other points. - - If we do not enforce this specific use of PING/PONG/PING, it becomes - another potential side channel, for use in attacks such as [DROPMARK]. - - In general, Tor is planning to remove current forms of context-free and - semantic-free cells from its protocol: - https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 - - We should not add more. + Instead of measuring RTTs during [LINKING_CIRCUITS], we could create + PING/PONG cells, whose sole purpose is to allow endpoints to measure + RTT. + + This was rejected for several reasons. First, during circuit use, we + already have SENDMEs to measure RTT. Every 100 cells (or + 'circwindow_inc' from Proposal 324), we are able to re-measure RTT based + on the time between that Nth cell and the SENDME ack. So we only need + PING/PONG to measure initial circuit RTT. + + If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above, + we might be able to specify a PING/PONG/PING handshake solely for + measuring initial RTT, especially for onion service circuits. + + The reason for not making a dedicated PING/PONG for this purpose is that + it is context-free. Even if we were able to use onionskins for linking + and resumption, to avoid additional data in handshake that just measures + RTT, we would have to enforce that this PING/PONG/PING only follows the + exact form needed by this proposal, at the expected time, and at no + other points. + + If we do not enforce this specific use of PING/PONG/PING, it becomes + another potential side channel, for use in attacks such as [DROPMARK]. + + In general, Tor is planning to remove current forms of context-free and + semantic-free cells from its protocol: + https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 + + We should not add more. Appendix B: Acknowledgments - Thanks to Per Hurtig for helping us with the framing of the MPTCP - problem space. - - Thanks to Simone Ferlin for clarifications on the [BLEST] - paper, and for pointing us at the Linux kernel implementation. - - Extreme thanks goes again to Toke Høiland-Jørgensen, who helped - immensely towards our understanding of how the BLEST condition relates - to edge connection pushback, and for clearing up many other - misconceptions we had. - - Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian - Goldberg, for the original [CONFLUX] paper! + Thanks to Per Hurtig for helping us with the framing of the MPTCP + problem space. + + Thanks to Simone Ferlin for clarifications on the [BLEST] paper, and for + pointing us at the Linux kernel implementation. + + Extreme thanks goes again to Toke Høiland-Jørgensen, who helped + immensely towards our understanding of how the BLEST condition relates + to edge connection pushback, and for clearing up many other + misconceptions we had. + + Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian + Goldberg, for the original [CONFLUX] paper! References:

1 0

[torspec/master] Prop329: Note new comments on receive windows from Simone and Toke
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit b2cfbecd3b8e0e2786c32edfb8b1e7bc763b7c68 Author: Mike Perry <mikeperry-git(a)torproject.org> Date: Fri Mar 26 15:54:02 2021 +0000 Prop329: Note new comments on receive windows from Simone and Toke --- proposals/329-traffic-splitting.txt | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt index 1412f5d..6ef9ed7 100644 --- a/proposals/329-traffic-splitting.txt +++ b/proposals/329-traffic-splitting.txt @@ -467,7 +467,9 @@ Status: Draft use CWND. There is some concern this may alter BLEST's buffer minimization properties, but since receive window should only matter if the application is slower than Tor, and XON/XOFF should cover that case, - hopefully this is fine. + hopefully this is fine. If we need to, we could turn [REORDER_SIGNALING] + into a receive window indication of some kind, to indicate remaining + buffer size. Otherwise, if the primary_limit condition is not hit, cease reading on source edge connections until SENDME acks come back. @@ -510,7 +512,10 @@ Status: Draft primary circuit will minimize or eliminate this out-of-order buffer. XXX: The remainder of this section may be over-complicating things... We - only need these concepts if we want to use BLEST's lambda feedback. + only need these concepts if we want to use BLEST's lambda feedback. Though + turning this into some kind of receive window that indicates remaining + reorder buffer size may also help with the total_send_window also noted + in BLEST_TOR. The default for this queue size is governed by the 'cflx_reorder_client' and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).

1 0

[torspec/master] Tighten up some terms and phrasing.
by asn＠torproject.org 30 Mar '21

30 Mar '21

commit 660a75f34d82e7b9ea2a632dd1f0e06ad17b00d5 Author: Mike Perry <mikeperry-git(a)torproject.org> Date: Fri Mar 26 21:05:59 2021 +0000 Tighten up some terms and phrasing. --- proposals/329-traffic-splitting.txt | 76 +++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 37 deletions(-) diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt index 6ef9ed7..292f51c 100644 --- a/proposals/329-traffic-splitting.txt +++ b/proposals/329-traffic-splitting.txt @@ -35,7 +35,7 @@ Status: Draft Because Tor's congestion control only concerns itself with bottnecks in Tor relay queues, and not with any other bottlenecks (such as intermediate Internet routers), we can avoid this complexity merely by - specifying that any paths that are constructed should not share any + specifying that any paths that are constructed SHOULD share any relays. In this way, we can proceed to use the exact same congestion control as specified in Proposal 324, for each path. @@ -105,10 +105,10 @@ Status: Draft only two rendezvous circuits are linked. Each of these RP circuits will be constructed separately, and then linked. However, the same path constraints apply to each half of the circuits (no shared relays between - the legs). Should, by chance, the service and the client sides end up + the legs). If, by chance, the service and the client sides end up sharing some relays, this is not catastrophic. Multipath TCP researchers - we have consulted believe Tor's congestion control from Proposal 324 to - be sufficient in this rare case. + we have consulted (see [ACKNOWLEDGEMENTS]), believe Tor's congestion + control from Proposal 324 to be sufficient in this rare case. Only two circuits SHOULD be linked together. However, implementations SHOULD make it easy for researchers to *test* more than two paths, as @@ -194,7 +194,7 @@ Status: Draft want to support it, because of its own memory issues. The NONCE contains a random 256-bit secret, used to associate the two - circuits together. The nonce must not be shared outside of the circuit + circuits together. The nonce MUST NOT be shared outside of the circuit transmission, or data may be injected into TCP streams. This means it MUST NOT be logged to disk. @@ -212,11 +212,11 @@ Status: Draft Sent from the OP to the exit/service, to provide initial RTT measurement for the exit/service. - For timeout of the handshake, clients should use the normal SOCKS/stream + For timeout of the handshake, clients SHOULD use the normal SOCKS/stream timeout already in use for RELAY_BEGIN. These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED, - and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each + and RELAY_CIRCUIT_LINKED_RTT_ACK) are send on *each* leg, to allow each endpoint to measure the initial RTT of each leg. 2.2. Linking Circuits from OP to Exit [LINKING_EXIT] @@ -235,7 +235,7 @@ Status: Draft measured by the client, to determine each circuit RTT to determine primary vs secondary circuit use, and for packet scheduling. Similarly, the exit measures the RTT times between RELAY_COMMAND_LINKED and - RELAY_COMMAND_LINKED_ACK, for the same purpose. + RELAY_COMMAND_LINKED_RTT_ACK, for the same purpose. 2.3. Linking circuits to an onion service [LINKING_SERVICE] @@ -254,7 +254,7 @@ Status: Draft circuit, until RTT can be measured. Once both circuits are linked and RTT is measured, packet scheduling - should be used, as per [SCHEDULING]. + MUST be used, as per [SCHEDULING]. 2.4. Congestion Control Application [CONGESTION_CONTROL] @@ -313,7 +313,7 @@ Status: Draft When an endpoint switches legs, on the first cell in a new leg, LongSeq is set to 1, and the following 31 bits represent the *total* number of - cells sent on the *other* leg, before the switch. The receiver must wait + cells sent on the *other* leg, before the switch. The receiver MUST wait for that number of cells to arrive from the previous leg before delivering that cell. @@ -332,10 +332,10 @@ Status: Draft In the event that a circuit leg is destroyed, they MAY be resumed. - Resumption is achieved by re-using the NONCE and method to the same - endpoint (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path - need not use the same middle and guard relays, but should not share any - relays with any existing legs(s). + Resumption is achieved by re-using the NONCE to the same endpoint + (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need + not use the same middle and guard relays as the destroyed leg(s), but + SHOULD NOT share any relays with any existing legs(s). To provide resumption, endpoints store an absolute 64bit cell counter of the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT), @@ -363,11 +363,13 @@ Status: Draft Because both endpoints get information about the other side's absolute SENT sequence number, they will know exactly how many re-transmitted - packets to expect, should the circuit stay open. Re-transmitters should - not re-increment their absolute sent fields while re-transmitting. + packets to expect, if the circuit is successfully resumed. + + Re-transmitters MUST NOT re-increment their absolute sent fields + while re-transmitting. If it does not have this missing data due to memory pressure, that - endpoint should destroy *both* legs, as this represents unrecoverable + endpoint MUST destroy *both* legs, as this represents unrecoverable data loss. Otherwise, the new circuit can be re-joined, and its RTT can be compared @@ -428,13 +430,13 @@ Status: Draft switch if the RTTs themselves change which circuit is primary). This is what was done in the original Conflux paper. This behavior effectively causes us to optimize for responsiveness and congestion avoidance, - rather than throughput. For evaluation, we should control this switching + rather than throughput. For evaluation, we will control this switching behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]). Because of potential side channel risk (see [SIDE_CHANNELS]), a third variant of this algorithm, where the primary circuit is chosen during - the [LINKING_CIRCUITS] handshake and never changed, should also be - possible to control via consensus parameter. + the [LINKING_CIRCUITS] handshake and never changed, is also possible + to control via consensus parameter. 3.2. BLEST Scheduling [BLEST_TOR] @@ -465,8 +467,8 @@ Status: Draft in this check. total_send_window is min(recv_win, CWND). But since Tor does not use receive windows and intead uses stream XON/XOFF, we only use CWND. There is some concern this may alter BLEST's buffer - minimization properties, but since receive window should only matter if - the application is slower than Tor, and XON/XOFF should cover that case, + minimization properties, but since receive window only matter if + the application is slower than Tor, and XON/XOFF will cover that case, hopefully this is fine. If we need to, we could turn [REORDER_SIGNALING] into a receive window indication of some kind, to indicate remaining buffer size. @@ -497,17 +499,17 @@ Status: Draft 3.3. Reorder queue signaling [REORDER_SIGNALING] - Reordering should be fairly simple task. By following using the sequence + Reordering is fairly simple task. By following using the sequence number field in [SEQUENCING], endpoints can know how many cells are still in flight on the other leg. To reorder them properly, a buffer of out of order cells needs to be - kept. On the Exit side, this can quickly become overwhelming + kept. On the Exit side, this can quickly become overwhelming considering ten of thousands of possible circuits can be held open leading to gigabytes of memory being used. There is a clear potential - memory DoS vector which means that a tor implementation should be able - to limit the size of those queues. - + memory DoS vector in this case, covered in more detail in + [MEMORY_DOS]. + Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the primary circuit will minimize or eliminate this out-of-order buffer. @@ -526,7 +528,7 @@ Status: Draft When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down the circuit leg that has data waiting in the queue and use of that leg - must cease, until it drains to half of this value, at which point an + SHOULD cease, until it drains to half of this value, at which point an RELAY_CONFLUX_XON is sent. Note that this is different than the stream XON/XOFF from Proposal 324. @@ -614,16 +616,16 @@ Status: Draft location. See [LATENCY_LEAK] for more details. It is unclear at this time how much more severe this is for two paths than just one. - We should preserve the ability to disable conflux to and from Exit - relays, should these side channels prove more severe, or should it prove - possible to mitigate single-circuit side channels, but not conflux side - channels. + We preserve the ability to disable conflux to and from Exit relays + using consensus parameters, if these side channels prove more severe, + or if it proves possible possible to mitigate single-circuit side + channels, but not conflux side channels. In all cases, all of these side channels appear less severe for onion service traffic, due to the higher path variability due to relay selection, as well as the end-to-end nature of conflux in that case. - This indicates that our ability to enable/disable conflux for services - should be separate from Exits. + Thus, we separate our ability to enable/disable conflux for onion + services from Exits. 4.3. Traffic analysis [TRAFFIC_ANALYSIS] @@ -634,7 +636,7 @@ Status: Draft [LINKING_CIRCUITS] may be quite noticeable. As one countermeasure, it may be possible to eliminate the third leg - (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via + (RELAY_CIRCUIT_LINKED_RTT_ACK) by computing the exit/service RTT via measuring the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, but this will introduce cross-component complexity into Tor's protocol that could quickly become unwieldy and fragile. @@ -739,7 +741,7 @@ A.1 BEGIN/END sequencing [ALTERNATIVE_SEQUENCING] opposed to the endpoint, to support padding. Sequence numbers are incremented by one when an endpoint switches legs - to transmit a cell. This number will wrap; implementations should treat + to transmit a cell. This number will wrap; implementations MUST treat 0 as the next sequence after 2^6-1. Because we do not expect to support significantly more than 2 legs, and much fewer than 63, this is not an issue. @@ -808,7 +810,7 @@ A.3. Alternative RTT measurement [ALTERNATIVE_RTT] We should not add more. -Appendix B: Acknowledgments +Appendix B: Acknowledgments [ACKNOWLEDGEMENTS] Thanks to Per Hurtig for helping us with the framing of the MPTCP problem space.

1 0