[tor-dev] Proposal 316: FlashFlow: A Secure Speed Test for Tor (Parent Proposal)

Sat Apr 25 02:25:12 UTC 2020

Hi Matt,

Thanks for the quick response!

I've trimmed the conversation to the comments that need further
discussion.

> On 25 Apr 2020, at 06:46, Matt Traudt <pastly at torproject.org> wrote:
> 
> On 4/23/20 21:05, teor wrote:
>> ...
>> 
>>> - msm_duration  [1 byte]
>> 
>> What are the minimum and maximum valid values for this field?
>> 1..255 ?
>> 
>> Do we want to limit measurements to 4 minutes at a protocol level?
>> 
>> In general, protocols should make invalid states impossible to represent.
>> But do we want a 4 minute hard limit here?
>> 
> 
> This document suggests a measurement duration of 30 seconds. We see no
> reason to ever go above 1 minute. If there's a byte to spare, then sure
> let's make this a uint16.

I've thought about this a bit more, and from a user experience perspective,
we also want a 30 second limit. (Most users will give up on a slow
connection after 30 seconds.)

So as long as there is a documented limit in the protocol, we should be
fine with 2 bytes.

>>> second is the number of seconds since the measurement began. MSM_BG
>>> cells are sent once per second from the relay to the FlashFlow
>>> coordinator. The first cell will have this set to 1, and each
>>> subsequent cell will increment it by one. sent_bg_bytes is the number of
>>> background traffic bytes sent in the last second (since the last MSM_BG
>>> cell). recv_bg_bytes is the same but for received bytes.
>>> 
>>> The payload of MSM_ERR cells:
>>> 
>>> - err_code [1 byte]
>>> - err_str  [possibly zero-len null-terminated string]
>> 
>> We don't have strings in any other tor protocol cells.
>> 
>> If you need extensible error information, can I suggest using
>> ext-type-length-value fields:
>> 
>>     N_EXTENSIONS     [1 byte]
>>     N_EXTENSIONS times:
>>        EXT_FIELD_TYPE [1 byte]
>>        EXT_FIELD_LEN  [1 byte]
>>        EXT_FIELD      [EXT_FIELD_LEN bytes]
>> 
>> https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
>> 
>> If strings are necessary, please specify a character encoding
>> (ASCII or UTF-8), and an allowed set of characters.
>> 
>> If we don't whitelist characters, we risk logging terminal escape
>> sequences, or other arbitrary data.
> 
> I seem to remember strings used between directory authority /directory
> mirror relays and clients to communicate certain errors (clock skew?),
> but what's probably reality is a *code* is communicated and what I'm
> thinking of is merely the Tor client interpreting the code for logging
> purposes.

There are two sources for clock skew warnings:
* A binary time field in NETINFO cells
* A HTTP header on directory documents

The header is text, but it's very structured, and at a different
protocol layer.

In another part of the directory protocol, when authorities reject a
relay descriptor upload, they send a rejection reason to the relay.
That's unstructured text in a HTTP response. (But we do escape it before
logging.)

> Regardless, we probably don't really need a string. It occurs to me we
> might want *something* that carries more information than a code; for
> example, a MSM_ERR cell with a code stating "I'm refusing to be measured
> because I've been measured too recently" would benefit from a field
> stating either time till measurement allowed again or time since last
> measurement.

Yes, I think a code and ext-type-length-value fields for any additional
info would work here.

>>> At this point, the
>>> relay
>>> 
>>> - Starts a repeating 1s timer on which it will report the amount of
>>>   background traffic to the coordinator over the coordinator's
>>>   connection.
>>> - Enters "measurement mode" and limits the amount of background
>>>   traffic it handles according to the torrc option/consensus
>>>   parameter.
>>> 
>>> The relay decrypts and echos back all MSM_ECHO cells it receives on
>>> measurement connections
>> 
>> Are MSM_ECHO cells relay cells?
>> How much of the relay protocol does the measurer implement?
>> 
>> The references to decrypting cells suggest that MSM_ECHO cells are
>> relay (circuit-level) cells. But earlier sections suggest that they are
>> link cells.
>> 
>> If they are link cells, what key material is used for decryption?
>> How do the measurer and relay agree on this key material?
>> 
>> If they are relay cells, do they use the ntor handshake?
>> https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1132
>> 
> 
> MEASURE cells are cells like CREATE, CREATED, RELAY, etc.

In the Tor protocol specification, we call these "commands", and the
cells are sent at the link level.

> MSM_ECHO, MSM_PARAMS, etc. cells are MEASURE cells in the same way
> RELAY_BEGIN, RELAY_DATA, RELAY_SENDME, etc. are RELAY cells.

For relay cells, we call these "relay commands", and the cells are
sent at the circuit level.

So it might be helpful to say "measure commands" here. It might also be
helpful to distinguish "control" and "data" cells, in a similar way to
the relay cell spec:

https://github.com/torproject/torspec/blob/master/tor-spec.txt#L1572

> The relay needs to do AES on the MSM_ECHO cells (or for simplicity, the
> MSM_ECHO cell payload) like it does AES on relay cells. As for key
> material necessary to do that, that's an oversight. We've neglected to
> specify how it's derived.
> 
> Suggestions? Not a cryptographer, but off the top of my head, the
> measurer could simply tell the relay to use $key_i for MSM_ECHO cells on
> $connection_i. We just want the CPU load on the relay; we're not after
> security properties here (other than verifying the relay is actually
> doing the crypto, as discussed elsewhere).

I suggest that the client opens a real single-hop circuit, and sends
RELAY_ECHO cells (a new relay command) on that circuit.

As part of this design:
* RELAY_ECHO cells are only allowed from valid measurers
* flow control is disabled on circuits from valid measurers
  (I think that's what you want here, but best to be explicit)

This design has a few advantages:
* The design and coding is much simpler
* FlashFlow automatically uses the latest relay crypto
* The key material is automatically derived for you
* The decryption and some verification is automatically performed for you
* You can verify the cell contents using a simple memcmp()

There's a slight disadvantage:
* When you skip decrypting a cell, the digest gets out of sync, so future
  cells have less validation. But I don't think that matters for single-hop
  circuits.

It's likely that your measurers will be network-bound, rather than
CPU-bound. So you may be able to just use unmodified circuit crypto.

There are also security advantages to using unmodified relay crypto. If tor
adds extra modes that skip decryption or verification, then it's easier to
accidentally trigger those modes. (Via bugs or exploits.)

If we use unmodified relay crypto, then it's much harder to get tor into an
insecure mode.

Here's what the cells would look like in detail:

16 -- RELAY_ECHO   [forward]             [control]
17 -- RELAY_ECHOED [backward]            [control]

I think these should be control cells (circuit-level cells) rather than
stream-level cells, because they are like RELAY_DROP:

10 -- RELAY_DROP   [forward or backward] [control]

I don't have a strong opinion about the rest of the measure commands. They
can stay as link-level cells. But if it turns out that it's easier to code
them as circuit-level cells, we could add a new RELAY_MEASURE command.

>>> until it has reported its amount of background
>>> traffic the same number of times as there are seconds in the measurement
>>> (e.g. 30 per-second reports for a 30 second measurement). After sending
>>> the last MSM_BG cell, the relay drops all buffered MSM_ECHO cells,
>>> closes all measurement connections, and exits measurement mode.
>> 
>> To be more precise here, can we say:
>> 
>> "the relay drops all inbound and outbound MSM_ECHO cells from measurers
>> associated with the completed measurement"
>> 
>> Can we avoid assuming that there is always only one measurement happening
>> at one time?
>> 
> 
> I think it's safe/smart/necessary to assume that, for a given relay,
> there is always only zero or one measurements happening.
> 
> - Measurements are scheduled s.t. coordinators won't try to measure a
> relay at the same time.
> - A coordinator trying to start a measurement while another one is
> ongoing can simply be sent a MSM_ERR cell stating as such.

You're right, the relay can resolve clashes. It's important that we make
that explicit.

>>> During the measurement the relay targets a ratio of background traffic
>>> to measurement traffic as specified by a consensus parameter/torrc
>>> option. For a given ratio r, if the relay has handled x cells of
>>> measurement traffic recently, Tor then limits itself to y = xr/(1-r)
>>> cells of non-measurement traffic this scheduling round. The target will
>>> enforce that a minimum of 10 Mbit/s of measurement traffic is recorded
>>> since the last background traffic scheduling round to ensure it always
>>> allows some minimum amount of background traffic.
>> 
>> Do you mean "a maximum of 10 Mbit/s of measurement traffic" ?
> 
> No. When getting ready to handle background traffic, if there has been
> less than 10 Mbit/s of measurement traffic recently, Tor will limit
> background traffic as if there was indeed 10 Mbit/s of measurement traffic.
> 
> This way the relay can always send at least some background traffic, and
> a malfunctioning/malicious FlashFlow deployment cannot stop all
> background client traffic going through a relay for 30 seconds by not
> sending it (very much) measurement traffic.

I'm still a bit confused here.

When you say:
"The target will enforce that a minimum of 10 Mbit/s of measurement traffic
is recorded"

I think you mean:
"... regardless of the actual traffic sent by the measurer."

But that raises another concern:

What about relays with very low bandwidths?
Will they reserve all their traffic for users, and none for the measurer?

Using the suggested r=25%, the maximum non-measurement traffic is:

y = (10 Mbits)(0.25)/(1-0.25)
  = 3.3 Mbits

So a relay with an actual capacity of 3.3 Mbits, which is fully loaded
with user traffic, will send no measurement traffic.

That seems... unexpected.

At the moment, tor directory authorities default to:

AuthDirFastGuarantee 100 Kbytes
AuthDirGuardBWGuarantee 2 Mbytes

So maybe we should derive the limit based on these values?

>>> 3.2.1 FlashFlow Coordinator
>>> 
>>> The coordinator is responsible for scheduling measurements, aggregating
>>> results, and producing v3bw files. It needs continuous access to new
>>> consensus files, which it can obtain by running an accompanying Tor
>>> process in client mode.
>> 
>> Recent tor versions go dormant when they haven't built circuits for a
>> while. There are options that prevent dormancy, but they are only designed
>> for interactive applications.
>> 
>> Is the FlashFlow coordinator going to use tor to implement the tor link
>> protocol?
>> 
>> If the coordinator uses tor, then it can use the same tor client instance
>> that's downloading its consensuses.
>> 
>> Otherwise, you might just be better using a small stem script, and a
>> download timer.
>> 
>> If you use a timer, you can download each new consensus, shortly after
>> it is created. (Clients often have consensuses that are 1-2 hours old,
>> unless specifically configured to fetch from directory authorities.
>> Even then, they can take up to an hour to download a new consensus.)
>> 
> 
> As described elsewhere, the coordinator uses a Tor client in order to
> avoid implementing the tor link protocol itself. If there is not already
> a way to make a Tor client download every new consensus (e.g. a torrc
> option or an hourly control port command), we'll want to add that.

If the coordinator is constantly sending network traffic to relays, then it
shouldn't go dormant.

Here are the torrc options you might want to set on the coordinator:

# Set this to your maximum expected gap between relay measurements,
# including network downtime and other emergencies.
# Particularly important during the initial deployment.
DormantClientTimeout 1 week

# You may also need to set
FetchUselessDescriptors 1

# Get new relays as fast as possible.
FetchDirInfoEarly 1
FetchDirInfoExtraEarly 1

This is starting to look like the sbws config, you probably want most of
these options on controllers and measurers:
https://github.com/torproject/sbws/blob/master/sbws/globals.py#L20

>> What if the capacity is limited at some other point on the internet?
>> 
>> For example:
>> * an intermediate transit provider between the measurer and all the chosen
>>  relays
>> * the chosen relays are all on the same local network
>> 
> 
> Ideally a single FlashFlow deployment's measurers are diverse to help
> mitigate the first point.
> 
> For the second, I don't have a good idea at this time. That shouldn't
> happen regularly. It will happen sometimes though, so perhaps this
> motivates a modification in how the coordinator chooses the weight for a
> relay. Instead of the result of the latest measurement, perhaps the
> highest result from the last X measurements.

That seems like a good idea.

It might also help to measure relays in each family in separate slots. You
might also want to do the same thing with relays that are in:
* the same IPv4 /24
* the same IPv6 /48

Or at the very least, relays on the same IP address.

>>> Relays without existing capacity estimates are assumed to have the 75th
>>> percentile capacity of the current network.
>>> 
>>> If a relay is not online when it's scheduled to be measured, it doesn't
>>> get measured that day.
>> 
>> Online in the consensus, or listening via its ORPort?
>> (There's a delay of up to 3 hours here, whenever the relay goes up or
>> down.)
>> 
>> What bandwidth weight does an offline relay get?
>> sbws has had issues because it drops offline relays.
>> 
> 
> Online as in both, I think.
> 
> I'm not up to speed or have forgotten why continuing to give weight to
> offline relays is important (and this may not be the place to enlighten
> me). Naively I'd say zero. Assuming that's stupid, I **think** whatever
> weight FlashFlow would give it were it online is smarter than some
> minimum weight value. Suggestions?

You'll need relays to be in the consensus to do connection crypto, and
listening on their ORPort to actually connect. Then you can measure.

Relays sometimes drop out of the consensus between their measurement,
and the creation of the v3bw file. So don't check if they are online
when you create that file.

Using the median of the past few measurements is a good idea anyway:
tor has a daily user bandwidth cycle. And it helps deal with missing
measurements.

>>> 3.2.1.2.1 Example
>>> 
>>> ...
>>> 
>>> 3.2.1.3 Generating V3BW files
>>> 
>>> Every hour the FF coordinator produces a v3bw file in which it stores
>>> the latest capacity estimate for every relay it has measured in the last
>>> week. The coordinator will create this file on the host's local file
>>> system. Previously-generated v3bw files will not be deleted by the
>>> coordinator.
>> 
>> Seems risky, we've seen Torflow fail in the past, because it filled up
>> the disk with bandwidth files.
>> 
>> What's the required disk capacity for a few years of bandwidth files?
>> 
> 
> We can ship a script or provide a parameter to keep the last X v3bw
> files if that would be preferable to relying on bwauths using logrotate
> themselves or otherwise finding an archival/deletion strategy that fits
> their needs.

If you provide a default maximum age, then you can document the disk
capacity that's required to keep that many files.

If operators have more or less disk, they can change the defaults.

>>> 3.2.2 FlashFlow Measurer
>>> 
>>> The measurers take commands from the coordinator
>> 
>> The command protocol is not specified in this proposal.
>> 
>> For example, does the coordinator send the IPv4 and IPv6 addresses of
>> the relay to the measurers?
>> 
>> Which deployment parameters are sent via the protocol, and which are
>> hard-coded in configurations?
>> 
> 
> A Tor proposal did not seem the place for some of these protocols,
> options, etc. existing entirely outside little-t tor. We can certainly
> elaborate better if that's wrong.

I'm not sure. It's helpful to have a design overview somewhere, and to
reference it from the proposal.

I don't have strong opinions about exactly where it is located. Most
similar documentation is external, but the BridgeDB spec is part of the
Tor specifications:
https://github.com/torproject/torspec/blob/master/bridgedb-spec.txt

One helpful question is:

Who will maintain this software over the long term?

Ask them how they want it to be specified.

T

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20200425/2c478678/attachment.sig>