Hi! We had a meeting about FlashFlow today, including several of the authors. Here are the notes we wound up with for ideas and next straps.
Easy changes: * Just use a PRNG; assume we can make them arbitrarily fast. (example candidates: chacha8, shake128.) * Use relay identities as the identifiers for measurers, so that we won't need a novel authentication scheme. * We can't call the list of measurer IDs a "network parameter", since technically speaking network parameters have to be integers. It will have to be a different part of the consensus header. * Make sure that all of the declared ranges for network parameters are as wide as they could possibly be; making these parameters take a wider range is hard to change later.
Trickier but straightforward: * Describe how to avoid collisions with multiple coordinators - idea: exactly how it's specified in the paper ;) but a simplier idea ... - idea: coord 1 measurers on day 1, c2 on d2, ... etc. for all coords, then repeat * Describe how to aggregate all background measurements over the full 30 seconds, and how to use that data. (This may lower accuracy a little, but makes some kinds of the analysis harder.) Idea: relay reports *once* at end of measurement the total amount of bg traffic and the coord simply divides that by the length of the measurement to have a per-second average. * Mention whether relays should reserve sockets in case they get measured
More thinking may be needed: * Summarize ideas for how multiple coordinators don't have to share full schedules with one another. Possibly divide up the network by days? [e.g., Coordinator 1 measures nodes in set X on Monday] * Would it work if we declare a maximum measurement fraction (eg 75% of bandwidth) but measurers only use that fraction in a few measurements once in a while, and mostly they do less (eg 10% of bandwidth). * Discuss migration: how do we use this data when not all relays support being measured in this way?
In terms of implementation: - identify the python parts that are different to sbws, create sbws subpackages "ff measurer" and "ff coordinator" and add a config option to run in 1 mode or other, to do not have yet another code base to maintain
In terms of deployment: - we currently don't have any automatic way to ensure net is still "working properly", only some mostly-manual ways and some one-time experiments. This has caused some relay operators to do not be happy and some quite time to figure out the problem and solve it
In terms of coordination: We're deploying sbws only 1 dirauth at a time and trying to ensure net is still "working properly". If we deploy ff, before we have deployed sbws in all bwauths and ensure net is still "working properly", will be hard to see what is an sbws bug or ff one or both
FlashFlow, the python code for coordinator, measurer, etc. https://gitlab.torproject.org/pastly/flashflow
The rendered documentation for/from the above https://flashflow.pastly.xyz/
Tor repo with branch https://gitlab.torproject.org/pastly/tor/-/tree/ff-v2
The ticket with the concerning graphs attributable to "Rob's speedtest thing" https://trac.torproject.org/projects/tor/ticket/33076