On Thu, Apr 23, 2020 at 2:48 PM Matt Traudt pastly@torproject.org wrote:
Hi! I've got some comments on the FlashFlow proposal; I'll start with the ones that I think are most important, so that we can try to get them out of the way.
First off, I'm concerned about the approach where measurers get to consume a certain amount of bandwidth, with only a set fraction left to devote to the background traffic. It seems like a hostile set of measurers could use this authority to introduce traffic patterns on the network to assist in traffic analysis. In general, having regular scheduled and visible changes in relay capacity seem to me like they'd help out traffic analysis a good deal.
Second, the "MSM_BG" information type also seems like a serious traffic analysis risk. It is, literally, telling the measurers a report of how much traffic was sent each second on other connections. Previously we decided that a much coarser summary than this was too much information to publish in bandwidth-history lines, and I'm worried not to see any analysis here.
{In both of the above cases we might say, "well, an attacker could do that anyway!" But to get the traffic information, an attacker would need to compromise the upstream connection, and to introduce traffic spikes the attacker would need to risk detection. This proposal as written would make both of these traffic analysis opportunities an expected part of the infrastructure, which seems not-so-good to me.}
Third, I don't understand why we're using cell crypto here but we aren't using RELAY cells or (apparently?) circuits. Since TLS is already in play, we'll already be measuring the relays' encryption performance. But if we do decide that cell crypto is needed, then it's way easier to get that crypto happening if there are circuits involved. I think there's been some discussion of that on IRC; I'd suggest that we try to make that work if we can.
Fourth, this approach to authenticating echo cell contents seems needlessly complicated. Instead of using random contents and remembering a fraction of cells, it would make more sense for measurers to use a keyed pseudorandom stream function to generate the cells, and to verify the contents of all the cells as they come back in. (AES128-CTR and ChaCha8 and SHAKE128 all have nice properties here.)
Fifth, using IP addresses for identification is NOT something we do on the production network. I think we should authenticate measurers by identity key, not by IPv4 address (as is happening here, unless I misunderstand.)
yrs,