Hello,
Thank you for your helpful review, teor.
I updated the proposal from most of your comments (see attached .txt) and I respond inline to add some precisions relative to a few of your questions.
Btw, I've mirrored my private repo to github
https://github.com/frochet/Waterfilling, such that you have the
proper commit history.
Hi Florentin,
I have copied your proposal below, so I can respond to it inline.
1 Overview
The current Tor path selection algorithm is designed to satisfy twoimportant properties:
1) Each relay should handle a fraction of connections that isproportional to its bandwidth.
2) The nodes in each circuit position (entry-middle-exit) shouldbe able to handle the same volume of connections.
What about onion service circuits?
They consist of entry - middle - middle, and for the purposes of thisanalysis, make up about 4% of the network.(2% of traffic at rend points, going through 2 x 3-hop circuits.)
Hence, in addition to select paths in a probabilistic manner, thepath selection algorithm is responsible for balancing the network,that is, making sure that enough bandwidth is made available in allthe positions. The current balance of the network is decided by thebandwidth-weights (see dir-spec.txt section 3.8.3. or/and theWaterfilling PETS2017 paperThis balance does not achieve maximum diversity in end-positions ofuser paths: the same network throughput could be achieved bydecreasing the use of high-bandwidth relays and increasing the useof lower-bandwidth relays in the guard position, instead of usingthese relays in a way that is just proportional to their bandwidth.
When you say "bandwidth", it's not clear whether you mean consensusweight (measured bandwidth) or advertised bandwidth (bandwidthcapacity). They're not the same.
I'm going to assume consensus weight from now on.Please fix all uses of bandwidth in the rest of the proposal.
<skip>
2 Design
Correlation attacks require to control guard and exit nodes, but thescarcity of exit bandwidth is such that there is no real freedom inthe way to use it.
As a result, the Waterfilling proposal focuseson the guard position. However, it could be easily extended to theexit position if, someday, nodes in that position happen not to beexploited to their full capacity.
No, exit bandwidth is only exploited to its full capacity onhigh-bandwidth exits in the northern EU and North America.
Select "Consensus Weight vs Bandwidth" on this map:All the exits in all the purple countries are probably under-loaded.And some exits elsewhere are under-loaded.(That's why we let Exits be fallback directories.)
So the network might actually benefit the most from a reallocationof Exit bandwidth. But you'd have to use the advertised bandwidthrather than Wee and Wed.
What would it look like if we used waterfilling on the advertisedbandwidths of Exits?
Is there a way to do this that avoids gaming the system byincreasing advertised bandwidth?Does the feedback loop with bandwidth authority measurementsmitigate this risk?
_Recall_: Tor currently computes bandwidth-weights in order to balancethe bandwidth made available by nodes between the different pathpositions. In particular the Wgg weight indicates to each guard whichproportion of its bandwidth should be used for entry traffic (the restbeing normally devoted to the middle position). This proportion isthe same for all guards.
Wgg indicates to *clients* how often they should select guardsin the guard position in *circuits*.
It doesn't influence relays themselves, and it only indirectlyaffects bandwidth.
Please fix similar wording in the rest of the proposal.
<skip>Compute a "pivot" N and the weight Wgg_i assigned torelay_i in such a way that:(a) Wgg_i * BW_i == Wgg_i+1 * BW_i+1 forall i in [1, N]
subscripts in brackets, otherwise it's ambiguous:
Wgg_(i+1) * BW_(i+1)
(b) Wgg_i == 1 forall i in [N+1, K](c) sum_{i=1}^{K} Wgg_i*BW_i = Wgg*G (Wgg is provided by Tor)
These equations are under-specified, because they allow solutions with:Wgg*G > 0Wgg_1 == 0That is, no guard selections for high-bandwidth relays.
From the descriptions, I think the missing condition is:Wgg_N * BW_N >= Wgg_(N+1) * BW_(N+1)
Also, Wgg is provided by the Tor directory authorities based onconsensus weights from the bandwidth authorities.
And what happens to any remainder in the calculations?
(This is most important for small, low bandwidth test networks.)
For example, if:G = 10Wgg = 0.6BW_i = 6, 2, 2What are the final weighted bandwidths?2, 2, 2?
<skip>
As a result of this process, each guard ranked before the pivot Ndedicates the same bandwidth to its guard role (equation (a)) -- wesay that these guards achieve the water level, while each guardranked after the pivot N dedicates its full bandwidth to the guardrole (equation (b)) -- they are below the water level. Equation (c)makes sure that the pivot and the water level are positioned in away that guarantees that the total amount of bandwidth dedicated tothe guard position is the same as before. In practice, the value ofN can be efficiently computed by dichotomic search on Equation (c),
Is this the same as a binary search?Does it require any division?Because division is slow on some Tor client architectures.
<skip>2.1 Going further by tweaking original bandwidth-weights computation
As explained above, our Waterfilling equations are based on: 1) theWgg weight computed by Tor 2) the assumption that the bandwidthavailable in exit is scarce, i.e., it is lower than the oneavailable for guard (and middle) positions.The second point is satisfied most of the time in Tor, and we do takeit for granted here.We, however, observe that Waterfilling could be made even moreeffective by applying a minor change in the way Tor computes theWgg. For the moment, Tor computes Wgg in such a way that the samebandwidth is allocated to the guard and middle positions. As aresult, both positions are in excess compared to the exit position.
The water level could be decreased and, as a result, the uniformityof the guard selection process could be improved, by computing Wggin a way that allocates the same bandwidth to the guard and exitpositions, putting the middle position as the only position inexcess.
No, this would slow down Guards, and everything that isn't an exit circuit:* directory fetches (3% of total bandwidth to guard position)* onion services (rend is 4% of total bandwidth to guard and middle)* HSDir is unweighted, and we don't know how much bandwidth it uses* FallbackDir is unweighted, but mostly Guards, and small
We show in the performance section of the Waterfilling paper thatscarcity on two positions does not reduce performance compared tovanilla bandwidth-weights.
What about guards that have low consensus weight due to latency,rather than available bandwidth?
I think this could also cause you huge latency issues as you push morebandwidth away from fast relays. I'm not sure if shadow captures thisaccurately.
3 Security implications
An analysis of the security impact of the Waterfilling proposal ismade in Section 6 of the Waterfilling paper. It studies theexpectation of the number of relays that an adversary needs tocontrol in order to mount a successful correlation attack at a giventime, as well as an analysis of the evolution of the time untilfirst path compromise, based on TorPS.
Given that the distribution produced by Waterfilling is closer tothe uniform distribution, the use of Waterfilling increases theexpectation of the number of relays that an adversary needs tocompromise in order mount a successful correlationattack. Concretely, and based on real data from 2015, thisexpectation increases by approximately 150 relays.
What percentage is this?
<skip>
I'm sorry, I can't read this, there are build files everywhere.I can't even find which files were changed.Can you provide clean commits in a fork, based on the Tor master branch?
Also, I think you will need a new consensus method for this change.We don't use consensus parameters or torrc options to modify thecontent of the consensus, we use consensus methods.
These are redundant, and could significantly expand the size of theconsensus, and consensus diffs. (Consensus weights are one of thelargest contributors to consensus diff size.)
Why not calculate wmg on clients?Why not calculate both wgg and wmg on clients?
<skip>
Unanswered questions:
What about the feedback loop between this new allocation systemand the bandwidth authorities?
Should bandwidth authority clients use the new system?
How do we report on the new system?How can we determine whether it is better for security in practice?How can we determine if it is faster or slower in practice?How can we work out if someone is trying to game the system?
T
_______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev