<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Re: [tor-dev] Proposal Waterfilling</title>
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hello,</p>
<p>Thank you for your helpful review, teor.</p>
<p> I updated the proposal from most of your comments (see attached
.txt) and I respond inline to add some precisions relative to a
few of your questions.</p>
<p>Btw, I've mirrored my private repo to github
<a class="moz-txt-link-freetext" href="https://github.com/frochet/Waterfilling">https://github.com/frochet/Waterfilling</a>, such that you have the
proper commit history.<br>
</p>
<div class="moz-cite-prefix">On 2018-01-11 14:47, teor wrote:<br>
</div>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div>Hi Florentin,<br>
<div><br>
</div>
<div>I have copied your proposal below, so I can respond to it
inline.</div>
<div><br>
</div>
<div>On 11 Jan 2018, at 21:00, Florentin Rochet <<a
href="mailto:florentin.rochet@uclouvain.be"
moz-do-not-send="true">florentin.rochet@uclouvain.be</a>>
wrote:<br>
<br>
<skip></div>
</div>
</blockquote>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div>1 Overview</div>
<div><br>
</div>
<div> The current Tor path selection algorithm is designed to
satisfy two</div>
<div> important properties:</div>
<div><br>
</div>
<div> 1) Each relay should handle a fraction of connections
that is</div>
<div> proportional to its bandwidth.</div>
<div><br>
</div>
<div> 2) The nodes in each circuit position (entry-middle-exit)
should</div>
<div> be able to handle the same volume of connections.</div>
</div>
<div><br>
</div>
<div>What about onion service circuits?</div>
<div><br>
</div>
<div>They consist of entry - middle - middle, and for the purposes
of this</div>
<div>analysis, make up about 4% of the network.</div>
<div>(2% of traffic at rend points, going through 2 x 3-hop
circuits.)</div>
<div><br>
</div>
<div><a
href="https://metrics.torproject.org/hidserv-rend-relayed-cells.html"
moz-do-not-send="true">https://metrics.torproject.org/hidserv-rend-relayed-cells.html</a></div>
<br>
</blockquote>
<br>
That's a good point. Waterfilling uses the current bandwidth-weights
logic as a basis and they doesn't account for onion service circuit,
hence it also ignore this sort of traffic. Prop 265 tries to address
that problem when producing the bandwidth-weights; Since our method
achieves the same total consensus weight balance between position as
the one produced by bandwidth-weights, Waterfilling would directly
inherit Prop 265's properties if this proposal is merged. <br>
<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<blockquote type="cite">
<div>
<div> Hence, in addition to select paths in a probabilistic
manner, the</div>
<div> path selection algorithm is responsible for balancing
the network,</div>
<div> that is, making sure that enough bandwidth is made
available in all</div>
<div> the positions. The current balance of the network is
decided by the</div>
<div> bandwidth-weights (see dir-spec.txt section 3.8.3.
or/and the</div>
</div>
</blockquote>
<blockquote type="cite">
<div> Waterfilling PETS2017 paper</div>
<div> <a
href="https://petsymposium.org/2017/papers/issue2/paper05-2017-2-source.pdf"
moz-do-not-send="true">https://petsymposium.org/2017/papers/issue2/paper05-2017-2-source.pdf</a>).</div>
<div> This balance does not achieve maximum diversity in
end-positions of</div>
<div> user paths: the same network throughput could be achieved
by</div>
<div> decreasing the use of high-bandwidth relays and
increasing the use</div>
<div> of lower-bandwidth relays in the guard position, instead
of using</div>
<div> these relays in a way that is just proportional to their
bandwidth.</div>
</blockquote>
<div><br>
</div>
<div>When you say "bandwidth", it's not clear whether you mean
consensus</div>
<div>weight (<span style="background-color: rgba(255, 255, 255,
0);">measured bandwidth) </span>or advertised bandwidth
(bandwidth</div>
<div>capacity). They're not the same.</div>
<div><br>
</div>
<div>I'm going to assume consensus weight from now on.</div>
<div>Please fix all uses of bandwidth in the rest of the proposal.</div>
</blockquote>
<br>
Yes, we mean consensus weight :) I did s/bandwidth/consensus weight
within the proposal<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div><br>
</div>
<skip>
<blockquote type="cite">
<div>
<div>2 Design</div>
<div><br>
</div>
<div> Correlation attacks require to control guard and exit
nodes, but the</div>
<div> scarcity of exit bandwidth is such that there is no
real freedom in</div>
<div> the way to use it.</div>
</div>
</blockquote>
<div>
<blockquote type="cite"><br>
</blockquote>
</div>
<blockquote type="cite">
<div>As a result, the Waterfilling proposal focuses</div>
<div> on the guard position. However, it could be easily
extended to the</div>
<div> exit position if, someday, nodes in that position happen
not to be</div>
<div> exploited to their full capacity.</div>
</blockquote>
<div><br>
</div>
<div>
<div><span style="background-color: rgba(255, 255, 255, 0);">No,
exit bandwidth is only exploited to its full capacity on</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">high-bandwidth
exits in the </span><span style="background-color: rgba(255,
255, 255, 0);">northern EU and North America.</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
</div>
</blockquote>
<br>
Well, I changed my wording here to avoid ambiguity. I was talking
about relays flagged as Exits being used only in the exit position
(Wee and Wed have the max value), which means that we cannot apply
our method over those relays with the current state of the Tor
network. <br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div><span style="background-color: rgba(255, 255, 255, 0);">Select
"Consensus Weight vs Bandwidth" on this map:</span></div>
<div><a href="https://atlas.torproject.org/#map/flag:Exit"
style="background-color: rgba(255, 255, 255, 0);"
moz-do-not-send="true"><font color="#000000">https://atlas.torproject.org/#map/flag:Exit</font></a></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">All
the exits in all the purple countries are probably
under-loaded.</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">And
some exits elsewhere are under-loaded.</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">(That's
why we let Exits be fallback directories.)</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">So
the network might actually benefit the most from a
reallocation</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">of
Exit bandwidth. But you'd have to use the advertised
bandwidth</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">rather
than Wee and Wed.</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">What
would it look like if we used waterfilling on the advertised</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">bandwidths
of Exits?</span></div>
</div>
</blockquote>
<br>
The problem your mention is more a measurement problem that would
not exist if the bwauths were perfect, within a perfect network. I
believe that research such as Peerflow[0] is the right path to track
down such issue, and this is compatible to our proposal.<br>
<br>
[0] <cite class="_Rm"><a class="moz-txt-link-abbreviated" href="http://www.robgjansen.com/publications/peerflow-popets2017.pdf">www.robgjansen.com/publications/peerflow-popets2017.pdf</a></cite>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div><span style="background-color: rgba(255, 255, 255, 0);">Is
there a way to do this that avoids gaming the system by</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">increasing
advertised bandwidth?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">Does
the feedback loop with bandwidth authority measurements</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">mitigate
this risk?</span></div>
</div>
<br>
<blockquote type="cite">
<div> _Recall_: Tor currently computes bandwidth-weights in
order to balance</div>
<div> the bandwidth made available by nodes between the
different path</div>
<div> positions. In particular the Wgg weight indicates to each
guard which</div>
<div> proportion of its bandwidth should be used for entry
traffic (the rest</div>
<div> being normally devoted to the middle position). This
proportion is</div>
<div> the same for all guards.</div>
</blockquote>
<div><br>
</div>
<div>Wgg indicates to *clients* how often they should select
guards</div>
<div>in the guard position in *circuits*.</div>
<div><br>
</div>
<div>It doesn't influence relays themselves, and it only
indirectly</div>
<div>affects bandwidth.</div>
<div><br>
</div>
<div>Please fix similar wording in the rest of the proposal.</div>
</blockquote>
<br>
Yep, this is basically what we tried to say. In fact, what I wrote
was the description of the *consequence* of bandwidth-weights. I
tried to re-word with your suggestion, thank you.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com"><skip>
<blockquote type="cite">
<div>
<div>Compute a "pivot" N and the weight Wgg_i assigned to</div>
<div> relay_i in such a way that:</div>
<div> </div>
<div> (a) Wgg_i * BW_i == Wgg_i+1 * BW_i+1 forall i in [1,
N]</div>
</div>
</blockquote>
<div><br>
</div>
<div>subscripts in brackets, otherwise it's ambiguous:</div>
<div><br>
</div>
<div>
<blockquote type="cite"><font color="#000000"><span
style="background-color: rgba(255, 255, 255, 0);">Wgg_(i+1)
* BW_(i+1)</span></font></blockquote>
</div>
<br>
<blockquote type="cite">
<div>
<div> (b) Wgg_i == 1 forall i in [N+1, K] </div>
<div> (c) sum_{i=1}^{K} Wgg_i*BW_i = Wgg*G (Wgg is
provided by Tor)</div>
</div>
</blockquote>
<div><br>
</div>
<div>These equations are under-specified, because they allow
solutions with:</div>
<div><span style="background-color: rgba(255, 255, 255, 0);">Wgg*G >
0</span></div>
<div>Wgg_1 == 0</div>
<div>That is, no guard selections for high-bandwidth relays.</div>
<div><br>
</div>
<div>From the descriptions, I think the missing condition is:</div>
<div>
<div><font color="#000000"><span style="background-color:
rgba(255, 255, 255, 0);">Wgg_N * BW_N >= Wgg_(N+1) *
BW_(N+1)</span></font></div>
</div>
</blockquote>
<br>
Yes, this can be added. But I think that this condition is
redundant, since BW_i are sorted in decreasing order.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div><br>
</div>
</div>
<div>Also, Wgg is provided by the Tor directory authorities based
on</div>
<div>consensus weights from the bandwidth authorities.</div>
<div><br>
</div>
<div>And what happens to any remainder in the calculations?</div>
</blockquote>
<br>
This is a very good question. Currently in my implementation, I
ignore the remainder. This is negligible for large network but can
be weird for small one (of a few relays).<br>
<br>
A possible solution would be to use floating precision for consensus
weights. <br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>(This is most important for small, low bandwidth test
networks.)</div>
<div><br>
</div>
<div>For example, if:</div>
<div>G = 10</div>
<div>Wgg = 0.6</div>
<div><span style="background-color: rgba(255, 255, 255, 0);">BW_i
= 6, 2, 2</span></div>
<div>What are the final weighted bandwidths?</div>
<div>2, 2, 2?</div>
</blockquote>
<br>
From my note, my current implementation would crash if the water
level reaches the smallest relay. Since it was prototype code, I
didn't mind to think about it, and I let it that way.<br>
<br>
I think that below a fixed size of the network, it does not make
sense to use this proposal. In this example, the remainder accounts
for a large part of the network capacity and would just be wasted.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com"><skip><br>
<blockquote type="cite">
<div>
<div> As a result of this process, each guard ranked before
the pivot N</div>
<div> dedicates the same bandwidth to its guard role
(equation (a)) -- we</div>
<div> say that these guards achieve the water level, while
each guard</div>
<div> ranked after the pivot N dedicates its full bandwidth
to the guard</div>
<div> role (equation (b)) -- they are below the water level.
Equation (c)</div>
<div> makes sure that the pivot and the water level are
positioned in a</div>
<div> way that guarantees that the total amount of bandwidth
dedicated to</div>
<div> the guard position is the same as before. In practice,
the value of</div>
<div> N can be efficiently computed by dichotomic search on
Equation (c),</div>
</div>
</blockquote>
<div><br>
</div>
<div>Is this the same as a binary search?</div>
<div>Does it require any division?</div>
<div>Because division is slow on some Tor client architectures.</div>
</blockquote>
<br>
Yes, binary search. It does require division. However, waterfilling
is designed to be executed in the authority side and called only
when the consensus document is produced. Moreover, my tests
indicates that the computation consumes a few ms.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com"><skip>
<blockquote type="cite">
<div>
<div>2.1 Going further by tweaking original bandwidth-weights
computation</div>
<div><br>
</div>
<div> As explained above, our Waterfilling equations are
based on: 1) the</div>
<div> Wgg weight computed by Tor 2) the assumption that the
bandwidth</div>
<div> available in exit is scarce, i.e., it is lower than the
one</div>
<div> available for guard (and middle) positions.</div>
<div> </div>
<div> The second point is satisfied most of the time in Tor,
and we do take</div>
<div> it for granted here.</div>
<div> </div>
<div> We, however, observe that Waterfilling could be made
even more</div>
<div> effective by applying a minor change in the way Tor
computes the</div>
<div> Wgg. For the moment, Tor computes Wgg in such a way
that the same</div>
<div> bandwidth is allocated to the guard and middle
positions. As a</div>
<div> result, both positions are in excess compared to the
exit position.</div>
<div><br>
</div>
<div> The water level could be decreased and, as a result,
the uniformity</div>
<div> of the guard selection process could be improved, by
computing Wgg</div>
<div> in a way that allocates the same bandwidth to the guard
and exit</div>
<div> positions, putting the middle position as the only
position in</div>
<div> excess.</div>
</div>
</blockquote>
<div><br>
</div>
<div>No, this would slow down Guards, and everything that isn't an
exit circuit:</div>
<div>* directory fetches (3% of total bandwidth to guard position)</div>
<div>* onion services (rend is 4% of total bandwidth to guard and
middle)</div>
<div>* HSDir is unweighted, and we don't know how much bandwidth
it uses</div>
<div>* FallbackDir is unweighted, but mostly Guards, and small</div>
<div><br>
</div>
</blockquote>
<br>
That's difficult to predict, I cannot be sure if it is better or
worse for that type of traffic since internal circuits use at least
2 middle relays + the guard and sometimes, even not the guard. Hence
we might also think that pushing a bit more to the middle position
could be a good thing to do. Moreover, middle relays are unstable
and often at the edge of the internet, while guard are stable and
most of them within the core of the internet. Hence, a little more
of them within the middle position *could* be a good thing,
especially if it makes entry's selection probability more uniform.
Anyway, I don't have any proof to assert this, as well that I don't
have any proof to assert that this optimization could be bad. What I
got, is that, for exit circuits, it does not slow down anything.<br>
<br>
This optimization is not mandatory, and could also be
enabled/disabled at will by the directory auths.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<blockquote type="cite">
<div>
<div> We show in the performance section of the Waterfilling
paper that</div>
<div> scarcity on two positions does not reduce performance
compared to</div>
<div> vanilla bandwidth-weights.</div>
</div>
</blockquote>
<div><br>
</div>
<div>
<div><span style="background-color: rgba(255, 255, 255, 0);">What
about guards that have low consensus weight due to latency,</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">rather
than available bandwidth?</span></div>
</div>
<div><br>
</div>
<div>I think this could also cause you huge latency issues as you
push more</div>
<div>bandwidth away from fast relays. I'm not sure if shadow
captures this</div>
<div>accurately.</div>
<br>
</blockquote>
<br>
If it happens that any bandwidth is pushed away from fast relays
within the entry position and make the entry position slower, at
average, then it will make the middle position faster (because it
got that bandwidth pushed away). Since the latency of your traffic
flow just care about the global latency of the circuit, this will
not appear to be slower or faster, on average. This is exactly what
we observe in Shadow, and yes, it captures latency accurately. At
least, better than any other simulator.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<blockquote type="cite">
<div>
<div>3 Security implications</div>
<div><br>
</div>
<div> An analysis of the security impact of the Waterfilling
proposal is</div>
<div> made in Section 6 of the Waterfilling paper. It studies
the</div>
<div> expectation of the number of relays that an adversary
needs to</div>
<div> control in order to mount a successful correlation
attack at a given</div>
<div> time, as well as an analysis of the evolution of the
time until</div>
<div> first path compromise, based on TorPS.</div>
<div><br>
</div>
<div> Given that the distribution produced by Waterfilling is
closer to</div>
<div> the uniform distribution, the use of Waterfilling
increases the</div>
<div> expectation of the number of relays that an adversary
needs to</div>
<div> compromise in order mount a successful correlation</div>
<div> attack. Concretely, and based on real data from 2015,
this</div>
<div> expectation increases by approximately 150 relays.</div>
</div>
</blockquote>
<div><br>
</div>
<div>What percentage is this?</div>
</blockquote>
<br>
~ 25 %<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com"><br>
<skip>
<div><br>
</div>
<div>I'm sorry, I can't read this, there are build files
everywhere.</div>
<div>I can't even find which files were changed.</div>
<div>Can you provide clean commits in a fork, based on the Tor
master branch?</div>
<div><br>
</div>
<div>Also, I think you will need a new consensus method for this
change.</div>
<div>We don't use consensus parameters or torrc options to modify
the</div>
<div>content of the consensus, we use consensus methods.</div>
<br>
</blockquote>
<br>
Sorry about this. I've updated the repo with a proper commit history
based on my fork. The code gives an idea about the easiness to
plug-in this method to the current path selection.<br>
<br>
<skip><br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">These
are redundant, and could significantly expand the size of the
<div>consensus, and consensus diffs. (Consensus weights are one of
the</div>
<div>largest contributors to consensus diff size.)</div>
</blockquote>
<br>
True, each time the consensus weights are modified, those
waterfilling weights need to be recomputed. It adds one line per
guard, which is about 2~3% of the size of the full consensus. Is
such a thing a problem?<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div><br>
</div>
<div>Why not calculate wmg on clients?</div>
<div>Why not calculate both wgg and wmg on clients?</div>
<div>
<div><br>
</div>
</div>
</blockquote>
<br>
This is again a very good question: for such a critical feature
(path selection), it is important that the directory auths have full
power over the weight computation. If it happens that some change
are needed, then the Tor network is updated in a straightforward
way. This is not the case if those weights are computed in client
code. In fact, I believe that one of the strength of this proposal
is the oriented design towards the dirauths.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div><skip></div>
<div>
<div><br>
</div>
<div>Unanswered questions:</div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">What
about the feedback loop between this new allocation system</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">and
the bandwidth authorities?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
</div>
</blockquote>
<br>
I am sorry, I don't really understand why a feedback loop is needed.
Measuring bandwidth and producing bandwidth-weights seems orthogonal
to me.<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div><span style="background-color: rgba(255, 255, 255, 0);">Should
bandwidth authority clients use the new system?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">How
do we report on the new system?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">How
can we determine whether it is better for security in
practice?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">How
can we determine if it is faster or slower in practice?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">How
can we work out if someone is trying to game the system?</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);"><br>
</span></div>
<div><span style="background-color: rgba(255, 255, 255, 0);">T</span></div>
<div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
I've added a section within the proposal for all upcoming questions.<br>
<br>
Best,<br>
Florentin<br>
<br>
<blockquote type="cite"
cite="mid:8D5F3E15-CB26-4DFC-A5C1-456049241C38@gmail.com">
<div>
<div>
<div><br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
tor-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:tor-dev@lists.torproject.org">tor-dev@lists.torproject.org</a>
<a class="moz-txt-link-freetext" href="https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev">https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>