<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><b class=""><br class=""></b></div><div class=""><b class=""><a href="mailto:info@tvdw.eu" class="">info@tvdw.eu</a> wrote:</b></div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">Hi Alec,</div></blockquote><div class=""><br class=""></div><div class="">Hi Tom! I love your proposal, BTW. :-)</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">Most of what you said sounds right, and I agree that caching needs TTLs (not just here, all caches need to have them, always).</div></blockquote><div class=""><br class=""></div><div class="">Thank you!</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic?</div></blockquote><div class=""><br class=""></div><div class="">It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points. </div><div class=""><br class=""></div><div class="">A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment.</div><div class=""><br class=""></div><div class="">Yes, in the meantime the proper solution is to split the service three ways, or even four, but that's administrative burden which less well-resourced organisations might struggle with. </div><div class=""><br class=""></div><div class="">Many (most?) will have a primary site and a single failover site, and it seems perverse that they could bounce just ONE of those sites and automatically lose 50% of their Onion capacity for up to 24 hours UNLESS they also take down the OTHER site for long enough to invalidate the OnionBalance descriptors. </div><div class=""><br class=""></div><div class="">Such is not the description of a high-availability (HA) service, and it might put people off.</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">If that is a problem, I would suggest adding more data centers to the pool. That way if one fails, you don't lose half of the capacity, but a third (if N=3) or even a tenth (if N=10).</div></blockquote><div class=""><br class=""></div><div class="">...but you lose it for 1..24 hours, even if you simply reboot the Tor daemon.</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">Anyway, such a thing is probably off-topic. To get back to the point about TTLs, I just want to note that retrying failed nodes until all fail is scary: </div></blockquote><div class=""><br class=""></div><div class="">I find that worrying, also. I'm not sure what I think about it yet, though.</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">what will happen if all ten nodes get a 'rolling restart' throughout the day? Wouldn't you eventually end up with all the traffic on a single node, as it's the only one that hadn't been restarted yet?</div></blockquote><div class=""><br class=""></div><div class="">Precisely.</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">As far as I can see the only thing that can avoid holes like that is a TTL, either hard coded to something like an hour, or just specified in the descriptor. Then, if you do a rolling restart, make sure you don't do it all within one TTL length, but at least two or three depending on capacity.</div></blockquote><div class=""><br class=""></div><div class="">Concur.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><a href="mailto:desnacked@riseup.net" class="">desnacked@riseup.net</a> wrote:</b></div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">Please see rend_client_get_random_intro_impl(). Clients will pick a random intro point from the descriptor which seems to be the proper behavior here.</div></blockquote><div class=""><br class=""></div><div class="">That looks great!</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">I can see how a TTL might be useful in high availability scenarios like the one you described. However, it does seem like something with potential security implications (like, set TTL to 1 second for all your descriptors, and now you have your clients keep on making directory circuits to fetch your descs).</div></blockquote><div class=""><br class=""></div><div class="">Okay, so, how about:</div><div class=""><br class=""></div><div class=""><b class="">IDEA: if ANY descriptor introduction point connection fails AND the descriptor's ttl has been exceeded THEN refetch the descriptor before trying again?</b></div><div class=""><br class=""></div><div class="">It strikes me (though I may be wrong?) that the degenerate case for this would be someone with an onion killing their IP in order to force the user to refetch a descriptor - which is what I think would happen anyway? </div><div class=""><br class=""></div><div class="">At very least this proposal would add a work factor. </div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">For this reason I'd be interested to see this specified in a formal Tor proposal (or even as a patch to prop224). It shouldn't be too big! :)</div></blockquote><div class=""><br class=""></div><div class="">I would hesitate to add it to Prop 224 which strikes me as rather large and distant.  I'd love to see this by Christmas :-P</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><a href="mailto:teor2345@gmail.com" class="">teor2345@gmail.com</a> wrote:</b></div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">Do we connect to introduction points in the order they are listed in the descriptor? If so, that's not ideal, there are surely benefits to a random choice (such as load balancing).</div></blockquote><div class=""><br class=""></div><div class="">Apparently not (re: George) :-)</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">That said, we believe that rendezvous points are the bottleneck in the rendezvous protocol, not introduction points.</div></blockquote><div class=""><br class=""></div><div class="">Currently, and in most current deployments, yes.</div><div class=""><br class=""></div><div class=""></div><blockquote type="cite" class=""><div class="">However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to:</div></blockquote><div class=""></div><blockquote type="cite" class=""><div class="">- 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.)</div></blockquote><div class=""></div><blockquote type="cite" class=""><div class="">- a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time).</div></blockquote><div class=""></div><blockquote type="cite" class=""><div class="">In this scenario, you could potentially overload the introduction points.</div></blockquote><div class=""><br class=""></div><div class="">Exactly my concern, especially when combined with overlong lifetimes of mostly-zombie descriptors.</div><div class=""><br class=""></div><div class="">- alec</div><div class=""><br class=""></div></body></html>