Attached is a draft document describing proposed changes to BridgeDB to accommodate the new or-address spec (186-multiple-orports.txt) and IPv6 bridges.
I am especially interested in comments on sections tagged #XXX.
Thanks!
--Aaron
Aaron aagbsn@extc.org wrote Mon, 5 Dec 2011 16:38:49 -0800:
| IPv6 Addresses are stored as strings, the same way as IPv4 addresses. | #XXX: is this better than using the ipaddr.IPAddress class?
What kind of database is this? If it is possible to use the rest of the database for a program written in a language without (the exact same implementation of) this particular version of Python ipaddr a string representation might have a value. If not, for easier debugging? Unclear to me.
| Parameters may be repeated to select multiple classes, e.g. | | q=ipv4&q=ipv6 - Request both IPv4 and IPv6 bridges. | | When no parameters are set, by default BridgeDB must return addresses | of the same class as the client. This default may promote IPv6 use | where possible.
This might cause confusion in cases where the equipment used for getting a bridge address is not the same as the equipment which is going to use it. Very few computer users know if they're using IPv4 or IPv6.
Is it worth it?
| | How does someone end up at bridgesv6.torproject.org? | | BridgeDB should include a message at the end of its' response. | e.g. | | "Get IPv4 bridges https://bridges.torproject.org" | "Get IPv6 bridges from https://bridgesv6.torproject.org" | "You must have IPv6 for these bridges to work." | #XXX: will users understand what this means?
I'd like to stress the case where a user fetches a bridge address on a computer which is not the consumer(s) of the address and suggest "These bridges will work only on computers with functional IPv6." or similar.
Thanks for your feedback!
On Tue, Dec 6, 2011 at 1:45 AM, Linus Nordberg linus@nordberg.se wrote:
Aaron aagbsn@extc.org wrote Mon, 5 Dec 2011 16:38:49 -0800:
| IPv6 Addresses are stored as strings, the same way as IPv4 addresses. | #XXX: is this better than using the ipaddr.IPAddress class?
What kind of database is this? If it is possible to use the rest of the database for a program written in a language without (the exact same implementation of) this particular version of Python ipaddr a string representation might have a value. If not, for easier debugging? Unclear to me.
The backend database (sqlite3) will still store the addresses as strings. Once a Bridge object is constructed (loaded from the database or from a bridge descriptor) - should the address representation (e.g. Bridge.ip or Bridge.or-addresses[n]) continue to be stored as a string or the more convenient ipaddr.IPAddress class? ipaddr.IPAddress.__str__() will return the string representation (for example, when storing Bridges to the database).
The advantage of using ipaddr is that class(Bridge.ip) will return either ipaddr.IPv4Address or ipaddr.IPv6Address. It's convenient, works, and already written.
The disadvantage is that for every Bridge object, at least one ipaddr.IPAddress object will be created. The overhead (if any, depending on how compact the packed addresses are and python object size) may not significant (depending on your perspective of significant) but it is worth thinking about as we want BridgeDB to scale to 10k bridges or more.
The alternative approach is to always store addresses as strings and detect the address class when necessary (e.g. sorting Bridges). If BridgeDB needs to sort or filter Bridges frequently then this approach could present performance issues.
I don't want to optimize prematurely either. Either approach should be replaceable without too much effort.
| Parameters may be repeated to select multiple classes, e.g. | | q=ipv4&q=ipv6 - Request both IPv4 and IPv6 bridges. | | When no parameters are set, by default BridgeDB must return addresses | of the same class as the client. This default may promote IPv6 use | where possible.
This might cause confusion in cases where the equipment used for getting a bridge address is not the same as the equipment which is going to use it. Very few computer users know if they're using IPv4 or IPv6.
Is it worth it?
The scenario where this hurts us is when a user with IPv6 who needs IPv4 bridges lands on the https://bridgesv6.tpo page, and doesn't click the link for IPv4 bridges. That scenario likely leads to a support request, but I suspect it will not be a very common one -- or at least not until IPv6 is adopted more widely. Our strategy in the future should change as we get feedback.
For example, in a 'mostly-ipv6' world, perhaps https://bridges.tpo will be dual-stack, and the https://bridgesv4.tpo page will give out ipv4 bridges. As long as we have the flexibility to change our approach, we can make the best decision when we have better data about the types of requests https://bridges.tpo sees
The selectors described above are proposed because some clients may not set a 'Host' header, and adding support for parameters in the request string is easy. If a client does not specify anything (Host header, or query string) then BridgeDB could instead default to giving out IPv4 addresses. We want people to use IPv6 (right?) so promoting it to users who can take advantage of it makes sense to me.
| | How does someone end up at bridgesv6.torproject.org? | | BridgeDB should include a message at the end of its' response. | e.g. | | "Get IPv4 bridges https://bridges.torproject.org" | "Get IPv6 bridges from https://bridgesv6.torproject.org" | "You must have IPv6 for these bridges to work." | #XXX: will users understand what this means?
I'd like to stress the case where a user fetches a bridge address on a computer which is not the consumer(s) of the address and suggest "These bridges will work only on computers with functional IPv6." or similar.
I agree. I do think that the warning string is a good first approach to reducing the confusion.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 2011-12-06, Aaron aagbsn@extc.org wrote:
How does IPv6 affect address datamining of https distribution? A user may be allocated a /128, or a /64. An adversary may control a /32 or perhaps larger Proposal: Enable reCAPTCHA support by default.
How much would it cost China to have 1000 (or even 10000) CAPTCHAs solved? How much of our bridge pool would such an attack obtain?
How do IPv6 addresses work with the IPBasedDistributor? #XXX: I need feedback on this # do we use all 128 bits here? # upper N bits? lower N bits? random or specific N bits?
I doubt that a single prefix length would be appropriate for all networks. There is no point in using a fixed bitmask other than a prefix; even if we do not publish the mask, an attacker can easily determine which bits within the suffix that it controls are used to select a portion of the bridge pool. A more complex mapping of IP addresses to bridge pool locations might work.
Robert Ransom
On Dec 10, 2011, at 4:07 PM, Robert Ransom wrote:
On 2011-12-06, Aaron aagbsn@extc.org wrote:
How does IPv6 affect address datamining of https distribution? A user may be allocated a /128, or a /64. An adversary may control a /32 or perhaps larger Proposal: Enable reCAPTCHA support by default.
How much would it cost China to have 1000 (or even 10000) CAPTCHAs solved? How much of our bridge pool would such an attack obtain?
Apparently prices are as low as USD 2.00 for 1000 CAPTCHAs (solved by humans):
Assuming those prices, it's cheaper to deplete Tor's bridge pool than going out on a night in the town…
Cheers, Ralf
On Sat, Dec 10, 2011 at 12:19 PM, Ralf-Philipp Weinmann ralf@coderpunks.org wrote:
On Dec 10, 2011, at 4:07 PM, Robert Ransom wrote:
On 2011-12-06, Aaron aagbsn@extc.org wrote:
How does IPv6 affect address datamining of https distribution? A user may be allocated a /128, or a /64. An adversary may control a /32 or perhaps larger Proposal: Enable reCAPTCHA support by default.
How much would it cost China to have 1000 (or even 10000) CAPTCHAs solved? How much of our bridge pool would such an attack obtain?
If China controls enough geographically diverse addresses, presumably most or all of the bridges assigned to the https distributor. CAPTCHA is not the limiting factor, it seems.
Apparently prices are as low as USD 2.00 for 1000 CAPTCHAs (solved by humans):
Assuming those prices, it's cheaper to deplete Tor's bridge pool than going out on a night in the town…
Cheers, Ralf
Unfortunately that is the reality given any adversary with a large budget. I don't know if that means we should give up on CAPTCHA; it is still an incremental improvement that forces attackers to adapt and spend resources with a low cost to us and our users. CAPTCHA is widely deployed and understood, and we stand to benefit from any future improvements made in the anti-spam arms race. And it's worth pointing out that CAPTCHA does rate-limit the requests to some degree.
That said, perhaps we should save CAPTCHA for a rainy day; it might buy a week or two window when we most need it. If we enable CAPTCHA by default and it is quickly broken we end up inconveniencing our users and add another point of failure.
--Aaron