commit 414f29fd2f2083e44908ceebda44955cb22f0ddb Author: Nick Mathewson nickm@torproject.org Date: Wed Mar 2 01:02:36 2011 -0500
Move xxx-bridge-disbursement to ideas/old: bridgedb spec supersedes it. --- proposals/ideas/old/xxx-bridge-disbursement.txt | 174 +++++++++++++++++++++++ proposals/ideas/xxx-bridge-disbursement.txt | 174 ----------------------- 2 files changed, 174 insertions(+), 174 deletions(-)
diff --git a/proposals/ideas/old/xxx-bridge-disbursement.txt b/proposals/ideas/old/xxx-bridge-disbursement.txt new file mode 100644 index 0000000..6c9a3c7 --- /dev/null +++ b/proposals/ideas/old/xxx-bridge-disbursement.txt @@ -0,0 +1,174 @@ + +How to hand out bridges. + +Divide bridges into 'strategies' as they come in. Do this uniformly +at random for now. + +For each strategy, we'll hand out bridges in a different way to +clients. This document describes two strategies: email-based and +IP-based. + +0. Notation: + + HMAC(k,v) : an HMAC of v using the key k. + + A|B: The string A concatenated with the string B. + + +1. Email-based. + + Goal: bootstrap based on one or more popular email service's sybil + prevention algorithms. + + + Parameters: + HMAC -- an HMAC function + P -- a time period + K -- the number of bridges to send in a period. + + Setup: Generate two nonces, N and M. + + As bridges arrive, put them into a ring according to HMAC(N,ID) + where ID is the bridges's identity digest. + + Divide time into divisions of length P. + + When we get an email: + + If it's not from a supported email service, reject it. + + If we already sent a response to that email address (normalized) + in this period, send _exactly_ the same response. + + If it is from a supported service, generate X = HMAC(M,PS|E) where E + is the lowercased normalized email address for the user, and + where PS is the start of the currrent period. Send + the first K bridges in the ring after point X. + + [If we want to make sure that repeat queries are given exactly the + same results, then we can't let the ring change during the + time period. For a long time period like a month, that's quite a + hassle. How about instead just keeping a replay cache of addresses + that have been answered, and sending them a "sorry, you already got + your addresses for the time period; perhaps you should try these + other fine distribution strategies while you wait?" response? This + approach would also resolve the "Make sure you can't construct a + distinct address to match an existing one" note below. -RD] + + [I think, if we get a replay, we need to send back the same + answer as we did the first time, not say "try again." + Otherwise we need to worry that an attacker can keep people + from getting bridges by preemtively asking for them, + or that an attacker may force them to prove they haven't + gotten any bridges by asking. -NM] + + [While we're at it, if we do the replay cache thing and don't need + repeatable answers, we could just pick K random answers from the + pool. Is it beneficial that a bridge user who knows about a clump of + nodes will be sharing them with other users who know about a similar + (overlapping) clump? One good aspect is against an adversary who + learns about a clump this way and watches those bridges to learn + other users and discover *their* bridges: he doesn't learn about + as many new bridges as he might if they were randomly distributed. + A drawback is against an adversary who happens to pick two email + addresses in P that include overlapping answers: he can measure + the difference in clumps and estimate how quickly the bridge pool + is growing. -RD] + + [Random is one more darn thing to implement; rings are already + there. -NM] + + [If we make the period P be mailbox-specific, and make it a random + value around some mean, then we make it harder for an attacker to + know when to try using his small army of gmail addresses to gather + another harvest. But we also make it harder for users to know when + they can try again. -RD] + + [Letting the users know about when they can try again seems + worthwhile. Otherwise users and attackers will all probe and + probe and probe until they get an answer. No additional + security will be achieved, but bandwidth will be lost. -NM] + + To normalize an email address: + Start with the RFC822 address. Consider only the mailbox {???} + portion of the address (username@domain). Put this into lowercase + ascii. + + Questions: + What to do with weird character encodings? Look up the RFC. + + Notes: + Make sure that you can't force a single email address to appear + in lots of different ways. IOW, if nickm@freehaven.net and + NICKM@freehaven.net aren't treated the same, then I can get lots + more bridges than I should. + + Make sure you can't construct a distinct address to match an + existing one. IOW, if we treat nickm@X and nickm@Y as the same + user, then anybody can register nickm@Z and use it to tell which + bridges nickm@X got (or would get). + + Make sure that we actually check headers so we can't be trivially + used to spam people. + + +2. IP-based. + + Goal: avoid handing out all the bridges to users in a similar IP + space and time. + + Parameters: + + T_Flush -- how long it should take a user on a single network to + see a whole cluster of bridges. + + N_C + + K -- the number of bridges we hand out in response to a single + request. + + Setup: using an AS map or a geoip map or some other flawed input + source, divide IP space into "areas" such that surveying a large + collection of "areas" is hard. For v0, use /24 address blocks. + + Group areas into N_C clusters. + + Generate secrets L, M, N. + + Set the period P such that P*(bridges-per-cluster/K) = T_flush. + Don't set P to greater than a week, or less than three hours. + + When we get a bridge: + + Based on HMAC(L,ID), assign the bridge to a cluster. Within each + cluster, keep the bridges in a ring based on HMAC(M,ID). + + [Should we re-sort the rings for each new time period, so the ring + for a given cluster is based on HMAC(M,PS|ID)? -RD] + + When we get a connection: + + If it's http, redirect it to https. + + Let area be the incoming IP network. Let PS be the current + period. Compute X = HMAC(N, PS|area). Return the next K bridges + in the ring after X. + + [Don't we want to compute C = HMAC(key, area) to learn what cluster + to answer from, and then X = HMAC(key, PS|area) to pick a point in + that ring? -RD] + + + Need to clarify that some HMACs are for rings, and some are for + partitions. How rings scale is clear. How do we grow the number of + partitions? Looking at successive bits from the HMAC output is one way. + +3. Open issues + + Denial of service attacks + A good view of network topology + +at some point we should learn some reliability stats on our bridges. when +we say above 'give out k bridges', we might give out 2 reliable ones and +k-2 others. we count around the ring the same way we do now, to find them. + diff --git a/proposals/ideas/xxx-bridge-disbursement.txt b/proposals/ideas/xxx-bridge-disbursement.txt deleted file mode 100644 index 6c9a3c7..0000000 --- a/proposals/ideas/xxx-bridge-disbursement.txt +++ /dev/null @@ -1,174 +0,0 @@ - -How to hand out bridges. - -Divide bridges into 'strategies' as they come in. Do this uniformly -at random for now. - -For each strategy, we'll hand out bridges in a different way to -clients. This document describes two strategies: email-based and -IP-based. - -0. Notation: - - HMAC(k,v) : an HMAC of v using the key k. - - A|B: The string A concatenated with the string B. - - -1. Email-based. - - Goal: bootstrap based on one or more popular email service's sybil - prevention algorithms. - - - Parameters: - HMAC -- an HMAC function - P -- a time period - K -- the number of bridges to send in a period. - - Setup: Generate two nonces, N and M. - - As bridges arrive, put them into a ring according to HMAC(N,ID) - where ID is the bridges's identity digest. - - Divide time into divisions of length P. - - When we get an email: - - If it's not from a supported email service, reject it. - - If we already sent a response to that email address (normalized) - in this period, send _exactly_ the same response. - - If it is from a supported service, generate X = HMAC(M,PS|E) where E - is the lowercased normalized email address for the user, and - where PS is the start of the currrent period. Send - the first K bridges in the ring after point X. - - [If we want to make sure that repeat queries are given exactly the - same results, then we can't let the ring change during the - time period. For a long time period like a month, that's quite a - hassle. How about instead just keeping a replay cache of addresses - that have been answered, and sending them a "sorry, you already got - your addresses for the time period; perhaps you should try these - other fine distribution strategies while you wait?" response? This - approach would also resolve the "Make sure you can't construct a - distinct address to match an existing one" note below. -RD] - - [I think, if we get a replay, we need to send back the same - answer as we did the first time, not say "try again." - Otherwise we need to worry that an attacker can keep people - from getting bridges by preemtively asking for them, - or that an attacker may force them to prove they haven't - gotten any bridges by asking. -NM] - - [While we're at it, if we do the replay cache thing and don't need - repeatable answers, we could just pick K random answers from the - pool. Is it beneficial that a bridge user who knows about a clump of - nodes will be sharing them with other users who know about a similar - (overlapping) clump? One good aspect is against an adversary who - learns about a clump this way and watches those bridges to learn - other users and discover *their* bridges: he doesn't learn about - as many new bridges as he might if they were randomly distributed. - A drawback is against an adversary who happens to pick two email - addresses in P that include overlapping answers: he can measure - the difference in clumps and estimate how quickly the bridge pool - is growing. -RD] - - [Random is one more darn thing to implement; rings are already - there. -NM] - - [If we make the period P be mailbox-specific, and make it a random - value around some mean, then we make it harder for an attacker to - know when to try using his small army of gmail addresses to gather - another harvest. But we also make it harder for users to know when - they can try again. -RD] - - [Letting the users know about when they can try again seems - worthwhile. Otherwise users and attackers will all probe and - probe and probe until they get an answer. No additional - security will be achieved, but bandwidth will be lost. -NM] - - To normalize an email address: - Start with the RFC822 address. Consider only the mailbox {???} - portion of the address (username@domain). Put this into lowercase - ascii. - - Questions: - What to do with weird character encodings? Look up the RFC. - - Notes: - Make sure that you can't force a single email address to appear - in lots of different ways. IOW, if nickm@freehaven.net and - NICKM@freehaven.net aren't treated the same, then I can get lots - more bridges than I should. - - Make sure you can't construct a distinct address to match an - existing one. IOW, if we treat nickm@X and nickm@Y as the same - user, then anybody can register nickm@Z and use it to tell which - bridges nickm@X got (or would get). - - Make sure that we actually check headers so we can't be trivially - used to spam people. - - -2. IP-based. - - Goal: avoid handing out all the bridges to users in a similar IP - space and time. - - Parameters: - - T_Flush -- how long it should take a user on a single network to - see a whole cluster of bridges. - - N_C - - K -- the number of bridges we hand out in response to a single - request. - - Setup: using an AS map or a geoip map or some other flawed input - source, divide IP space into "areas" such that surveying a large - collection of "areas" is hard. For v0, use /24 address blocks. - - Group areas into N_C clusters. - - Generate secrets L, M, N. - - Set the period P such that P*(bridges-per-cluster/K) = T_flush. - Don't set P to greater than a week, or less than three hours. - - When we get a bridge: - - Based on HMAC(L,ID), assign the bridge to a cluster. Within each - cluster, keep the bridges in a ring based on HMAC(M,ID). - - [Should we re-sort the rings for each new time period, so the ring - for a given cluster is based on HMAC(M,PS|ID)? -RD] - - When we get a connection: - - If it's http, redirect it to https. - - Let area be the incoming IP network. Let PS be the current - period. Compute X = HMAC(N, PS|area). Return the next K bridges - in the ring after X. - - [Don't we want to compute C = HMAC(key, area) to learn what cluster - to answer from, and then X = HMAC(key, PS|area) to pick a point in - that ring? -RD] - - - Need to clarify that some HMACs are for rings, and some are for - partitions. How rings scale is clear. How do we grow the number of - partitions? Looking at successive bits from the HMAC output is one way. - -3. Open issues - - Denial of service attacks - A good view of network topology - -at some point we should learn some reliability stats on our bridges. when -we say above 'give out k bridges', we might give out 2 reliable ones and -k-2 others. we count around the ring the same way we do now, to find them. -