commit 6501e1e80a6eb44aa1ff089ced2870b6728865a8 Author: Nick Mathewson nickm@torproject.org Date: Wed Mar 2 11:20:33 2011 -0500
Close proposal 166 and make xxx-geoip-survey-plan obsolete
Karsten confirms that 166 is implemented, and xxx-geoip-survey-plan is superseded by this tech report:
https://metrics.torproject.org/papers/countingusers-2010-11-30.pdf --- proposals/000-index.txt | 4 +- proposals/166-statistics-extra-info-docs.txt | 2 +- proposals/ideas/old/xxx-geoip-survey-plan.txt | 137 +++++++++++++++++++++++++ proposals/ideas/xxx-geoip-survey-plan.txt | 137 ------------------------- 4 files changed, 140 insertions(+), 140 deletions(-)
diff --git a/proposals/000-index.txt b/proposals/000-index.txt index 48ec6a8..91c2f27 100644 --- a/proposals/000-index.txt +++ b/proposals/000-index.txt @@ -86,7 +86,7 @@ Proposals by number: 163 Detecting whether a connection comes from a client [OPEN] 164 Reporting the status of server votes [OPEN] 165 Easy migration for voting authority sets [OPEN] -166 Including Network Statistics in Extra-Info Documents [ACCEPTED] +166 Including Network Statistics in Extra-Info Documents [CLOSED] 167 Vote on network parameters in consensus [CLOSED] 168 Reduce default circuit window [OPEN] 169 Eliminate TLS renegotiation for the Tor connection handshake [SUPERSEDED] @@ -137,7 +137,6 @@ Proposals by status: 140 Provide diffs between consensuses [for 0.2.2.x] 147 Eliminate the need for v2 directories in generating v3 directories [for 0.2.1.x] 157 Make certificate downloads specific [for 0.2.1.x] - 166 Including Network Statistics in Extra-Info Documents [for 0.2.2] 172 GETINFO controller option for circuit information 173 GETINFO Option Expansion 174 Optimistic Data for Tor: Server Side @@ -179,6 +178,7 @@ Proposals by status: 148 Stream end reasons from the client side should be uniform [in 0.2.1.9-alpha] 150 Exclude Exit Nodes from a circuit [in 0.2.1.3-alpha] 152 Optionally allow exit from single-hop circuits [in 0.2.1.6-alpha] + 166 Including Network Statistics in Extra-Info Documents [for 0.2.2] 167 Vote on network parameters in consensus [in 0.2.2] SUPERSEDED: 112 Bring Back Pathlen Coin Weight diff --git a/proposals/166-statistics-extra-info-docs.txt b/proposals/166-statistics-extra-info-docs.txt index ab2716a..8b0c6a1 100644 --- a/proposals/166-statistics-extra-info-docs.txt +++ b/proposals/166-statistics-extra-info-docs.txt @@ -3,7 +3,7 @@ Title: Including Network Statistics in Extra-Info Documents Author: Karsten Loesing Created: 21-Jul-2009 Target: 0.2.2 -Status: Accepted +Status: Closed
Change history:
diff --git a/proposals/ideas/old/xxx-geoip-survey-plan.txt b/proposals/ideas/old/xxx-geoip-survey-plan.txt new file mode 100644 index 0000000..49c6615 --- /dev/null +++ b/proposals/ideas/old/xxx-geoip-survey-plan.txt @@ -0,0 +1,137 @@ + + +Abstract + + This document explains how to tell about how many Tor users there + are, and how many there are in which country. Statistics are + involved. + +Motivation + + There are a few reasons we need to keep track of which countries + Tor users (in aggregate) are coming from: + + - Resource allocation. Knowing about underserved countries with + lots of users can let us know about where we need to direct + translation and outreach efforts. + + - Anticensorship. Sudden drops in usage on a national basis can + indicate the arrival of a censorious firewall. + + - Sponsor outreach and self-evalutation. Many people and + organizations who are interested in funding The Tor Project's + work want to know that we're successfully serving parts of the + world they're interested in, and that efforts to expand our + userbase are actually succeeding. So do we. + +Goals + + We want to know approximately how many Tor users there are, and which + countries they're in, even in the presence of a hypothetical + "directory guard" feature. Some uncertainty is okay, but we'd like + to be able to put a bound on the uncertainty. + + We need to make sure this information isn't exposed in a way that + helps an adversary. + +Methods for current clients: + + Every client downloads network status documents. There are + currently three methods (one hypothetical) for clients to get them. + - 0.1.2.x clients (and earlier) fetch a v2 networkstatus + document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30 + minutes]. + + - 0.2.0.x clients fetch a v3 networkstatus consensus document + at a random interval between when their current document is no + longer freshest, and when their current document is about to + expire. + + [In both of the above cases, clients choose a running + directory cache at random with odds roughly proportional to + its bandwidth. If they're just starting, they know a XXXX FIXME -NM] + + - In some future version, clients will choose directory caches + to serve as their "directory guards" to avoid profiling + attacks, similarly to how clients currently start all their + circuits at guard nodes. + + We assume that a directory cache can tell which of these three + categories a client is in by the format of its status request. + + A directory cache can be made to count distinct client IP + addresses that make a certain request of it in a given timeframe, + and total requests made to it over that timeframe. For the first + two cases, a cache can get a picture of the overall + number and countries of users in the network by dividing the IP + count by the probability with which they (as a cache) would be + chosen. Assuming that our listed bandwidth is such that we expect + to be chosen with probability P for any given request, and we've + been counting IPs for long enough that we expect the average + client to have made N requests, they will have visited us at least + once with probability P' = 1-(1-P)^N, and so we divide the IP + counts we've seen by P' for our estimate. To estimate total + number of clients of a given type, determine how many requests a + client of that type will make over that time, and assume we'll + have seen P of them. + + Both of these numbers are useful: the IP counts will give the + total number of IPs connecting to the network, and the request + counts will give the total number of users on the network at any + given time. + + Notes: + - [Over H hours, the N for V2 clients is 2*H, and the N for V3 + clients is currently around H/2 or H/3.] + + - (We should only count requests that we actually intend to answer; + 503 requests shouldn't count.) + + - These measurements should also be taken at a directory + authority if possible: their picture of the network is skewed + by clients that fetch from them directly. These clients, + however, are all the clients that are just bootstrapping + (assuming that the fallback-consensus feature isn't yet used + much). + + - These measurements also overestimate the V2 download rate if + some downloads fail and clients retry them later after backing + off. + +Methods for directory guards: + + If directory guards are in use, directory guards get a picture of + all those users who chose them as a guard when they were listed + as a good choice for a guard, and who are also on the network + now. The cleanest data here will come from nodes that were listed + as good new-guards choices for a while, and have not been so for a + while longer (to study decay rates); nodes that have been listed + as good new-guard choices consistently for a long time (to get a + sample of the network); and nodes that have been listed as good + new-guard choices only recently (to get a sample of new users and + users whose guards have died out.) + + Since directory guards are currently unspecified, we'll need to + make some guesses about how they'll turn out to work. Here are + a couple of approaches that could work. + - We could have clients pick completely new directory guards on + a rolling basis every two months or so. This would ensure + that staying as a guard for a while would be sufficient to + see a sample of users. This is potentially advantageous for + load-balancing the network as well, though it might lose some + of the benefits of directory guard. We need to quantify the + impact of this; it might not actually make stuff worse in + practice, if most guards don't stay good guards for a month + or two. + + - We could try to collect statistics at several directory + guards and combine their statisics, but we would need to make + sure that for all time, at least one of the directory guards + had been recommended as a good choice for new guards. By + looking at new-IP rates for guards, we could get an idea of + user uptake; for looking at old-IP decay rates, we could get + an idea of turnover. This approach would entail significant + complexity, and we'd probably need to record more information + than we'd really like to. + + diff --git a/proposals/ideas/xxx-geoip-survey-plan.txt b/proposals/ideas/xxx-geoip-survey-plan.txt deleted file mode 100644 index 49c6615..0000000 --- a/proposals/ideas/xxx-geoip-survey-plan.txt +++ /dev/null @@ -1,137 +0,0 @@ - - -Abstract - - This document explains how to tell about how many Tor users there - are, and how many there are in which country. Statistics are - involved. - -Motivation - - There are a few reasons we need to keep track of which countries - Tor users (in aggregate) are coming from: - - - Resource allocation. Knowing about underserved countries with - lots of users can let us know about where we need to direct - translation and outreach efforts. - - - Anticensorship. Sudden drops in usage on a national basis can - indicate the arrival of a censorious firewall. - - - Sponsor outreach and self-evalutation. Many people and - organizations who are interested in funding The Tor Project's - work want to know that we're successfully serving parts of the - world they're interested in, and that efforts to expand our - userbase are actually succeeding. So do we. - -Goals - - We want to know approximately how many Tor users there are, and which - countries they're in, even in the presence of a hypothetical - "directory guard" feature. Some uncertainty is okay, but we'd like - to be able to put a bound on the uncertainty. - - We need to make sure this information isn't exposed in a way that - helps an adversary. - -Methods for current clients: - - Every client downloads network status documents. There are - currently three methods (one hypothetical) for clients to get them. - - 0.1.2.x clients (and earlier) fetch a v2 networkstatus - document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30 - minutes]. - - - 0.2.0.x clients fetch a v3 networkstatus consensus document - at a random interval between when their current document is no - longer freshest, and when their current document is about to - expire. - - [In both of the above cases, clients choose a running - directory cache at random with odds roughly proportional to - its bandwidth. If they're just starting, they know a XXXX FIXME -NM] - - - In some future version, clients will choose directory caches - to serve as their "directory guards" to avoid profiling - attacks, similarly to how clients currently start all their - circuits at guard nodes. - - We assume that a directory cache can tell which of these three - categories a client is in by the format of its status request. - - A directory cache can be made to count distinct client IP - addresses that make a certain request of it in a given timeframe, - and total requests made to it over that timeframe. For the first - two cases, a cache can get a picture of the overall - number and countries of users in the network by dividing the IP - count by the probability with which they (as a cache) would be - chosen. Assuming that our listed bandwidth is such that we expect - to be chosen with probability P for any given request, and we've - been counting IPs for long enough that we expect the average - client to have made N requests, they will have visited us at least - once with probability P' = 1-(1-P)^N, and so we divide the IP - counts we've seen by P' for our estimate. To estimate total - number of clients of a given type, determine how many requests a - client of that type will make over that time, and assume we'll - have seen P of them. - - Both of these numbers are useful: the IP counts will give the - total number of IPs connecting to the network, and the request - counts will give the total number of users on the network at any - given time. - - Notes: - - [Over H hours, the N for V2 clients is 2*H, and the N for V3 - clients is currently around H/2 or H/3.] - - - (We should only count requests that we actually intend to answer; - 503 requests shouldn't count.) - - - These measurements should also be taken at a directory - authority if possible: their picture of the network is skewed - by clients that fetch from them directly. These clients, - however, are all the clients that are just bootstrapping - (assuming that the fallback-consensus feature isn't yet used - much). - - - These measurements also overestimate the V2 download rate if - some downloads fail and clients retry them later after backing - off. - -Methods for directory guards: - - If directory guards are in use, directory guards get a picture of - all those users who chose them as a guard when they were listed - as a good choice for a guard, and who are also on the network - now. The cleanest data here will come from nodes that were listed - as good new-guards choices for a while, and have not been so for a - while longer (to study decay rates); nodes that have been listed - as good new-guard choices consistently for a long time (to get a - sample of the network); and nodes that have been listed as good - new-guard choices only recently (to get a sample of new users and - users whose guards have died out.) - - Since directory guards are currently unspecified, we'll need to - make some guesses about how they'll turn out to work. Here are - a couple of approaches that could work. - - We could have clients pick completely new directory guards on - a rolling basis every two months or so. This would ensure - that staying as a guard for a while would be sufficient to - see a sample of users. This is potentially advantageous for - load-balancing the network as well, though it might lose some - of the benefits of directory guard. We need to quantify the - impact of this; it might not actually make stuff worse in - practice, if most guards don't stay good guards for a month - or two. - - - We could try to collect statistics at several directory - guards and combine their statisics, but we would need to make - sure that for all time, at least one of the directory guards - had been recommended as a good choice for new guards. By - looking at new-IP rates for guards, we could get an idea of - user uptake; for looking at old-IP decay rates, we could get - an idea of turnover. This approach would entail significant - complexity, and we'd probably need to record more information - than we'd really like to. - -