January 2014 - tor-commits - lists.torproject.org

[torspec/master] Move XXX-bridgedb-database-improvements.txt and history into torspec.git.
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit 4aee321ebd0b4f11f3339a52a9045cc7caef6724 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Jan 30 16:25:03 2014 +0000 Move XXX-bridgedb-database-improvements.txt and history into torspec.git. The command used to filter the file and its commit history from bridgedb.git was: $ git filter-branch -f --index-filter \ 'git rm --cached -qr -- . && git reset -q $GIT_COMMIT -- doc/proposals/XXX-bridgedb-database-improvements.txt' \ --prune-empty \ --parent-filter 'ruby /home/isis/scripts/git-rewrite-parents.rb $@' \ --tag-name-filter cat -- --all with the `git-rewrite-parents.rb` script, taken from http://www.spinics.net/lists/git/msg177988.html, containing: old_parents = gets.chomp.gsub('-p ', ' ') if old_parents.empty? then new_parents = [] else new_parents = `git show-branch --independent #{old_parents}`.split end puts new_parents.map{|p| '-p ' + p}.join(' ') --- .../XXX-bridgedb-database-improvements.txt | 260 -------------------- proposals/XXX-bridgedb-database-improvements.txt | 260 ++++++++++++++++++++ 2 files changed, 260 insertions(+), 260 deletions(-) diff --git a/doc/proposals/XXX-bridgedb-database-improvements.txt b/doc/proposals/XXX-bridgedb-database-improvements.txt deleted file mode 100644 index 2d25bd2..0000000 --- a/doc/proposals/XXX-bridgedb-database-improvements.txt +++ /dev/null @@ -1,260 +0,0 @@ -# -*- coding: utf-8 ; mode: org -*- - -Filename: XXX-bridgedb-database-improvements.txt -Title: "Scalability and Stability Improvements to BridgeDB: Switching to a - Distributed Database System and RDBMS" -Author: Isis Agora Lovecruft -Created: 12 Oct 2013 -Related Proposals: XXX-social-bridge-distribution.txt -Status: Open - -* I. Overview - - BridgeDB is Tor's Bridge Distribution system, which currently has two major - Bridge Distribution mechanisms: the HTTPS Distributor and an Email - Distributor. [0] - - BridgeDB is written largely in Twisted Python, and uses Python2's builtin - sqlite3 as its database backend. Unfortunately, this backend system is - already showing strain through increased times for queries, and sqlite's - memory usage is not up-to-par with modern, more efficient, NoSQL databases. - - In order to better facilitate the implementation of newer, more complex - Bridge Distribution mechanisms, several improvements should be made to the - underlying database system of BridgeDB. Additionally, I propose that a - clear distinction in terms, as well as a modularisation of the codebase, be - drawn between the mechanisms for Bridge Distribution versus the backend - Bridge Database (BridgeDB) storage system. - - This proposal covers the design and implementation of a scalable NoSQL ― - Document-Based and Key-Value Relational ― database backend for storing data - on Tor Bridge relays, in an efficient manner that is ammenable to - interfacing with the Twisted Python asynchronous networking code of current - and future Bridge Distribution mechanisms. - -* II. Terminology - - BridgeDistributor := A program which decides when and how to hand out - information on a Tor Bridge relay, and to whom. - - BridgeDB := The backend system of databases and object-relational mapping - servers, which interfaces with the BridgeDistributor in order - to hand out bridges to clients, and to obtain and process new, - incoming ``@type bridge-server-descriptors``, - ``@type bridge-networkstatus`` documents, and - ``@type bridge-extrainfo`` descriptors. [3] - - BridgeFinder := A client-side program for an Onion Proxy (OP) which handles - interfacing with a BridgeDistributor in order to obtain new - Bridge relays for a client. A BridgeFinder also interfaces - with a local Tor Controller (such as TorButton or ARM) to - handle automatic, transparent Bridge configuration (no more - copy+pasting into a torrc) without being given any - additional privileges over the Tor process, [1] and relies - on the Tor Controller to interface with the user for - control input and displaying up-to-date information - regarding available Bridges, Pluggable Transport methods, - and potentially Invite Tickets and Credits (a cryptographic - currency without fiat value which is generated - automatically by clients whose Bridges remain largely - uncensored, and is used to purchase new Bridges), should a - Social Bridge Distributor be implemented. [2] - -* III. Databases -** III.A. Scalability Requirements - - Databases SHOULD be implemented in a manner which is ammenable to using a - distributed storage system; this is necessary because many potential - datatypes required by future BridgeDistributors MUST be stored permanently. - For example, in the designs for the Social Bridge Distributor, the list of - hash digests of spent Credits, and the list of hash digests of redeemed - Invite Tickets MUST be stored forever to prevent either from being replayed - ― or double-spent ― by a malicious user who wishes to block bridges faster. - Designing the BridgeDB backend system such that additional nodes may be - added in the future will allow the system to freely scale in relation to - the storage requirements of future BridgeDistributors. - - Additionally, requiring that the implementation allow for distributed - database backends promotes modularisation the components of BridgeDB, such - that BridgeDistributors can be separated from the backend storage system, - BridgeDB, as all queries will be issued through a simplified, common API, - regardless of the number of nodes system, or the design of future - BridgeDistributors. - -*** 1. Distributed Database System - - A distributed database system SHOULD be used for BridgeDB, in order to - scale resources as the number of Tor bridge users grows. This database - system, hereafter referred to as DDBS. - - The DDBS MUST be capable of working within Twisted's asynchronous - framework. If possible, a Object-Relational Mapper (ORM) SHOULD be used to - abstract the database backend's structure and query syntax from the Twisted - Python classes which interact with it, so that the type of database may be - swapped out for another with less code refactoring. - - The DDBM SHALL be used for persistent storage of complex data structures - such as the bridges, which MAY include additional information from both the - `@type bridge-server-descriptor`s and the `@type bridge-extra-info` - descriptors. [3] - -**** 1.a. Choice of DDBS - - CouchDB is chosen for its simple HTTP API, ease of use, speed, and official - support for Twisted Python applications. [4] Additionally, its - document-based data model is very similar to the current archetecture of - tor's Directory Server/Mirror system, in that an HTTP API is used to - retrieve data stored within virtual directories. Internally, it uses JSON - to store data and JavaScript as its query language, both of which are - likely friendlier to various other components of the Tor Metrics - infrastructure which sanitise and analyse portions of the Bridge - descriptors. At the very least, friendlier than hardcoding raw SQL queries - as Python strings. - -**** 1.b. Data Structures which should be stored in a DDBS: - - - RedactedDB - The Database of Blocked Bridges - - The RedactedDB will hold entries of bridges which have been discovered to - be unreachable from BridgeDB network vantage point, or have been reported - unreachable by clients. - - - BridgeDB - The Database of Bridges - - BridgeDB holds information on available Bridges, obtained via bridge - descriptors and networkstatus documents from the BridgeAuthority. Because - a Bridge may have multiple `ORPort`s and multiple - `ServerTransportListenAddress`es, attaching additional data to each of - these addresses which MAY include the following information on a blocking - event: - - Geolocational country code of the reported blocking event - - Timestamp for when the blocking event was first reported - - The method used for discovery of the block - - an the believed mechanism which is causing the block - would quickly become unwieldy, the RedactedDB and BridgeDB SHOULD be kept - separate. - - - User Credentials - - For the Social BridgeDistributor, these are rather complex, - increasingly-growing, concatenations (or tuples) of several datatypes, - including Non-Interactive Proofs-of-Knowledge (NIPK) of Commitments to - k-TAA Blind Signatures, and NIPK of Commitments to a User's current - number of Credits and timestamps of requests for Invite Tickets. - -*** 2. Key-Value Relational Database Mapping Server - - For simpler data structures which must be persistently stored, such as the - list of hashes of previously seen Invite Tickets, or the list of - previously spent Tokens, a Relational Database Mapping Server (RDBMS) - SHALL be used for optimisation of queries. - - Redis and Memcached are two examples of RDBMS which are well tested and - are known to work well with Twisted. The major difference between the two - is that Memcaches are stored only within volatile memory, while Redis - additionally supports commands for transferring objects into persistent, - on-disk storage. - - There are several support modules for interfacing with both Memcached and - Redis from Twisted Python, see Twisted's MemCacheProtocol class [5] [6] or - txyam [7] for Memcached, and txredis [8] or txredisapi [9] for - Redis. Additionally, numerous big name projects both use Redis as part of - their backend systems, and also provide helpful documentation on their own - experience of the process of switching over to the new systems. [17] For - non-Twisted Python Redis APIs, there is redis-py, which provides a - connection pool that could likely be interfaced with from Twisted Python - without too much difficultly. [10] [11] - -**** 2.a. Data Structures which should be stored in a RDBMS - - Simple, mostly-flat datatypes, and data which must be frequently indexed - should be stored in a RDBMS, such as large lists of hashes, or arbitrary - strings with assigned point-values (i.e. the "Uniform Mapping" for the - current HTTPS BridgeDistributor). - - For the Social BridgeDistributor, hash digests of the following datatypes - SHOULD be stored in the RDBMS, in order to prevent double-spending and - replay attacks: - - - Invite Tickets - - These are anonymous, unlinkable, unforgeable, and verifiable tokens - which are occasionally handed out to well-behaved Users by the Social - BridgeDistributor to permit new Users to be invited into the system. - When they are redeemed, the Social BridgeDistributor MUST store a hash - digest of their contents to prevent replayed Invite Tickets. - - - Spent Credits - - These are Credits which have already been redeemed for new Bridges. - The Social BridgeDistributor MUST also store a hash digest of Spent - Credits to prevent double-spending. - -*** 3. Bloom Filters and Other Database Optimisations - - In order to further decrease the need for lookups in the backend - databases, Bloom Filters can used to eliminate extraneous - queries. However, this optimization would only be beneficial for string - lookups, i.e. querying for a User's Credential, and SHOULD NOT be used for - queries within any of the hash lists, i.e. the list of hashes of - previously seen Invite Tickets. [14] - -**** 3.a. Bloom Filters within Redis - - It might be possible to use Redis' GETBIT and SETBIT commands to store a - Bloom Filter within a Redis cache system; [15] doing so would offload the - severe memory requirements of loading the Bloom Filter into memory in - Python when inserting new entries, reducing the time complexity from some - polynomial time complexity that is proportional to the integral of the - number of bridge users over the rate of change of bridge users over time, - to a time complexity of order O(1). - -**** 3.b. Expiration of Stale Data - - Some types of data SHOULD be safe to expire, such as User Credentials - which have not been updated within a certain timeframe. This idea should - be further explored to assess the safety and potential drawbacks to - removing old data. - - If there is data which SHOULD expire, the PEXPIREAT command provided by - Redis for the key datatype would allow the RDBMS itself to handle cleanup - of stale data automatically. [16] - -**** 4. Other potential uses of the improved Bridge database system - - Redis provides mechanisms for evaluations to be made on data by calling - the sha1 for a serverside Lua script. [15] While not required in the - slightest, it is a rather neat feature, as it would allow Tor's Metrics - infrastructure to offload some of the computational overhead of gathering - data on Bridge usage to BridgeDB (as well as diminish the security - implications of storing Bridge descriptors). - - Also, if Twisted's IProducer and IConsumer interfaces do not provide - needed interface functionality, or it is desired that other components of - the Tor software ecosystem be capable of scheduling jobs for BridgeDB, - there are well-tested mechanisms for using Redis as a message - queue/scheduling system. [16] - -* References - -[0]: https://bridges.torproject.org - mailto:bridges@bridges.torproject.org -[1]: See proposals 199-bridgefinder-integration.txt at - https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/199-bridgefi… -[2]: See XXX-social-bridge-distribution.txt at - https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/featur… -[3]: https://metrics.torproject.org/formats.html#descriptortypes -[4]: https://github.com/couchbase/couchbase-python-client#twisted-api -[5]: https://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.… -[6]: http://stackoverflow.com/a/5162203 -[7]: http://findingscience.com/twisted/python/memcache/2012/06/09/txyam:-yet-ano… -[8]: https://pypi.python.org/pypi/txredis -[9]: https://github.com/fiorix/txredisapi -[10]: https://github.com/andymccurdy/redis-py/ -[11]: http://degizmo.com/2010/03/22/getting-started-redis-and-python/ -[12]: http://www.dr-josiah.com/2012/03/why-we-didnt-use-bloom-filter.html -[13]: http://redis.io/topics/data-types §"Strings" -[14]: http://redis.io/commands/pexpireat -[15]: http://redis.io/commands/evalsha -[16]: http://www.restmq.com/ -[17]: https://www.mediawiki.org/wiki/Redis diff --git a/proposals/XXX-bridgedb-database-improvements.txt b/proposals/XXX-bridgedb-database-improvements.txt new file mode 100644 index 0000000..2d25bd2 --- /dev/null +++ b/proposals/XXX-bridgedb-database-improvements.txt @@ -0,0 +1,260 @@ +# -*- coding: utf-8 ; mode: org -*- + +Filename: XXX-bridgedb-database-improvements.txt +Title: "Scalability and Stability Improvements to BridgeDB: Switching to a + Distributed Database System and RDBMS" +Author: Isis Agora Lovecruft +Created: 12 Oct 2013 +Related Proposals: XXX-social-bridge-distribution.txt +Status: Open + +* I. Overview + + BridgeDB is Tor's Bridge Distribution system, which currently has two major + Bridge Distribution mechanisms: the HTTPS Distributor and an Email + Distributor. [0] + + BridgeDB is written largely in Twisted Python, and uses Python2's builtin + sqlite3 as its database backend. Unfortunately, this backend system is + already showing strain through increased times for queries, and sqlite's + memory usage is not up-to-par with modern, more efficient, NoSQL databases. + + In order to better facilitate the implementation of newer, more complex + Bridge Distribution mechanisms, several improvements should be made to the + underlying database system of BridgeDB. Additionally, I propose that a + clear distinction in terms, as well as a modularisation of the codebase, be + drawn between the mechanisms for Bridge Distribution versus the backend + Bridge Database (BridgeDB) storage system. + + This proposal covers the design and implementation of a scalable NoSQL ― + Document-Based and Key-Value Relational ― database backend for storing data + on Tor Bridge relays, in an efficient manner that is ammenable to + interfacing with the Twisted Python asynchronous networking code of current + and future Bridge Distribution mechanisms. + +* II. Terminology + + BridgeDistributor := A program which decides when and how to hand out + information on a Tor Bridge relay, and to whom. + + BridgeDB := The backend system of databases and object-relational mapping + servers, which interfaces with the BridgeDistributor in order + to hand out bridges to clients, and to obtain and process new, + incoming ``@type bridge-server-descriptors``, + ``@type bridge-networkstatus`` documents, and + ``@type bridge-extrainfo`` descriptors. [3] + + BridgeFinder := A client-side program for an Onion Proxy (OP) which handles + interfacing with a BridgeDistributor in order to obtain new + Bridge relays for a client. A BridgeFinder also interfaces + with a local Tor Controller (such as TorButton or ARM) to + handle automatic, transparent Bridge configuration (no more + copy+pasting into a torrc) without being given any + additional privileges over the Tor process, [1] and relies + on the Tor Controller to interface with the user for + control input and displaying up-to-date information + regarding available Bridges, Pluggable Transport methods, + and potentially Invite Tickets and Credits (a cryptographic + currency without fiat value which is generated + automatically by clients whose Bridges remain largely + uncensored, and is used to purchase new Bridges), should a + Social Bridge Distributor be implemented. [2] + +* III. Databases +** III.A. Scalability Requirements + + Databases SHOULD be implemented in a manner which is ammenable to using a + distributed storage system; this is necessary because many potential + datatypes required by future BridgeDistributors MUST be stored permanently. + For example, in the designs for the Social Bridge Distributor, the list of + hash digests of spent Credits, and the list of hash digests of redeemed + Invite Tickets MUST be stored forever to prevent either from being replayed + ― or double-spent ― by a malicious user who wishes to block bridges faster. + Designing the BridgeDB backend system such that additional nodes may be + added in the future will allow the system to freely scale in relation to + the storage requirements of future BridgeDistributors. + + Additionally, requiring that the implementation allow for distributed + database backends promotes modularisation the components of BridgeDB, such + that BridgeDistributors can be separated from the backend storage system, + BridgeDB, as all queries will be issued through a simplified, common API, + regardless of the number of nodes system, or the design of future + BridgeDistributors. + +*** 1. Distributed Database System + + A distributed database system SHOULD be used for BridgeDB, in order to + scale resources as the number of Tor bridge users grows. This database + system, hereafter referred to as DDBS. + + The DDBS MUST be capable of working within Twisted's asynchronous + framework. If possible, a Object-Relational Mapper (ORM) SHOULD be used to + abstract the database backend's structure and query syntax from the Twisted + Python classes which interact with it, so that the type of database may be + swapped out for another with less code refactoring. + + The DDBM SHALL be used for persistent storage of complex data structures + such as the bridges, which MAY include additional information from both the + `@type bridge-server-descriptor`s and the `@type bridge-extra-info` + descriptors. [3] + +**** 1.a. Choice of DDBS + + CouchDB is chosen for its simple HTTP API, ease of use, speed, and official + support for Twisted Python applications. [4] Additionally, its + document-based data model is very similar to the current archetecture of + tor's Directory Server/Mirror system, in that an HTTP API is used to + retrieve data stored within virtual directories. Internally, it uses JSON + to store data and JavaScript as its query language, both of which are + likely friendlier to various other components of the Tor Metrics + infrastructure which sanitise and analyse portions of the Bridge + descriptors. At the very least, friendlier than hardcoding raw SQL queries + as Python strings. + +**** 1.b. Data Structures which should be stored in a DDBS: + + - RedactedDB - The Database of Blocked Bridges + + The RedactedDB will hold entries of bridges which have been discovered to + be unreachable from BridgeDB network vantage point, or have been reported + unreachable by clients. + + - BridgeDB - The Database of Bridges + + BridgeDB holds information on available Bridges, obtained via bridge + descriptors and networkstatus documents from the BridgeAuthority. Because + a Bridge may have multiple `ORPort`s and multiple + `ServerTransportListenAddress`es, attaching additional data to each of + these addresses which MAY include the following information on a blocking + event: + - Geolocational country code of the reported blocking event + - Timestamp for when the blocking event was first reported + - The method used for discovery of the block + - an the believed mechanism which is causing the block + would quickly become unwieldy, the RedactedDB and BridgeDB SHOULD be kept + separate. + + - User Credentials + + For the Social BridgeDistributor, these are rather complex, + increasingly-growing, concatenations (or tuples) of several datatypes, + including Non-Interactive Proofs-of-Knowledge (NIPK) of Commitments to + k-TAA Blind Signatures, and NIPK of Commitments to a User's current + number of Credits and timestamps of requests for Invite Tickets. + +*** 2. Key-Value Relational Database Mapping Server + + For simpler data structures which must be persistently stored, such as the + list of hashes of previously seen Invite Tickets, or the list of + previously spent Tokens, a Relational Database Mapping Server (RDBMS) + SHALL be used for optimisation of queries. + + Redis and Memcached are two examples of RDBMS which are well tested and + are known to work well with Twisted. The major difference between the two + is that Memcaches are stored only within volatile memory, while Redis + additionally supports commands for transferring objects into persistent, + on-disk storage. + + There are several support modules for interfacing with both Memcached and + Redis from Twisted Python, see Twisted's MemCacheProtocol class [5] [6] or + txyam [7] for Memcached, and txredis [8] or txredisapi [9] for + Redis. Additionally, numerous big name projects both use Redis as part of + their backend systems, and also provide helpful documentation on their own + experience of the process of switching over to the new systems. [17] For + non-Twisted Python Redis APIs, there is redis-py, which provides a + connection pool that could likely be interfaced with from Twisted Python + without too much difficultly. [10] [11] + +**** 2.a. Data Structures which should be stored in a RDBMS + + Simple, mostly-flat datatypes, and data which must be frequently indexed + should be stored in a RDBMS, such as large lists of hashes, or arbitrary + strings with assigned point-values (i.e. the "Uniform Mapping" for the + current HTTPS BridgeDistributor). + + For the Social BridgeDistributor, hash digests of the following datatypes + SHOULD be stored in the RDBMS, in order to prevent double-spending and + replay attacks: + + - Invite Tickets + + These are anonymous, unlinkable, unforgeable, and verifiable tokens + which are occasionally handed out to well-behaved Users by the Social + BridgeDistributor to permit new Users to be invited into the system. + When they are redeemed, the Social BridgeDistributor MUST store a hash + digest of their contents to prevent replayed Invite Tickets. + + - Spent Credits + + These are Credits which have already been redeemed for new Bridges. + The Social BridgeDistributor MUST also store a hash digest of Spent + Credits to prevent double-spending. + +*** 3. Bloom Filters and Other Database Optimisations + + In order to further decrease the need for lookups in the backend + databases, Bloom Filters can used to eliminate extraneous + queries. However, this optimization would only be beneficial for string + lookups, i.e. querying for a User's Credential, and SHOULD NOT be used for + queries within any of the hash lists, i.e. the list of hashes of + previously seen Invite Tickets. [14] + +**** 3.a. Bloom Filters within Redis + + It might be possible to use Redis' GETBIT and SETBIT commands to store a + Bloom Filter within a Redis cache system; [15] doing so would offload the + severe memory requirements of loading the Bloom Filter into memory in + Python when inserting new entries, reducing the time complexity from some + polynomial time complexity that is proportional to the integral of the + number of bridge users over the rate of change of bridge users over time, + to a time complexity of order O(1). + +**** 3.b. Expiration of Stale Data + + Some types of data SHOULD be safe to expire, such as User Credentials + which have not been updated within a certain timeframe. This idea should + be further explored to assess the safety and potential drawbacks to + removing old data. + + If there is data which SHOULD expire, the PEXPIREAT command provided by + Redis for the key datatype would allow the RDBMS itself to handle cleanup + of stale data automatically. [16] + +**** 4. Other potential uses of the improved Bridge database system + + Redis provides mechanisms for evaluations to be made on data by calling + the sha1 for a serverside Lua script. [15] While not required in the + slightest, it is a rather neat feature, as it would allow Tor's Metrics + infrastructure to offload some of the computational overhead of gathering + data on Bridge usage to BridgeDB (as well as diminish the security + implications of storing Bridge descriptors). + + Also, if Twisted's IProducer and IConsumer interfaces do not provide + needed interface functionality, or it is desired that other components of + the Tor software ecosystem be capable of scheduling jobs for BridgeDB, + there are well-tested mechanisms for using Redis as a message + queue/scheduling system. [16] + +* References + +[0]: https://bridges.torproject.org + mailto:bridges@bridges.torproject.org +[1]: See proposals 199-bridgefinder-integration.txt at + https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/199-bridgefi… +[2]: See XXX-social-bridge-distribution.txt at + https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/featur… +[3]: https://metrics.torproject.org/formats.html#descriptortypes +[4]: https://github.com/couchbase/couchbase-python-client#twisted-api +[5]: https://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.… +[6]: http://stackoverflow.com/a/5162203 +[7]: http://findingscience.com/twisted/python/memcache/2012/06/09/txyam:-yet-ano… +[8]: https://pypi.python.org/pypi/txredis +[9]: https://github.com/fiorix/txredisapi +[10]: https://github.com/andymccurdy/redis-py/ +[11]: http://degizmo.com/2010/03/22/getting-started-redis-and-python/ +[12]: http://www.dr-josiah.com/2012/03/why-we-didnt-use-bloom-filter.html +[13]: http://redis.io/topics/data-types §"Strings" +[14]: http://redis.io/commands/pexpireat +[15]: http://redis.io/commands/evalsha +[16]: http://www.restmq.com/ +[17]: https://www.mediawiki.org/wiki/Redis

1 0

[torspec/master] Merge branch 'db-spec_1' into bridgedb-database-improvements_1
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit c995f6217a93e2927d049a578262670a6763c6b7 Merge: 2cb5c4c 4aee321 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Jan 30 16:33:57 2014 +0000 Merge branch 'db-spec_1' into bridgedb-database-improvements_1 proposals/XXX-bridgedb-database-improvements.txt | 260 ++++++++++++++++++++++ 1 file changed, 260 insertions(+)

1 0

[torspec/master] Merge remote-tracking branch 'bridgedb/bdb-spec' into bridgedb/1606-bridgedb-spec
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit 44876cd32c55c8a59649c68414e205624bb0468b Merge: 2cb5c4c 38d3292 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Jan 30 16:51:09 2014 +0000 Merge remote-tracking branch 'bridgedb/bdb-spec' into bridgedb/1606-bridgedb-spec bridgedb-spec.txt | 391 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 391 insertions(+)

1 0

[torspec/master] Move bridgedb.git:/doc/bridge-db-spec.txt → torspec.git:/bridgedb-spec.txt.
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit 38d3292b94edcf4fbc52da05b95aa1420cad5a82 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Jan 30 16:48:00 2014 +0000 Move bridgedb.git:/doc/bridge-db-spec.txt → torspec.git:/bridgedb-spec.txt. The git filter-branch command used on the bridgedb.git repo was: $ git filter-branch -f --index-filter \ 'git rm --cached -qr -- . && git reset -q $GIT_COMMIT -- doc/bridge-db-spec.txt' \ --prune-empty \ --parent-filter 'ruby /home/isis/scripts/git-rewrite-parents.rb $@' \ --tag-name-filter cat -- --all --- bridgedb-spec.txt | 391 ++++++++++++++++++++++++++++++++++++++++++++++++ doc/bridge-db-spec.txt | 391 ------------------------------------------------ 2 files changed, 391 insertions(+), 391 deletions(-) diff --git a/bridgedb-spec.txt b/bridgedb-spec.txt new file mode 100644 index 0000000..c897226 --- /dev/null +++ b/bridgedb-spec.txt @@ -0,0 +1,391 @@ + + BridgeDB specification + + Karsten Loesing + Nick Mathewson + +0. Preliminaries + + This document specifies how BridgeDB processes bridge descriptor files + to learn about new bridges, maintains persistent assignments of bridges + to distributors, and decides which bridges to give out upon user + requests. + + Some of the decisions here may be suboptimal: this document is meant to + specify current behavior as of August 2013, not to specify ideal + behavior. + +1. Importing bridge network statuses and bridge descriptors + + BridgeDB learns about bridges by parsing bridge network statuses, + bridge descriptors, and extra info documents as specified in Tor's + directory protocol. BridgeDB parses one bridge network status file + first and at least one bridge descriptor file and potentially one extra + info file afterwards. + + BridgeDB scans its files on sighup. + + BridgeDB does not validate signatures on descriptors or networkstatus + files: the operator needs to make sure that these documents have come + from a Tor instance that did the validation for us. + +1.1. Parsing bridge network statuses + + Bridge network status documents contain the information of which bridges + are known to the bridge authority and which flags the bridge authority + assigns to them. + We expect bridge network statuses to contain at least the following two + lines for every bridge in the given order (format fully specified in Tor's + directory protocol): + + "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort + SP DirPort NL + "a" SP address ":" port NL (no more than 8 instances) + "s" SP Flags NL + + BridgeDB parses the identity and the publication timestamp from the "r" + line, the OR address(es) and ORPort(s) from the "a" line(s), and the + assigned flags from the "s" line, specifically checking the assignment + of the "Running" and "Stable" flags. + BridgeDB memorizes all bridges that have the Running flag as the set of + running bridges that can be given out to bridge users. + BridgeDB memorizes assigned flags if it wants to ensure that sets of + bridges given out should contain at least a given number of bridges + with these flags. + +1.2. Parsing bridge descriptors + + BridgeDB learns about a bridge's most recent IP address and OR port + from parsing bridge descriptors. + In theory, both IP address and OR port of a bridge are also contained + in the "r" line of the bridge network status, so there is no mandatory + reason for parsing bridge descriptors. But the functionality described + in this section is still implemented in case we need data from the + bridge descriptor in the future. + + Bridge descriptor files may contain one or more bridge descriptors. + We expect a bridge descriptor to contain at least the following lines in + the stated order: + + "@purpose" SP purpose NL + "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL + "published" SP timestamp + ["opt" SP] "fingerprint" SP fingerprint NL + "router-signature" NL Signature NL + + BridgeDB parses the purpose, IP, ORPort, nickname, and fingerprint + from these lines. + BridgeDB skips bridge descriptors if the fingerprint is not contained + in the bridge network status parsed earlier or if the bridge does not + have the Running flag. + BridgeDB discards bridge descriptors which have a different purpose + than "bridge". BridgeDB can be configured to only accept descriptors + with another purpose or not discard descriptors based on purpose at + all. + BridgeDB memorizes the IP addresses and OR ports of the remaining + bridges. + If there is more than one bridge descriptor with the same fingerprint, + BridgeDB memorizes the IP address and OR port of the most recently + parsed bridge descriptor. + If BridgeDB does not find a bridge descriptor for a bridge contained in + the bridge network status parsed before, it does not add that bridge + to the set of bridges to be given out to bridge users. + +1.3. Parsing extra-info documents + + BridgeDB learns if a bridge supports a pluggable transport by parsing + extra-info documents. + Extra-info documents contain the name of the bridge (but only if it is + named), the bridge's fingerprint, the type of pluggable transport(s) it + supports, and the IP address and port number on which each transport + listens, respectively. + + Extra-info documents may contain zero or more entries per bridge. We expect + an extra-info entry to contain the following lines in the stated order: + + "extra-info" SP name SP fingerprint NL + "transport" SP transport SP IP ":" PORT ARGS NL + + BridgeDB parses the fingerprint, transport type, IP address, port and any + arguments that are specified on these lines. BridgeDB skips the name. If + the fingerprint is invalid, BridgeDB skips the entry. BridgeDB memorizes + the transport type, IP address, port number, and any arguments that are be + provided and then it assigns them to the corresponding bridge based on the + fingerprint. Arguments are comma-separated and are of the form k=v,k=v. + Bridges that do not have an associated extra-info entry are not invalid. + +2. Assigning bridges to distributors + + A "distributor" is a mechanism by which bridges are given (or not + given) to clients. The current distributors are "email", "https", + and "unallocated". + + BridgeDB assigns bridges to distributors based on an HMAC hash of the + bridge's ID and a secret and makes these assignments persistent. + Persistence is achieved by using a database to map node ID to + distributor. + Each bridge is assigned to exactly one distributor (including + the "unallocated" distributor). + BridgeDB may be configured to support only a non-empty subset of the + distributors specified in this document. + BridgeDB may be configured to use different probabilities for assigning + new bridges to distributors. + BridgeDB does not change existing assignments of bridges to + distributors, even if probabilities for assigning bridges to + distributors change or distributors are disabled entirely. + +3. Giving out bridges upon requests + + Upon receiving a client request, a BridgeDB distributor provides a + subset of the bridges assigned to it. + BridgeDB only gives out bridges that are contained in the most recently + parsed bridge network status and that have the Running flag set (see + Section 1). + BridgeDB may be configured to give out a different number of bridges + (typically 4) depending on the distributor. + BridgeDB may define an arbitrary number of rules. These rules may + specify the criteria by which a bridge is selected. Specifically, + the available rules restrict the IP address version, OR port number, + transport type, bridge relay flag, or country in which the bridge + should not be blocked. + +4. Selecting bridges to be given out based on IP addresses + + BridgeDB may be configured to support one or more distributors which + gives out bridges based on the requestor's IP address. Currently, this + is how the HTTPS distributor works. + The goal is to avoid handing out all the bridges to users in a similar + IP space and time. +# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement +# to see if this section is missing relevant pieces from it. -KL + + BridgeDB fixes the set of bridges to be returned for a defined time + period. + BridgeDB considers all IP addresses coming from the same /24 network + as the same IP address and returns the same set of bridges. From here on, + this non-unique address will be referred to as the IP address's 'area'. + BridgeDB divides the IP address space equally into a small number of +# Note, changed term from "areas" to "disjoint clusters" -MF + disjoint clusters (typically 4) and returns different results for requests + coming from addresses that are placed into different clusters. +# I found that BridgeDB is not strict in returning only bridges for a +# given area. If a ring is empty, it considers the next one. Is this +# expected behavior? -KL +# +# This does not appear to be the case, anymore. If a ring is empty, then +# BridgeDB simply returns an empty set of bridges. -MF +# +# I also found that BridgeDB does not make the assignment to areas +# persistent in the database. So, if we change the number of rings, it +# will assign bridges to other rings. I assume this is okay? -KL + BridgeDB maintains a list of proxy IP addresses and returns the same + set of bridges to requests coming from these IP addresses. + The bridges returned to proxy IP addresses do not come from the same + set as those for the general IP address space. + + BridgeDB can be configured to include bridge fingerprints in replies + along with bridge IP addresses and OR ports. + BridgeDB can be configured to display a CAPTCHA which the user must solve + prior to returning the requested bridges. + + The current algorithm is as follows. An IP-based distributor splits + the bridges uniformly into a set of "rings" based on an HMAC of their + ID. Some of these rings are "area" rings for parts of IP space; some + are "category" rings for categories of IPs (like proxies). When a + client makes a request from an IP, the distributor first sees whether + the IP is in one of the categories it knows. If so, the distributor + returns an IP from the category rings. If not, the distributor + maps the IP into an "area" (that is, a /24), and then uses an HMAC to + map the area to one of the area rings. + + When the IP-based distributor determines from which area ring it is handing + out bridges, it identifies which rules it will use to choose appropriate + bridges. Using this information, it searches its cache of rings for one + that already adheres to the criteria specified in this request. If one + exists, then BridgeDB maps the current "epoch" (N-hour period) and the + IP's area (/24) to a point on the ring based on HMAC, and hands out + bridges at that point. If a ring does not already exist which satisfies this + request, then a new ring is created and filled with bridges that fulfill + the requirements. This ring is then used to select bridges as described. + + "Mapping X to Y based on an HMAC" above means one of the following: + - We keep all of the elements of Y in some order, with a mapping + from all 160-bit strings to positions in Y. + - We take an HMAC of X using some fixed string as a key to get a + 160-bit value. We then map that value to the next position of Y. + + When giving out bridges based on a position in a ring, BridgeDB first + looks at flag requirements and port requirements. For example, + BridgeDB may be configured to "Give out at least L bridges with port + 443, and at least M bridges with Stable, and at most N bridges + total." To do this, BridgeDB combines to the results: + - The first L bridges in the ring after the position that have the + port 443, and + - The first M bridges in the ring after the position that have the + flag stable and that it has not already decided to give out, and + - The first N-L-M bridges in the ring after the position that it + has not already decided to give out. + + After BridgeDB selects appropriate bridges to return to the requestor, it + then prioritises the ordering of them in a list so that as many criteria + are fulfilled as possible within the first few bridges. This list is then + truncated to N bridges, if possible. N is currently defined as a + piecewise function of the number of bridges in the ring such that: + + / + | 1, if len(ring) < 20 + | + N = | 2, if 20 <= len(ring) <= 100 + | + | 3, if 100 <= len(ring) + \ + + The bridges in this sublist, containing no more than N bridges, are the + bridges returned to the requestor. + +5. Selecting bridges to be given out based on email addresses + + BridgeDB can be configured to support one or more distributors that are + giving out bridges based on the requestor's email address. Currently, + this is how the email distributor works. + The goal is to bootstrap based on one or more popular email service's + sybil prevention algorithms. +# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement +# to see if this section is missing relevant pieces from it. -KL + + BridgeDB rejects email addresses containing other characters than the + ones that RFC2822 allows. + BridgeDB may be configured to reject email addresses containing other + characters it might not process correctly. +# I don't think we do this, is it worthwhile? -MF + BridgeDB rejects email addresses coming from other domains than a + configured set of permitted domains. + BridgeDB normalizes email addresses by removing "." characters and by + removing parts after the first "+" character. + BridgeDB can be configured to discard requests that do not have the + value "pass" in their X-DKIM-Authentication-Result header or does not + have this header. The X-DKIM-Authentication-Result header is set by + the incoming mail stack that needs to check DKIM authentication. + + BridgeDB does not return a new set of bridges to the same email address + until a given time period (typically a few hours) has passed. +# Why don't we fix the bridges we give out for a global 3-hour time period +# like we do for IP addresses? This way we could avoid storing email +# addresses. -KL +# The 3-hour value is probably much too short anyway. If we take longer +# time values, then people get new bridges when bridges show up, as +# opposed to then we decide to reset the bridges we give them. (Yes, this +# problem exists for the IP distributor). -NM +# I'm afraid I don't fully understand what you mean here. Can you +# elaborate? -KL +# +# Assuming an average churn rate, if we use short time periods, then a +# requestor will receive new bridges based on rate-limiting and will (likely) +# eventually work their way around the ring; eventually exhausting all bridges +# available to them from this distributor. If we use a longer time period, +# then each time the period expires there will be more bridges in the ring +# thus reducing the likelihood of all bridges being blocked and increasing +# the time and effort required to enumerate all bridges. (This is my +# understanding, not from Nick) -MF +# Also, we presently need the cache to prevent replays and because if a user +# sent multiple requests with different criteria in each then we would leak +# additional bridges otherwise. -MF + BridgeDB can be configured to include bridge fingerprints in replies + along with bridge IP addresses and OR ports. + BridgeDB can be configured to sign all replies using a PGP signing key. + BridgeDB periodically discards old email-address-to-bridge mappings. + BridgeDB rejects too frequent email requests coming from the same + normalized address. + + To map previously unseen email addresses to a set of bridges, BridgeDB + proceeds as follows: + - It normalizes the email address as above, by stripping out dots, + removing all of the localpart after the +, and putting it all + in lowercase. (Example: "John.Doe+bridges(a)example.COM" becomes + "johndoe(a)example.com".) + - It maps an HMAC of the normalized address to a position on its ring + of bridges. + - It hands out bridges starting at that position, based on the + port/flag requirements, as specified at the end of section 4. + + See section 4 for the details of how bridges are selected from the ring + and returned to the requestor. + +6. Selecting unallocated bridges to be stored in file buckets + +# Kaner should have a look at this section. -NM + + BridgeDB can be configured to reserve a subset of bridges and not give + them out via one of the distributors. + BridgeDB assigns reserved bridges to one or more file buckets of fixed + sizes and write these file buckets to disk for manual distribution. + BridgeDB ensures that a file bucket always contains the requested + number of running bridges. + If the requested number of bridges in a file bucket is reduced or the + file bucket is not required anymore, the unassigned bridges are + returned to the reserved set of bridges. + If a bridge stops running, BridgeDB replaces it with another bridge + from the reserved set of bridges. +# I'm not sure if there's a design bug in file buckets. What happens if +# we add a bridge X to file bucket A, and X goes offline? We would add +# another bridge Y to file bucket A. OK, but what if A comes back? We +# cannot put it back in file bucket A, because it's full. Are we going to +# add it to a different file bucket? Doesn't that mean that most bridges +# will be contained in most file buckets over time? -KL +# +# This should be handled the same as if the file bucket is reduced in size. +# If X returns, then it should be added to the appropriate distributor. -MF + +7. Displaying Bridge Information + + After bridges are selected using one of the methods described in + Sections 4 - 6, they are output in one of two formats. Bridges are + formatted as: + + <address:port> NL + + Pluggable transports are formatted as: + + <transportname> SP <address:port> [SP arglist] NL + + where arglist is an optional space-separated list of key-value pairs in + the form of k=v. + + Previously, each line was prepended with the "bridge" keyword, such as + + "bridge" SP <address:port> NL + + "bridge" SP <transportname> SP <address:port> [SP arglist] NL + +# We don't do this anymore because Vidalia and TorLauncher don't expect it. +# See the commit message for b70347a9c5fd769c6d5d0c0eb5171ace2999a736. + +8. Writing bridge assignments for statistics + + BridgeDB can be configured to write bridge assignments to disk for + statistical analysis. + The start of a bridge assignment is marked by the following line: + + "bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL + + YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed + loading new bridges and assigning them to distributors. + + For every running bridge there is a line with the following format: + + fingerprint SP distributor (SP key "=" value)* NL + + The distributor is one out of "email", "https", or "unallocated". + + Both "email" and "https" distributors support adding keys for "port", + "flag" and "transport". Respectively, the port number, flag name, and + transport types are the values. These are used to indicate that + a bridge matches certain port, flag, transport criteria of requests. + + The "https" distributor also allows the key "ring" with a number as + value to indicate to which IP address area the bridge is returned. + + The "unallocated" distributor allows the key "bucket" with the file + bucket name as value to indicate which file bucket a bridge is assigned + to. + diff --git a/doc/bridge-db-spec.txt b/doc/bridge-db-spec.txt deleted file mode 100644 index c897226..0000000 --- a/doc/bridge-db-spec.txt +++ /dev/null @@ -1,391 +0,0 @@ - - BridgeDB specification - - Karsten Loesing - Nick Mathewson - -0. Preliminaries - - This document specifies how BridgeDB processes bridge descriptor files - to learn about new bridges, maintains persistent assignments of bridges - to distributors, and decides which bridges to give out upon user - requests. - - Some of the decisions here may be suboptimal: this document is meant to - specify current behavior as of August 2013, not to specify ideal - behavior. - -1. Importing bridge network statuses and bridge descriptors - - BridgeDB learns about bridges by parsing bridge network statuses, - bridge descriptors, and extra info documents as specified in Tor's - directory protocol. BridgeDB parses one bridge network status file - first and at least one bridge descriptor file and potentially one extra - info file afterwards. - - BridgeDB scans its files on sighup. - - BridgeDB does not validate signatures on descriptors or networkstatus - files: the operator needs to make sure that these documents have come - from a Tor instance that did the validation for us. - -1.1. Parsing bridge network statuses - - Bridge network status documents contain the information of which bridges - are known to the bridge authority and which flags the bridge authority - assigns to them. - We expect bridge network statuses to contain at least the following two - lines for every bridge in the given order (format fully specified in Tor's - directory protocol): - - "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort - SP DirPort NL - "a" SP address ":" port NL (no more than 8 instances) - "s" SP Flags NL - - BridgeDB parses the identity and the publication timestamp from the "r" - line, the OR address(es) and ORPort(s) from the "a" line(s), and the - assigned flags from the "s" line, specifically checking the assignment - of the "Running" and "Stable" flags. - BridgeDB memorizes all bridges that have the Running flag as the set of - running bridges that can be given out to bridge users. - BridgeDB memorizes assigned flags if it wants to ensure that sets of - bridges given out should contain at least a given number of bridges - with these flags. - -1.2. Parsing bridge descriptors - - BridgeDB learns about a bridge's most recent IP address and OR port - from parsing bridge descriptors. - In theory, both IP address and OR port of a bridge are also contained - in the "r" line of the bridge network status, so there is no mandatory - reason for parsing bridge descriptors. But the functionality described - in this section is still implemented in case we need data from the - bridge descriptor in the future. - - Bridge descriptor files may contain one or more bridge descriptors. - We expect a bridge descriptor to contain at least the following lines in - the stated order: - - "@purpose" SP purpose NL - "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL - "published" SP timestamp - ["opt" SP] "fingerprint" SP fingerprint NL - "router-signature" NL Signature NL - - BridgeDB parses the purpose, IP, ORPort, nickname, and fingerprint - from these lines. - BridgeDB skips bridge descriptors if the fingerprint is not contained - in the bridge network status parsed earlier or if the bridge does not - have the Running flag. - BridgeDB discards bridge descriptors which have a different purpose - than "bridge". BridgeDB can be configured to only accept descriptors - with another purpose or not discard descriptors based on purpose at - all. - BridgeDB memorizes the IP addresses and OR ports of the remaining - bridges. - If there is more than one bridge descriptor with the same fingerprint, - BridgeDB memorizes the IP address and OR port of the most recently - parsed bridge descriptor. - If BridgeDB does not find a bridge descriptor for a bridge contained in - the bridge network status parsed before, it does not add that bridge - to the set of bridges to be given out to bridge users. - -1.3. Parsing extra-info documents - - BridgeDB learns if a bridge supports a pluggable transport by parsing - extra-info documents. - Extra-info documents contain the name of the bridge (but only if it is - named), the bridge's fingerprint, the type of pluggable transport(s) it - supports, and the IP address and port number on which each transport - listens, respectively. - - Extra-info documents may contain zero or more entries per bridge. We expect - an extra-info entry to contain the following lines in the stated order: - - "extra-info" SP name SP fingerprint NL - "transport" SP transport SP IP ":" PORT ARGS NL - - BridgeDB parses the fingerprint, transport type, IP address, port and any - arguments that are specified on these lines. BridgeDB skips the name. If - the fingerprint is invalid, BridgeDB skips the entry. BridgeDB memorizes - the transport type, IP address, port number, and any arguments that are be - provided and then it assigns them to the corresponding bridge based on the - fingerprint. Arguments are comma-separated and are of the form k=v,k=v. - Bridges that do not have an associated extra-info entry are not invalid. - -2. Assigning bridges to distributors - - A "distributor" is a mechanism by which bridges are given (or not - given) to clients. The current distributors are "email", "https", - and "unallocated". - - BridgeDB assigns bridges to distributors based on an HMAC hash of the - bridge's ID and a secret and makes these assignments persistent. - Persistence is achieved by using a database to map node ID to - distributor. - Each bridge is assigned to exactly one distributor (including - the "unallocated" distributor). - BridgeDB may be configured to support only a non-empty subset of the - distributors specified in this document. - BridgeDB may be configured to use different probabilities for assigning - new bridges to distributors. - BridgeDB does not change existing assignments of bridges to - distributors, even if probabilities for assigning bridges to - distributors change or distributors are disabled entirely. - -3. Giving out bridges upon requests - - Upon receiving a client request, a BridgeDB distributor provides a - subset of the bridges assigned to it. - BridgeDB only gives out bridges that are contained in the most recently - parsed bridge network status and that have the Running flag set (see - Section 1). - BridgeDB may be configured to give out a different number of bridges - (typically 4) depending on the distributor. - BridgeDB may define an arbitrary number of rules. These rules may - specify the criteria by which a bridge is selected. Specifically, - the available rules restrict the IP address version, OR port number, - transport type, bridge relay flag, or country in which the bridge - should not be blocked. - -4. Selecting bridges to be given out based on IP addresses - - BridgeDB may be configured to support one or more distributors which - gives out bridges based on the requestor's IP address. Currently, this - is how the HTTPS distributor works. - The goal is to avoid handing out all the bridges to users in a similar - IP space and time. -# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement -# to see if this section is missing relevant pieces from it. -KL - - BridgeDB fixes the set of bridges to be returned for a defined time - period. - BridgeDB considers all IP addresses coming from the same /24 network - as the same IP address and returns the same set of bridges. From here on, - this non-unique address will be referred to as the IP address's 'area'. - BridgeDB divides the IP address space equally into a small number of -# Note, changed term from "areas" to "disjoint clusters" -MF - disjoint clusters (typically 4) and returns different results for requests - coming from addresses that are placed into different clusters. -# I found that BridgeDB is not strict in returning only bridges for a -# given area. If a ring is empty, it considers the next one. Is this -# expected behavior? -KL -# -# This does not appear to be the case, anymore. If a ring is empty, then -# BridgeDB simply returns an empty set of bridges. -MF -# -# I also found that BridgeDB does not make the assignment to areas -# persistent in the database. So, if we change the number of rings, it -# will assign bridges to other rings. I assume this is okay? -KL - BridgeDB maintains a list of proxy IP addresses and returns the same - set of bridges to requests coming from these IP addresses. - The bridges returned to proxy IP addresses do not come from the same - set as those for the general IP address space. - - BridgeDB can be configured to include bridge fingerprints in replies - along with bridge IP addresses and OR ports. - BridgeDB can be configured to display a CAPTCHA which the user must solve - prior to returning the requested bridges. - - The current algorithm is as follows. An IP-based distributor splits - the bridges uniformly into a set of "rings" based on an HMAC of their - ID. Some of these rings are "area" rings for parts of IP space; some - are "category" rings for categories of IPs (like proxies). When a - client makes a request from an IP, the distributor first sees whether - the IP is in one of the categories it knows. If so, the distributor - returns an IP from the category rings. If not, the distributor - maps the IP into an "area" (that is, a /24), and then uses an HMAC to - map the area to one of the area rings. - - When the IP-based distributor determines from which area ring it is handing - out bridges, it identifies which rules it will use to choose appropriate - bridges. Using this information, it searches its cache of rings for one - that already adheres to the criteria specified in this request. If one - exists, then BridgeDB maps the current "epoch" (N-hour period) and the - IP's area (/24) to a point on the ring based on HMAC, and hands out - bridges at that point. If a ring does not already exist which satisfies this - request, then a new ring is created and filled with bridges that fulfill - the requirements. This ring is then used to select bridges as described. - - "Mapping X to Y based on an HMAC" above means one of the following: - - We keep all of the elements of Y in some order, with a mapping - from all 160-bit strings to positions in Y. - - We take an HMAC of X using some fixed string as a key to get a - 160-bit value. We then map that value to the next position of Y. - - When giving out bridges based on a position in a ring, BridgeDB first - looks at flag requirements and port requirements. For example, - BridgeDB may be configured to "Give out at least L bridges with port - 443, and at least M bridges with Stable, and at most N bridges - total." To do this, BridgeDB combines to the results: - - The first L bridges in the ring after the position that have the - port 443, and - - The first M bridges in the ring after the position that have the - flag stable and that it has not already decided to give out, and - - The first N-L-M bridges in the ring after the position that it - has not already decided to give out. - - After BridgeDB selects appropriate bridges to return to the requestor, it - then prioritises the ordering of them in a list so that as many criteria - are fulfilled as possible within the first few bridges. This list is then - truncated to N bridges, if possible. N is currently defined as a - piecewise function of the number of bridges in the ring such that: - - / - | 1, if len(ring) < 20 - | - N = | 2, if 20 <= len(ring) <= 100 - | - | 3, if 100 <= len(ring) - \ - - The bridges in this sublist, containing no more than N bridges, are the - bridges returned to the requestor. - -5. Selecting bridges to be given out based on email addresses - - BridgeDB can be configured to support one or more distributors that are - giving out bridges based on the requestor's email address. Currently, - this is how the email distributor works. - The goal is to bootstrap based on one or more popular email service's - sybil prevention algorithms. -# Someone else should look at proposals/ideas/old/xxx-bridge-disbursement -# to see if this section is missing relevant pieces from it. -KL - - BridgeDB rejects email addresses containing other characters than the - ones that RFC2822 allows. - BridgeDB may be configured to reject email addresses containing other - characters it might not process correctly. -# I don't think we do this, is it worthwhile? -MF - BridgeDB rejects email addresses coming from other domains than a - configured set of permitted domains. - BridgeDB normalizes email addresses by removing "." characters and by - removing parts after the first "+" character. - BridgeDB can be configured to discard requests that do not have the - value "pass" in their X-DKIM-Authentication-Result header or does not - have this header. The X-DKIM-Authentication-Result header is set by - the incoming mail stack that needs to check DKIM authentication. - - BridgeDB does not return a new set of bridges to the same email address - until a given time period (typically a few hours) has passed. -# Why don't we fix the bridges we give out for a global 3-hour time period -# like we do for IP addresses? This way we could avoid storing email -# addresses. -KL -# The 3-hour value is probably much too short anyway. If we take longer -# time values, then people get new bridges when bridges show up, as -# opposed to then we decide to reset the bridges we give them. (Yes, this -# problem exists for the IP distributor). -NM -# I'm afraid I don't fully understand what you mean here. Can you -# elaborate? -KL -# -# Assuming an average churn rate, if we use short time periods, then a -# requestor will receive new bridges based on rate-limiting and will (likely) -# eventually work their way around the ring; eventually exhausting all bridges -# available to them from this distributor. If we use a longer time period, -# then each time the period expires there will be more bridges in the ring -# thus reducing the likelihood of all bridges being blocked and increasing -# the time and effort required to enumerate all bridges. (This is my -# understanding, not from Nick) -MF -# Also, we presently need the cache to prevent replays and because if a user -# sent multiple requests with different criteria in each then we would leak -# additional bridges otherwise. -MF - BridgeDB can be configured to include bridge fingerprints in replies - along with bridge IP addresses and OR ports. - BridgeDB can be configured to sign all replies using a PGP signing key. - BridgeDB periodically discards old email-address-to-bridge mappings. - BridgeDB rejects too frequent email requests coming from the same - normalized address. - - To map previously unseen email addresses to a set of bridges, BridgeDB - proceeds as follows: - - It normalizes the email address as above, by stripping out dots, - removing all of the localpart after the +, and putting it all - in lowercase. (Example: "John.Doe+bridges(a)example.COM" becomes - "johndoe(a)example.com".) - - It maps an HMAC of the normalized address to a position on its ring - of bridges. - - It hands out bridges starting at that position, based on the - port/flag requirements, as specified at the end of section 4. - - See section 4 for the details of how bridges are selected from the ring - and returned to the requestor. - -6. Selecting unallocated bridges to be stored in file buckets - -# Kaner should have a look at this section. -NM - - BridgeDB can be configured to reserve a subset of bridges and not give - them out via one of the distributors. - BridgeDB assigns reserved bridges to one or more file buckets of fixed - sizes and write these file buckets to disk for manual distribution. - BridgeDB ensures that a file bucket always contains the requested - number of running bridges. - If the requested number of bridges in a file bucket is reduced or the - file bucket is not required anymore, the unassigned bridges are - returned to the reserved set of bridges. - If a bridge stops running, BridgeDB replaces it with another bridge - from the reserved set of bridges. -# I'm not sure if there's a design bug in file buckets. What happens if -# we add a bridge X to file bucket A, and X goes offline? We would add -# another bridge Y to file bucket A. OK, but what if A comes back? We -# cannot put it back in file bucket A, because it's full. Are we going to -# add it to a different file bucket? Doesn't that mean that most bridges -# will be contained in most file buckets over time? -KL -# -# This should be handled the same as if the file bucket is reduced in size. -# If X returns, then it should be added to the appropriate distributor. -MF - -7. Displaying Bridge Information - - After bridges are selected using one of the methods described in - Sections 4 - 6, they are output in one of two formats. Bridges are - formatted as: - - <address:port> NL - - Pluggable transports are formatted as: - - <transportname> SP <address:port> [SP arglist] NL - - where arglist is an optional space-separated list of key-value pairs in - the form of k=v. - - Previously, each line was prepended with the "bridge" keyword, such as - - "bridge" SP <address:port> NL - - "bridge" SP <transportname> SP <address:port> [SP arglist] NL - -# We don't do this anymore because Vidalia and TorLauncher don't expect it. -# See the commit message for b70347a9c5fd769c6d5d0c0eb5171ace2999a736. - -8. Writing bridge assignments for statistics - - BridgeDB can be configured to write bridge assignments to disk for - statistical analysis. - The start of a bridge assignment is marked by the following line: - - "bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL - - YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed - loading new bridges and assigning them to distributors. - - For every running bridge there is a line with the following format: - - fingerprint SP distributor (SP key "=" value)* NL - - The distributor is one out of "email", "https", or "unallocated". - - Both "email" and "https" distributors support adding keys for "port", - "flag" and "transport". Respectively, the port number, flag name, and - transport types are the values. These are used to indicate that - a bridge matches certain port, flag, transport criteria of requests. - - The "https" distributor also allows the key "ring" with a number as - value to indicate to which IP address area the bridge is returned. - - The "unallocated" distributor allows the key "bucket" with the file - bucket name as value to indicate which file bucket a bridge is assigned - to. -

1 0

[torspec/master] Merge remote-tracking branch 'isis/bridgedb-database-improvements_1'
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit e645344de84fdb0f7d438aa3a34d3406742a7310 Merge: 44876cd c995f62 Author: Nick Mathewson <nickm(a)torproject.org> Date: Thu Jan 30 16:00:12 2014 -0500 Merge remote-tracking branch 'isis/bridgedb-database-improvements_1' proposals/XXX-bridgedb-database-improvements.txt | 260 ++++++++++++++++++++++ 1 file changed, 260 insertions(+)

1 0

[torspec/master] Give proposal 226 a number
by nickm＠torproject.org 30 Jan '14

30 Jan '14

commit 2eec5b4e3e073a2a27d51e2e2f8ab1fe752ee65d Author: Nick Mathewson <nickm(a)torproject.org> Date: Thu Jan 30 16:01:17 2014 -0500 Give proposal 226 a number --- proposals/000-index.txt | 2 + proposals/226-bridgedb-database-improvements.txt | 258 +++++++++++++++++++++ proposals/XXX-bridgedb-database-improvements.txt | 260 ---------------------- 3 files changed, 260 insertions(+), 260 deletions(-) diff --git a/proposals/000-index.txt b/proposals/000-index.txt index 0d79a82..96deb8d 100644 --- a/proposals/000-index.txt +++ b/proposals/000-index.txt @@ -146,6 +146,7 @@ Proposals by number: 223 Ace: Improved circuit-creation key exchange [OPEN] 224 Next-Generation Hidden Services in Tor [DRAFT] 225 Strawman proposal: commit-and-reveal shared rng [DRAFT] +226 "Scalability and Stability Improvements to BridgeDB: Switching to a Distributed Database System and RDBMS" [OPEN] Proposals by status: @@ -194,6 +195,7 @@ Proposals by status: 212 Increase Acceptable Consensus Age [for 0.2.4.x+] 215 Let the minimum consensus method change with time 223 Ace: Improved circuit-creation key exchange + 226 "Scalability and Stability Improvements to BridgeDB: Switching to a Distributed Database System and RDBMS" ACCEPTED: 140 Provide diffs between consensuses 147 Eliminate the need for v2 directories in generating v3 directories [for 0.2.4.x] diff --git a/proposals/226-bridgedb-database-improvements.txt b/proposals/226-bridgedb-database-improvements.txt new file mode 100644 index 0000000..d52f7f2 --- /dev/null +++ b/proposals/226-bridgedb-database-improvements.txt @@ -0,0 +1,258 @@ +Filename: 226-bridgedb-database-improvements.txt +Title: "Scalability and Stability Improvements to BridgeDB: Switching to a + Distributed Database System and RDBMS" +Author: Isis Agora Lovecruft +Created: 12 Oct 2013 +Related Proposals: XXX-social-bridge-distribution.txt +Status: Open + +* I. Overview + + BridgeDB is Tor's Bridge Distribution system, which currently has two major + Bridge Distribution mechanisms: the HTTPS Distributor and an Email + Distributor. [0] + + BridgeDB is written largely in Twisted Python, and uses Python2's builtin + sqlite3 as its database backend. Unfortunately, this backend system is + already showing strain through increased times for queries, and sqlite's + memory usage is not up-to-par with modern, more efficient, NoSQL databases. + + In order to better facilitate the implementation of newer, more complex + Bridge Distribution mechanisms, several improvements should be made to the + underlying database system of BridgeDB. Additionally, I propose that a + clear distinction in terms, as well as a modularisation of the codebase, be + drawn between the mechanisms for Bridge Distribution versus the backend + Bridge Database (BridgeDB) storage system. + + This proposal covers the design and implementation of a scalable NoSQL ― + Document-Based and Key-Value Relational ― database backend for storing data + on Tor Bridge relays, in an efficient manner that is ammenable to + interfacing with the Twisted Python asynchronous networking code of current + and future Bridge Distribution mechanisms. + +* II. Terminology + + BridgeDistributor := A program which decides when and how to hand out + information on a Tor Bridge relay, and to whom. + + BridgeDB := The backend system of databases and object-relational mapping + servers, which interfaces with the BridgeDistributor in order + to hand out bridges to clients, and to obtain and process new, + incoming ``@type bridge-server-descriptors``, + ``@type bridge-networkstatus`` documents, and + ``@type bridge-extrainfo`` descriptors. [3] + + BridgeFinder := A client-side program for an Onion Proxy (OP) which handles + interfacing with a BridgeDistributor in order to obtain new + Bridge relays for a client. A BridgeFinder also interfaces + with a local Tor Controller (such as TorButton or ARM) to + handle automatic, transparent Bridge configuration (no more + copy+pasting into a torrc) without being given any + additional privileges over the Tor process, [1] and relies + on the Tor Controller to interface with the user for + control input and displaying up-to-date information + regarding available Bridges, Pluggable Transport methods, + and potentially Invite Tickets and Credits (a cryptographic + currency without fiat value which is generated + automatically by clients whose Bridges remain largely + uncensored, and is used to purchase new Bridges), should a + Social Bridge Distributor be implemented. [2] + +* III. Databases +** III.A. Scalability Requirements + + Databases SHOULD be implemented in a manner which is ammenable to using a + distributed storage system; this is necessary because many potential + datatypes required by future BridgeDistributors MUST be stored permanently. + For example, in the designs for the Social Bridge Distributor, the list of + hash digests of spent Credits, and the list of hash digests of redeemed + Invite Tickets MUST be stored forever to prevent either from being replayed + ― or double-spent ― by a malicious user who wishes to block bridges faster. + Designing the BridgeDB backend system such that additional nodes may be + added in the future will allow the system to freely scale in relation to + the storage requirements of future BridgeDistributors. + + Additionally, requiring that the implementation allow for distributed + database backends promotes modularisation the components of BridgeDB, such + that BridgeDistributors can be separated from the backend storage system, + BridgeDB, as all queries will be issued through a simplified, common API, + regardless of the number of nodes system, or the design of future + BridgeDistributors. + +*** 1. Distributed Database System + + A distributed database system SHOULD be used for BridgeDB, in order to + scale resources as the number of Tor bridge users grows. This database + system, hereafter referred to as DDBS. + + The DDBS MUST be capable of working within Twisted's asynchronous + framework. If possible, a Object-Relational Mapper (ORM) SHOULD be used to + abstract the database backend's structure and query syntax from the Twisted + Python classes which interact with it, so that the type of database may be + swapped out for another with less code refactoring. + + The DDBM SHALL be used for persistent storage of complex data structures + such as the bridges, which MAY include additional information from both the + `@type bridge-server-descriptor`s and the `@type bridge-extra-info` + descriptors. [3] + +**** 1.a. Choice of DDBS + + CouchDB is chosen for its simple HTTP API, ease of use, speed, and official + support for Twisted Python applications. [4] Additionally, its + document-based data model is very similar to the current archetecture of + tor's Directory Server/Mirror system, in that an HTTP API is used to + retrieve data stored within virtual directories. Internally, it uses JSON + to store data and JavaScript as its query language, both of which are + likely friendlier to various other components of the Tor Metrics + infrastructure which sanitise and analyse portions of the Bridge + descriptors. At the very least, friendlier than hardcoding raw SQL queries + as Python strings. + +**** 1.b. Data Structures which should be stored in a DDBS: + + - RedactedDB - The Database of Blocked Bridges + + The RedactedDB will hold entries of bridges which have been discovered to + be unreachable from BridgeDB network vantage point, or have been reported + unreachable by clients. + + - BridgeDB - The Database of Bridges + + BridgeDB holds information on available Bridges, obtained via bridge + descriptors and networkstatus documents from the BridgeAuthority. Because + a Bridge may have multiple `ORPort`s and multiple + `ServerTransportListenAddress`es, attaching additional data to each of + these addresses which MAY include the following information on a blocking + event: + - Geolocational country code of the reported blocking event + - Timestamp for when the blocking event was first reported + - The method used for discovery of the block + - an the believed mechanism which is causing the block + would quickly become unwieldy, the RedactedDB and BridgeDB SHOULD be kept + separate. + + - User Credentials + + For the Social BridgeDistributor, these are rather complex, + increasingly-growing, concatenations (or tuples) of several datatypes, + including Non-Interactive Proofs-of-Knowledge (NIPK) of Commitments to + k-TAA Blind Signatures, and NIPK of Commitments to a User's current + number of Credits and timestamps of requests for Invite Tickets. + +*** 2. Key-Value Relational Database Mapping Server + + For simpler data structures which must be persistently stored, such as the + list of hashes of previously seen Invite Tickets, or the list of + previously spent Tokens, a Relational Database Mapping Server (RDBMS) + SHALL be used for optimisation of queries. + + Redis and Memcached are two examples of RDBMS which are well tested and + are known to work well with Twisted. The major difference between the two + is that Memcaches are stored only within volatile memory, while Redis + additionally supports commands for transferring objects into persistent, + on-disk storage. + + There are several support modules for interfacing with both Memcached and + Redis from Twisted Python, see Twisted's MemCacheProtocol class [5] [6] or + txyam [7] for Memcached, and txredis [8] or txredisapi [9] for + Redis. Additionally, numerous big name projects both use Redis as part of + their backend systems, and also provide helpful documentation on their own + experience of the process of switching over to the new systems. [17] For + non-Twisted Python Redis APIs, there is redis-py, which provides a + connection pool that could likely be interfaced with from Twisted Python + without too much difficultly. [10] [11] + +**** 2.a. Data Structures which should be stored in a RDBMS + + Simple, mostly-flat datatypes, and data which must be frequently indexed + should be stored in a RDBMS, such as large lists of hashes, or arbitrary + strings with assigned point-values (i.e. the "Uniform Mapping" for the + current HTTPS BridgeDistributor). + + For the Social BridgeDistributor, hash digests of the following datatypes + SHOULD be stored in the RDBMS, in order to prevent double-spending and + replay attacks: + + - Invite Tickets + + These are anonymous, unlinkable, unforgeable, and verifiable tokens + which are occasionally handed out to well-behaved Users by the Social + BridgeDistributor to permit new Users to be invited into the system. + When they are redeemed, the Social BridgeDistributor MUST store a hash + digest of their contents to prevent replayed Invite Tickets. + + - Spent Credits + + These are Credits which have already been redeemed for new Bridges. + The Social BridgeDistributor MUST also store a hash digest of Spent + Credits to prevent double-spending. + +*** 3. Bloom Filters and Other Database Optimisations + + In order to further decrease the need for lookups in the backend + databases, Bloom Filters can used to eliminate extraneous + queries. However, this optimization would only be beneficial for string + lookups, i.e. querying for a User's Credential, and SHOULD NOT be used for + queries within any of the hash lists, i.e. the list of hashes of + previously seen Invite Tickets. [14] + +**** 3.a. Bloom Filters within Redis + + It might be possible to use Redis' GETBIT and SETBIT commands to store a + Bloom Filter within a Redis cache system; [15] doing so would offload the + severe memory requirements of loading the Bloom Filter into memory in + Python when inserting new entries, reducing the time complexity from some + polynomial time complexity that is proportional to the integral of the + number of bridge users over the rate of change of bridge users over time, + to a time complexity of order O(1). + +**** 3.b. Expiration of Stale Data + + Some types of data SHOULD be safe to expire, such as User Credentials + which have not been updated within a certain timeframe. This idea should + be further explored to assess the safety and potential drawbacks to + removing old data. + + If there is data which SHOULD expire, the PEXPIREAT command provided by + Redis for the key datatype would allow the RDBMS itself to handle cleanup + of stale data automatically. [16] + +**** 4. Other potential uses of the improved Bridge database system + + Redis provides mechanisms for evaluations to be made on data by calling + the sha1 for a serverside Lua script. [15] While not required in the + slightest, it is a rather neat feature, as it would allow Tor's Metrics + infrastructure to offload some of the computational overhead of gathering + data on Bridge usage to BridgeDB (as well as diminish the security + implications of storing Bridge descriptors). + + Also, if Twisted's IProducer and IConsumer interfaces do not provide + needed interface functionality, or it is desired that other components of + the Tor software ecosystem be capable of scheduling jobs for BridgeDB, + there are well-tested mechanisms for using Redis as a message + queue/scheduling system. [16] + +* References + +[0]: https://bridges.torproject.org + mailto:bridges@bridges.torproject.org +[1]: See proposals 199-bridgefinder-integration.txt at + https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/199-bridgefi… +[2]: See XXX-social-bridge-distribution.txt at + https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/featur… +[3]: https://metrics.torproject.org/formats.html#descriptortypes +[4]: https://github.com/couchbase/couchbase-python-client#twisted-api +[5]: https://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.… +[6]: http://stackoverflow.com/a/5162203 +[7]: http://findingscience.com/twisted/python/memcache/2012/06/09/txyam:-yet-ano… +[8]: https://pypi.python.org/pypi/txredis +[9]: https://github.com/fiorix/txredisapi +[10]: https://github.com/andymccurdy/redis-py/ +[11]: http://degizmo.com/2010/03/22/getting-started-redis-and-python/ +[12]: http://www.dr-josiah.com/2012/03/why-we-didnt-use-bloom-filter.html +[13]: http://redis.io/topics/data-types §"Strings" +[14]: http://redis.io/commands/pexpireat +[15]: http://redis.io/commands/evalsha +[16]: http://www.restmq.com/ +[17]: https://www.mediawiki.org/wiki/Redis diff --git a/proposals/XXX-bridgedb-database-improvements.txt b/proposals/XXX-bridgedb-database-improvements.txt deleted file mode 100644 index 2d25bd2..0000000 --- a/proposals/XXX-bridgedb-database-improvements.txt +++ /dev/null @@ -1,260 +0,0 @@ -# -*- coding: utf-8 ; mode: org -*- - -Filename: XXX-bridgedb-database-improvements.txt -Title: "Scalability and Stability Improvements to BridgeDB: Switching to a - Distributed Database System and RDBMS" -Author: Isis Agora Lovecruft -Created: 12 Oct 2013 -Related Proposals: XXX-social-bridge-distribution.txt -Status: Open - -* I. Overview - - BridgeDB is Tor's Bridge Distribution system, which currently has two major - Bridge Distribution mechanisms: the HTTPS Distributor and an Email - Distributor. [0] - - BridgeDB is written largely in Twisted Python, and uses Python2's builtin - sqlite3 as its database backend. Unfortunately, this backend system is - already showing strain through increased times for queries, and sqlite's - memory usage is not up-to-par with modern, more efficient, NoSQL databases. - - In order to better facilitate the implementation of newer, more complex - Bridge Distribution mechanisms, several improvements should be made to the - underlying database system of BridgeDB. Additionally, I propose that a - clear distinction in terms, as well as a modularisation of the codebase, be - drawn between the mechanisms for Bridge Distribution versus the backend - Bridge Database (BridgeDB) storage system. - - This proposal covers the design and implementation of a scalable NoSQL ― - Document-Based and Key-Value Relational ― database backend for storing data - on Tor Bridge relays, in an efficient manner that is ammenable to - interfacing with the Twisted Python asynchronous networking code of current - and future Bridge Distribution mechanisms. - -* II. Terminology - - BridgeDistributor := A program which decides when and how to hand out - information on a Tor Bridge relay, and to whom. - - BridgeDB := The backend system of databases and object-relational mapping - servers, which interfaces with the BridgeDistributor in order - to hand out bridges to clients, and to obtain and process new, - incoming ``@type bridge-server-descriptors``, - ``@type bridge-networkstatus`` documents, and - ``@type bridge-extrainfo`` descriptors. [3] - - BridgeFinder := A client-side program for an Onion Proxy (OP) which handles - interfacing with a BridgeDistributor in order to obtain new - Bridge relays for a client. A BridgeFinder also interfaces - with a local Tor Controller (such as TorButton or ARM) to - handle automatic, transparent Bridge configuration (no more - copy+pasting into a torrc) without being given any - additional privileges over the Tor process, [1] and relies - on the Tor Controller to interface with the user for - control input and displaying up-to-date information - regarding available Bridges, Pluggable Transport methods, - and potentially Invite Tickets and Credits (a cryptographic - currency without fiat value which is generated - automatically by clients whose Bridges remain largely - uncensored, and is used to purchase new Bridges), should a - Social Bridge Distributor be implemented. [2] - -* III. Databases -** III.A. Scalability Requirements - - Databases SHOULD be implemented in a manner which is ammenable to using a - distributed storage system; this is necessary because many potential - datatypes required by future BridgeDistributors MUST be stored permanently. - For example, in the designs for the Social Bridge Distributor, the list of - hash digests of spent Credits, and the list of hash digests of redeemed - Invite Tickets MUST be stored forever to prevent either from being replayed - ― or double-spent ― by a malicious user who wishes to block bridges faster. - Designing the BridgeDB backend system such that additional nodes may be - added in the future will allow the system to freely scale in relation to - the storage requirements of future BridgeDistributors. - - Additionally, requiring that the implementation allow for distributed - database backends promotes modularisation the components of BridgeDB, such - that BridgeDistributors can be separated from the backend storage system, - BridgeDB, as all queries will be issued through a simplified, common API, - regardless of the number of nodes system, or the design of future - BridgeDistributors. - -*** 1. Distributed Database System - - A distributed database system SHOULD be used for BridgeDB, in order to - scale resources as the number of Tor bridge users grows. This database - system, hereafter referred to as DDBS. - - The DDBS MUST be capable of working within Twisted's asynchronous - framework. If possible, a Object-Relational Mapper (ORM) SHOULD be used to - abstract the database backend's structure and query syntax from the Twisted - Python classes which interact with it, so that the type of database may be - swapped out for another with less code refactoring. - - The DDBM SHALL be used for persistent storage of complex data structures - such as the bridges, which MAY include additional information from both the - `@type bridge-server-descriptor`s and the `@type bridge-extra-info` - descriptors. [3] - -**** 1.a. Choice of DDBS - - CouchDB is chosen for its simple HTTP API, ease of use, speed, and official - support for Twisted Python applications. [4] Additionally, its - document-based data model is very similar to the current archetecture of - tor's Directory Server/Mirror system, in that an HTTP API is used to - retrieve data stored within virtual directories. Internally, it uses JSON - to store data and JavaScript as its query language, both of which are - likely friendlier to various other components of the Tor Metrics - infrastructure which sanitise and analyse portions of the Bridge - descriptors. At the very least, friendlier than hardcoding raw SQL queries - as Python strings. - -**** 1.b. Data Structures which should be stored in a DDBS: - - - RedactedDB - The Database of Blocked Bridges - - The RedactedDB will hold entries of bridges which have been discovered to - be unreachable from BridgeDB network vantage point, or have been reported - unreachable by clients. - - - BridgeDB - The Database of Bridges - - BridgeDB holds information on available Bridges, obtained via bridge - descriptors and networkstatus documents from the BridgeAuthority. Because - a Bridge may have multiple `ORPort`s and multiple - `ServerTransportListenAddress`es, attaching additional data to each of - these addresses which MAY include the following information on a blocking - event: - - Geolocational country code of the reported blocking event - - Timestamp for when the blocking event was first reported - - The method used for discovery of the block - - an the believed mechanism which is causing the block - would quickly become unwieldy, the RedactedDB and BridgeDB SHOULD be kept - separate. - - - User Credentials - - For the Social BridgeDistributor, these are rather complex, - increasingly-growing, concatenations (or tuples) of several datatypes, - including Non-Interactive Proofs-of-Knowledge (NIPK) of Commitments to - k-TAA Blind Signatures, and NIPK of Commitments to a User's current - number of Credits and timestamps of requests for Invite Tickets. - -*** 2. Key-Value Relational Database Mapping Server - - For simpler data structures which must be persistently stored, such as the - list of hashes of previously seen Invite Tickets, or the list of - previously spent Tokens, a Relational Database Mapping Server (RDBMS) - SHALL be used for optimisation of queries. - - Redis and Memcached are two examples of RDBMS which are well tested and - are known to work well with Twisted. The major difference between the two - is that Memcaches are stored only within volatile memory, while Redis - additionally supports commands for transferring objects into persistent, - on-disk storage. - - There are several support modules for interfacing with both Memcached and - Redis from Twisted Python, see Twisted's MemCacheProtocol class [5] [6] or - txyam [7] for Memcached, and txredis [8] or txredisapi [9] for - Redis. Additionally, numerous big name projects both use Redis as part of - their backend systems, and also provide helpful documentation on their own - experience of the process of switching over to the new systems. [17] For - non-Twisted Python Redis APIs, there is redis-py, which provides a - connection pool that could likely be interfaced with from Twisted Python - without too much difficultly. [10] [11] - -**** 2.a. Data Structures which should be stored in a RDBMS - - Simple, mostly-flat datatypes, and data which must be frequently indexed - should be stored in a RDBMS, such as large lists of hashes, or arbitrary - strings with assigned point-values (i.e. the "Uniform Mapping" for the - current HTTPS BridgeDistributor). - - For the Social BridgeDistributor, hash digests of the following datatypes - SHOULD be stored in the RDBMS, in order to prevent double-spending and - replay attacks: - - - Invite Tickets - - These are anonymous, unlinkable, unforgeable, and verifiable tokens - which are occasionally handed out to well-behaved Users by the Social - BridgeDistributor to permit new Users to be invited into the system. - When they are redeemed, the Social BridgeDistributor MUST store a hash - digest of their contents to prevent replayed Invite Tickets. - - - Spent Credits - - These are Credits which have already been redeemed for new Bridges. - The Social BridgeDistributor MUST also store a hash digest of Spent - Credits to prevent double-spending. - -*** 3. Bloom Filters and Other Database Optimisations - - In order to further decrease the need for lookups in the backend - databases, Bloom Filters can used to eliminate extraneous - queries. However, this optimization would only be beneficial for string - lookups, i.e. querying for a User's Credential, and SHOULD NOT be used for - queries within any of the hash lists, i.e. the list of hashes of - previously seen Invite Tickets. [14] - -**** 3.a. Bloom Filters within Redis - - It might be possible to use Redis' GETBIT and SETBIT commands to store a - Bloom Filter within a Redis cache system; [15] doing so would offload the - severe memory requirements of loading the Bloom Filter into memory in - Python when inserting new entries, reducing the time complexity from some - polynomial time complexity that is proportional to the integral of the - number of bridge users over the rate of change of bridge users over time, - to a time complexity of order O(1). - -**** 3.b. Expiration of Stale Data - - Some types of data SHOULD be safe to expire, such as User Credentials - which have not been updated within a certain timeframe. This idea should - be further explored to assess the safety and potential drawbacks to - removing old data. - - If there is data which SHOULD expire, the PEXPIREAT command provided by - Redis for the key datatype would allow the RDBMS itself to handle cleanup - of stale data automatically. [16] - -**** 4. Other potential uses of the improved Bridge database system - - Redis provides mechanisms for evaluations to be made on data by calling - the sha1 for a serverside Lua script. [15] While not required in the - slightest, it is a rather neat feature, as it would allow Tor's Metrics - infrastructure to offload some of the computational overhead of gathering - data on Bridge usage to BridgeDB (as well as diminish the security - implications of storing Bridge descriptors). - - Also, if Twisted's IProducer and IConsumer interfaces do not provide - needed interface functionality, or it is desired that other components of - the Tor software ecosystem be capable of scheduling jobs for BridgeDB, - there are well-tested mechanisms for using Redis as a message - queue/scheduling system. [16] - -* References - -[0]: https://bridges.torproject.org - mailto:bridges@bridges.torproject.org -[1]: See proposals 199-bridgefinder-integration.txt at - https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/199-bridgefi… -[2]: See XXX-social-bridge-distribution.txt at - https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/featur… -[3]: https://metrics.torproject.org/formats.html#descriptortypes -[4]: https://github.com/couchbase/couchbase-python-client#twisted-api -[5]: https://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.… -[6]: http://stackoverflow.com/a/5162203 -[7]: http://findingscience.com/twisted/python/memcache/2012/06/09/txyam:-yet-ano… -[8]: https://pypi.python.org/pypi/txredis -[9]: https://github.com/fiorix/txredisapi -[10]: https://github.com/andymccurdy/redis-py/ -[11]: http://degizmo.com/2010/03/22/getting-started-redis-and-python/ -[12]: http://www.dr-josiah.com/2012/03/why-we-didnt-use-bloom-filter.html -[13]: http://redis.io/topics/data-types §"Strings" -[14]: http://redis.io/commands/pexpireat -[15]: http://redis.io/commands/evalsha -[16]: http://www.restmq.com/ -[17]: https://www.mediawiki.org/wiki/Redis

1 0

[metrics-db/master] Don't remove ipv6-policy lines from bridge descriptors.
by karsten＠torproject.org 30 Jan '14

30 Jan '14

commit e92794a24acd57e98ddb167954675936f98d1ea0 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Jan 30 20:17:39 2014 +0100 Don't remove ipv6-policy lines from bridge descriptors. These lines only contain exit policy summaries, not IP addresses. --- src/org/torproject/ernie/db/bridgedescs/SanitizedBridgesWriter.java | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/org/torproject/ernie/db/bridgedescs/SanitizedBridgesWriter.java b/src/org/torproject/ernie/db/bridgedescs/SanitizedBridgesWriter.java index dcdfb87..275e155 100644 --- a/src/org/torproject/ernie/db/bridgedescs/SanitizedBridgesWriter.java +++ b/src/org/torproject/ernie/db/bridgedescs/SanitizedBridgesWriter.java @@ -745,7 +745,8 @@ public class SanitizedBridgesWriter extends Thread { || line.equals("opt caches-extra-info") || line.equals("caches-extra-info") || line.equals("opt allow-single-hop-exits") - || line.equals("allow-single-hop-exits")) { + || line.equals("allow-single-hop-exits") + || line.startsWith("ipv6-policy ")) { scrubbed.append(line + "\n"); /* Replace node fingerprints in the family line with their hashes

1 0

[flashproxy/master] Update docs for new appengine SDK.
by dcf＠torproject.org 30 Jan '14

30 Jan '14

commit ad6367cfffd37b1f1a587418ab66f70056af6bca Author: David Fifield <david(a)bamsoftware.com> Date: Thu Jan 30 11:06:39 2014 -0800 Update docs for new appengine SDK. --- facilitator/doc/appspot-howto.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/facilitator/doc/appspot-howto.txt b/facilitator/doc/appspot-howto.txt index 1b743bc..458f1c0 100644 --- a/facilitator/doc/appspot-howto.txt +++ b/facilitator/doc/appspot-howto.txt @@ -13,7 +13,8 @@ this purpose, rather than a personal or organisation account. See email-howto.txt for how to do that. Download the SDK: -https://developers.google.com/appengine/docs/go/gettingstarted/devenvironment +https://developers.google.com/appengine/downloads#Google_App_Engine_SDK_for_Go +This guide was written for version 1.8.9 of the SDK. Find your facilitator appengine installation, probably in reg-appspot/ in your flashproxy config dir. Edit config.go to point to the address of @@ -24,14 +25,14 @@ https://developers.google.com/appengine/docs/go/gettingstarted/uploading Enter an application ID and create the application. To run locally using the development server: -$ ~/google_appengine/dev_appserver.py reg-appspot/ +$ ~/go_appengine/goapp serve reg-appspot/ You are advised to do this on a non-production machine, away from the main facilitator. Use the appcfg.py program to upload the program. It should look something like this: -$ torify ./google_appengine/appcfg.py --no_cookies -A <YOUR_APP_ID> update reg-appspot/ +$ torify ./go_appengine/goapp --no_cookies -A <YOUR_APP_ID> update reg-appspot/ 07:25 PM Host: appengine.google.com 07:25 PM Application: application-id; version: 1 07:25 PM

1 0

[flashproxy/master] Typo.
by dcf＠torproject.org 30 Jan '14

30 Jan '14

commit 8749453896f9e1aa55f8c212f42c2190e307251d Author: David Fifield <david(a)bamsoftware.com> Date: Thu Jan 30 09:45:32 2014 -0800 Typo. --- facilitator/doc/appspot-howto.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/facilitator/doc/appspot-howto.txt b/facilitator/doc/appspot-howto.txt index 8ed1284..1b743bc 100644 --- a/facilitator/doc/appspot-howto.txt +++ b/facilitator/doc/appspot-howto.txt @@ -24,7 +24,7 @@ https://developers.google.com/appengine/docs/go/gettingstarted/uploading Enter an application ID and create the application. To run locally using the development server: -$ ~/go_appengine/dev_appserver.py reg-appspot/ +$ ~/google_appengine/dev_appserver.py reg-appspot/ You are advised to do this on a non-production machine, away from the main facilitator.

1 0

[translation/torbirdy_completed] Update translations for torbirdy_completed
by translation＠torproject.org 30 Jan '14

30 Jan '14

commit 216a8edd6bb98033457b82d0323699b4e68378ed Author: Translation commit bot <translation(a)torproject.org> Date: Thu Jan 30 15:45:27 2014 +0000 Update translations for torbirdy_completed --- de/torbirdy.dtd | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/de/torbirdy.dtd b/de/torbirdy.dtd index aacc1ce..f146147 100644 --- a/de/torbirdy.dtd +++ b/de/torbirdy.dtd @@ -13,9 +13,9 @@ <!ENTITY torbirdy.prefs.save.key "s"> <!ENTITY torbirdy.prefs.cancel.button "Abbrechen"> <!ENTITY torbirdy.prefs.extra2.button "Auf Vorgaben zurücksetzen"> -<!ENTITY torbirdy.prefs.extra2.key "d"> -<!ENTITY torbirdy.prefs.testproxy.button "Proxyeinstellungen Testen"> -<!ENTITY torbirdy.prefs.testproxy.key "n"> +<!ENTITY torbirdy.prefs.extra2.key "z"> +<!ENTITY torbirdy.prefs.testproxy.button "Vermittlungsservereinstellungen testen"> +<!ENTITY torbirdy.prefs.testproxy.key "t"> <!ENTITY torbirdy.prefs.proxy.label "Proxy"> <!ENTITY torbirdy.prefs.privacy.label "Privatsphäre"> <!ENTITY torbirdy.prefs.enigmail.label "Enigmail"> @@ -33,27 +33,27 @@ <!ENTITY torbirdy.prefs.torification.label "Transparente Torification (Achtung: erfordert benutzerdefinierten transproxy oder TOR-Router)"> <!ENTITY torbirdy.prefs.torification.key "T"> <!ENTITY torbirdy.prefs.global "Global"> -<!ENTITY torbirdy.prefs.imap.label "Push E-Mail Support für IMAP-Konten aktivieren [default: deaktiviert]"> -<!ENTITY torbirdy.prefs.imap.key "p"> -<!ENTITY torbirdy.prefs.startup_folder.label "Beim Start zum zuletzt aktiven Mail-Ordner wechseln [default: deaktiviert]"> -<!ENTITY torbirdy.prefs.startup_folder.key "l"> -<!ENTITY torbirdy.prefs.timezone.label "Setzen Sie Thunderbirds Zeitzone nicht auch UTC [standard: auf UTC setzen]"> -<!ENTITY torbirdy.prefs.timezone.key "z"> -<!ENTITY torbirdy.prefs.enigmail_throwkeyid.label "Die Empfänger-Schlüssel-ID nicht in Verschlüsselte Nachrichten einbinden [default: mit einbinden]"> -<!ENTITY torbirdy.prefs.enigmail_throwkeyid.key "r"> -<!ENTITY torbirdy.prefs.confirmemail.label "Vor dem absenden einer Email überprüfen, ob Enigmail aktiviert ist [standard: nicht überprüfen]"> -<!ENTITY torbirdy.prefs.confirmemail.key "b"> -<!ENTITY torbirdy.prefs.emailwizard.label "Thunderbirds automatischen E-Mail-Konfigurationsassistenten aktivieren [standard: deaktiviert]"> +<!ENTITY torbirdy.prefs.imap.label "Push-E-Mail-Unterstützung für IMAP-Konten aktivieren [Vorgabe: deaktiviert]"> +<!ENTITY torbirdy.prefs.imap.key "P"> +<!ENTITY torbirdy.prefs.startup_folder.label "Beim Start zum letzten aktiven Nachrichtenordner wechseln [Vorgabe: deaktiviert]"> +<!ENTITY torbirdy.prefs.startup_folder.key "B"> +<!ENTITY torbirdy.prefs.timezone.label "Thunderbirds Zeitzone nicht auf UTC einstellen [Vorgabe: auf UTC eingestellt]"> +<!ENTITY torbirdy.prefs.timezone.key "T"> +<!ENTITY torbirdy.prefs.enigmail_throwkeyid.label "Die Empfängerschlüsselkennung nicht in verschlüsselte Nachrichten einbinden [Vorgabe: mit einbinden]"> +<!ENTITY torbirdy.prefs.enigmail_throwkeyid.key "D"> +<!ENTITY torbirdy.prefs.confirmemail.label "Vor dem Versenden einer E-Mail überprüfen, ob Enigmail aktiviert ist [Vorgabe: nicht überprüfen]"> +<!ENTITY torbirdy.prefs.confirmemail.key "V"> +<!ENTITY torbirdy.prefs.emailwizard.label "Thunderbirds automatischen E-Mail-Konfigurationsassistenten aktivieren [Vorgabe: deaktiviert]"> <!ENTITY torbirdy.prefs.emailwizard.key "T"> -<!ENTITY torbirdy.prefs.automatic.label "Prüfe automatisch nach neuen Nachrichten für alle Accounts [Standard: deaktiviert]"> -<!ENTITY torbirdy.prefs.automatic.key "f"> -<!ENTITY torbirdy.prefs.renegotiation.label "Verbindungen zu Servern erlauben, die kein SSL/TLS mit sicherer Renegotiation unterstützen [standard: nicht erlauben]"> -<!ENTITY torbirdy.prefs.renegotiation.key "r"> -<!ENTITY torbirdy.prefs.account_specific "Konten-spezifisch"> +<!ENTITY torbirdy.prefs.automatic.label "Automatisch auf neue Nachrichten für alle Konten prüfen [Vorgabe: deaktiviert]"> +<!ENTITY torbirdy.prefs.automatic.key "A"> +<!ENTITY torbirdy.prefs.renegotiation.label "Verbindungen zu Servern erlauben, die kein SSL/TLS mit sicherer Neuverhandlung unterstützen [Vorgabe: nicht erlauben]"> +<!ENTITY torbirdy.prefs.renegotiation.key "V"> +<!ENTITY torbirdy.prefs.account_specific "Bestimmtes Konto"> <!ENTITY torbirdy.prefs.select_account.key "K"> <!ENTITY torbirdy.prefs.select_account.label "Konto auswählen: "> -<!ENTITY torbirdy.prefs.enigmail.keyserver.label "Folgende Keyserver benutzen:"> -<!ENTITY torbirdy.prefs.enigmail.keyserver.key "k"> +<!ENTITY torbirdy.prefs.enigmail.keyserver.label "Folgende(n) Schlüssel-Server benutzen:"> +<!ENTITY torbirdy.prefs.enigmail.keyserver.key "F"> <!ENTITY torbirdy.panel.usetor.label "Tor Onion Router benutzen"> <!ENTITY torbirdy.panel.usejondo.label "JonDo (Premium) nutzen">

1 0