[tor-dev] Scaling tor for a global population
isis at torproject.org
Tue Sep 30 05:04:57 UTC 2014
Andrew Lewman transcribed 1.8K bytes:
> The last research paper I see directly addressing scalability is Torsk
> (http://www.freehaven.net/anonbib/bibtex.html#ccs09-torsk) or PIR-Tor
Using Private Information Retrieval (PIR) to retrieve a 50 KB/s relay would
likely make the current network  completely unusable.
First, the PIR-Tor paper makes a few naïve assumptions which grossly effect
the analysis of the overhead for directory fetches:
1. The authors assume that a client's three [sic] Guard Nodes will be the
directory mirrors for that client, accepting PIR queries for descriptors.
2. There is zero handling of cases where multiple directory mirrors have
3. There is zero handling of colluding directory mirrors.
4. The *increase* to descriptor size is not well factored in to the analysis.
In the PIR-Tor paper, §4.2:
| "Even if all three guard relays are compromised, they cannot actively
| manipulate elements in the PIR database since they are signed by the
| directory authorities[...]"
Single descriptors are *not* signed, the set of all descriptors is.
Therefore, if a client wanted to actually check that all (or a majority)
or the Directory Authorities had signed a descriptor, that client would
4.a. Download all of them and check. Which negates the whole point of
this PIR thing.
4.b. All of the Directory Authorities would need to sign each and every
descriptor by itself. This would result in the current
microdescriptors, which are sized from roughly 900 bytes to some of
the larger ones which are 2.5 KB, increasing in size by roughly 4 KB
(that's the size of the DirAuth signatures on the current consensus).
Result: Each microdescriptor is 4X larger, and the Directory Authorities
need to do several orders of magnitudes more signatures.
Second, because the PIR-Tor paper doesn't seem to be what we would actually
want to implement, the following is an analysis.
Using some safe assumptions about what security guarantees we would likely
want to require from any PIR scheme we used... That is, we would like want a
1-roundtrip (RT), b-private, b-Byzantine, k-out-of-m databases,  PIR
scheme. (Meaning that there are m total directory mirrors, only k of those
actually answer a set of your queries, and you want the privacy of your
queries and the authenticity of the answers to your queries to be protected
even if b number of directory mirrors are malicious and colluding with one
Then, with those assumptions in mind, from Theorem 7.7 in §7.1 of this review
on PIR schemes  (which is, admittedly, a bit dated) used in one of Dan
Boneh's classes, we have a size complexity  of
O((k/3b) n^(1/[(k-1)/3]) m log_2 m)
where b<k/3 (Meaning that the number of malicious, colluding directory mirrors
which answered is less than one third of the total number of
mirrors which answered.)
>>> import math
>>> m = 3000
>>> k = 30
>>> b = 4
>>> n = 160 # the number of bits in a relay fingerprint
... * (n**(1.0/((float(k)-1.0)/float(3)))))
... * float(m)) * math.log(float(m), 2.0)))
the above space complexity comes out to O(146449)-bits per lookup. (Where, by
"lookup", I mean "the set of all queries pertaining to fetching a single relay
descriptor", since, in PIR, a "query" is usually for a single bit.) This would
mean that adding any sane PIR scheme for directory requests would result in
something like a 900X increase in the bytes used for each request.
While one of the benefits of PIR would be that clients would no longer need to
request the descriptors for all relays listed in the consensus, this benefit
actually doesn't help as much as it would seem, because switching to a PIR
scheme would require a client to make three of these roughly O(146449)-bit
requests, every ten minutes or so, when the client wants a new circuit.
Don't get me wrong; PIR is awesome. But the current Tor network is likely
*not* the correct real-world application of any of the current PIR schemes. At
least, it isn't until we have some HUGE number of nodes in the network, which
by #1 and #2 in my original reply to this thread,  we shouldn't.
: Note, I said "the current network". An imaginary future Tor network which
has 10,000,000 relays would be different. And then note the very last
paragraph where I state that we probably shouldn't ever have 10,000,000
: I assume that we need replication in Tor's use case. There are papers,
such as the following:
Kushilevitz, Eyal, and Rafail Ostrovsky.
"Replication is not needed: Single database, computationally-private
2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
IEEE Computer Society, 2013.
for which the research doesn't apply because it was aimed at
computationally-hiding PIR schemes, and obviously Tor aims for
information theoretic security. Other than the PIR-Tor paper, I haven't
found any literature which analyses PIR for anything close to Tor's use
case. (Although I'd be stoked to hear about any such papers.)
: Gasarch, William. "A survey on private information retrieval."
Bulletin of the EATCS. University of Chicago, 2004.
: I'm completely ignoring the fact that I'm supposed to do polynomial
interpolation for this complexity because I just want an estimate and
don't feel like messing around with Lagrange polynomials or doing
Gaussian elimination on a Vandermonde matrix of a highly-conditional
system of equations or anything. Consequently, I removed the `t` variable
from the original complexity equation, effectively setting it to t=1.
♥Ⓐ isis agora lovecruft
Current Keys: https://blog.patternsinthevoid.net/isis.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1154 bytes
Desc: Digital signature
More information about the tor-dev