[tor-dev] Scaling tor for a global population

Tue Sep 30 05:04:57 UTC 2014

Andrew Lewman transcribed 1.8K bytes:
> The last research paper I see directly addressing scalability is Torsk
> (http://www.freehaven.net/anonbib/bibtex.html#ccs09-torsk) or PIR-Tor
> (http://www.freehaven.net/anonbib/bibtex.html#usenix11-pirtor)

Using Private Information Retrieval (PIR) to retrieve a 50 KB/s relay would
likely make the current network [0] completely unusable.

_______________________________________________________________________________

First, the PIR-Tor paper makes a few naïve assumptions which grossly effect
the analysis of the overhead for directory fetches:

  1. The authors assume that a client's three [sic] Guard Nodes will be the
     directory mirrors for that client, accepting PIR queries for descriptors.

  2. There is zero handling of cases where multiple directory mirrors have
     differing data.

  3. There is zero handling of colluding directory mirrors.

  4. The *increase* to descriptor size is not well factored in to the analysis.

     In the PIR-Tor paper, §4.2:
       |
       | "Even if all three guard relays are compromised, they cannot actively
       | manipulate elements in the PIR database since they are signed by the
       | directory authorities[...]"
       |
     Single descriptors are *not* signed, the set of all descriptors is.
     Therefore, if a client wanted to actually check that all (or a majority)
     or the Directory Authorities had signed a descriptor, that client would
     need to:

     4.a. Download all of them and check. Which negates the whole point of
          this PIR thing.

     4.b. All of the Directory Authorities would need to sign each and every
          descriptor by itself. This would result in the current
          microdescriptors, which are sized from roughly 900 bytes to some of
          the larger ones which are 2.5 KB, increasing in size by roughly 4 KB
          (that's the size of the DirAuth signatures on the current consensus).

      Result: Each microdescriptor is 4X larger, and the Directory Authorities
              need to do several orders of magnitudes more signatures.

_______________________________________________________________________________

Second, because the PIR-Tor paper doesn't seem to be what we would actually
want to implement, the following is an analysis.

Using some safe assumptions about what security guarantees we would likely
want to require from any PIR scheme we used... That is, we would like want a
1-roundtrip (RT), b-private, b-Byzantine, k-out-of-m databases, [1] PIR
scheme. (Meaning that there are m total directory mirrors, only k of those
actually answer a set of your queries, and you want the privacy of your
queries and the authenticity of the answers to your queries to be protected
even if b number of directory mirrors are malicious and colluding with one
another.)

Then, with those assumptions in mind, from Theorem 7.7 in §7.1 of this review
on PIR schemes [2] (which is, admittedly, a bit dated) used in one of Dan
Boneh's classes, we have a size complexity [3] of

  O((k/3b) n^(1/[(k-1)/3]) m log_2 m)

where b<k/3 (Meaning that the number of malicious, colluding directory mirrors
             which answered is less than one third of the total number of
             mirrors which answered.)

>>> import math
>>> m = 3000
>>> k = 30
>>> b = 4
>>> n = 160  # the number of bits in a relay fingerprint
>>> int(math.ceil((((float(k)/(3*float(b)))
...               * (n**(1.0/((float(k)-1.0)/float(3)))))
...               * float(m)) * math.log(float(m), 2.0)))
146449

the above space complexity comes out to O(146449)-bits per lookup. (Where, by
"lookup", I mean "the set of all queries pertaining to fetching a single relay
descriptor", since, in PIR, a "query" is usually for a single bit.) This would
mean that adding any sane PIR scheme for directory requests would result in
something like a 900X increase in the bytes used for each request.

While one of the benefits of PIR would be that clients would no longer need to
request the descriptors for all relays listed in the consensus, this benefit
actually doesn't help as much as it would seem, because switching to a PIR
scheme would require a client to make three of these roughly O(146449)-bit
requests, every ten minutes or so, when the client wants a new circuit.

Don't get me wrong; PIR is awesome. But the current Tor network is likely
*not* the correct real-world application of any of the current PIR schemes. At
least, it isn't until we have some HUGE number of nodes in the network, which
by #1 and #2 in my original reply to this thread, [4] we shouldn't.

[0]: Note, I said "the current network". An imaginary future Tor network which
     has 10,000,000 relays would be different. And then note the very last
     paragraph where I state that we probably shouldn't ever have 10,000,000
     relays.

[1]: I assume that we need replication in Tor's use case. There are papers,
     such as the following:

     Kushilevitz, Eyal, and Rafail Ostrovsky.
       "Replication is not needed: Single database, computationally-private
       information retrieval."
       2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
       IEEE Computer Society, 2013.
       http://www.cs.fiu.edu/~carbunar/teaching/cis5374/slides/pir-single.pdf

     for which the research doesn't apply because it was aimed at
     computationally-hiding PIR schemes, and obviously Tor aims for
     information theoretic security. Other than the PIR-Tor paper, I haven't
     found any literature which analyses PIR for anything close to Tor's use
     case. (Although I'd be stoked to hear about any such papers.)

[2]: Gasarch, William. "A survey on private information retrieval."
       Bulletin of the EATCS. University of Chicago, 2004.
       https://crypto.stanford.edu/~dabo/courses/cs355_fall07/pir.pdf

[3]: I'm completely ignoring the fact that I'm supposed to do polynomial
     interpolation for this complexity because I just want an estimate and
     don't feel like messing around with Lagrange polynomials or doing
     Gaussian elimination on a Vandermonde matrix of a highly-conditional
     system of equations or anything. Consequently, I removed the `t` variable
     from the original complexity equation, effectively setting it to t=1.

[4]: https://lists.torproject.org/pipermail/tor-dev/2014-September/007558.html

-- 
 ♥Ⓐ isis agora lovecruft
_________________________________________________________
GPG: 4096R/A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1154 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20140930/5016d95a/attachment.sig>