[or-cvs] r17785: {tor} Document our Bloom filter parameter choices. (tor/trunk/src/common)

nickm at seul.org nickm at seul.org
Fri Dec 26 17:35:19 UTC 2008


Author: nickm
Date: 2008-12-26 12:35:18 -0500 (Fri, 26 Dec 2008)
New Revision: 17785

Modified:
   tor/trunk/src/common/container.c
Log:
Document our Bloom filter parameter choices.

Modified: tor/trunk/src/common/container.c
===================================================================
--- tor/trunk/src/common/container.c	2008-12-26 17:35:12 UTC (rev 17784)
+++ tor/trunk/src/common/container.c	2008-12-26 17:35:18 UTC (rev 17785)
@@ -1233,6 +1233,16 @@
 digestset_t *
 digestset_new(int max_elements)
 {
+  /* The probability of false positivies is about P=(1 - exp(-kn/m))^k, where k
+   * is the number of hash functions per entry, m is the bits in the array,
+   * and n is the number of elements inserted.  For us, k==4, n<=max_elements,
+   * and m==n_bits= approximately max_elements*32.  This gives
+   *   P<(1-exp(-4*n/(32*n)))^4 == (1-exp(1/-8))^4 == .00019
+   *
+   * It would be more optimal in space vs false positives to get this false
+   * positive rate by going for k==13, and m==18.5n, but we also want to
+   * conserve CPU, and k==13 is pretty big.
+   */
   int n_bits = 1u << (tor_log2(max_elements)+5);
   digestset_t *r = tor_malloc(sizeof(digestset_t));
   r->mask = n_bits - 1;



More information about the tor-commits mailing list