[tor-bugs] #11350 [Onionoo]: Extend Onionoo's lookup parameter to give out relays/bridges that haven't been running in the past week

Sat Apr 19 23:28:41 UTC 2014

#11350: Extend Onionoo's lookup parameter to give out relays/bridges that haven't
been running in the past week
-----------------------------+---------------------
     Reporter:  karsten      |      Owner:  karsten
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:
    Component:  Onionoo      |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+---------------------

Comment (by wfn):

 This is not really relevant to the relay challenge task per se, so anyone
 can safely skip this comment.

 Maybe orthogonal, but can't hurt, so fwiw, re:

 {{{
 +    /* TODO This is an evil hack to support looking up relays or bridges
 +     * that haven't been running for a week without having to load
 +     * 500,000 NodeStatus instances into memory.  Maybe there's a better
 +     * way?  Or do we need to switch to a real database for this? */
 }}}

 Karsten, *if* you decide to do some benchmarking using a database (using
 whatever database schema appropriate), I'd very much advise to look over
 the following document/tutorial:

 https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

 Note that this is not considered to be any kind of 'postgres hacking';
 this can be done in a purely wheezy/stable setting, and is completely
 normal practice. The postgres defaults in linux systems are somewhat..
 conservative. e.g. changing `effective_cache_size` to up to of 75% of the
 overall system's memory is normal. `shared_buffers` default in linux is
 usually 32MB or so. (To elevate this, you do need to change
 `/etc/sysctl.conf` (to raise `SHMMAX`), but again, this should not be
 considered to be fringe/esoteric practice; if this is not done, postgres
 assumes it can't pre-allocate more than 32MB of memory; that's not a lot
 of memory.)

 You once mentioned cases of indexes not fitting into memory. Beyond not
 using partial/functional indexes (LOWER(), SUBSTR()) and having redundant
 indexes, the primary reason for this is (as I've somewhat painfully
 discovered) not allowing postgres to actually use enough memory (fwiw,
 using pre-allocated shared memory is faster, too, though I'd need to dig
 up references.)

 Sorry for the detour, but in case someone *does* end up experimenting with
 less hacky database-based solutions, don't forget to take a good look at
 your postgres configuration. :) (or, maybe you've already done that, and
 this was all redundant!)

 Also, it makes sense to use intermediary tables[1], so e.g. a
 'fingerprint' table for unique fingerprint lookup -> then join with status
 entries / whatnot. The fingerprints can just as well reside in memory of
 course, if they can be efficiently persisted, and so on. In-house partial-
 nosql-solutions. :)

 [1]: e.g. https://github.com/wfn/torsearch/blob/master/db/db_create.sql

 (hopefully this was not painful to read! Just wanted to share what I've
 learned.)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11350#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online