[tor-bugs] #15844 [Onionoo]: Develop database schema to support Onionoo's search parameter efficiently

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed May 13 14:47:57 UTC 2015


#15844: Develop database schema to support Onionoo's search parameter efficiently
-----------------------------+-----------------
     Reporter:  karsten      |      Owner:
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:
    Component:  Onionoo      |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+-----------------

Comment (by karsten):

 Ueland, it might be that search engines can help us here, but AFAIK,
 nobody has investigated that yet.  Note that both our data and possible
 searches are highly structured, so that databases might be a better fit
 than search engines.  That's why I'm more optimistic about databases than
 search engines.  But I'm not ruling out anything here, and if you want to
 look into search engines for this purpose, that might be quite
 interesting.  If you're interested, please take a look at the
 [https://onionoo.torproject.org/protocol.html#methods protocol
 specification] for the possible searches that the search engine would have
 to support.  Also note the possible simplifications to that protocol
 mentioned on this ticket.

 teor, leeroy, thanks a lot for your comments.  I started writing a
 response a couple of times now, but was overwhelmed by the amount of
 detail in this thread and gave up.  But I really want to respond, because
 I feel we're getting somewhere.  I hope the following response makes
 sense, even though it doesn't address all your suggestions.  But all the
 good ideas remain in this thread, and whenever we start writing code, we
 should revisit this entire thread.

 We don't need to support `%foo%bar%`.  We should check all user input and
 avoid searches including `%` or other special characters before passing it
 to the database.

 If substring searches are just too hard to implement, let me suggest
 another simplification: how about we sacrifice the `contact` parameter
 together with substring searches in the `nickname` field?  I think that
 people are using the `contact` parameter to obtain all relays run by the
 same entity, but they could as well use the `family` parameter for that.
 If the choice is to either keep supporting that parameter or to offer
 searches over the entire history, I'd prefer the latter.  Again, not
 something to decide quickly, but worth considering.  I should ask people
 on tor-dev@ before we make a final decision though.

 At the same time, maybe we shouldn't restrict searches to 3+ characters
 for the reasons you state above.  Scratch that idea, it probably wouldn't
 help that much anyway.

 Example: a search for "u" would still return all relays called "Unnamed"
 and "u", but not "default".

 Regarding performance, I'm not concerned about mean time but much more
 about variance.  If a request takes 100 ms on average, that's fine by me,
 but if some requests take 60 s, then something's very wrong.  I also worry
 about potential lack of scalability, either to thousands of concurrent
 clients or to twice or three times as many entries in the database.  I'd
 prefer a solution that has reasonable performance for most use cases,
 rather than one that is highly optimized for one use case and that might
 perform badly for use cases we didn't think.  Of course it's no hard
 requirement that the database performs better than the current in-memory
 search.

 How do we proceed?  leeroy, did you want to write some database code and
 run some performance measurements?  If so,
 [https://people.torproject.org/~karsten/volatile/summary.xz here's some
 sample data] (the same that I mentioned above).

 Thanks everyone!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/15844#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list