[tor-bugs] #15844 [Onionoo]: Develop database schema to support Onionoo's search parameter efficiently

Wed May 6 20:52:42 UTC 2015

#15844: Develop database schema to support Onionoo's search parameter efficiently
-----------------------------+-----------------
     Reporter:  karsten      |      Owner:
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:
    Component:  Onionoo      |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+-----------------

Comment (by leeroy):

 The problem '''is '''the `LIKE '%foo%'` . That's the only problem. All the
 other queries will be fast because they are index-able using prefix
 search, `LIKE 'foo%'` . Looking for a schema or de-normalizing from 3NF
 won't fix it. It's not going to be solved by a trigram-match either.

 Check out the documentation and look at how the trigram-match is
 implemented. The trigram is faster only because it allows a form of
 indexing, but the implementation is computation heavy. I was wondering--
 what is the importance of a similarity metric? The trigram-match computes
 similarity for ranking results. How important (or relevant) is this if
 searching for example in fingerprints? If similarity isn't that important
 maybe a user-defined function would work better.

 Another possibility would be to opportunistically try to complete the
 query using a prefix search first.

 You might also do for nickname, and ip, what you do for fingerprint. Make
 it a separate query type in the protocol and do search_fp, search_nn,
 search_ip. Not ideal, I know, but it would avoid a multi-table search
 combined with a scan of each. This would also make possible pattern length
 adjustments. I mean a single character search might make sense for
 nickname search, but does it equally make sense for ip or fingerprint?

 I'm really interested in the user-defined function possibility because
 that wouldn't require any compromise. Especially if similarity isn't that
 important a metric. I'm sure there are other well known algorithms that
 are suitable replacements.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/15844#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online