[tor-project] [TSoP 2018] Ahmia status update ~ report #6

Stelios Barberakis chefarov at gmail.com
Mon Aug 6 12:40:44 UTC 2018


Hello all,

This is the biweekly status update for ahmia development, that arrives a
bit late, since I should have sent it on Friday.

The last two weeks I have been working on:

=== Ahmia-Site ===

* Added a "Did you mean" functionality. This utilizes Elasticsearch's
fuzziness
<https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzziness.html>
functionality, and more specifically phrase suggesters
<https://www.elastic.co/guide/en/elasticsearch/reference/6.3/search-suggesters-phrase.html>,
to suggest actual terms when a mispell may have happened. For example:
https://ahmia.fi/search/?q=snodwen or https://ahmia.fi/search/?q=tor+netork
[1]

* Changed the search criteria of elasticsearch searches, by 1) using weighted
fields
<https://www.elastic.co/blog/multi-field-search-just-got-better>
and 2) adding 'anchor' and 'content' fields as well. That has improved the
first (upper) results in some cases, and also increases the overall results
fetched. [2]

* Performed an overall html refactoring, to improve the code structure, fix
some unmatched tags, etc [3]

* A minor improvement on add onion page, makes clearer the response
message, and doesn't redirect to new page. [4]

* [Ongoing] I have been Integrating PageRank algorithm in order to improve
results sorting based on website popularity. For each page we take into
account the backlinks from the rest of the onion addresses to calculate its
page rank coefficient. An appropriate formula needs to be done to combine
this metric with the already elasticsearch relevance score [5] To be
committed soon


=== Ahmia-Index ===

* Changed the bulk update request on ES aliases, to an iterative one, to
prevent any error on the first requests from disrupting the rest of the
requests [6]

* Separated *add *and *remove *alias, functionality to make the first
crawls of each month available from the beginning of the month. [7]


[1] https://github.com/ahmia/ahmia-site/issues/25
[2] https://github.com/ahmia/ahmia-site/issues/29
[3]
https://github.com/ahmia/ahmia-site/commit/ddcf1a32321a8506f99eaf8c402c0aec604fd41a
[4] https://github.com/ahmia/ahmia-site/issues/27
[5] https://github.com/ahmia/ahmia-site/issues/30
[6]
https://github.com/ahmia/ahmia-index/commit/08952b6609c7d0d8ee82afec4e64a1c610c204ef
[7] https://github.com/ahmia/ahmia-index/issues/6
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20180806/fd0cc0af/attachment.html>


More information about the tor-project mailing list