[tor-dev] [GSOC 16] Ahmia status update #1
zma at riseup.net
Fri Jun 3 18:41:34 UTC 2016
I'm working on ahmia.fi, the hidden service search engine and you're
reading status update #1.
During the last two weeks i've been working on several things:
1/ Settle on a new structure for ahmia source code.
The official repository  contains all the code related to ahmia. Some
of this code is deprecated (solr is not used anymore), documentation
needed to be updated, so it needed a bit of cleanup anyway.
A structure with two repositories was chosen:
- ahmia-site  is going to contain the django website, configuration
to use it in production (apache, nginx, uwsgi) and documentation on how
to get the project running.
- ahmia-crawler  is going to contain scrapy bots, configuration +
documentation (elasticsearch, polipo)
I tried to keep all past commits when creating these repositories.
2/ Update documentation
See  and .
3/ Start to refactor the django project
The django project is going to be composed by two apps:
- search is going to be the search engine frontend + future API endpoints
- trends is going to be the statistics visualization frontend + future
Some logic is also going to move from the website source code to the
indexer part of the search engine (ex: removal of fake/banned domains).
You can see this work on the ahmia-site repository .
Note: The trends app is not yet done so it isn't visible online.
4/ Implement continuous integration with travis.CI
Tests are going to be automatically run on travis.CI.
I also consider to display test code coverage with coveralls.io but I
fear about people focusing on improving the coverage percentage at all
cost, which is not very good.
This work is going to be pushed during the week-end.
5/ Start to write a proposal with details on how to improve search
I have yet to write a much more readable document, but here are a couple
- Regroup all data related to domains, stats, content into elasticsearch
so when can use it for search or insights
- What about a pagerank-like algorithm to estimate a webpage popularity
instead of tor2web popularity ?
- Improve search with human language thanks to elasticsearch 
- Use static boosting with popularity (or pagerank) field 
We have a meeting planned tuesday with all ahmia's contributors. I hope
to have a clean proposal by then to discuss it with them.
During the next two weeks, I plan to continue working on the same
things. I want to finish 1/ to 4/ as quickly as possible to start
working on search quality.
See you in two weeks :)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: OpenPGP digital signature
More information about the tor-dev