RĂ©mi remi.py@yandex.com writes:
Hy,
I am currently a master student with a focus on natural language processing, machine learning, information retrieval and data mining.
The Tor website lists a bunch of ideas, one of which is "Search Engine for Hidden Services"[1]. This project suits me well given my education and skill set and I would really enjoy it. Does tor-dev think this would be a good project? There are already many hidden search engines, although non are open source.
I have done two smaller information retrieval projects in university this year, and I have a strong background in search engine algorithms. The components of the system that I am currently thinking of are:
- index and features in a nosql database (possibly CodernityDB)
- hidden service crawler
- simple search using BM25, but recording click through and many
features other than BM25.
- Basic front-end.
- A component for 'Learning to rank' based on more features, which
should be used once there is significant click-through data. This should be an easy to use program that performs search engine optimization.
The recording of the click through is done in order to learn to search better. This is important because there is no known search ranker that will give excellent results out of the box. Click through recording can be done by only recording feature weights. I would work in Python because I am very comfortable working with it.
What are your thoughts?
You look like a reasonable candidate for this project.
The summer doesn't look like enough time to implement all the above from scratch. You will probably need to use and extend some already existing tools.
Feel free to submit an application for this project. However, be warned that we've already received 4+ applications for this project so it's going to be a tough competition. You are encouraged to submit to other projects too.