[tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services

Juha Nurmi juha.nurmi at ahmia.fi
Sun Apr 27 07:15:00 UTC 2014

Hash: SHA1

On 25.04.2014 17:27, George Kadianakis wrote:
> Juha Nurmi <juha.nurmi at ahmia.fi> writes:
>> On 22.04.2014 17:35, George Kadianakis wrote:
>>> Enjoy GSoC :)
>> I will :)
>>> BTW, looking again at your proposal, I see that you are going
>>> to do both popularity tracking and backlinks.
>> Yes, another crawler gathers backlinks from the public WWW and I
>> will start gathering the URL clicks from the users.
>>> How are these two technologies going to interact with each
>>> other? That is, how will the indexer consider the output of
>>> those two features?
>> Django front-end re-sorts the answers from YaCy back-end.
>> See https://ahmia.fi/static/gsoc/re_sort.jpg
>> I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
>> The result is sorted according to YaCy result index, number of 
>> backlinks and clicks which are scaled.
>> Note the scaling:  p_info.backlinks = 1 / (float(index) + 1)
>> etc.
>> sum_function = 3.0*self.yacy + 2.0*self.backlinks +
>> 1.0*self.clicks
>> where 3, 2 and 1 are test coefficients. I will optimize these and
>> made a better model if necessary. However, clicks are easily
>> spoofed and there have to be small coefficient for them.
> That makes sense.
> BTW, what is the 'yacy' score? Is it just the order that YaCy's 
> indexer chose for each result? Or does YaCy actually expose a
> score for each result? How is the score derived? Or do you treat it
> as a blackbox and assume it's the most accurate of backlinks and 
> popularity.

I am using only the order information.

BTW, we (Mikko installed new servers) are migrating YaCy servers and
took down the old one system. There should be a working crawler +
fresh full text search results soon :)

- -Juha
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


More information about the tor-dev mailing list