[tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services

George Kadianakis desnacked at riseup.net
Fri Apr 25 14:27:14 UTC 2014


Juha Nurmi <juha.nurmi at ahmia.fi> writes:

> On 22.04.2014 17:35, George Kadianakis wrote:
>> Enjoy GSoC :)
>
> I will :)
>
>> BTW, looking again at your proposal, I see that you are going to
>> do both popularity tracking and backlinks.
>
> Yes, another crawler gathers backlinks from the public WWW and I will
> start gathering the URL clicks from the users.
>
>> How are these two technologies going to interact with each other?
>> That is, how will the indexer consider the output of those two
>> features?
>
> Django front-end re-sorts the answers from YaCy back-end.
>
> See https://ahmia.fi/static/gsoc/re_sort.jpg
>
> I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
>
> The result is sorted according to YaCy result index, number of
> backlinks and clicks which are scaled.
>
> Note the scaling:  p_info.backlinks = 1 / (float(index) + 1) etc.
>
> sum_function = 3.0*self.yacy + 2.0*self.backlinks + 1.0*self.clicks
>
> where 3, 2 and 1 are test coefficients. I will optimize these and made
> a better model if necessary. However, clicks are easily spoofed and
> there have to be small coefficient for them.
>

That makes sense.

BTW, what is the 'yacy' score? Is it just the order that YaCy's
indexer chose for each result? Or does YaCy actually expose a score
for each result? How is the score derived? Or do you treat it as a
blackbox and assume it's the most accurate of backlinks and
popularity.

Thanks!


More information about the tor-dev mailing list