[tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services

Juha Nurmi juha.nurmi at ahmia.fi
Thu Apr 24 06:00:21 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 22.04.2014 17:35, George Kadianakis wrote:
> Enjoy GSoC :)

I will :)

> BTW, looking again at your proposal, I see that you are going to
> do both popularity tracking and backlinks.

Yes, another crawler gathers backlinks from the public WWW and I will
start gathering the URL clicks from the users.

> How are these two technologies going to interact with each other?
> That is, how will the indexer consider the output of those two
> features?

Django front-end re-sorts the answers from YaCy back-end.

See https://ahmia.fi/static/gsoc/re_sort.jpg

I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py

The result is sorted according to YaCy result index, number of
backlinks and clicks which are scaled.

Note the scaling:  p_info.backlinks = 1 / (float(index) + 1) etc.

sum_function = 3.0*self.yacy + 2.0*self.backlinks + 1.0*self.clicks

where 3, 2 and 1 are test coefficients. I will optimize these and made
a better model if necessary. However, clicks are easily spoofed and
there have to be small coefficient for them.

> Also, with your newly acquired knowledge about backlinks, how long
> is it going to take your incorporate them in ahmia? Are you
> actually going to do it during the "Use an another crawler to
> search .onion pages from the public Internet" phase?

We can test it when popularity tracking and backlinks crawler are working.

- -Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTWKhsAAoJELGTs54GL8vA+WAH/1i4sCvvcwotn5b39Ox8yldn
Wv6mBxqlIiaoeBj1Eeu+A92QfGvvpxdWDb7Kn3+3u0IO0wXcZlf0SrIri11IgprW
1f8x5BMDYiaFl12dVO/3jfXSmdfKQ24AdKknfK9wuD63266L2Tks/DVURHQKrYaM
zTfYJKZNWJtOPxUj45lHknHxDWVzRlmqiksRn1aPwx2EW5dpKCCVkV9ySnJdZW74
DWs1es1rLKj6UVmVl6w88PJ/C1COWhMQspXtYIZ8paZQfMHtEgDxLuifITIHgdBh
TdGLUEVteUl5wyCNjDh1Q+ZEkdbMvcpNZuP5D3lUYweHz0cMMOGHC0oaLlJS4KE=
=48jK
-----END PGP SIGNATURE-----


More information about the tor-dev mailing list