
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 22.04.2014 17:35, George Kadianakis wrote:
Enjoy GSoC :)
I will :)
BTW, looking again at your proposal, I see that you are going to do both popularity tracking and backlinks.
Yes, another crawler gathers backlinks from the public WWW and I will start gathering the URL clicks from the users.
How are these two technologies going to interact with each other? That is, how will the indexer consider the output of those two features?
Django front-end re-sorts the answers from YaCy back-end. See https://ahmia.fi/static/gsoc/re_sort.jpg I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py The result is sorted according to YaCy result index, number of backlinks and clicks which are scaled. Note the scaling: p_info.backlinks = 1 / (float(index) + 1) etc. sum_function = 3.0*self.yacy + 2.0*self.backlinks + 1.0*self.clicks where 3, 2 and 1 are test coefficients. I will optimize these and made a better model if necessary. However, clicks are easily spoofed and there have to be small coefficient for them.
Also, with your newly acquired knowledge about backlinks, how long is it going to take your incorporate them in ahmia? Are you actually going to do it during the "Use an another crawler to search .onion pages from the public Internet" phase?
We can test it when popularity tracking and backlinks crawler are working. - -Juha -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTWKhsAAoJELGTs54GL8vA+WAH/1i4sCvvcwotn5b39Ox8yldn Wv6mBxqlIiaoeBj1Eeu+A92QfGvvpxdWDb7Kn3+3u0IO0wXcZlf0SrIri11IgprW 1f8x5BMDYiaFl12dVO/3jfXSmdfKQ24AdKknfK9wuD63266L2Tks/DVURHQKrYaM zTfYJKZNWJtOPxUj45lHknHxDWVzRlmqiksRn1aPwx2EW5dpKCCVkV9ySnJdZW74 DWs1es1rLKj6UVmVl6w88PJ/C1COWhMQspXtYIZ8paZQfMHtEgDxLuifITIHgdBh TdGLUEVteUl5wyCNjDh1Q+ZEkdbMvcpNZuP5D3lUYweHz0cMMOGHC0oaLlJS4KE= =48jK -----END PGP SIGNATURE-----