Replying to some new additions in the proposal:
Thanks asn! "Ask help from organizations that are crawling" Today I emailed to duckduckgo and asked is there an easy way to search new .onions using their search engine. "Checking out the backlinks from public WWW" With known onion address it is possible to find the popularity of an address checking the number of search results: https://duckduckgo.com/?q=%22http%3A%2F%2Fjlve2y45zacpbz6s.onion%22 and https://www.google.com/#q=%22http:%2F%2Fjlve2y45zacpbz6s.onion%22 and https://www.google.com/#q=link:http:%2F%2Fjlve2y45zacpbz6s.onion This way I will get a list that tells the popularity according to links from the public WWW: onion address & number of WWW sites that are linking to it xyz.onion 123 abc.onion 90 uio.onion 24 mre.onion 17 Today I asked from the YaCy's developer how could I use this information. "Commenting features" I agree that commenting might be a mouth of madness because people might write just some random crap there. Technically this would be developed to the Django framework. Note that the priority of this task is low (10). We could decide to leave this commenting feature to the very last task or skip it.
ACK wrt commenting.
As far as backlinks are concerned, while I appreciate how rapid and easy your solution is, you might want to make it a bit more robust.
The way you did it, you treat the 123 references to 'xyz.onion', as strictly better than the 90 references to 'abc.onion'. This is not the case in the real web, since the 123 references to 'xyz.onion' might be SEO and they might be coming from xyz.onion itself or related websites.
Proper search engines assign weights to each backlink, according to how legit the search engine believes the linker to be. This has to do with how many backlinks the linker had, and how legit the HTML content of the linker looks like, etc. You can find more heuristics that search engines use by skimming an SEO book or an SEO forum.
It's up to you how deep you want to go into backlinking during GSoC, but IMO backlinking is a more reliable heuristic than popularity tracking. Up to you anyway!