Hello
I am found a couple of ideas from the Ideas Page interesting . I was a GSoC student for SimpleCV last year. In the past I've programmed in C,C++,Java and Python .
Following are my queries .
1. Search for Hidden Services .
I apologize in advance if there is something obviously wrong with my idea. Dark Web consists of information that cannot be crawled because it doesn't appear as hyperlinks in other pages . But someone somewhere will always have access to this information, either by entering search queries , through subscriptions or logging in. What if we can index all the pages a browser visits ? Users can voluntarily install and enable or disable a plugin in their browsers . This plugin will index process ( and maybe index ) pages locally and upload it's data to servers which will hold the global index .
2. Develop a Censorship Analyzer
Will this be a part of any existing tor projects ? What is a student required to do to be considered suitable for this ?
Thanks Vighnesh
On Tue, Feb 25, 2014 at 01:57:11PM +0530, Vighnesh Birodkar wrote:
- Develop a Censorship Analyzer
Will this be a part of any existing tor projects ? What is a student required to do to be considered suitable for this ?
It is not yet clear if this project will be part of GSoC. Required are very good skills in Python, asynchronous programming (in particular using Twisted), and networking.
See this thread for more information: https://lists.torproject.org/pipermail/tor-dev/2014-February/006171.html
Cheers, Philipp
Vighnesh Birodkar vighneshbirodkar@gmail.com writes:
Hello
I am found a couple of ideas from the Ideas Page interesting . I was a GSoC student for SimpleCV last year. In the past I've programmed in C,C++,Java and Python .
Following are my queries .
- Search for Hidden Services .
I apologize in advance if there is something obviously wrong with my idea. Dark Web consists of information that cannot be crawled because it doesn't appear as hyperlinks in other pages . But someone somewhere will always have access to this information, either by entering search queries , through subscriptions or logging in. What if we can index all the pages a browser visits ? Users can voluntarily install and enable or disable a plugin in their browsers . This plugin will index process ( and maybe index ) pages locally and upload it's data to servers which will hold the global index .
I'm not sure what you mean by 'Dark Web', but if you mean 'Tor Hidden Services' it _is_ possible to crawl and index onion addresses. For example, if you google for ".onion" and check through the first few result pages you can find dozens of onion addresses. If you then crawl those pages you will get even more onion addresses.
Then the question is how you present those onion addresses to the user of the search engine. Users should be able to search for terms and get accurate results (popularity tracking, backlinks, etc. should be used to reduce phishing). The search engine should also be able to give a short description of each hidden service (e.g. by scraping its contents, or by the community editing the description, or by using official descriptions [0], or...).
Assuming that all the above are solved we might get to the point were we have indexed all the potentialy visible onion addresses and that's where your browser extension idea might be useful. However we are currently quite far away from that situation. I also doubt that many users of hidden services would install a browser extension to index Hidden Services that have been intentionally kept secret (and hence not found by conventional crawling).
I am sorry. Where you mentioned "clear web" I mistook hidden services to be the Deep Web[1]. That's what I meant by Dark Web.
[1] : http://en.wikipedia.org/wiki/Deep_Web
On Tue, Feb 25, 2014 at 9:41 PM, George Kadianakis desnacked@riseup.netwrote:
Vighnesh Birodkar vighneshbirodkar@gmail.com writes:
Hello
I am found a couple of ideas from the Ideas Page interesting . I was a
GSoC
student for SimpleCV last year. In the past I've programmed in C,C++,Java and Python .
Following are my queries .
- Search for Hidden Services .
I apologize in advance if there is something obviously wrong with my
idea.
Dark Web consists of information that cannot be crawled because it
doesn't
appear as hyperlinks in other pages . But someone somewhere will always have access to this information, either by entering search queries , through subscriptions or logging in. What if we can index all the pages a browser visits ? Users can voluntarily install and enable or disable a plugin in their browsers . This plugin will index process ( and maybe
index
) pages locally and upload it's data to servers which will hold the
global
index .
I'm not sure what you mean by 'Dark Web', but if you mean 'Tor Hidden Services' it _is_ possible to crawl and index onion addresses. For example, if you google for ".onion" and check through the first few result pages you can find dozens of onion addresses. If you then crawl those pages you will get even more onion addresses.
Then the question is how you present those onion addresses to the user of the search engine. Users should be able to search for terms and get accurate results (popularity tracking, backlinks, etc. should be used to reduce phishing). The search engine should also be able to give a short description of each hidden service (e.g. by scraping its contents, or by the community editing the description, or by using official descriptions [0], or...).
Assuming that all the above are solved we might get to the point were we have indexed all the potentialy visible onion addresses and that's where your browser extension idea might be useful. However we are currently quite far away from that situation. I also doubt that many users of hidden services would install a browser extension to index Hidden Services that have been intentionally kept secret (and hence not found by conventional crawling).
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev