<div dir="ltr">I am sorry. Where you mentioned "clear web" I mistook hidden services to be the Deep Web[1]. That's what I meant by Dark Web.<br><br><br>[1] : <a href="http://en.wikipedia.org/wiki/Deep_Web">http://en.wikipedia.org/wiki/Deep_Web</a><br>


<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Feb 25, 2014 at 9:41 PM, George Kadianakis <span dir="ltr"><<a href="mailto:desnacked@riseup.net" target="_blank">desnacked@riseup.net</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">Vighnesh Birodkar <<a href="mailto:vighneshbirodkar@gmail.com">vighneshbirodkar@gmail.com</a>> writes:<br>


<br>

> Hello<br>

><br>

> I am found a couple of ideas from the Ideas Page interesting . I was a GSoC<br>

> student for SimpleCV last year. In the past I've programmed in C,C++,Java<br>

> and Python .<br>

><br>

> Following are my queries .<br>

><br>

> 1. Search for Hidden Services .<br>

><br>

> I apologize in advance if there is something obviously wrong with my idea.<br>

> Dark Web consists of information that cannot be crawled because it doesn't<br>

> appear as hyperlinks in other pages . But someone somewhere will always<br>

> have access to this information, either by entering search queries ,<br>

> through subscriptions or logging in. What if we can index all the pages a<br>

> browser visits ? Users can voluntarily install and enable or disable a<br>

> plugin in their browsers . This plugin will index process ( and maybe index<br>

> ) pages locally and upload it's data to servers which will hold the global<br>

> index .<br>

><br>

<br>

</div>I'm not sure what you mean by 'Dark Web', but if you mean 'Tor Hidden<br>

Services' it _is_ possible to crawl and index onion addresses. For<br>

example, if you google for ".onion" and check through the first few<br>

result pages you can find dozens of onion addresses. If you then crawl<br>

those pages you will get even more onion addresses.<br>

<br>

Then the question is how you present those onion addresses to the user<br>

of the search engine. Users should be able to search for terms and get<br>

accurate results (popularity tracking, backlinks, etc. should be used<br>

to reduce phishing). The search engine should also be able to give a<br>

short description of each hidden service (e.g. by scraping its<br>

contents, or by the community editing the description, or by using<br>

official descriptions [0], or...).<br>

<br>

Assuming that all the above are solved we might get to the point were<br>

we have indexed all the potentialy visible onion addresses and that's<br>

where your browser extension idea might be useful. However we are<br>

currently quite far away from that situation. I also doubt that many<br>

users of hidden services would install a browser extension to index<br>

Hidden Services that have been intentionally kept secret (and hence<br>

not found by conventional crawling).<br>

<br>

[0]: <a href="https://ahmia.fi/documentation/descriptionProposal/" target="_blank">https://ahmia.fi/documentation/descriptionProposal/</a><br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

tor-dev mailing list<br>

<a href="mailto:tor-dev@lists.torproject.org">tor-dev@lists.torproject.org</a><br>

<a href="https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev" target="_blank">https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev</a><br>

</div></div></blockquote></div><br></div>