<div dir="ltr">Hello everyone! <div><br></div><div><font face="arial, helvetica, sans-serif" color="#000000" style="background-color:rgb(255,255,255)">I am Pushkar, one of GSoC17 accepted students. </font><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">I am a third year undergraduate at International Institute of Information Technology, Hyderabad (India). </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">I have been working on 'Ahmia - Hidden Service Search' [0][1] for some time now and will be extending my contribution through GSoC this summer. I am being mentored by J</span><span style="font-size:12.800000190734863px">uha </span><span style="font-size:12.800000190734863px">Nurmi (numes) and George (asn).</span></div><div><font face="arial, helvetica, sans-serif" color="#000000" style="background-color:rgb(255,255,255)"><br></font></div><div><font face="arial, helvetica, sans-serif" color="#000000" style="background-color:rgb(255,255,255)"><span style="letter-spacing:0.01em">Ahmia is a search engine that indexes, searches, and catalogs content published on Tor Hidden Services. Furthermore, it is a medium to share meaningful insights, statistics, and news about the</span></font><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif"> Tor network itself. There are several improvements and upgrades required in Ahmia.</span></div><div>
                
        
        
                <div class="gmail-page" title="Page 1">
                        <div class="gmail-section">
                                <div class="gmail-layoutArea">
                                        <div class="gmail-column">
                                                <p><span style="color:rgb(67,67,67)"><font face="arial, helvetica, sans-serif"><b>>>Tasks
</b></font></span></p>
                                                <span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">●  </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Automate Blacklisting<br></span></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Fetch a list of child abuse media sites and remove these sites from Elasticsearch.
Also add MD5 checksums of child abuse websites to banned database for others to
check.</span></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><ul style="list-style-type:none">
                                                        </ul><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">●  </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Add Hidden Services page<br></span></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Improve the existing Add page so that adding a website stores the data to SQL
Database under '/onionsadded'. From there crawler can crawl these websites once
a day. Remove the entries after 1 week so that the list is fresh</span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif"> </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">.</span></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><ul style="list-style-type:none">
                                                        </ul><p><font color="#000000"><font face="arial, helvetica, sans-serif">●  Data visualization<br></font></font></p></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><p><font color="#000000"><font face="arial, helvetica, sans-serif">Graphs need to plotted for various statistics in the Statistics page. Some examples </font><span style="font-family:arial,helvetica,sans-serif">include:</span></font></p></div></div></div></div></div></blockquote><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 1"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><p><font color="#000000"><font face="arial, helvetica, sans-serif">○  Linking structure between sites and keyword based labeling for onions in
the graph</font></font></p></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><ul style="list-style-type:none">
                                                        <li>
                                                                <p><font face="arial, helvetica, sans-serif" color="#000000">○  Popularity of domains according to backlinks and search clicks. I plan to use either Google Charts or D3.js to plot these graphs.
</font></p>
                                                        </li>
                                                </ul>
                                                <span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">●  </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Replace Polipo with Tor Socks5 proxy in ahmia-crawler<br></span></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">As of now Ahmia crawler uses Polipo as an HTTP proxy to direct tor traffic. But
since Polipo is now no longer maintained and torsocks can provide better
functionality, the crawler code needs to be updated to use torsocks. Modules like
socksipy can be used to connect crawler to torsocks.</span></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><ul style="list-style-type:none">
                                                        </ul><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">●  </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Upgrade support from Elastic 2.4.0 to 5.X<br></span></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Ahmia settings should be adjusted accordingly to support Elastic 5.X. It will
require a full cluster restart since rolling upgrades are not supported in major
version upgrade. Upgrading includes replacing Groovy scripts with Painless.
Painless is sandboxed and a Elasticsearch targeted scripting language which
replaced Groovy in Elastic 5.0.0.</span></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><ul style="list-style-type:none">
                                                        </ul><p><font face="arial, helvetica, sans-serif" color="#000000">●  Detailed Documentation and update software dependencies<br></font></p></div></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><p><font face="arial, helvetica, sans-serif" color="#000000">A detailed documentation at <a href="http://ahmia.fi">ahmia.fi</a>[0] as well as on the Github page[1].</font></p></div></div></div></div></div></blockquote><div><div class="gmail-page" title="Page 2"><div class="gmail-section"><div class="gmail-layoutArea"><div class="gmail-column"><p><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">●  </span><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Advance search options</span></p><ul style="list-style-type:none"><li><p><font face="arial, helvetica, sans-serif" color="#000000">Advance Search options as mentioned below can be incorporated in search bar to
allow better customisable searches.
</font></p>
                                                                <p><font face="arial, helvetica, sans-serif" color="#000000">○ Double quotes(""): Returns pages that contain exactly "term"(case
sensitive).
</font></p>
                                                                <p><font face="arial, helvetica, sans-serif" color="#000000">○ AND operator(&&): Logical AND gate i.e. it returns all the pages that
contain all queries separated by ‘&&’.
</font></p>
                                                                <p><font face="arial, helvetica, sans-serif" color="#000000">○ OR operator(||): Logical OR gate i.e it returns all the pages that contain
queries separated by ‘||’.
</font></p>
                                                                <p><font face="arial, helvetica, sans-serif" color="#000000">This is one of the optional tasks I have included. If any of the features mentioned
above is not completed in the given timeline, this feature will be dropped and
priority will be given to the uncompleted task. </font></p>
                                                        </li>
                                                </ul>
                                        </div>
                                </div>
                        </div>
                </div></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">>>Timeline </span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif"><br></span></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">Week 1 - Automating blacklisting of onions with child abuse content</span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">Week 2 - Tweaking 'Add' page to save the added onion under '/onionsadded'</span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">Week 3 - Replace Polipo with Torsocks5 in ahmia-crawler</span></font></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 4 - 1st Evaluation</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 5+6 - Data visualization of statistics</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 7 - Upgrade support from Elastic2.4.0 to Elastic5.X</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 8 - 2nd Evaluation</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 9 - Updating dependancies and documentation</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 10 - Adding advanced search options like "",|| and &&</span></div><div><span style="letter-spacing:0.01em;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif">Week 11 - Catch up and bug fixes</span></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;letter-spacing:0.01em"><br></span></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">I will be mailing biweekly status report to this list. Feel free to contact me if you have any suggestions or doubts.</span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px"><br></span></font></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;letter-spacing:0.12999999523162842px">IRC: mdhash</span><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px"><br></span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px"><br></span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">I would like to thank Juha and the Tor team for their constant support and guidance. It has been a great experience for me to contribute to TorProject and I look forward to be a core member of the community.</span></font></div><div><span style="letter-spacing:0.12999999523162842px;color:rgb(0,0,0);font-family:arial,helvetica,sans-serif"> </span><br></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">Thanks,</span></font></div><div><font color="#000000" face="arial, helvetica, sans-serif"><span style="letter-spacing:0.12999999523162842px">Pushkar Pathak</span></font></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;letter-spacing:0.01em"> </span><br></div><div><span style="color:rgb(0,0,0);font-family:arial,helvetica,sans-serif;letter-spacing:0.01em">[0]: </span><a href="https://ahmia.fi">https://ahmia.fi</a></div><div>[1]: <a href="https://github.com/ahmia">https://github.com/ahmia</a></div></div>