[tor-dev] GSoC 2021 - Alexa Top Sites Captcha and Tor Block Monitoring #Update

David Fifield david at bamsoftware.com
Tue Jul 20 16:11:50 UTC 2021


On Mon, Jul 12, 2021 at 05:01:35PM +0530, Apratim Ranjan Chakrabarty wrote:
> ** Looking forward for suggestions and comments as to how to improve on it.
> Also materials like research paper in this domain would be helpful **

Section IV-C of the ICLab paper has discussion of block page detection.
The first pass is regex for known block pages, but there is also
clustering by similar HTML structure and text.
https://censorbib.nymity.ch/#Niaki2020a
https://github.com/net4people/bbs/issues/52

The 2016 "Do You See What I See?" study seems to be in line with your
project. "The second-class treatment of anonymous users ranges from
outright rejection to ... imposing hurdles such as CAPTCHA-solving....
Our study draws upon ... scans of the home pages of top-1,000 Alexa
websites through every Tor exit..." Section V-A has to do with scans of
top-ranked sites.
https://www.ndss-symposium.org/wp-content/uploads/2017/09/do-you-see-what-i-see-differential-treatment-anonymous-users.pdf
https://archive.org/details/ndss16doyousee


More information about the tor-dev mailing list