Re: [tor-dev] GSoC 2021 - Alexa Top Sites Captcha and Tor Block Monitoring #Update

20 Jul 2021

      On Mon, Jul 12, 2021 at 05:01:35PM +0530, Apratim Ranjan Chakrabarty wrote:
...
** Looking forward for suggestions and comments as to how to improve on it.
Also materials like research paper in this domain would be helpful **
Section IV-C of the ICLab paper has discussion of block page detection.
The first pass is regex for known block pages, but there is also
clustering by similar HTML structure and text.
https://censorbib.nymity.ch/#Niaki2020a
https://github.com/net4people/bbs/issues/52

The 2016 "Do You See What I See?" study seems to be in line with your
project. "The second-class treatment of anonymous users ranges from
outright rejection to ... imposing hurdles such as CAPTCHA-solving....
Our study draws upon ... scans of the home pages of top-1,000 Alexa
websites through every Tor exit..." Section V-A has to do with scans of
top-ranked sites.
https://www.ndss-symposium.org/wp-content/uploads/2017/09/do-you-see-what-i-...
https://archive.org/details/ndss16doyousee

Re: [tor-dev] GSoC 2021 - Alexa Top Sites Captcha and Tor Block Monitoring #Update

David Fifield