[ooni-dev] Blockpage Detection

Arturo Filastò art at torproject.org
Wed Nov 27 10:26:59 UTC 2013

On 11/25/13, 4:23 PM, Ben Jones wrote:
> Hello,

Hi Ben,

Thanks for your interest in OONI :).

> I am a PhD student at Georgia Tech and I am collaborating with
> researchers at Stony Brook University to find an effective means of
> detecting block pages. We have access to a very robust set of both real
> pages and blocked pages (~2.4 million pages) which we are using to
> evaluate block page detection metrics.

This is an extremely valuable dataset. Under what sorts of license are
you able to release such dataset? Would it be possible to ship your
dataset, for example, as part of the ooni-probe debian package?

> We already have two measures to detect block pages and would like to
> evaluate your DOM similarity measure alongside our own metrics. Since we
> are planning on publishing our results, may we include your DOM
> similarity measure in our evaluation? Also, I would like to look more
> into this similarity measure, is there a paper that I can read?

Ah yes, I wrote that quite some time ago, but never wrote a paper. Keep
in mind that I don't have lot's of experience with machine learning and
did this just as a personal pet project to try out some of the things I
learned studying mathematics at university.

I wrote up a brief description of how the method works here:

I would be very interested in checking out what method you have applied
to DOM similarity measurement. I also have a feeling there is some way
of proving that an eigenvalue approach is somehow correlated to
computing a labelled tree distance between the DOM pages.

~ Art.

More information about the ooni-dev mailing list