[metrics-team] metrics-web detect script update and question

Karsten Loesing karsten at torproject.org
Fri Nov 27 09:58:15 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 23/11/15 01:48, seamus tuohy wrote:
> Hey Karsten,

Hi Seamus,

> I decided on getting a web-page up first since I see very little
> time on my horizon for cleaning up the back-end for new developers
> as I hoped to.
> 
> The page is not fully done yet (I don't have filter by date
> implemented) but, it does the basic visualization of where
> censorship and spikes occur. I also added cute little date-range
> specific twitter and google news searches when you click on one of
> the censorship or spike anomalies.
> 
> Here is where the viz lives: http://seamustuohy.com/tor_anomaly/
> 
> The code lives here:
> https://github.com/elationfoundation/tor_anomaly
> 
> You can find the website code on the "gh-pages" branch and the data
> on the "data" branch.

This is really cool!  Thanks for picking this up, and sorry for the
delay in responding, things are really busy here.  And sorry for not
looking at the code yet.

Here's some feedback:

 - Filtering by date is indeed one of the missing features I'd put on
top of the list.  The data resolution of one data point per day is
much too high for showing four years of data.  One needs to be able to
zoom into a given month or so to say whether a given series of green
or red dots is justified or not.  Still, this visualization is so much
nicer than what's currently on Tor Metrics!

 - It's fair enough that you didn't find the time to clean up your
code yet, but maybe you don't have to: how about you give developers
an interface for developing their own detection algorithms?  For
example: "here's your input file, here's how your output file should
look like to be included in the website, use any scripting language
you want, make sure the script finishes within 1 minute".  Maybe
you'll have to negotiate some of those requirements, but in theory,
developers should be able to work with that.  And once you include
their data, users can choose which algorithm they want to use for
their green and red dots.

 - How about you show bridge user numbers in a second graph, right
below the one you already have for directly connecting users?  Sure,
you won't have green or red dots in them, but it's still valuable
information to have, especially in comparison to directly connecting
users.  And it's mostly a shortcoming of the detection algorithm that
it doesn't support bridge user numbers, and maybe future algorithms
will do that.

 - How about you invite non-developers to help with getting some
ground truth about censorship events?  Users could click on a green or
red dot and put in a comment saying what they think happened there.
What you could learn is whether a finding was a true positive ("yes,
there was an event, it's related to XYZ") or false positive ("nothing
to see here, just a glitch in the graph or detection algorithm").
Also, you could let users comment on any date in the graph that
doesn't have a green or red dot yet to learn about false negatives
("something happened there, but either the graph doesn't show it or
the detection algorithm missed it").  There's a potential for inviting
spammers, but if you only allow, say, 100 characters of text, no URLs,
and no more than 10 comments per minute, you may be doing fine.  You
could provide a download link for all comments, so that people can
read and work on them offline.  Or if you're too afraid of spammers,
maybe define a data format and accept submissions from users
containing this information.  Could be a fun experiment with
potentially valuable output.

This is definitely something I'd want to link from Tor Metrics.  I
hope to do that once things are a little less busy.  I have a few
other things that I want to add, and maybe I can do that all at once.

Thanks!

All the best,
Karsten
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWWCk3AAoJEJD5dJfVqbCrv/4IAMg2tYRJe8MpqDANtISqq95x
u2VoaBiHNhpUPHDw68yyF29edOb/nJoYQsB3du7LU74vRoCj8tMd4w93D4R439EB
IXVF+tKUI95Yqb1YcnqYy2+u1ScFcH6szNanHhy6FtWePFRgGLWAOEFowy1ey78O
A/Zj2zpPIFlFhmpv3jwsarQwmhk6EVjDNPA3T46sO13bKb2+QuUH5EUhtELQDNRG
fDWUZL8dxAu2V+Y2odevAlal3rEu5uXXwIknRUfDJd69sDGezWDDDscT9EimcBww
Ejy3L0Iu/DeQDdkQrYiyF06L43VCwWz6Oh+45DGhAE+XZZ0KMJjYZXti9OK75fU=
=zzqp
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list