[metrics-team] metrics-web detect script update and question

seamus tuohy stuohy at internews.org
Fri Nov 27 23:01:46 UTC 2015


Hey,

Karsten Loesing <karsten at torproject.org> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 23/11/15 01:48, seamus tuohy wrote:
>> Hey Karsten,
>
> Hi Seamus,
>
>> I decided on getting a web-page up first since I see very little
>> time on my horizon for cleaning up the back-end for new developers
>> as I hoped to.
>>
>> The page is not fully done yet (I don't have filter by date
>> implemented) but, it does the basic visualization of where
>> censorship and spikes occur. I also added cute little date-range
>> specific twitter and google news searches when you click on one of
>> the censorship or spike anomalies.
>>
>> Here is where the viz lives: http://seamustuohy.com/tor_anomaly/
>>
>> The code lives here:
>> https://github.com/elationfoundation/tor_anomaly
>>
>> You can find the website code on the "gh-pages" branch and the data
>> on the "data" branch.
>
> This is really cool!  Thanks for picking this up, and sorry for the
> delay in responding, things are really busy here.  And sorry for not
> looking at the code yet.

I am glad you like it. Don't worry about the delay. There is no rush for
you to look into the code since it will be dormant until a random excited
statistician decides they need a project. I will keep an ear open and
try to be available in case someone reaches out to me about how to use
the code to implement something better. I think that this may be the
extent of my capacity for (at least) the near future.

>
> Here's some feedback:
>
>  - Filtering by date is indeed one of the missing features I'd put on
> top of the list.  The data resolution of one data point per day is
> much too high for showing four years of data.  One needs to be able to
> zoom into a given month or so to say whether a given series of green
> or red dots is justified or not.  Still, this visualization is so much
> nicer than what's currently on Tor Metrics!

Filtering is completed. Added a simple brush filter to allow users to zoom in
on a time range.

>
>  - It's fair enough that you didn't find the time to clean up your
> code yet, but maybe you don't have to: how about you give developers
> an interface for developing their own detection algorithms?  For
> example: "here's your input file, here's how your output file should
> look like to be included in the website, use any scripting language
> you want, make sure the script finishes within 1 minute".  Maybe
> you'll have to negotiate some of those requirements, but in theory,
> developers should be able to work with that.  And once you include
> their data, users can choose which algorithm they want to use for
> their green and red dots.

The current html source is static and hosted on github. Anyone who
wishes to update the existing source code to use different algorithms
simply needs to fork the repo, change a few lines in the update script,
redirect which repository the data is pulled from in the index, and then
run the update_data scripts.

I think that this minimal implementation allows this to properly support
others in creating a cleaner implementation of the existing detection
code.

>
>  - How about you show bridge user numbers in a second graph, right
> below the one you already have for directly connecting users?  Sure,
> you won't have green or red dots in them, but it's still valuable
> information to have, especially in comparison to directly connecting
> users.  And it's mostly a shortcoming of the detection algorithm that
> it doesn't support bridge user numbers, and maybe future algorithms
> will do that.

The tor metrics site supports basic bridge user number graphs. I am wary
of adding any further functionality to this demonstration. If a future
developer is interested in adding extra visualization functionality to
their implementation to capture the addition of bridges it would only
take minor experience with python and D3 to get it working.

>
>  - How about you invite non-developers to help with getting some
> ground truth about censorship events?  Users could click on a green or
> red dot and put in a comment saying what they think happened there.
> What you could learn is whether a finding was a true positive ("yes,
> there was an event, it's related to XYZ") or false positive ("nothing
> to see here, just a glitch in the graph or detection algorithm").
> Also, you could let users comment on any date in the graph that
> doesn't have a green or red dot yet to learn about false negatives
> ("something happened there, but either the graph doesn't show it or
> the detection algorithm missed it").  There's a potential for inviting
> spammers, but if you only allow, say, 100 characters of text, no URLs,
> and no more than 10 comments per minute, you may be doing fine.  You
> could provide a download link for all comments, so that people can
> read and work on them offline.  Or if you're too afraid of spammers,
> maybe define a data format and accept submissions from users
> containing this information.  Could be a fun experiment with
> potentially valuable output.

This is a very interesting idea. Sadly, this cannot be implemented with
the current static nature of the website.

The current implementation pulls its data from some CSV files kept in a
separate branch in the github repo that are manually updated. (Soon I
will put up a cron job on one device or another to update the data ever
few days or so, but until that point it also relies on me manually
running the update data script.)

>
> This is definitely something I'd want to link from Tor Metrics.  I
> hope to do that once things are a little less busy.  I have a few
> other things that I want to add, and maybe I can do that all at once.
>

If you have any difficulty playing with the javascript code (it was
hacked together quickly) I would be happy to help you make sense of
it.

Best,
s2e

> Thanks!
>
> All the best,
> Karsten
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJWWCk3AAoJEJD5dJfVqbCrv/4IAMg2tYRJe8MpqDANtISqq95x
> u2VoaBiHNhpUPHDw68yyF29edOb/nJoYQsB3du7LU74vRoCj8tMd4w93D4R439EB
> IXVF+tKUI95Yqb1YcnqYy2+u1ScFcH6szNanHhy6FtWePFRgGLWAOEFowy1ey78O
> A/Zj2zpPIFlFhmpv3jwsarQwmhk6EVjDNPA3T46sO13bKb2+QuUH5EUhtELQDNRG
> fDWUZL8dxAu2V+Y2odevAlal3rEu5uXXwIknRUfDJd69sDGezWDDDscT9EimcBww
> Ejy3L0Iu/DeQDdkQrYiyF06L43VCwWz6Oh+45DGhAE+XZZ0KMJjYZXti9OK75fU=
> =zzqp
> -----END PGP SIGNATURE-----

--
seamus tuohy | Sr. Technologist - Internet Initiatives
stuohy at internews.org
Skype/XMPP on request
PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf

INTERNEWS | Local Voices. Global Change.
www.internews.org | @internews


More information about the metrics-team mailing list