[metrics-team] metrics-web detect script update and question

seamus tuohy stuohy at internews.org
Mon Nov 23 00:48:26 UTC 2015


Hey Karsten,

I decided on getting a web-page up first since I see very little time on
my horizon for cleaning up the back-end for new developers as I hoped to.

The page is not fully done yet (I don't have filter by date implemented)
but, it does the basic visualization of where censorship and spikes
occur. I also added cute little date-range specific twitter and google
news searches when you click on one of the censorship or spike
anomalies.

Here is where the viz lives: http://seamustuohy.com/tor_anomaly/

The code lives here: https://github.com/elationfoundation/tor_anomaly

You can find the website code on the "gh-pages" branch and the data on
the "data" branch.

Best,
s2e

Karsten Loesing <karsten at torproject.org> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Seamus,
>
> do the two links I pasted below not contain what you're looking for?
>
> https://metrics.torproject.org/clients-data.html
>
> https://metrics.torproject.org/stats/clients.csv
>
> Otherwise, it's true that CollecTor only has the raw data, and I
> wouldn't recommend re-implementing the aggregation algorithm from scratch.
>
> I might be able to provide you with intermediate data, as a snapshot,
> that is produced as part of the current aggregation.  But those
> numbers are pretty hard to understand, much harder than what's in
> clients.csv.  Is this what you're asking for?
>
> Let me know how I can help. :)
>
> All the best,
> Karsten
>
>
> On 22/11/15 03:16, seamus tuohy wrote:
>>
>> Hello Karsten,
>>
>> Is there a place I can grab the daily/weekly data instead of the
>> historic data dump? I checked on collector, but it seems to only
>> have the raw data a few steps removed.
>>
>> Best, s2e
>>
>>
>> Karsten Loesing <karsten at torproject.org> writes:
>>
>> Hello Seamus,
>>
>> On 17/11/15 21:49, seamus tuohy wrote:
>>>>> Howdy Karsten,
>>>>>
>>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>>>
>>>>> On 16/11/15 22:57, seamus tuohy wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>
>>>>>> Hello Seamus,
>>>>>>
>>>>>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>>>>>>
>>>>>>>>> Great to see your interest in making the censorship
>>>>>>>>> detector better!
>>>>>>>>>
>>>>>>>>> So, I have been thinking about your plan to submit a
>>>>>>>>> pull request for the rewrite of the current
>>>>>>>>> functionality, and I think I'd want to suggest a
>>>>>>>>> different plan:
>>>>>>>>>
>>>>>>>>> How about you deploy your rewritten code on a
>>>>>>>>> minimal website that visualizes the output of your
>>>>>>>>> rewritten censorship detection script, possibly
>>>>>>>>> comparing it to other algorithms, and we link that
>>>>>>>>> website from the Metrics website?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sadly, My expertise is not in the statistical analysis,
>>>>>>>> but in open source software development. This is why I
>>>>>>>> focused on making the existing code cleaner and more
>>>>>>>> cleanly documented and structured. It would be a much
>>>>>>>> more significant task for me to compare the original
>>>>>>>> algorithm to others.
>>>>>>>>
>>>>>>>>> Let me explain this plan a bit more: what we really
>>>>>>>>> want is a better censorship detection algorithm that
>>>>>>>>> doesn't produce as many false positives.  Your
>>>>>>>>> rewrite can be a great starting point for that.  But
>>>>>>>>> there's no need to merge code directly into Metrics
>>>>>>>>> until we're sure we found an algorithm we like better
>>>>>>>>> than the current one, and maybe that requires making
>>>>>>>>> two or three attempts to get it right. For now, I'd
>>>>>>>>> rather want to add a link to your results.  We can
>>>>>>>>> always discuss replacing the script in Metrics with
>>>>>>>>> a new one later, but there are really no requirements
>>>>>>>>> other than that it can read a .csv in the provided
>>>>>>>>> format and write a new .csv in the expected format.
>>>>>>>>>
>>>>>>>>> If you're not sure what I mean by link, here are two
>>>>>>>>> examples for external links on Tor Metrics:
>>>>>>>>>
>>>>>>>>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>
>>>>>>>>>
> https://metrics.torproject.org/uncharted-data-flow.html
>>>>>>>>>
>>>>>>>>> Does that plan make sense to you?  It's really great
>>>>>>>>> that you're picking up this topic.  Thanks for that!
>>>>>>>>>
>>>>>>>>
>>>>>>>> I will keep the code available if anyone wants to use
>>>>>>>> it as a base to implement a better algorithm, but if
>>>>>>>> this code will not serve any functional purpose I see
>>>>>>>> no value in putting any additional work into it.
>>>>>>
>>>>>> That's not at all what I wanted to achieve here!  I do see
>>>>>> this code serving a purpose when it runs on a dedicated
>>>>>> website (which can probably run on a tiny VM) built to
>>>>>> compare detection algorithms.  And this doesn't mean that
>>>>>> you would have to implement those other algorithms
>>>>>> yourself, that could easily be done by others.  To be
>>>>>> clear, I think that your rewrite makes it more likely that
>>>>>> others start working on different algorithms, so that's
>>>>>> already a benefit.
>>>>>>
>>>>>> I'm just careful with adding this code directly to Tor
>>>>>> Metrics yet, because that causes quite some overhead for
>>>>>> you and me without providing an immediate benefit.  It's a
>>>>>> non-trivial amount of work for me to review your code and
>>>>>> make sure it does the exact same thing as the current code,
>>>>>> because I didn't write the current code and only reviewed
>>>>>> it once many years ago.
>>>>>
>>>>>
>>>>> Thanks, that makes sense. With this in mind, I am going to
>>>>> get a axe out and restructure aggressively to make the
>>>>> functions and flow support easy implementation of other
>>>>> algorithms.
>>
>> Sounds great!  I think it would be good to have your rewrite
>> produce the same output as the original, but it's certainly a good
>> idea to make the rewritten code as readable as possible.
>>
>>>>> On that note, The inputs of the current code rely on the
>>>>> output of other scripts in metrics-web. If I am going to host
>>>>> this separately I would like to directly query the correct
>>>>> public API/directory/Store so that it can collect updated
>>>>> daily data automatically instead of manually. What is the
>>>>> proper place to query?
>>
>> You can download the latest client number estimates here:
>>
>> https://metrics.torproject.org/clients-data.html
>>
>> https://metrics.torproject.org/stats/clients.csv
>>
>>>>>> I would rather want to promote your website by adding a
>>>>>> link to Metrics and by making a call for help on the Tor
>>>>>> development mailing list.  Let me know if you'd want that.
>>>>>
>>>>> Once I get something up and running a call to see what
>>>>> restructuring would be needed for models to be more easily
>>>>> tested and implemented within this code base would allow me
>>>>> to make any structural changes while I still have it in my
>>>>> head.
>>
>> (I have difficulty parsing that sentence.  Can you rephrase that?)
>>
>>
>> In any case, glad to hear that you're going to keep working on
>> this. Thanks!
>>
>> All the best, Karsten
>>
>>
>>>>>
>>>>> Best, s2e
>>>>>
>>>>>
>>>>>
>>>>> Really hope that makes sense.
>>>>>
>>>>> All the best, Karsten
>>>>>
>>>>>
>>>>>>>> Best, s2e
>>>>>>>>
>>>>>>>>
>>>>>>>>> All the best, Karsten
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best, s2e
>>>>>>>>>>
>>>>>>>>>> -- seamus tuohy | Sr. Technologist - Internet
>>>>>>>>>> Initiatives stuohy at internews.org Skype/XMPP on
>>>>>>>>>> request PGP: 36AC 272E B7CF EDD5 F907 E488 B619
>>>>>>>>>> 3EC7 3CF0 7AA7 MiniLock:
>>>>>>>>>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>>>>>>
>>>>>>>>>> INTERNEWS | Local Voices. Global Change.
>>>>>>>>>> www.internews.org | @internews
>>>>>>>>>> _______________________________________________
>>>>>>>>>> metrics-team mailing list
>>>>>>>>>> metrics-team at lists.torproject.org
>>>>>>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>>
>>
>>>>>>>>>>
> - -
>>
>>>>>
>>>>> -- seamus tuohy | Sr. Technologist - Internet Initiatives
>>>>> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E
>>>>> B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
>>>>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>
>>>>> INTERNEWS | Local Voices. Global Change. www.internews.org |
>>>>> @internews
>>>>>
>>
>>
>> -- seamus tuohy | Sr. Technologist - Internet Initiatives
>> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E B7CF EDD5
>> F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>
>> INTERNEWS | Local Voices. Global Change. www.internews.org |
>> @internews
>>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJWUXN9AAoJEJD5dJfVqbCrIUAIAMERBYrI54ZQGo+GjcjK9cQB
> IUbC0nm1RUiPmrupg77osydb/Kcr+AXkLm/tezSi05VWfP7MJ6T4F/pPPuZD5Y6T
> EeAIkFs/NCedupY86EgXC5hT9gYWFp5u0mcvZ2A/GGAC8qPOCuESnuo9NClRbc/O
> glTGgzVEXM3WOC0pxDci1W3BmF9UKtBPNPAzbK7BoVSDA1BLB6vkYBpFfRP9Fnb/
> NdvfTiUqe7ZMuZhjrrmP+5UKwS+kOX0BaHjnPv5DWmzzph+1e/DU4F5fSuDvwFlA
> wQkYdF2vUzabQz7N4tq1BdxxSmlKynBE2cr7Q+Q5SOLqer2Y1qIcav9h3Wz7krk=
> =fV2h
> -----END PGP SIGNATURE-----

--
seamus tuohy | Sr. Technologist - Internet Initiatives
stuohy at internews.org
Skype/XMPP on request
PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf

INTERNEWS | Local Voices. Global Change.
www.internews.org | @internews


More information about the metrics-team mailing list