[metrics-team] metrics-web detect script update and question

seamus tuohy stuohy at internews.org
Wed Nov 18 14:37:17 UTC 2015


Karsten Loesing <karsten at torproject.org> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Seamus,
>
> On 17/11/15 21:49, seamus tuohy wrote:
>> Howdy Karsten,
>> 
>> Karsten Loesing <karsten at torproject.org> writes:
>> 
>> On 16/11/15 22:57, seamus tuohy wrote:
>>>>> 
>>>>> Hello,
>>> 
>>> Hello Seamus,
>>> 
>>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>>> 
>>>>>> Great to see your interest in making the censorship
>>>>>> detector better!
>>>>>> 
>>>>>> So, I have been thinking about your plan to submit a pull
>>>>>> request for the rewrite of the current functionality, and I
>>>>>> think I'd want to suggest a different plan:
>>>>>> 
>>>>>> How about you deploy your rewritten code on a minimal
>>>>>> website that visualizes the output of your rewritten
>>>>>> censorship detection script, possibly comparing it to other
>>>>>> algorithms, and we link that website from the Metrics
>>>>>> website?
>>>>>> 
>>>>> 
>>>>> Sadly, My expertise is not in the statistical analysis, but
>>>>> in open source software development. This is why I focused on
>>>>> making the existing code cleaner and more cleanly documented
>>>>> and structured. It would be a much more significant task for
>>>>> me to compare the original algorithm to others.
>>>>> 
>>>>>> Let me explain this plan a bit more: what we really want is
>>>>>> a better censorship detection algorithm that doesn't
>>>>>> produce as many false positives.  Your rewrite can be a
>>>>>> great starting point for that.  But there's no need to
>>>>>> merge code directly into Metrics until we're sure we found
>>>>>> an algorithm we like better than the current one, and maybe
>>>>>> that requires making two or three attempts to get it right.
>>>>>> For now, I'd rather want to add a link to your results.  We
>>>>>> can always discuss replacing the script in Metrics with a
>>>>>> new one later, but there are really no requirements other
>>>>>> than that it can read a .csv in the provided format and
>>>>>> write a new .csv in the expected format.
>>>>>> 
>>>>>> If you're not sure what I mean by link, here are two
>>>>>> examples for external links on Tor Metrics:
>>>>>> 
>>>>>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>>>>>
>>>>>>
>>>>>> 
> https://metrics.torproject.org/uncharted-data-flow.html
>>>>>> 
>>>>>> Does that plan make sense to you?  It's really great that
>>>>>> you're picking up this topic.  Thanks for that!
>>>>>> 
>>>>> 
>>>>> I will keep the code available if anyone wants to use it as a
>>>>> base to implement a better algorithm, but if this code will
>>>>> not serve any functional purpose I see no value in putting
>>>>> any additional work into it.
>>> 
>>> That's not at all what I wanted to achieve here!  I do see this
>>> code serving a purpose when it runs on a dedicated website (which
>>> can probably run on a tiny VM) built to compare detection
>>> algorithms.  And this doesn't mean that you would have to
>>> implement those other algorithms yourself, that could easily be
>>> done by others.  To be clear, I think that your rewrite makes it
>>> more likely that others start working on different algorithms, so
>>> that's already a benefit.
>>> 
>>> I'm just careful with adding this code directly to Tor Metrics
>>> yet, because that causes quite some overhead for you and me
>>> without providing an immediate benefit.  It's a non-trivial
>>> amount of work for me to review your code and make sure it does
>>> the exact same thing as the current code, because I didn't write
>>> the current code and only reviewed it once many years ago.
>> 
>> 
>> Thanks, that makes sense. With this in mind, I am going to get a
>> axe out and restructure aggressively to make the functions and flow
>> support easy implementation of other algorithms.
>
> Sounds great!  I think it would be good to have your rewrite produce
> the same output as the original, but it's certainly a good idea to
> make the rewritten code as readable as possible.
>
>> On that note, The inputs of the current code rely on the output of
>> other scripts in metrics-web. If I am going to host this separately
>> I would like to directly query the correct public
>> API/directory/Store so that it can collect updated daily data
>> automatically instead of manually. What is the proper place to
>> query?
>
> You can download the latest client number estimates here:
>
> https://metrics.torproject.org/clients-data.html
>
> https://metrics.torproject.org/stats/clients.csv
>
>>> I would rather want to promote your website by adding a link to 
>>> Metrics and by making a call for help on the Tor development
>>> mailing list.  Let me know if you'd want that.
>> 
>> Once I get something up and running a call to see what
>> restructuring would be needed for models to be more easily tested
>> and implemented within this code base would allow me to make any
>> structural changes while I still have it in my head.
>
> (I have difficulty parsing that sentence.  Can you rephrase that?)
>

Sorry. That sounds like a good idea. It will be easier to provide help
using the code sooner than later. The longer I go without working on a
code-base the less helpful I am for others who need guidance.

Best,
s2e

>
> In any case, glad to hear that you're going to keep working on this.
> Thanks!
>
> All the best,
> Karsten
>
>
>> 
>> Best, s2e
>> 
>> 
>> 
>> Really hope that makes sense.
>> 
>> All the best, Karsten
>> 
>> 
>>>>> Best, s2e
>>>>> 
>>>>> 
>>>>>> All the best, Karsten
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Best, s2e
>>>>>>> 
>>>>>>> -- seamus tuohy | Sr. Technologist - Internet
>>>>>>> Initiatives stuohy at internews.org Skype/XMPP on request
>>>>>>> PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
>>>>>>> MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>>> 
>>>>>>> INTERNEWS | Local Voices. Global Change.
>>>>>>> www.internews.org | @internews
>>>>>>> _______________________________________________ 
>>>>>>> metrics-team mailing list
>>>>>>> metrics-team at lists.torproject.org 
>>>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>
>>>>>>> 
> - -
>
>> 
>> -- seamus tuohy | Sr. Technologist - Internet Initiatives 
>> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E B7CF EDD5
>> F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>> 
>> INTERNEWS | Local Voices. Global Change. www.internews.org |
>> @internews
>> 
>
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJWTEDyAAoJEJD5dJfVqbCrl2UH/33EuK9vMflcbyhfG0TU6Rf3
> ENomvMmCCfs2rIGPh9lRPmnqFWix446sJiU2byecKvArFTGwDOsAeX9U67Bf6lBZ
> 8y8/3a6Zw27ZKVqtd6Hgqf28F95ls3fhNIvjWbRx4nRQ4bxPWdxgHEHUe7Qtczq+
> SUCMz18u5U5e0UsOSjkMi5mOvuUdb9/szTSCm0Zw31xesQbdG+6VSmmjWyxJwWoR
> VJmNRcCPwDs4aakqW9jKyv6dVenr5PI0fLAYulNnAXxJ0LdiUR8B6aDzMOdKF3+b
> /e9h5hHRm9puQXZ75HsXFFot6C7SlK1EvsTIVAxuitWa3keWgrGvCiJA6hE2Ovo=
> =WSAT
> -----END PGP SIGNATURE-----

-- 
seamus tuohy | Sr. Technologist - Internet Initiatives
stuohy at internews.org
Skype/XMPP on request
PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf

INTERNEWS | Local Voices. Global Change.
www.internews.org | @internews


More information about the metrics-team mailing list