[metrics-team] metrics-web detect script update and question

Karsten Loesing karsten at torproject.org
Wed Nov 18 09:12:18 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Seamus,

On 17/11/15 21:49, seamus tuohy wrote:
> Howdy Karsten,
> 
> Karsten Loesing <karsten at torproject.org> writes:
> 
> On 16/11/15 22:57, seamus tuohy wrote:
>>>> 
>>>> Hello,
>> 
>> Hello Seamus,
>> 
>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>> 
>>>>> Great to see your interest in making the censorship
>>>>> detector better!
>>>>> 
>>>>> So, I have been thinking about your plan to submit a pull
>>>>> request for the rewrite of the current functionality, and I
>>>>> think I'd want to suggest a different plan:
>>>>> 
>>>>> How about you deploy your rewritten code on a minimal
>>>>> website that visualizes the output of your rewritten
>>>>> censorship detection script, possibly comparing it to other
>>>>> algorithms, and we link that website from the Metrics
>>>>> website?
>>>>> 
>>>> 
>>>> Sadly, My expertise is not in the statistical analysis, but
>>>> in open source software development. This is why I focused on
>>>> making the existing code cleaner and more cleanly documented
>>>> and structured. It would be a much more significant task for
>>>> me to compare the original algorithm to others.
>>>> 
>>>>> Let me explain this plan a bit more: what we really want is
>>>>> a better censorship detection algorithm that doesn't
>>>>> produce as many false positives.  Your rewrite can be a
>>>>> great starting point for that.  But there's no need to
>>>>> merge code directly into Metrics until we're sure we found
>>>>> an algorithm we like better than the current one, and maybe
>>>>> that requires making two or three attempts to get it right.
>>>>> For now, I'd rather want to add a link to your results.  We
>>>>> can always discuss replacing the script in Metrics with a
>>>>> new one later, but there are really no requirements other
>>>>> than that it can read a .csv in the provided format and
>>>>> write a new .csv in the expected format.
>>>>> 
>>>>> If you're not sure what I mean by link, here are two
>>>>> examples for external links on Tor Metrics:
>>>>> 
>>>>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>>>>
>>>>>
>>>>> 
https://metrics.torproject.org/uncharted-data-flow.html
>>>>> 
>>>>> Does that plan make sense to you?  It's really great that
>>>>> you're picking up this topic.  Thanks for that!
>>>>> 
>>>> 
>>>> I will keep the code available if anyone wants to use it as a
>>>> base to implement a better algorithm, but if this code will
>>>> not serve any functional purpose I see no value in putting
>>>> any additional work into it.
>> 
>> That's not at all what I wanted to achieve here!  I do see this
>> code serving a purpose when it runs on a dedicated website (which
>> can probably run on a tiny VM) built to compare detection
>> algorithms.  And this doesn't mean that you would have to
>> implement those other algorithms yourself, that could easily be
>> done by others.  To be clear, I think that your rewrite makes it
>> more likely that others start working on different algorithms, so
>> that's already a benefit.
>> 
>> I'm just careful with adding this code directly to Tor Metrics
>> yet, because that causes quite some overhead for you and me
>> without providing an immediate benefit.  It's a non-trivial
>> amount of work for me to review your code and make sure it does
>> the exact same thing as the current code, because I didn't write
>> the current code and only reviewed it once many years ago.
> 
> 
> Thanks, that makes sense. With this in mind, I am going to get a
> axe out and restructure aggressively to make the functions and flow
> support easy implementation of other algorithms.

Sounds great!  I think it would be good to have your rewrite produce
the same output as the original, but it's certainly a good idea to
make the rewritten code as readable as possible.

> On that note, The inputs of the current code rely on the output of
> other scripts in metrics-web. If I am going to host this separately
> I would like to directly query the correct public
> API/directory/Store so that it can collect updated daily data
> automatically instead of manually. What is the proper place to
> query?

You can download the latest client number estimates here:

https://metrics.torproject.org/clients-data.html

https://metrics.torproject.org/stats/clients.csv

>> I would rather want to promote your website by adding a link to 
>> Metrics and by making a call for help on the Tor development
>> mailing list.  Let me know if you'd want that.
> 
> Once I get something up and running a call to see what
> restructuring would be needed for models to be more easily tested
> and implemented within this code base would allow me to make any
> structural changes while I still have it in my head.

(I have difficulty parsing that sentence.  Can you rephrase that?)


In any case, glad to hear that you're going to keep working on this.
Thanks!

All the best,
Karsten


> 
> Best, s2e
> 
> 
> 
> Really hope that makes sense.
> 
> All the best, Karsten
> 
> 
>>>> Best, s2e
>>>> 
>>>> 
>>>>> All the best, Karsten
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Best, s2e
>>>>>> 
>>>>>> -- seamus tuohy | Sr. Technologist - Internet
>>>>>> Initiatives stuohy at internews.org Skype/XMPP on request
>>>>>> PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
>>>>>> MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>> 
>>>>>> INTERNEWS | Local Voices. Global Change.
>>>>>> www.internews.org | @internews
>>>>>> _______________________________________________ 
>>>>>> metrics-team mailing list
>>>>>> metrics-team at lists.torproject.org 
>>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
>>>>>>
>>>>>
>>>>>
>>>>>>
>
>>>>>> 
- -

> 
> -- seamus tuohy | Sr. Technologist - Internet Initiatives 
> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E B7CF EDD5
> F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
> 
> INTERNEWS | Local Voices. Global Change. www.internews.org |
> @internews
> 

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWTEDyAAoJEJD5dJfVqbCrl2UH/33EuK9vMflcbyhfG0TU6Rf3
ENomvMmCCfs2rIGPh9lRPmnqFWix446sJiU2byecKvArFTGwDOsAeX9U67Bf6lBZ
8y8/3a6Zw27ZKVqtd6Hgqf28F95ls3fhNIvjWbRx4nRQ4bxPWdxgHEHUe7Qtczq+
SUCMz18u5U5e0UsOSjkMi5mOvuUdb9/szTSCm0Zw31xesQbdG+6VSmmjWyxJwWoR
VJmNRcCPwDs4aakqW9jKyv6dVenr5PI0fLAYulNnAXxJ0LdiUR8B6aDzMOdKF3+b
/e9h5hHRm9puQXZ75HsXFFot6C7SlK1EvsTIVAxuitWa3keWgrGvCiJA6hE2Ovo=
=WSAT
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list