[metrics-team] metrics-web detect script update and question

seamus tuohy stuohy at internews.org
Sun Nov 22 02:16:09 UTC 2015


Hello Karsten,

Is there a place I can grab the daily/weekly data instead of the
historic data dump? I checked on collector, but it seems to only have
the raw data a few steps removed.

Best,
s2e


Karsten Loesing <karsten at torproject.org> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Seamus,
>
> On 17/11/15 21:49, seamus tuohy wrote:
>> Howdy Karsten,
>>
>> Karsten Loesing <karsten at torproject.org> writes:
>>
>> On 16/11/15 22:57, seamus tuohy wrote:
>>>>>
>>>>> Hello,
>>>
>>> Hello Seamus,
>>>
>>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>>>
>>>>>> Great to see your interest in making the censorship
>>>>>> detector better!
>>>>>>
>>>>>> So, I have been thinking about your plan to submit a pull
>>>>>> request for the rewrite of the current functionality, and I
>>>>>> think I'd want to suggest a different plan:
>>>>>>
>>>>>> How about you deploy your rewritten code on a minimal
>>>>>> website that visualizes the output of your rewritten
>>>>>> censorship detection script, possibly comparing it to other
>>>>>> algorithms, and we link that website from the Metrics
>>>>>> website?
>>>>>>
>>>>>
>>>>> Sadly, My expertise is not in the statistical analysis, but
>>>>> in open source software development. This is why I focused on
>>>>> making the existing code cleaner and more cleanly documented
>>>>> and structured. It would be a much more significant task for
>>>>> me to compare the original algorithm to others.
>>>>>
>>>>>> Let me explain this plan a bit more: what we really want is
>>>>>> a better censorship detection algorithm that doesn't
>>>>>> produce as many false positives.  Your rewrite can be a
>>>>>> great starting point for that.  But there's no need to
>>>>>> merge code directly into Metrics until we're sure we found
>>>>>> an algorithm we like better than the current one, and maybe
>>>>>> that requires making two or three attempts to get it right.
>>>>>> For now, I'd rather want to add a link to your results.  We
>>>>>> can always discuss replacing the script in Metrics with a
>>>>>> new one later, but there are really no requirements other
>>>>>> than that it can read a .csv in the provided format and
>>>>>> write a new .csv in the expected format.
>>>>>>
>>>>>> If you're not sure what I mean by link, here are two
>>>>>> examples for external links on Tor Metrics:
>>>>>>
>>>>>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>>>>>
>>>>>>
>>>>>>
> https://metrics.torproject.org/uncharted-data-flow.html
>>>>>>
>>>>>> Does that plan make sense to you?  It's really great that
>>>>>> you're picking up this topic.  Thanks for that!
>>>>>>
>>>>>
>>>>> I will keep the code available if anyone wants to use it as a
>>>>> base to implement a better algorithm, but if this code will
>>>>> not serve any functional purpose I see no value in putting
>>>>> any additional work into it.
>>>
>>> That's not at all what I wanted to achieve here!  I do see this
>>> code serving a purpose when it runs on a dedicated website (which
>>> can probably run on a tiny VM) built to compare detection
>>> algorithms.  And this doesn't mean that you would have to
>>> implement those other algorithms yourself, that could easily be
>>> done by others.  To be clear, I think that your rewrite makes it
>>> more likely that others start working on different algorithms, so
>>> that's already a benefit.
>>>
>>> I'm just careful with adding this code directly to Tor Metrics
>>> yet, because that causes quite some overhead for you and me
>>> without providing an immediate benefit.  It's a non-trivial
>>> amount of work for me to review your code and make sure it does
>>> the exact same thing as the current code, because I didn't write
>>> the current code and only reviewed it once many years ago.
>>
>>
>> Thanks, that makes sense. With this in mind, I am going to get a
>> axe out and restructure aggressively to make the functions and flow
>> support easy implementation of other algorithms.
>
> Sounds great!  I think it would be good to have your rewrite produce
> the same output as the original, but it's certainly a good idea to
> make the rewritten code as readable as possible.
>
>> On that note, The inputs of the current code rely on the output of
>> other scripts in metrics-web. If I am going to host this separately
>> I would like to directly query the correct public
>> API/directory/Store so that it can collect updated daily data
>> automatically instead of manually. What is the proper place to
>> query?
>
> You can download the latest client number estimates here:
>
> https://metrics.torproject.org/clients-data.html
>
> https://metrics.torproject.org/stats/clients.csv
>
>>> I would rather want to promote your website by adding a link to
>>> Metrics and by making a call for help on the Tor development
>>> mailing list.  Let me know if you'd want that.
>>
>> Once I get something up and running a call to see what
>> restructuring would be needed for models to be more easily tested
>> and implemented within this code base would allow me to make any
>> structural changes while I still have it in my head.
>
> (I have difficulty parsing that sentence.  Can you rephrase that?)
>
>
> In any case, glad to hear that you're going to keep working on this.
> Thanks!
>
> All the best,
> Karsten
>
>
>>
>> Best, s2e
>>
>>
>>
>> Really hope that makes sense.
>>
>> All the best, Karsten
>>
>>
>>>>> Best, s2e
>>>>>
>>>>>
>>>>>> All the best, Karsten
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Best, s2e
>>>>>>>
>>>>>>> -- seamus tuohy | Sr. Technologist - Internet
>>>>>>> Initiatives stuohy at internews.org Skype/XMPP on request
>>>>>>> PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
>>>>>>> MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>>>
>>>>>>> INTERNEWS | Local Voices. Global Change.
>>>>>>> www.internews.org | @internews
>>>>>>> _______________________________________________
>>>>>>> metrics-team mailing list
>>>>>>> metrics-team at lists.torproject.org
>>>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>
>>>>>>>
> - -
>
>>
>> -- seamus tuohy | Sr. Technologist - Internet Initiatives
>> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E B7CF EDD5
>> F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>
>> INTERNEWS | Local Voices. Global Change. www.internews.org |
>> @internews
>>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJWTEDyAAoJEJD5dJfVqbCrl2UH/33EuK9vMflcbyhfG0TU6Rf3
> ENomvMmCCfs2rIGPh9lRPmnqFWix446sJiU2byecKvArFTGwDOsAeX9U67Bf6lBZ
> 8y8/3a6Zw27ZKVqtd6Hgqf28F95ls3fhNIvjWbRx4nRQ4bxPWdxgHEHUe7Qtczq+
> SUCMz18u5U5e0UsOSjkMi5mOvuUdb9/szTSCm0Zw31xesQbdG+6VSmmjWyxJwWoR
> VJmNRcCPwDs4aakqW9jKyv6dVenr5PI0fLAYulNnAXxJ0LdiUR8B6aDzMOdKF3+b
> /e9h5hHRm9puQXZ75HsXFFot6C7SlK1EvsTIVAxuitWa3keWgrGvCiJA6hE2Ovo=
> =WSAT
> -----END PGP SIGNATURE-----

--
seamus tuohy | Sr. Technologist - Internet Initiatives
stuohy at internews.org
Skype/XMPP on request
PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf

INTERNEWS | Local Voices. Global Change.
www.internews.org | @internews


More information about the metrics-team mailing list