[metrics-team] metrics-web detect script update and question

Karsten Loesing karsten at torproject.org
Sun Nov 22 07:49:18 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Seamus,

do the two links I pasted below not contain what you're looking for?

https://metrics.torproject.org/clients-data.html

https://metrics.torproject.org/stats/clients.csv

Otherwise, it's true that CollecTor only has the raw data, and I
wouldn't recommend re-implementing the aggregation algorithm from scratch.

I might be able to provide you with intermediate data, as a snapshot,
that is produced as part of the current aggregation.  But those
numbers are pretty hard to understand, much harder than what's in
clients.csv.  Is this what you're asking for?

Let me know how I can help. :)

All the best,
Karsten


On 22/11/15 03:16, seamus tuohy wrote:
> 
> Hello Karsten,
> 
> Is there a place I can grab the daily/weekly data instead of the 
> historic data dump? I checked on collector, but it seems to only
> have the raw data a few steps removed.
> 
> Best, s2e
> 
> 
> Karsten Loesing <karsten at torproject.org> writes:
> 
> Hello Seamus,
> 
> On 17/11/15 21:49, seamus tuohy wrote:
>>>> Howdy Karsten,
>>>> 
>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>> 
>>>> On 16/11/15 22:57, seamus tuohy wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>> 
>>>>> Hello Seamus,
>>>>> 
>>>>>>> Karsten Loesing <karsten at torproject.org> writes:
>>>>>>> 
>>>>>>>> Great to see your interest in making the censorship 
>>>>>>>> detector better!
>>>>>>>> 
>>>>>>>> So, I have been thinking about your plan to submit a
>>>>>>>> pull request for the rewrite of the current
>>>>>>>> functionality, and I think I'd want to suggest a
>>>>>>>> different plan:
>>>>>>>> 
>>>>>>>> How about you deploy your rewritten code on a
>>>>>>>> minimal website that visualizes the output of your
>>>>>>>> rewritten censorship detection script, possibly
>>>>>>>> comparing it to other algorithms, and we link that
>>>>>>>> website from the Metrics website?
>>>>>>>> 
>>>>>>> 
>>>>>>> Sadly, My expertise is not in the statistical analysis,
>>>>>>> but in open source software development. This is why I
>>>>>>> focused on making the existing code cleaner and more
>>>>>>> cleanly documented and structured. It would be a much
>>>>>>> more significant task for me to compare the original
>>>>>>> algorithm to others.
>>>>>>> 
>>>>>>>> Let me explain this plan a bit more: what we really
>>>>>>>> want is a better censorship detection algorithm that
>>>>>>>> doesn't produce as many false positives.  Your
>>>>>>>> rewrite can be a great starting point for that.  But
>>>>>>>> there's no need to merge code directly into Metrics
>>>>>>>> until we're sure we found an algorithm we like better
>>>>>>>> than the current one, and maybe that requires making
>>>>>>>> two or three attempts to get it right. For now, I'd
>>>>>>>> rather want to add a link to your results.  We can
>>>>>>>> always discuss replacing the script in Metrics with
>>>>>>>> a new one later, but there are really no requirements
>>>>>>>> other than that it can read a .csv in the provided
>>>>>>>> format and write a new .csv in the expected format.
>>>>>>>> 
>>>>>>>> If you're not sure what I mean by link, here are two 
>>>>>>>> examples for external links on Tor Metrics:
>>>>>>>> 
>>>>>>>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>
>>>>>>>> 
https://metrics.torproject.org/uncharted-data-flow.html
>>>>>>>> 
>>>>>>>> Does that plan make sense to you?  It's really great
>>>>>>>> that you're picking up this topic.  Thanks for that!
>>>>>>>> 
>>>>>>> 
>>>>>>> I will keep the code available if anyone wants to use
>>>>>>> it as a base to implement a better algorithm, but if
>>>>>>> this code will not serve any functional purpose I see
>>>>>>> no value in putting any additional work into it.
>>>>> 
>>>>> That's not at all what I wanted to achieve here!  I do see
>>>>> this code serving a purpose when it runs on a dedicated
>>>>> website (which can probably run on a tiny VM) built to
>>>>> compare detection algorithms.  And this doesn't mean that
>>>>> you would have to implement those other algorithms
>>>>> yourself, that could easily be done by others.  To be
>>>>> clear, I think that your rewrite makes it more likely that
>>>>> others start working on different algorithms, so that's
>>>>> already a benefit.
>>>>> 
>>>>> I'm just careful with adding this code directly to Tor
>>>>> Metrics yet, because that causes quite some overhead for
>>>>> you and me without providing an immediate benefit.  It's a
>>>>> non-trivial amount of work for me to review your code and
>>>>> make sure it does the exact same thing as the current code,
>>>>> because I didn't write the current code and only reviewed
>>>>> it once many years ago.
>>>> 
>>>> 
>>>> Thanks, that makes sense. With this in mind, I am going to
>>>> get a axe out and restructure aggressively to make the
>>>> functions and flow support easy implementation of other
>>>> algorithms.
> 
> Sounds great!  I think it would be good to have your rewrite
> produce the same output as the original, but it's certainly a good
> idea to make the rewritten code as readable as possible.
> 
>>>> On that note, The inputs of the current code rely on the
>>>> output of other scripts in metrics-web. If I am going to host
>>>> this separately I would like to directly query the correct
>>>> public API/directory/Store so that it can collect updated
>>>> daily data automatically instead of manually. What is the
>>>> proper place to query?
> 
> You can download the latest client number estimates here:
> 
> https://metrics.torproject.org/clients-data.html
> 
> https://metrics.torproject.org/stats/clients.csv
> 
>>>>> I would rather want to promote your website by adding a
>>>>> link to Metrics and by making a call for help on the Tor
>>>>> development mailing list.  Let me know if you'd want that.
>>>> 
>>>> Once I get something up and running a call to see what 
>>>> restructuring would be needed for models to be more easily
>>>> tested and implemented within this code base would allow me
>>>> to make any structural changes while I still have it in my
>>>> head.
> 
> (I have difficulty parsing that sentence.  Can you rephrase that?)
> 
> 
> In any case, glad to hear that you're going to keep working on
> this. Thanks!
> 
> All the best, Karsten
> 
> 
>>>> 
>>>> Best, s2e
>>>> 
>>>> 
>>>> 
>>>> Really hope that makes sense.
>>>> 
>>>> All the best, Karsten
>>>> 
>>>> 
>>>>>>> Best, s2e
>>>>>>> 
>>>>>>> 
>>>>>>>> All the best, Karsten
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best, s2e
>>>>>>>>> 
>>>>>>>>> -- seamus tuohy | Sr. Technologist - Internet 
>>>>>>>>> Initiatives stuohy at internews.org Skype/XMPP on
>>>>>>>>> request PGP: 36AC 272E B7CF EDD5 F907 E488 B619
>>>>>>>>> 3EC7 3CF0 7AA7 MiniLock:
>>>>>>>>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>>>>>>> 
>>>>>>>>> INTERNEWS | Local Voices. Global Change. 
>>>>>>>>> www.internews.org | @internews 
>>>>>>>>> _______________________________________________ 
>>>>>>>>> metrics-team mailing list 
>>>>>>>>> metrics-team at lists.torproject.org 
>>>>>>>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/metrics-team
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>
>>>>>>>>>
>
>>>>>>>>> 
- -
> 
>>>> 
>>>> -- seamus tuohy | Sr. Technologist - Internet Initiatives 
>>>> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E
>>>> B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7 MiniLock: 
>>>> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
>>>> 
>>>> INTERNEWS | Local Voices. Global Change. www.internews.org | 
>>>> @internews
>>>> 
> 
> 
> -- seamus tuohy | Sr. Technologist - Internet Initiatives 
> stuohy at internews.org Skype/XMPP on request PGP: 36AC 272E B7CF EDD5
> F907 E488 B619 3EC7 3CF0 7AA7 MiniLock:
> 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf
> 
> INTERNEWS | Local Voices. Global Change. www.internews.org |
> @internews
> 

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWUXN9AAoJEJD5dJfVqbCrIUAIAMERBYrI54ZQGo+GjcjK9cQB
IUbC0nm1RUiPmrupg77osydb/Kcr+AXkLm/tezSi05VWfP7MJ6T4F/pPPuZD5Y6T
EeAIkFs/NCedupY86EgXC5hT9gYWFp5u0mcvZ2A/GGAC8qPOCuESnuo9NClRbc/O
glTGgzVEXM3WOC0pxDci1W3BmF9UKtBPNPAzbK7BoVSDA1BLB6vkYBpFfRP9Fnb/
NdvfTiUqe7ZMuZhjrrmP+5UKwS+kOX0BaHjnPv5DWmzzph+1e/DU4F5fSuDvwFlA
wQkYdF2vUzabQz7N4tq1BdxxSmlKynBE2cr7Q+Q5SOLqer2Y1qIcav9h3Wz7krk=
=fV2h
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list