[metrics-team] metrics-web detect script update and question

Sat Nov 14 17:04:24 UTC 2015

Hello All,

I have been updating the detector scripts in metrics-web with a goal
towards making it easier for others (hopefully with more statistical
knowledge than I) to work with and build on the code. It has been a
substantial rewrite that relies heavily on the python pandas library. I
have just reached the point where I can accurately duplicate the
functionality of the original code as it is called in the
80-run-clients-stats.sh file. This code also removes the need for
pre-processing the data as done by the userstats-detector.R script.

-- THE ACTUAL QUESTION --

The original script has data-visualization functions (e.g. plot_all())
that don't seem to be called from within metrics-web and I want to get
some guidance on if I should re-implement them. If so, where I should be
looking to make sure that I implement them to work seamlessly with the
existing system?

-- FLUFF FOR THE INTERESTED --

Here is the current code:
https://github.com/elationfoundation/metrics-web/blob/master/modules/clients/detector.py

Below is a quick overview of the changes in output that may impact other
programs, or consumers of this information. I will write up a much more
in-depth overview of functionality when I submit the actual pull
request. I am thinking of getting basic PT anomaly detection added
before this before I submit the pull request. This should be much easier
with the new code.

Here is a comparison of the old and new output:
https://gist.github.com/elationfoundation/1714e0f1e9f8728eddb1

NEW_ranges_file_SUBSET.csv
OLD_ranges_file_SUBSET.csv

- The output from write_all function [now called
  write_censorship_analysis()] has had some fields added to it. The old
  code had some duplicate processing that was built into it. The new
  code identifies the censorship and spike events the first time it runs
  through the time series so that the other functions can just read from
  the ranges output.
- I have also changed the names of some of the
  fields.This will impact any code that is currently parsing this
  output. I can either change the field names back, write a seperate
  file that only has the currently formatted data and heading in it for
  further processing, or whatever code process' this output can be
  updated to parse this properly.

NEW_short_censorship_report.txt
OLD_short_censorship_report.txt

- I have slightly modified the short censorship report produced by
write_ml_report() which is now called write_short_report(). The changes
are merely cosmetic, but I think there is a lot that can be done to
eventually make the short report a more useful document (e.g. putting it
in a structured format that will allow others to scrape and incorporate
it into a threat feed).

Best,
s2e

--
seamus tuohy | Sr. Technologist - Internet Initiatives
stuohy at internews.org
Skype/XMPP on request
PGP: 36AC 272E B7CF EDD5 F907 E488 B619 3EC7 3CF0 7AA7
MiniLock: 2G3JmRWRYB3B7rthZqkzomcRe8GwJvPtSooA748XMsTBdf

INTERNEWS | Local Voices. Global Change.
www.internews.org | @internews