[tor-bugs] #21588 [- Select a component]: Rewrite the censorship detector used by the Tor Metrics website in Java

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Mar 1 14:51:06 UTC 2017


#21588: Rewrite the censorship detector used by the Tor Metrics website in Java
--------------------------------------+-----------------
     Reporter:  karsten               |      Owner:
         Type:  enhancement           |     Status:  new
     Priority:  Medium                |  Milestone:
    Component:  - Select a component  |    Version:
     Severity:  Normal                |   Keywords:
Actual Points:                        |  Parent ID:
       Points:                        |   Reviewer:
      Sponsor:                        |
--------------------------------------+-----------------
 The censorship detector written by George Danezis in 2011 is the only part
 of the Tor Metrics website that is written in Python.  We should consider
 rewriting it in Java in order to integrate it more closely into the rest
 of the Tor Metrics website code.  This is also related to #19754.

 iwakeh, want to comment on whether this makes sense or not, before
 somebody else comes and picks this up?

 (The following thoughts depend on whether we reach consensus in the
 metrics team that this is even a good idea.)

 The first step of this rewrite should be to create a minimal setup of the
 Python file that doesn't require setting up an own instance of the Tor
 Metrics website.  I'll attach a compressed version of the input file
 `userstats-detector.csv` to this ticket.  Running the Python version
 should be as simple as downloading that attachment and the two Python
 files `detector.py` and `country_info.py` from
 [https://gitweb.torproject.org/metrics-web.git/tree/modules/clients
 metrics-web's clients module] and running:

 {{{
 unxz userstats-detector.csv.xz
 python detector.py
 }}}

 That command should run for a few minutes and produce a couple of files
 including `userstats-ranges.csv`, which is the only output file we care
 about:

 {{{
 date,country,minusers,maxusers
 2011-09-08,a1,559.698186453,1399.64885163
 2011-09-09,a1,469.497090181,1451.46081727
 2011-09-11,a1,639.857484235,1457.19233381
 2011-09-12,a1,597.260782974,1312.46735446
 [...]
 }}}

 Step two could be to throw out any unused code that is not required to
 produce this output file.  Ideally, this would happen in one or more
 separate commits.

 Step three would be to look at required external dependencies to rewrite
 the remaining code in Java.  I haven't looked at all at this yet, so maybe
 this is doable without adding external dependencies, which would be best.
 But if external dependencies are necessary, maybe there's something in
 Apache Commons that we can use here.  In any case, adding external
 dependencies requires discussion on this ticket.

 Step four would be to do the rewrite and to try out that it produces
 roughly the same results (we're cutting off decimal places, for example).
 There's a guide on coding style
 [wiki:org/teams/MetricsTeam/MetricsJavaStyleGuide#CodingStyle here].

 Step five would be to review the new code and integrate it into metrics-
 web.

 All in all, I could imagine that steps 1 to 4 might be an interesting task
 for a new volunteer.  Optimistically adding the `metrics-help` keyword.

 But let's first discuss whether this rewrite makes sense, or whether
 there's a better plan to do it!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21588>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list