[tor-bugs] #2718 [Metrics]: Analyze Tor usage data for ways to automatically detect country-wide blockings

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Sat Mar 12 07:16:42 UTC 2011


#2718: Analyze Tor usage data for ways to automatically detect country-wide
blockings
---------------------+------------------------------------------------------
 Reporter:  karsten  |          Owner:  karsten
     Type:  task     |         Status:  new    
 Priority:  normal   |      Milestone:         
Component:  Metrics  |        Version:         
 Keywords:           |         Parent:         
   Points:           |   Actualpoints:         
---------------------+------------------------------------------------------
 Every now and then, there are country-wide blockings of Tor.  In most
 cases we learn about these events from users telling us that Tor has
 stopped working from them.  This may work okay, but given that we already
 have usage data per country, we should be able to detect blockings
 ourselves, preferrably automatically and with as few false positives as
 possible.

 I already spent some time on a censorship detector that takes our usage
 data as input and tells us whenever the usage on a given day falls outside
 an expected interval.  But I'm afraid I don't know enough math to push
 this further, at least not without reading more about time series
 analysis.  Maybe someone wants to pick this up?

 Here's where I am:

 We take our estimated daily user numbers as input.  Our goal is to give
 out a warning whenever the estimated user number from a given country
 drops below a predicted value.  This predicted value is not static, but
 should depend on previous values, therefore we should use time series
 analysis.  We want to model the user numbers for days 1..n-1, predict a
 value for day n, and warn if the actual value for day n is lower than the
 predicted value minus some error.

 I read some stuff about time series analysis and came up with the ARIMA
 model.  Thankfully, the ARIMA model is already implemented in R.

 I'm going to upload some R code to the [http://gitweb.torproject.org
 /metrics-tasks.git metrics-tasks] repository once I have a ticket number
 (see comment below).  The R code generates a PDF that shows on which days
 we'd receive a warning.  I'm also going to attach the PDf to this ticket.
 Here's how you can run the R code yourself:

 {{{
 $ wget https://metrics.torproject.org/csv/direct-users.csv
 $ R --slave -f detect-censorship.R
 }}}

 Possible next steps are a) finding good parameters for the ARIMA model, b)
 trying other time series models, and c) extending the approach to bridge
 users.  Once we have a useful approach for estimated daily user numbers,
 we should d) try to get rid of day-based statistics which have a delay of
 1--2 days and make the approach work for directory request stats and
 connecting bridge user stats to get results more quickly.  The final step
 is to e) integrate the R code with the metrics website and execute it
 every few hours.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2718>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list