[tor-bugs] #5047 [Obfsproxy]: Implement basic usage statistics in obfsproxy

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Wed Feb 8 14:43:42 UTC 2012


#5047: Implement basic usage statistics in obfsproxy
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  asn
     Type:  enhancement  |         Status:  new
 Priority:  normal       |      Milestone:     
Component:  Obfsproxy    |        Version:     
 Keywords:               |         Parent:     
   Points:               |   Actualpoints:     
-------------------------+--------------------------------------------------
 We should implement some basic usage statistics in obfsproxy to learn
 about usage as long as Tor doesn't have support for obfsproxy statistics
 (#5040).  Once Tor supports these statistics, the implementation in
 obfsproxy can be removed.  Both Tor's and obfsproxy's statistics should be
 equivalent or at least easily comparable.

 The idea is to have obfsproxy log incoming connections in a privacy-aware
 way and provide a simple script to convert these logs into a format that
 can be published without issues.  Bridge operators can periodically run
 the script and send the output to the Tor developers who publish and
 analyze them.  The implementation in obfsproxy should be quite simple in
 order not to break too much stuff.  The conversion script should be dead
 simple, so that bridge operators can understand what's going on.

 Here's a possible approach:

 We want to count daily connections by country and daily unique IP
 addresses by country.  Similar to other statistics in Tor, we want to
 aggregate data over 24-hour periods, resolve IP addresses to country
 codes, and round up frequencies to multiples of 8.

  1. When obfsproxy starts, it does three things: a) generate a secret
 string S that it only keeps in memory; b) note the timestamp TS when it
 started; c) create a buffer B with a capacity of 100 log messages.

  2. Whenever obfsproxy receives a client connection, it runs steps 3 to 5:

  3. It checks whether at least 24 hours have passed since TS.  If so, it
 flushes all log messages from buffer B, shuffles them, and appends them to
 a file on disk.  It also increments TS in 24-hour steps until TS is not
 more than 24 hours in the past.

  4. It checks whether B is full, i.e., contains 100 messages.  If so, it
 flushes B and appends messages to a file on disk in random order.

  5. It creates a new log message containing a) timestamp TS (which is NOT
 the current timestamp!), b) the country code of the connecting IP as
 resolved by a GeoIP database, c) the hashed IP address using secret S,
 i.e., `H(IP || S)` with a cryptographic hash function of the implementor's
 choice.  An example log message would be `"2012-02-07 14:01:04 de
 1234567890123456789012345678901234567890"`.

  6. When obfsproxy stops, it does NOT flush the contents of B to disk.  It
 forgets about S, possibly in a cryptographically secure manner.

 The buffer has two functions here.  First, it removes the original order
 of connections, which may still be meaningful if it contains connections
 from countries with few connections.  Second, the buffer protects the
 timing of single client connections that occur when obfsproxy is
 terminated and restarted shortly after a 24-hour interval ends.  The
 buffer size of 100 was arbitrarily chosen to avoid memory problems on
 heavily used bridges.  Higher numbers are preferred, but if that makes
 things more complicated, 100 should be a large enough number.

 The log messages still reveal too much information to be published.  They
 shouldn't contain IP hashes, and frequencies still need to be rounded up
 to the next multiple of 8.  The following bash script, which probably
 requires a lot more comments, converts a log message file into a format
 that can be published by bridge operators.

 {{{
 #!/bin/bash
 echo "Daily rounded total requests by country"
 cut -d" " -f1-3 data | sort | uniq -c | \
 awk '{printf "%s %s %s %d\n", $2, $3, $4, 8*(int(($1+7)/8))}'
 echo "Daily rounded unique IPs by country"
 sort data | uniq | cut -d" " -f1-3 | uniq -c | \
 awk '{printf "%s %s %s %d\n", $2, $3, $4, 8*(int(($1+7)/8))}'
 }}}

 Note that the approach taken here was designed to keep the changes to
 obfsproxy small.  Of course, we could implement everything in obfsproxy
 and write nice files that bridge operators can mail to the Tor devs
 directly.  That would be an implementation similar to what Tor does for
 the various statistics.  The buffered logging approach seemed to be a good
 compromise between not logging sensitive data and not adding too much
 code.  Whether that is true is a question for the obfsproxy developers.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5047>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list