[tor-bugs] #4463 [Website]: Set up web log analysis tool

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Thu Dec 1 11:45:43 UTC 2011


#4463: Set up web log analysis tool
---------------------+------------------------------------------------------
 Reporter:  runa     |          Owner:  runa                        
     Type:  project  |         Status:  assigned                    
 Priority:  normal   |      Milestone:  Sponsor Z: December 31, 2011
Component:  Website  |        Version:                              
 Keywords:           |         Parent:                              
   Points:           |   Actualpoints:                              
---------------------+------------------------------------------------------

Comment(by runa):

 I looked at four different web log analysis tools, here's what I found:

 [http://piwik.org/ Piwik] looks great, but is not available in Ubuntu or
 Debian. Setting it up manually is pretty straight forward, but you will
 not be able to import Apache logs without using some third-party script.
 Last time I checked, that third-party script had some issues with our
 sanitized log format.

 [http://awstats.sourceforge.net/ AWStats] is easy to set up and easy to
 use, but incredibly slow when importing logs. I set up AWStats on an
 Ubuntu EC2 instance and pulled the sanitized logs for January and February
 2010 (you only get 8 GB storage). The import of wiki.torproject.org-
 access.log was pretty quick, and we have some
 [http://107.22.86.235/statistics/awstats.pl?month=01&year=2010&output=main&config=wiki&framename=index
 preliminary results]. However, the import of www.torproject.org-access.log
 does not complete at all. Maybe it's because I tried to do all this in the
 cloud, or maybe it's just AWStats.

 [http://www.webalizer.org/ Webalizer] is just as easy to set up and use as
 AWStats. It doesn't look as pretty, but it's a lot faster when it comes to
 importing existing log I managed to set it up and import the Jan+Feb
 www.torproject.org-access.log without any problems.

 [http://www.splunk.com/ Splunk] was recommended to me by someone on
 Twitter, so I figured I'd look into it. The free version of Splunk allows
 you to index only 500 megabytes of data per day, we probably want more
 than that.

 Another option is to write our own parser and use R to create graphs
 similar to what we have on metrics.tpo. Writing our own parser will take
 some time, so maybe we should just go with Webalizer for now.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4463#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list