[tor-dev] Sanitizing and publishing our web server logs

Brian Szymanski ski at allafrica.com
Fri Sep 2 14:08:37 UTC 2011

What exactly are we hoping to gain from the analysis of the (hopefully 
correctly) stripped logs?

On 09/02/2011 09:06 AM, Sebastian Hahn wrote:
> On Sep 2, 2011, at 2:46 PM, Karsten Loesing wrote:
>> Hi Andrew,
>> On 9/2/11 2:18 AM, Andrew Lewman wrote:
>>> On Thursday, August 25, 2011 04:08:00 Karsten Loesing wrote:
>>>> we have been discussing sanitizing and publishing our web server logs
>>>> for quite a while now.  The idea is to remove all potentially sensitive
>>>> parts from the logs, publish them in monthly tarballs on the metrics
>>>> website, and analyze them for top visited pages, top downloaded
>>>> packages, etc.  See the tickets #1641 and #2489 for details.
>>> My concern is that we have the data at all.  We shouldn't have any
>>> sensitive information logged on the webservers. Therefore sanitizing the
>>> logs should not be necessary.
>> My concern is that we remove details from the logs and learn in a few
>> months that we wanted to analyze them.  I'd like to sanitize the
>> existing logs first, make them available for people to analyze, and only
>> change the Apache configuration once we're really sure we found the
>> level of detail that we want.  There's no rush in changing the Apache
>> configuration now, right?
> So, if we decide in a few months that we need more detail, we can
> change the logging then. Sure, we won't have history, but that just
> means that the graphs we make start in 2012 instead of 2007.
>> Finally, we'll have to find a way to encode the country code in the logs
>> and still keep Apache's Combined Log Format.  And do we still care about
>> the HTTP vs. HTTPS bit?  Because if we use the IP column for the country
>> code, we'll have to encode the HTTP/HTTPS thing somewhere else.
> IP addresses have plenty of bits for a country code and http/https
> encoding, we could for example use the first bytes for country code.
>> So, it should be possible to implement GeoIP lookups in the future.  I'd
>> like to consider that a separate task from sanitizing the existing web
>> logs, though.
> It's separate, but without the on-the-fly geoip lookups we won't have
> any, because the sanitizing process doesn't get them magically.
> All the best
> Sebastian
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

More information about the tor-dev mailing list