[tor-project] Plan to double-check Tor Browser initial download numbers

Ian Goldberg tor at cypherpunks.ca
Mon Jul 17 23:54:14 UTC 2017

On Mon, Jul 17, 2017 at 08:05:30PM +0200, Karsten Loesing wrote:
> Hello list,
> it's been almost two years since we started collecting sanitized Apache
> web server logs. During this time the number of Tor Browser initial
> downloads rarely went below 70,000 per day.
> https://metrics.torproject.org/webstats-tb.html
> Either there must be a steady demand for fresh binaries, or there is a
> non-zero number of bots downloading the Tor Browser binary several times
> per day.
> I already double-checked our aggregation code that takes sanitized web
> server logs as input and produces daily totals as output. It looks okay
> to me.
> I'd also like to double-check whether there's anything unexpected
> happening before the sanitizing step. For example, could it be that
> there are a few IP addresses making hundreds or thousands of requests?
> Or are there lots of requests with same referrers or common user agents
> indicating bots?
> My plan is to ask our admins to temporarily add a second Apache log file
> on one of the dist.torproject.org hosts with the default Apache log file
> format without the sanitizing that is usually applied.
> A snapshot of 15 or 30 minutes would likely be sufficient as sample. I'd
> analyze this log file on the server, delete it, and report my findings here.
> This message has two purposes:
>  1. Is this approach acceptable? If not, are there more acceptable
> approaches yielding similar results?
>  2. Are there any theories what might keep the numbers from dropping
> below those 70,000 requests per day? What should I be looking for?
> Thanks!
> All the best,
> Karsten

Any chance you (i.e. a script) could replace the IP address with
HASH(IP||salt) for a randomly chosen salt that you don't know, and which
is deleted when the 30 minutes are up, before you get access to the log

More information about the tor-project mailing list