[tor-project] Plan to double-check Tor Browser initial download numbers

Karsten Loesing karsten at torproject.org
Wed Aug 2 18:50:53 UTC 2017


On 2017-07-19 18:50, Karsten Loesing wrote:
> On 2017-07-17 20:05, Karsten Loesing wrote:
>> Hello list,
>>
>> it's been almost two years since we started collecting sanitized Apache
>> web server logs. During this time the number of Tor Browser initial
>> downloads rarely went below 70,000 per day.
>>
>> https://metrics.torproject.org/webstats-tb.html
>>
>> Either there must be a steady demand for fresh binaries, or there is a
>> non-zero number of bots downloading the Tor Browser binary several times
>> per day.
>>
>> I already double-checked our aggregation code that takes sanitized web
>> server logs as input and produces daily totals as output. It looks okay
>> to me.
>>
>> I'd also like to double-check whether there's anything unexpected
>> happening before the sanitizing step. For example, could it be that
>> there are a few IP addresses making hundreds or thousands of requests?
>>
>> Or are there lots of requests with same referrers or common user agents
>> indicating bots?
>>
>> My plan is to ask our admins to temporarily add a second Apache log file
>> on one of the dist.torproject.org hosts with the default Apache log file
>> format without the sanitizing that is usually applie>
>> A snapshot of 15 or 30 minutes would likely be sufficient as sample. I'd
>> analyze this log file on the server, delete it, and report my findings here.
> 
> Based on the discussion here, my amended plan is to use the default
> Apache log file format but leave out timestamps and IP addresses.
> 
> I'll ask our sysadmins to produce such a log file some time tomorrow,
> unless there are further concerns/ideas.

So, I did ask the admins for some logs without timestamps and without IP
addresses, but those logs did not reveal anything unusual.

It might have helped to include IP addresses, but I wanted to keep this
analysis simple and decided to instead ask the admins for a log with
only timestamps and no further request details.

iwakeh and I looked at a few days of these timestamp-only logs. There is
a daily pattern with a decline towards UTC midnight and an incline
towards UTC noon.

All in all we did not find any hints that these download numbers would
be wrong. Which doesn't mean they're right, but that's quite impossible
to prove.

Thanks to our friendly sysadmins for helping with this quick analysis!

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20170802/a8c9cb85/attachment.sig>


More information about the tor-project mailing list