[tor-project] Plan to double-check Tor Browser initial download numbers

teor teor2345 at gmail.com
Mon Jul 17 23:46:12 UTC 2017


> On 18 Jul 2017, at 04:05, Karsten Loesing <karsten at torproject.org> wrote:
> 
> Hello list,
> 
> it's been almost two years since we started collecting sanitized Apache
> web server logs. During this time the number of Tor Browser initial
> downloads rarely went below 70,000 per day.
> 
> https://metrics.torproject.org/webstats-tb.html
> 
> Either there must be a steady demand for fresh binaries, or there is a
> non-zero number of bots downloading the Tor Browser binary several times
> per day.
> 
> I already double-checked our aggregation code that takes sanitized web
> server logs as input and produces daily totals as output. It looks okay
> to me.
> 
> I'd also like to double-check whether there's anything unexpected
> happening before the sanitizing step. For example, could it be that
> there are a few IP addresses making hundreds or thousands of requests?
> 
> Or are there lots of requests with same referrers or common user agents
> indicating bots?
> 
> My plan is to ask our admins to temporarily add a second Apache log file
> on one of the dist.torproject.org hosts with the default Apache log file
> format without the sanitizing that is usually applied.
> 
> A snapshot of 15 or 30 minutes would likely be sufficient as sample. I'd
> analyze this log file on the server, delete it, and report my findings here.
> 
> This message has two purposes:
> 
> 1. Is this approach acceptable? If not, are there more acceptable
> approaches yielding similar results?

Can you get similar results with a default apache log file, with the
following changes:
* remove timestamps
* sort lines to destroy the original order

Without precise timing information, the data would be a lot less
sensitive.

It might also be useful to know the distribution of requests over
a 24 hour period, without any other details. This might help you
work out how the activity is being triggered.

> 2. Are there any theories what might keep the numbers from dropping
> below those 70,000 requests per day? What should I be looking for?

There are 86,400 seconds in a day, which means that we're getting
about 1 request per second. This could be a single bot caught in a
loop.

Are you only counting GET requests?
Do you count incomplete downloads?
(A continually failing automated download process could cause this.)

T

--
Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
xmpp: teor at torproject dot org
------------------------------------------------------------------------



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20170718/bf9c8116/attachment.sig>


More information about the tor-project mailing list