commit 76688eec38bc02a2813ae9bf9a72f1f1c2c239c3 Author: iwakeh iwakeh@torproject.org Date: Tue Nov 14 08:39:29 2017 +0000
Explain sorting more prominently.
Also make the point that normal web log analyzers can operate on sanitized logs. Improvements were suggested by Sebastian, cf. ticket-23243. --- .../src/main/resources/spec/web-server-logs.xml | 17 +++++++------ .../main/resources/web/WEB-INF/web-server-logs.jsp | 29 +++++++++++++--------- 2 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/website/src/main/resources/spec/web-server-logs.xml b/website/src/main/resources/spec/web-server-logs.xml index d8efe53..13cfad7 100644 --- a/website/src/main/resources/spec/web-server-logs.xml +++ b/website/src/main/resources/spec/web-server-logs.xml @@ -20,7 +20,7 @@ </front> <middle> <section title="Purpose of this document"> - <t>BETA: As of November 8, 2017, this document is still under + <t>BETA: As of November 14, 2017, this document is still under discussion and subject to change without prior notice. Feel free to <eref target="/about.html#contact">contact us</eref> for questions or concerns regarding this document.</t> @@ -174,6 +174,12 @@ mod_log_config module</eref>.</t> <section title="Re-assembling log files"> <t>Rewritten log lines are re-assembled into sanitized log files based on physical host, virtual host, and request start date.</t> + <t>All rewritten log lines are sorted alphabetically, so that request + order cannot be inferred from sanitized log files.</t> + <t>Many of the sanitized log lines will now be identical. + But in order to not remove too much useful information we keep the + identical log lines and thus enable typical web log analyzers to + operate on the sanitized log files. </t> <t>The naming convention for sanitized log files is: <list> <t><virtual-host>_<physical-host>_access.log_YYYYMMDD[.xz]</t> @@ -190,12 +196,9 @@ mod_log_config module</eref>.</t> 'dist.torproject.org', are more familiar to the public and were therefore chosen to be the first naming component. </t> - <t>As last and certainly not least important sanitizing step, all - rewritten log lines are sorted alphabetically, so that request order - cannot be inferred from sanitized log files.</t> - <t>Sanitized log files are typically compressed before publication. In - particular the sorting step allows for highly efficient compression - rates. We typically use XZ for compression, which is indicated by + <t>Sanitized log files are typically compressed before publication. + The sorting step also allows for highly efficient compression rates. + We typically use XZ for compression, which is indicated by appending ".xz" to log file names, but this is subject to change.</t> </section> </section> diff --git a/website/src/main/resources/web/WEB-INF/web-server-logs.jsp b/website/src/main/resources/web/WEB-INF/web-server-logs.jsp index b1505df..5e9cc79 100644 --- a/website/src/main/resources/web/WEB-INF/web-server-logs.jsp +++ b/website/src/main/resources/web/WEB-INF/web-server-logs.jsp @@ -22,7 +22,7 @@ "#rfc.section.1">1.</a> <a href= "#n-purpose-of-this-document">Purpose of this document</a></h2> <div id="rfc.section.1.p.1"> -<p>BETA: As of November 8, 2017, this document is still under +<p>BETA: As of November 14, 2017, this document is still under discussion and subject to change without prior notice. Feel free to <a href="/about.html#contact">contact us</a> for questions or concerns regarding this document.</p> @@ -254,6 +254,16 @@ of processing that format.</p> based on physical host, virtual host, and request start date.</p> </div> <div id="rfc.section.4.3.p.2"> +<p>All rewritten log lines are sorted alphabetically, so that +request order cannot be inferred from sanitized log files.</p> +</div> +<div id="rfc.section.4.3.p.3"> +<p>Many of the sanitized log lines will now be identical. But in +order to not remove too much useful information we keep the +identical log lines and thus enable typical web log analyzers to +operate on the sanitized log files.</p> +</div> +<div id="rfc.section.4.3.p.4"> <p>The naming convention for sanitized log files is:</p> <ul class="empty"> <li> @@ -262,7 +272,7 @@ based on physical host, virtual host, and request start date.</p> <p>The underscore is a separator symbol between the various parts of the filename.</p> </div> -<div id="rfc.section.4.3.p.3"> +<div id="rfc.section.4.3.p.5"> <p>Sanitized log files may additionally be sorted into directories by virtual host and date as in:</p> <ul class="empty"> @@ -273,17 +283,12 @@ by virtual host and date as in:</p> 'dist.torproject.org', are more familiar to the public and were therefore chosen to be the first naming component.</p> </div> -<div id="rfc.section.4.3.p.4"> -<p>As last and certainly not least important sanitizing step, all -rewritten log lines are sorted alphabetically, so that request -order cannot be inferred from sanitized log files.</p> -</div> -<div id="rfc.section.4.3.p.5"> +<div id="rfc.section.4.3.p.6"> <p>Sanitized log files are typically compressed before publication. -In particular the sorting step allows for highly efficient -compression rates. We typically use XZ for compression, which is -indicated by appending ".xz" to log file names, but this is subject -to change.</p> +The sorting step also allows for highly efficient compression +rates. We typically use XZ for compression, which is indicated by +appending ".xz" to log file names, but this is subject to +change.</p> </div> </section> </div> <!-- container -->
tor-commits@lists.torproject.org