[tor-commits] [metrics-lib/master] Fix a bug in recognizing bandwidth files.

karsten at torproject.org karsten at torproject.org
Fri May 3 06:40:58 UTC 2019


commit 016d49f5142561476185105ef770006d9635f91e
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Thu May 2 20:54:53 2019 +0200

    Fix a bug in recognizing bandwidth files.
    
    We're using a regular expression on the first 100 characters of a
    descriptor to recognize bandwidth files. More specifically, if a
    descriptor starts with ten digits followed by a newline, we parse it
    as a bandwidth file. (This is ugly, but the legacy bandwidth file
    format doesn't give us much of a choice.)
    
    This regular expression is broken. The regular expression we want is
    one that matches the first 100 characters of a descriptor, which ours
    didn't do.
    
    More detailed explanation of the code change:
    
     - We don't need to start the pattern with `^`, because the regular
       expression needs to match the whole string anyway.
     - The `(?s)` part enables the dotall mode: "In dotall mode, the
       expression . matches any character, including a line terminator. By
       default this expression does not match line terminators. Dotall
       mode can also be enabled via the embedded flag expression (?s).
       (The s is a mnemonic for "single-line" mode, which is what this is
       called in Perl.)"
     - We need to end the pattern with `.*` to match any characters
       following the first newline, which also includes newlines due to
       the previously enabled dotall mode.
    
    Fixes #30369.
---
 CHANGELOG.md                                                        | 6 ++++++
 .../java/org/torproject/descriptor/impl/DescriptorParserImpl.java   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6a62528..aee65ea 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,9 @@
+# Changes in version 2.6.1 - 2019-05-??
+
+ * Medium changes
+   - Fix a bug in recognizing descriptors as bandwidth files.
+
+
 # Changes in version 2.6.0 - 2019-04-29
 
  * Medium changes
diff --git a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
index 119fe09..08ac909 100644
--- a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
+++ b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java
@@ -132,7 +132,7 @@ public class DescriptorParserImpl implements DescriptorParser {
           sourceFile);
     } else if (fileName.contains(LogDescriptorImpl.MARKER)) {
       return LogDescriptorImpl.parse(rawDescriptorBytes, sourceFile, fileName);
-    } else if (firstLines.matches("^[0-9]{10}\\n")) {
+    } else if (firstLines.matches("(?s)[0-9]{10}\\n.*")) {
       /* Identifying bandwidth files by a 10-digit timestamp in the first line
        * breaks with files generated before 2002 or after 2286 and when the next
        * descriptor identifier starts with just a timestamp in the first line



More information about the tor-commits mailing list