commit 016d49f5142561476185105ef770006d9635f91e Author: Karsten Loesing karsten.loesing@gmx.net Date: Thu May 2 20:54:53 2019 +0200
Fix a bug in recognizing bandwidth files.
We're using a regular expression on the first 100 characters of a descriptor to recognize bandwidth files. More specifically, if a descriptor starts with ten digits followed by a newline, we parse it as a bandwidth file. (This is ugly, but the legacy bandwidth file format doesn't give us much of a choice.)
This regular expression is broken. The regular expression we want is one that matches the first 100 characters of a descriptor, which ours didn't do.
More detailed explanation of the code change:
- We don't need to start the pattern with `^`, because the regular expression needs to match the whole string anyway. - The `(?s)` part enables the dotall mode: "In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators. Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)" - We need to end the pattern with `.*` to match any characters following the first newline, which also includes newlines due to the previously enabled dotall mode.
Fixes #30369. --- CHANGELOG.md | 6 ++++++ .../java/org/torproject/descriptor/impl/DescriptorParserImpl.java | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md index 6a62528..aee65ea 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,9 @@ +# Changes in version 2.6.1 - 2019-05-?? + + * Medium changes + - Fix a bug in recognizing descriptors as bandwidth files. + + # Changes in version 2.6.0 - 2019-04-29
* Medium changes diff --git a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java index 119fe09..08ac909 100644 --- a/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java +++ b/src/main/java/org/torproject/descriptor/impl/DescriptorParserImpl.java @@ -132,7 +132,7 @@ public class DescriptorParserImpl implements DescriptorParser { sourceFile); } else if (fileName.contains(LogDescriptorImpl.MARKER)) { return LogDescriptorImpl.parse(rawDescriptorBytes, sourceFile, fileName); - } else if (firstLines.matches("^[0-9]{10}\n")) { + } else if (firstLines.matches("(?s)[0-9]{10}\n.*")) { /* Identifying bandwidth files by a 10-digit timestamp in the first line * breaks with files generated before 2002 or after 2286 and when the next * descriptor identifier starts with just a timestamp in the first line