[tor-bugs] #20525 [Metrics/metrics-lib]: Be more careful when deleting extraneous local descriptor files

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Nov 1 17:01:43 UTC 2016


#20525: Be more careful when deleting extraneous local descriptor files
-------------------------------------+---------------------
     Reporter:  karsten              |      Owner:  karsten
         Type:  defect               |     Status:  new
     Priority:  Medium               |  Milestone:
    Component:  Metrics/metrics-lib  |    Version:
     Severity:  Normal               |   Keywords:
Actual Points:                       |  Parent ID:
       Points:                       |   Reviewer:
      Sponsor:                       |
-------------------------------------+---------------------
 As found in
 [https://trac.torproject.org/projects/tor/ticket/18910#comment:95 this
 ticket], `DescriptorIndexCollector` deletes descriptor files from a
 previous or concurrent collect run if it doesn't collect those files
 itself.  This is unexpected behavior and differs from what
 `DescriptorCollectorImpl` does.  Let's fix this.

 Here's a minimal example (taken from that other ticket):

 {{{
     DescriptorCollector dc =
         DescriptorSourceFactory.createDescriptorCollector();
     dc.collectDescriptors("https://collector.torproject.org",
         new String[] { "/recent/torperf/" }, 0L, new File("sync"), true);
     dc.collectDescriptors("https://collector.torproject.org",
         new String[] { "/recent/exit-lists/" }, 0L, new File("sync"),
 true);
 }}}

 This is the (annotated) output of running this code (1) with the default
 `DescriptorIndexCollector`, (2) with `DescriptorCollectorImpl`, and (3) a
 fixed version of `DescriptorIndexCollector`:

 {{{
 $ java -cp bin/:lib/descriptor-1.5.0.jar:lib/slf4j-api-1.7.7.jar:lib
 /commons-compress-1.9.jar:lib/gson-2.2.4.jar Main
 $ du -hs sync/recent/*
  14M    sync/recent/exit-lists
   0B    sync/recent/torperf        # <- deleted, bad!
 $ rm -rf sync/
 $ java -cp bin/:lib/descriptor-1.5.0.jar:lib/slf4j-api-1.7.7.jar:lib
 /commons-compress-1.9.jar:lib/gson-2.2.4.jar
 -Ddescriptor.collector=org.torproject.descriptor.impl.DescriptorCollectorImpl
 Main
 $ du -hs sync/recent/*
  14M    sync/recent/exit-lists
 3.0M    sync/recent/torperf        # <- not deleted, good!
 $ java -cp bin/:lib/descriptor-1.5.0-dev.jar:lib/slf4j-api-1.7.7.jar:lib
 /commons-compress-1.9.jar:lib/gson-2.2.4.jar Main
 $ du -hs sync/recent/*
  14M    sync/recent/exit-lists
 3.0M    sync/recent/torperf        # <- not deleted, good!
 }}}

 Possible patch used in (3) above:

 {{{
 diff --git
 a/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
 b/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
 index d32f811..daef379 100644
 ---
 a/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
 +++
 b/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
 @@ -70,7 +70,8 @@ public class DescriptorIndexCollector implements
 DescriptorCollector {
      this.fetchRemoteFiles(index.path, remoteFiles, minLastModified,
          localDirectory, localFiles);
      if (deleteExtraneousLocalFiles) {
 -      this.deleteExtraneousLocalFiles(remoteFiles, localDirectory,
 localFiles);
 +      deleteExtraneousLocalFiles(remoteDirectories, remoteFiles,
 localDirectory,
 +          localFiles);
      }
    }

 @@ -108,12 +109,16 @@ public class DescriptorIndexCollector implements
 DescriptorCollector {
      }
    }

 -  static void deleteExtraneousLocalFiles(
 +  static void deleteExtraneousLocalFiles(String[] remoteDirectories,
        SortedMap<String, FileNode> remoteFiles,
        File localDir, SortedMap<String, Long> locals) {
      for (String localPath : locals.keySet()) {
 -      if (!remoteFiles.containsKey(localPath)) {
 -        new File(localDir, localPath).delete();
 +      for (String remoteDirectory : remoteDirectories) {
 +        if (localPath.startsWith(remoteDirectory)) {
 +          if (!remoteFiles.containsKey(localPath)) {
 +            new File(localDir, localPath).delete();
 +          }
 +        }
        }
      }
    }
 }}}

 I briefly thought about an alternative fix where we look at the actual
 `index.json` contents to decide whether a local descriptor file that we
 didn't care about in the current collect run would be safe to delete or
 not.  After all, we have that information, so we could as well use it.

 The two arguments against it are (1) that we'd change the previous
 behavior of `DescriptorCollector` in a somewhat backward-incompatible way
 and (2) that this code might be smarter than developers might think it is
 and hence confuse them in a bad way.

 I think I'd rather go with something like the patch above.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20525>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list