[tor-commits] [metrics-web/master] Allow importing descriptors from a local Tor directory.

karsten at torproject.org karsten at torproject.org
Wed Jul 13 15:42:56 UTC 2011


commit d19126aeab5974db35a7a087ff21984ea6085682
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Wed Jul 13 17:38:56 2011 +0200

    Allow importing descriptors from a local Tor directory.
    
    The `KeepDirectoryArchiveImportHistory` option allows us to keep a history
    of imported directory archive files to know which files have been imported
    before.  This history can be useful when importing from a changing source
    to avoid importing descriptors over and over again, but it can be
    confusing to users who don't know about it.
    
    However, this approach does not work for importing cached descriptors from
    a local Tor directory, because descriptors are contained in files like
    cached-descriptors.  Filenames never change here, just contents.
    
    The new approach is to also keep the file modification time in our history
    and check whether files have been modified after we last read them.  This
    lets us import descriptors from a local Tor directory, too.
    
    Also update the README.
---
 ChangeLog                                        |    1 +
 README                                           |   27 ++++++++-----------
 src/org/torproject/ernie/cron/ArchiveReader.java |   31 ++++++++++++++++-----
 3 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 387e813..42032ca 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,7 @@ Metrics database and website change log:
 
 Changes in version 0.0.2 - 2011-??-??
   - Let users download votes from a given valid-after time.
+  - Allow importing descriptors from a local Tor directory.
 
 Changes in version 0.0.1 - 2011-06-08
   - Initial release
diff --git a/README b/README
index 0786852..111543a 100644
--- a/README
+++ b/README
@@ -25,7 +25,7 @@ delivery service, and others.
 
 
 1.1. Preparing the operating system
------------------------------------
+===================================
 
 This README describes the steps for installing metrics-web on a Debian
 GNU/Linux Squeeze server.  Instructions for other operating systems may
@@ -151,22 +151,25 @@ can process.  Look for the config option WriteRelayDescriptorsRawFiles in
 /srv/metrics-web/config.template for more information on this experimental
 feature.
 
+In a future version of metrics-web it may also be possible to update local
+relay descriptor tarballs from the official metrics server via rsync and
+import only the changes into the metrics database.  The idea is to simply
+rsync the data/ directory from the metrics server and have all information
+available.  However, this feature is not implemented yet.
+
 
 1.4. Importing relay descriptors from a local Tor data directory
 ================================================================
 
-WARNING: The functions described in this section are not implemented yet!
-
-In a future version of metrics-web, the metrics database importer will be
-able to import the cached descriptors from a local Tor data directory.
-(A special case of importing descriptors from a continuously updated
-directory is when both metrics-db and metrics-web are run on the same
-machine, but this shouldn't be the general case.)
+In order to keep the data in the metrics database up-to-date, the metrics
+database importer can import the cached descriptors from a local Tor data
+directory.
 
 Configure a local Tor client to fetch all known descriptors as early as
 possible by adding these config options to its torrc file:
 
 FetchUselessDescriptors 1
+FetchDirInfoEarly 1
 FetchDirInfoExtraEarly 1
 
 Tell the metrics database importer where to find the cached descriptor
@@ -181,14 +184,6 @@ Add a crontab entry for the database importer to run once per hour:
 
 15 * * * * cd /srv/metrics-web/ && ./run.sh
 
-In a future version of metrics-web it may also be possible to update local
-relay descriptor tarballs from the official metrics server via rsync and
-import only the changes into the metrics database.  The idea is to simply
-rsync the data/ directory from the metrics server and have all information
-available.  But until this is implemented, the recommended way to keep the
-metrics website up-to-date would be the one described above in this
-section.
-
 
 1.5. Importing GeoIP information
 ================================
diff --git a/src/org/torproject/ernie/cron/ArchiveReader.java b/src/org/torproject/ernie/cron/ArchiveReader.java
index b21233d..4d4fe64 100644
--- a/src/org/torproject/ernie/cron/ArchiveReader.java
+++ b/src/org/torproject/ernie/cron/ArchiveReader.java
@@ -21,8 +21,9 @@ public class ArchiveReader {
 
     int parsedFiles = 0, ignoredFiles = 0;
     Logger logger = Logger.getLogger(ArchiveReader.class.getName());
-    SortedSet<String> lastArchivesImportHistory = new TreeSet<String>();
-    SortedSet<String> newArchivesImportHistory = new TreeSet<String>();
+    SortedMap<String, Long>
+        lastArchivesImportHistory = new TreeMap<String, Long>(),
+        newArchivesImportHistory = new TreeMap<String, Long>();
     File archivesImportHistoryFile = new File(statsDirectory,
         "archives-import-history");
     if (keepImportHistory && archivesImportHistoryFile.exists()) {
@@ -31,7 +32,15 @@ public class ArchiveReader {
             archivesImportHistoryFile));
         String line = null;
         while ((line = br.readLine()) != null) {
-          lastArchivesImportHistory.add(line);
+          String[] parts = line.split(",");
+          if (parts.length < 2) {
+            logger.warning("Archives import history file does not "
+                + "contain timestamps. Skipping.");
+            break;
+          }
+          long lastModified = Long.parseLong(parts[0]);
+          String filename = parts[1];
+          lastArchivesImportHistory.put(filename, lastModified);
         }
         br.close();
       } catch (IOException e) {
@@ -53,14 +62,17 @@ public class ArchiveReader {
           }
         } else {
           try {
+            long lastModified = pop.lastModified();
+            String filename = pop.getName();
             if (keepImportHistory) {
-              newArchivesImportHistory.add(pop.getName());
+              newArchivesImportHistory.put(filename, lastModified);
             }
             if (keepImportHistory &&
-                lastArchivesImportHistory.contains(pop.getName())) {
+                lastArchivesImportHistory.containsKey(filename) &&
+                lastArchivesImportHistory.get(filename) >= lastModified) {
               ignoredFiles++;
               continue;
-            } else if (pop.getName().endsWith(".tar.bz2")) {
+            } else if (filename.endsWith(".tar.bz2")) {
               logger.warning("Cannot parse compressed tarball "
                   + pop.getAbsolutePath() + ". Skipping.");
               continue;
@@ -99,6 +111,7 @@ public class ArchiveReader {
             break;
           }
         }
+        logger.warning(sb.toString());
       }
     }
     if (keepImportHistory) {
@@ -106,8 +119,10 @@ public class ArchiveReader {
         archivesImportHistoryFile.getParentFile().mkdirs();
         BufferedWriter bw = new BufferedWriter(new FileWriter(
             archivesImportHistoryFile));
-        for (String line : newArchivesImportHistory) {
-          bw.write(line + "\n");
+        for (Map.Entry<String, Long> historyEntry :
+            newArchivesImportHistory.entrySet()) {
+          bw.write(String.valueOf(historyEntry.getValue()) + ","
+              + historyEntry.getKey() + "\n");
         }
         bw.close();
       } catch (IOException e) {



More information about the tor-commits mailing list