[metrics-bugs] #21354 [Metrics/Onionoo]: Make Onionoo more memory-efficient

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Jan 30 21:32:13 UTC 2017


#21354: Make Onionoo more memory-efficient
---------------------------------+--------------------------
     Reporter:  karsten          |      Owner:  metrics-team
         Type:  enhancement      |     Status:  new
     Priority:  Medium           |  Milestone:
    Component:  Metrics/Onionoo  |    Version:
     Severity:  Normal           |   Keywords:
Actual Points:                   |  Parent ID:
       Points:                   |   Reviewer:
      Sponsor:                   |
---------------------------------+--------------------------
 When I re-imported 2 years of bridge network statuses to fix #20994, I
 noticed that this is only possible when importing descriptors in batches
 of a few months.  This is pretty bad, because it also means that we
 wouldn't be able to start a new Onionoo instance from scratch.  And a new
 operator might not even figure out what's wrong and assume that Onionoo
 contains a serious bug.  (Maybe not a totally unreasonable assumption.)

 Today I found the reason for this problem: `UptimeStatusUpdater` stores
 sets of `Long` values of bridge network status publication times, and that
 simply does not scale for more than a few months of statuses.
 Unfortunately, we need to store all these publication times in memory,
 because opening and updating status files for each fingerprint contained
 in a status also does not scale, for a different reason.

 But we can do better here.  We can apply the same trick that we're using
 for storing relay flag strings, that is, resolve strings to indexes using
 a single, static map, and store indexes in a bitset rather than strings in
 a set.  Here's a possible patch:

 {{{
 diff --git
 a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
 b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
 index 2b5f5fc..6280902 100644
 ---
 a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
 +++
 b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
 @@ -86,14 +86,18 @@ public class UptimeStatusUpdater implements
 DescriptorListener,
      }
    }

 +  private static Map<Long, Integer> dateHourMillisToIndex = new
 HashMap<>();
 +
 +  private static Map<Integer, Long> dateHourIndexToMillis = new
 HashMap<>();
 +
    private SortedMap<Long, Flags> newRelayStatuses = new TreeMap<>();

    private SortedMap<String, SortedMap<Long, Flags>>
        newRunningRelays = new TreeMap<>();

 -  private SortedSet<Long> newBridgeStatuses = new TreeSet<>();
 +  private BitSet newBridgeStatuses = new BitSet();

 -  private SortedMap<String, SortedSet<Long>>
 +  private SortedMap<String, BitSet>
        newRunningBridges = new TreeMap<>();

    private void processRelayNetworkStatusConsensus(
 @@ -123,13 +127,18 @@ public class UptimeStatusUpdater implements
 DescriptorListener,
      if (!fingerprints.isEmpty()) {
        long dateHourMillis = (status.getPublishedMillis()
            / DateTimeHelper.ONE_HOUR) * DateTimeHelper.ONE_HOUR;
 +      if (!dateHourMillisToIndex.containsKey(dateHourMillis)) {
 +        dateHourIndexToMillis.put(dateHourIndexToMillis.size(),
 dateHourMillis);
 +        dateHourMillisToIndex.put(dateHourMillis,
 dateHourMillisToIndex.size());
 +      }
 +      int dateHourIndex = dateHourMillisToIndex.get(dateHourMillis);
        for (String fingerprint : fingerprints) {
          if (!this.newRunningBridges.containsKey(fingerprint)) {
 -          this.newRunningBridges.put(fingerprint, new TreeSet<Long>());
 +          this.newRunningBridges.put(fingerprint, new BitSet());
          }
 -        this.newRunningBridges.get(fingerprint).add(dateHourMillis);
 +        this.newRunningBridges.get(fingerprint).set(dateHourIndex);
        }
 -      this.newBridgeStatuses.add(dateHourMillis);
 +      this.newBridgeStatuses.set(dateHourIndex);
      }
    }

 @@ -140,17 +149,19 @@ public class UptimeStatusUpdater implements
 DescriptorListener,
        this.updateStatus(true, e.getKey(), e.getValue());
      }
      this.updateStatus(true, null, this.newRelayStatuses);
 -    for (Map.Entry<String, SortedSet<Long>> e :
 -        this.newRunningBridges.entrySet()) {
 +    for (Map.Entry<String, BitSet> e : this.newRunningBridges.entrySet())
 {
 +      BitSet dateHourIndexes = e.getValue();
        SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
 -      for (long dateHourMillis : e.getValue()) {
 -        dateHourMillisNoFlags.put(dateHourMillis, null);
 +      for (int i = dateHourIndexes.nextSetBit(0); i >= 0;
 +          i = dateHourIndexes.nextSetBit(i + 1)) {
 +        dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
        }
        this.updateStatus(false, e.getKey(), dateHourMillisNoFlags);
      }
      SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
 -    for (long dateHourMillis : this.newBridgeStatuses) {
 -      dateHourMillisNoFlags.put(dateHourMillis, null);
 +    for (int i = this.newBridgeStatuses.nextSetBit(0); i >= 0;
 +        i = this.newBridgeStatuses.nextSetBit(i + 1)) {
 +      dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
      }
      this.updateStatus(false, null, dateHourMillisNoFlags);
    }

 }}}

 But let's not merge this yet, as it only solves this particular problem of
 importing lots of bridge network statuses, and maybe there's an ever
 better way to fix this.

 Let's rather look at this (possible) improvement together with earlier
 improvements towards memory efficiency, and let's think about what remains
 to be done.

 For example, should we avoid certain data structures (like
 `TreeSet/TreeMap`) and replace them with more memory-efficient data
 structures?

 Are there other tricks we should use more consistently?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21354>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list