November 2019 - tor-commits - lists.torproject.org

[metrics-web/release] Document versions graph better.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit 05a8b0e437b9793fda01c8f0da633f74039f77f7 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Nov 22 11:17:45 2018 +0100 Document versions graph better. Fixes #28462. --- src/main/resources/web/json/metrics.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/main/resources/web/json/metrics.json b/src/main/resources/web/json/metrics.json index 502b0e6..527f13f 100644 --- a/src/main/resources/web/json/metrics.json +++ b/src/main/resources/web/json/metrics.json @@ -26,7 +26,7 @@ "id": "versions", "title": "Relays by tor version", "type": "Graph", - "description": "This graph shows the number of running <a href=\"glossary.html#relay\">relays</a> by tor software version. Relays report their tor software version when they announce themselves in the network. More details on when these versions were declared stable or unstable can be found on the <a href=\"https://www.torproject.org/download/download.html\">download page</a> and in the <a href=\"https://gitweb.torproject.org/tor.git/tree/ChangeLog\">changes file</a>.", + "description": "This graph shows the number of running <a href=\"glossary.html#relay\">relays</a> by tor software version. Relays report their tor software version when they announce themselves in the network. New major versions are added to the graph as soon as they are first recommended by the directory authorities. More details on when these versions were declared stable or unstable can be found on the <a href=\"https://www.torproject.org/download/download.html\">download page</a> and in the <a href=\"https://gitweb.torproject.org/tor.git/tree/ChangeLog\">changes file</a>.", "function": "versions", "parameters": [ "start",

1 0

[metrics-web/release] Update Reproducible Metrics document.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit d4b17e906f55b553ac16bcf902967157e07a234d Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Fri Nov 30 16:54:17 2018 +0100 Update Reproducible Metrics document. Reflects code changes made in #28116 and #28305. --- .../resources/web/jsps/reproducible-metrics.jsp | 51 +++++----------------- 1 file changed, 11 insertions(+), 40 deletions(-) diff --git a/src/main/resources/web/jsps/reproducible-metrics.jsp b/src/main/resources/web/jsps/reproducible-metrics.jsp index 939b42e..b6df6c3 100644 --- a/src/main/resources/web/jsps/reproducible-metrics.jsp +++ b/src/main/resources/web/jsps/reproducible-metrics.jsp @@ -15,15 +15,6 @@ <div class="container"> -<div class="panel panel-danger"> -<div class="panel-heading"> -<h5 class="panel-title">Work in progress notice</h5> -</div> -<div class="panel-body"> -As of July 2018, this page is still a work in progress. Handle with care! -</div> -</div> - <h1>Reproducible Metrics <a href="#reproducible-metrics" name="reproducible-metrics" class="anchor">#</a></h1> @@ -103,7 +94,7 @@ Split observations to the covered UTC dates by assuming a linear distribution of <h4>Step 3: Estimate fraction of reported directory-request statistics</h4> The next step after parsing descriptors is to estimate the fraction of reported directory-request statistics on a given day. -This fraction, a value between <var>0%</var> and <var>100%</var>, will be used in the next step to extrapolate observed request numbers to expected network totals. +This fraction will be used in the next step to extrapolate observed request numbers to expected network totals. For further background on the following calculation method, refer to the technical report titled <a href="https://research.torproject.org/techreports/counting-daily-bridge-users-201…">"Counting daily bridge users"</a> which also applies to relay users. In the following, we're using the term server instead of relay or bridge, because the estimation method is exactly the same for relays and bridges. @@ -139,7 +130,7 @@ This approach also works with <var>r(R)</var> being the sum of requests from r(N) = floor(r(R) / frac / 10)</pre> A client that is connected 24/7 makes about 15 requests per day, but not all clients are connected 24/7, so we picked the number 10 for the average client. We simply divide directory requests by 10 and consider the result as the number of users. Another way of looking at it, is that we assume that each request represents a client that stays online for one tenth of a day, so 2 hours and 24 minutes. -Skip dates where <var>frac</var> is smaller than 10% and hence too low for a robust estimate, or where <var>frac</var> is greater than 100%, which would indicate an issue in the previous step. +Skip dates where <var>frac</var> is smaller than 10% and hence too low for a robust estimate. Also skip dates where <var>frac</var> is greater than 110%, which would indicate an issue in the previous step. We picked 110% as upper bound, not 100%, because there can be relays reporting statistics that temporarily didn't make it into the consensus, and we accept up to 10% of those additional statistics. However, there needs to be some upper bound to exclude obvious outliers with fractions of 120%, 150%, or even 200%. <h4>Step 5: Compute ranges of expected clients per day to detect potential censorship events</h4> @@ -278,14 +269,12 @@ Refer to the <a href="https://gitweb.torproject.org/torspec.git/tree/dir-spec.tx <li>Relay flags: Parse relay flags from the <code>"s"</code> line. If there is no <code>"Running"</code> flag, skip this consensus entry. This ensures that we only consider running relays. Also parse any other relay flags from the <code>"s"</code> line that the relay had assigned.</li> </ul> -If a consensus contains zero running relays, we skip it in the <a href="/relays-ipv6.html">Relays by IP version</a> graph, but not in the other graphs (simply because we didn't get around to changing those graphs). +If a consensus contains zero running relays, we skip it. This is mostly to rule out a rare edge case when only a minority of <a href="/glossary.html#directory-authority">directory authorities</a> voted on the <code>"Running"</code> flag. In those cases, such a consensus would skew the average, even though relays were likely running. <h4>Step 2: Parse relay server descriptors</h4> -Parsing relay server descriptors is an optional step. You only need to do this if you want to break down the number of running relays by something that relays report in their server descriptors. This includes, among other things, the relay's platform string containing tor software version and operating system and whether the relay announced an IPv6 OR address or permitted exiting to IPv6 targets. - Obtain relay server descriptors from <a href="/collector.html#type-server-descriptor">CollecTor</a>. Again, refer to the <a href="https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt">Tor directory protocol, version 3</a> for details on the descriptor format. @@ -307,21 +296,16 @@ If the platform line is missing, we skip this descriptor, which later leads to n <h4>Step 3: Compute daily averages</h4> -Optionally, match consensus entries with server descriptors by SHA-1 digest. +Match consensus entries with server descriptors by SHA-1 digest. Every consensus entry references exactly one server descriptor, and a server descriptor may be referenced from an arbitrary number of consensus entries. -We handle missing server descriptors differently in the graphs covered in this section: - -<ul> -<li><a href="/versions.html">Relays by tor version</a> and <a href="/platforms.html">Relays by platform</a>: If a referenced server descriptor is missing, we also skip the consensus entry. We are aware that this is slightly wrong, because we should either exclude a consensus with too few matching server descriptors from the overall result, or at least count these relays as unknown tor version or unknown platform.</li> -<li><a href="/relays-ipv6.html">Relays by IP version</a>: If at least 0.1% of referenced server descriptors are missing, we skip the consensus. We chose this threshold as low, because missing server descriptors may easily skew the results. However, a small number of missing server descriptors per consensus is acceptable and also unavoidable.</li> -</ul> +If at least 0.1% of referenced server descriptors are missing, we skip the consensus. We chose this threshold as low, because missing server descriptors may easily skew the results. However, a small number of missing server descriptors per consensus is acceptable and also unavoidable. Go through all previously processed consensuses by valid-after UTC date. Compute the arithmetic mean of running relays, possibly broken down by relay flag, tor version, platform, or IPv6 capabilities, as the sum of all running relays divided by the number of consensuses. Round down to the next integer number. Skip the last day of the results if it matches the current UTC date, because those averages may still change throughout the day. -For the <a href="/relays-ipv6.html">Relays by IP version</a> graph we further skip days for which fewer than 12 consensuses are known. The goal is to avoid over-representing a few consensuses during periods when the directory authorities had trouble producing a consensus for at least half of the day. +Further skip days for which fewer than 12 consensuses are known. The goal is to avoid over-representing a few consensuses during periods when the directory authorities had trouble producing a consensus for at least half of the day. <h3 id="running-bridges" class="hover">Running bridges <a href="#running-bridges" class="anchor">#</a> @@ -360,9 +344,6 @@ This timestamp is used to uniquely identify the status while processing, and the <h4>Step 2: Parse bridge server descriptors.</h4> -Parsing bridge server descriptors is an optional step. You only need to do this if you want to break down the number of running bridges by something that bridges report in their server descriptors. -This includes, among other things, whether the bridge announced an IPv6 OR address. - Obtain bridge server descriptors from <a href="/collector.html#type-bridge-server-descriptor">CollecTor</a>. As above, refer to the <a href="/bridge-descriptors.html">Tor bridge descriptors page</a> for details on the descriptor format. @@ -375,21 +356,16 @@ As above, refer to the <a href="/bridge-descriptors.html">Tor bridge descriptors <h4>Step 3: Compute daily averages</h4> -Optionally, match status entries with server descriptors by SHA-1 digest. +Match status entries with server descriptors by SHA-1 digest. Every status entry references exactly one server descriptor, and a server descriptor may be referenced from an arbitrary number of status entries. If at least 0.1% of referenced server descriptors are missing, we skip the status. We chose this threshold as low, because missing server descriptors may easily skew the results. However, a small number of missing server descriptors per status is acceptable and also unavoidable. -We compute averages differently in the graphs covered in this section: - -<ul> -<li><a href="/networksize.html">Relays and bridges</a>: For each bridge authority, compute the arithmetic mean of running bridges as the sum of all running bridges divided by the number of statuses; sum up averages for all bridge authorities per day and round down to the next integer number.</li> -<li><a href="/bridges-ipv6.html">Bridges by IP version</a>: Compute the arithmetic mean of running bridges as the sum of all running bridges divided by the number of statuses and round down to the next integer number. We are aware that this approach does not correctly reflect that bridges typically register at a single bridge authority only.</li> -</ul> +Compute the arithmetic mean of running bridges as the sum of all running bridges divided by the number of statuses and round down to the next integer number. We are aware that this approach does not correctly reflect that bridges typically register at a single bridge authority only. Skip the last day of the results if it matches the current UTC date, because those averages may still change throughout the day. -For the <a href="/bridges-ipv6.html">Bridges by IP version</a> graph we further skip days for which fewer than 12 statuses are known. +Further skip days for which fewer than 12 statuses are known. The goal is to avoid over-representing a few statuses during periods when the bridge directory authority had trouble producing a status for at least half of the day. <h3 id="consensus-weight" class="hover">Consensus weight @@ -483,12 +459,7 @@ We consider a relay with the <code>"Guard"</code> flag as guard and a relay with In order to compute these averages, first match consensus entries with server descriptors by SHA-1 digest. Every consensus entry references exactly one server descriptor, and a server descriptor may be referenced from an arbitrary number of consensus entries. -We handle missing server descriptors differently in the graphs covered in this section: - -<ul> -<li><a href="/bandwidth.html">Total relay bandwidth</a> and <a href="/bandwidth-flags.html">Advertised and consumed bandwidth by relay flag</a>: If a referenced server descriptor is missing, we also skip the consensus entry. We are aware that this is slightly wrong, because we should rather exclude a consensus with too few matching server descriptors from the overall result than including it with an advertised bandwidth sum that is too low.</li> -<li><a href="/advbw-ipv6.html">Advertised bandwidth by IP version</a>: If at least 0.1% of referenced server descriptors are missing, we skip the consensus. We chose this threshold as low, because missing server descriptors may easily skew the results. However, a small number of missing server descriptors per consensus is acceptable and also unavoidable.</li> -</ul> +If at least 0.1% of referenced server descriptors are missing, we skip the consensus. We chose this threshold as low, because missing server descriptors may easily skew the results. However, a small number of missing server descriptors per consensus is acceptable and also unavoidable. Go through all previously processed consensuses by valid-after UTC date. Compute the arithmetic mean of advertised bandwidth as the sum of all advertised bandwidth values divided by the number of consensuses. @@ -497,7 +468,7 @@ Round down to the next integer number. Break down numbers by guards and/or exits by taking into account which <a href="/glossary.html#relay-flag">relay flags</a> a consensus entry had that referenced a server descriptor. Skip the last day of the results if it matches the current UTC date, because those averages may still change throughout the day. -For the <a href="/advbw-ipv6.html">Advertised bandwidth by IP version</a> graph we further skip days for which fewer than 12 consensuses are known. +Further skip days for which fewer than 12 consensuses are known. The goal is to avoid over-representing a few consensuses during periods when the directory authorities had trouble producing a consensus for at least half of the day. <h4>Step 4: Compute ranks and percentiles</h4>

1 0

[metrics-web/release] Replace emptyNull() by more meaningful method.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit cdab81bdd17f8de3f22c28dd6392e697f823b846 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Wed Oct 31 10:12:01 2018 +0100 Replace emptyNull() by more meaningful method. --- .../torproject/metrics/stats/onionperf/Main.java | 23 +++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/src/main/java/org/torproject/metrics/stats/onionperf/Main.java b/src/main/java/org/torproject/metrics/stats/onionperf/Main.java index c54ee6f..02b70af 100644 --- a/src/main/java/org/torproject/metrics/stats/onionperf/Main.java +++ b/src/main/java/org/torproject/metrics/stats/onionperf/Main.java @@ -248,8 +248,8 @@ public class Main { statistics.add(String.format("%s,%d,%s,%s,%.0f,%.0f,%.0f,%d,%d,%d", dateFormat.format(rs.getDate("date", calendar)), rs.getInt("filesize"), - emptyNull(rs.getString("source")), - emptyNull(rs.getString("server")), + getStringFromResultSet(rs, "source"), + getStringFromResultSet(rs, "server"), getDoubleFromResultSet(rs, "q1"), getDoubleFromResultSet(rs, "md"), getDoubleFromResultSet(rs, "q3"), @@ -276,7 +276,7 @@ public class Main { while (rs.next()) { statistics.add(String.format("%s,%s,%d,%d,%d,%d", dateFormat.format(rs.getDate("date", calendar)), - emptyNull(rs.getString("source")), + getStringFromResultSet(rs, "source"), rs.getInt("position"), rs.getInt("q1"), rs.getInt("md"), @@ -301,7 +301,7 @@ public class Main { while (rs.next()) { statistics.add(String.format("%s,%s,%s,%d,%d,%d", dateFormat.format(rs.getDate("date", calendar)), - emptyNull(rs.getString("source")), + getStringFromResultSet(rs, "source"), rs.getString("server"), rs.getInt("q1"), rs.getInt("md"), @@ -311,8 +311,13 @@ public class Main { return statistics; } - private static String emptyNull(String text) { - return null == text ? "" : text; + /** Retrieves the <code>String</code> value of the designated column in the + * current row of the given <code>ResultSet</code> object, or returns the + * empty string if the retrieved value was <code>NULL</code>. */ + private static String getStringFromResultSet(ResultSet rs, String columnLabel) + throws SQLException { + String result = rs.getString(columnLabel); + return null == result ? "" : result; } /** Retrieves the <code>double</code> value of the designated column in the @@ -322,11 +327,7 @@ public class Main { private static Double getDoubleFromResultSet(ResultSet rs, String columnLabel) throws SQLException { double result = rs.getDouble(columnLabel); - if (rs.wasNull()) { - return null; - } else { - return result; - } + return rs.wasNull() ? null : result; } static void writeStatistics(Path webstatsPath, List<String> statistics)

1 0

[metrics-web/release] Remove long unused code from legacy module.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit ca5fa45df0cfb14801e3556caa93f2cc6d26d790 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Tue Nov 13 18:10:18 2018 +0100 Remove long unused code from legacy module. This includes the lock file, the option to write raw output files for importing into the database, and a couple boolean config options that have always been true. Required changes to existing legacy.config (removals): ImportDirectoryArchives KeepDirectoryArchiveImportHistory WriteRelayDescriptorDatabase WriteRelayDescriptorsRawFiles RelayDescriptorRawFilesDirectory Part of #28116. --- .../metrics/stats/servers/Configuration.java | 46 +------- .../torproject/metrics/stats/servers/LockFile.java | 61 ---------- .../org/torproject/metrics/stats/servers/Main.java | 33 +----- .../servers/RelayDescriptorDatabaseImporter.java | 126 +-------------------- src/main/resources/legacy.config.template | 20 ---- 5 files changed, 10 insertions(+), 276 deletions(-) diff --git a/src/main/java/org/torproject/metrics/stats/servers/Configuration.java b/src/main/java/org/torproject/metrics/stats/servers/Configuration.java index 76788df..b6ee397 100644 --- a/src/main/java/org/torproject/metrics/stats/servers/Configuration.java +++ b/src/main/java/org/torproject/metrics/stats/servers/Configuration.java @@ -24,21 +24,11 @@ public class Configuration { private static Logger log = LoggerFactory.getLogger(Configuration.class); - private boolean importDirectoryArchives = false; - private List<File> directoryArchivesDirectories = new ArrayList<>(); - private boolean keepDirectoryArchiveImportHistory = false; - - private boolean writeRelayDescriptorDatabase = false; - private String relayDescriptorDatabaseJdbc = "jdbc:postgresql://localhost/tordir?user=metrics&password=password"; - private boolean writeRelayDescriptorsRawFiles = false; - - private String relayDescriptorRawFilesDirectory = "pg-import/"; - /** Initializes this configuration class. */ public Configuration() { @@ -51,24 +41,10 @@ public class Configuration { String line = null; try (BufferedReader br = new BufferedReader(new FileReader(configFile))) { while ((line = br.readLine()) != null) { - if (line.startsWith("ImportDirectoryArchives")) { - this.importDirectoryArchives = Integer.parseInt( - line.split(" ")[1]) != 0; - } else if (line.startsWith("DirectoryArchivesDirectory")) { + if (line.startsWith("DirectoryArchivesDirectory")) { this.directoryArchivesDirectories.add(new File(line.split(" ")[1])); - } else if (line.startsWith("KeepDirectoryArchiveImportHistory")) { - this.keepDirectoryArchiveImportHistory = Integer.parseInt( - line.split(" ")[1]) != 0; - } else if (line.startsWith("WriteRelayDescriptorDatabase")) { - this.writeRelayDescriptorDatabase = Integer.parseInt( - line.split(" ")[1]) != 0; } else if (line.startsWith("RelayDescriptorDatabaseJDBC")) { this.relayDescriptorDatabaseJdbc = line.split(" ")[1]; - } else if (line.startsWith("WriteRelayDescriptorsRawFiles")) { - this.writeRelayDescriptorsRawFiles = Integer.parseInt( - line.split(" ")[1]) != 0; - } else if (line.startsWith("RelayDescriptorRawFilesDirectory")) { - this.relayDescriptorRawFilesDirectory = line.split(" ")[1]; } else if (!line.startsWith("#") && line.length() > 0) { log.error("Configuration file contains unrecognized " + "configuration key in line '{}'! Exiting!", line); @@ -93,10 +69,6 @@ public class Configuration { } } - public boolean getImportDirectoryArchives() { - return this.importDirectoryArchives; - } - /** Returns directories containing archived descriptors. */ public List<File> getDirectoryArchivesDirectories() { if (this.directoryArchivesDirectories.isEmpty()) { @@ -108,24 +80,8 @@ public class Configuration { } } - public boolean getKeepDirectoryArchiveImportHistory() { - return this.keepDirectoryArchiveImportHistory; - } - - public boolean getWriteRelayDescriptorDatabase() { - return this.writeRelayDescriptorDatabase; - } - public String getRelayDescriptorDatabaseJdbc() { return this.relayDescriptorDatabaseJdbc; } - - public boolean getWriteRelayDescriptorsRawFiles() { - return this.writeRelayDescriptorsRawFiles; - } - - public String getRelayDescriptorRawFilesDirectory() { - return this.relayDescriptorRawFilesDirectory; - } } diff --git a/src/main/java/org/torproject/metrics/stats/servers/LockFile.java b/src/main/java/org/torproject/metrics/stats/servers/LockFile.java deleted file mode 100644 index c6063d1..0000000 --- a/src/main/java/org/torproject/metrics/stats/servers/LockFile.java +++ /dev/null @@ -1,61 +0,0 @@ -/* Copyright 2011--2018 The Tor Project - * See LICENSE for licensing information */ - -package org.torproject.metrics.stats.servers; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.BufferedReader; -import java.io.BufferedWriter; -import java.io.File; -import java.io.FileReader; -import java.io.FileWriter; -import java.io.IOException; - -public class LockFile { - - private File lockFile; - - private static Logger log = LoggerFactory.getLogger(LockFile.class); - - public LockFile() { - this.lockFile = new File("lock"); - } - - /** Acquires the lock by checking whether a lock file already exists, - * and if not, by creating one with the current system time as - * content. */ - public boolean acquireLock() { - log.debug("Trying to acquire lock..."); - try { - if (this.lockFile.exists()) { - BufferedReader br = new BufferedReader(new FileReader("lock")); - long runStarted = Long.parseLong(br.readLine()); - br.close(); - if (System.currentTimeMillis() - runStarted - < 23L * 60L * 60L * 1000L) { - return false; - } - } - BufferedWriter bw = new BufferedWriter(new FileWriter("lock")); - bw.append("").append(String.valueOf(System.currentTimeMillis())) - .append("\n"); - bw.close(); - log.debug("Acquired lock."); - return true; - } catch (IOException e) { - log.warn("Caught exception while trying to acquire " - + "lock!"); - return false; - } - } - - /** Releases the lock by deleting the lock file, if present. */ - public void releaseLock() { - log.debug("Releasing lock..."); - this.lockFile.delete(); - log.debug("Released lock."); - } -} - diff --git a/src/main/java/org/torproject/metrics/stats/servers/Main.java b/src/main/java/org/torproject/metrics/stats/servers/Main.java index 4d349bc..1454418 100644 --- a/src/main/java/org/torproject/metrics/stats/servers/Main.java +++ b/src/main/java/org/torproject/metrics/stats/servers/Main.java @@ -24,38 +24,15 @@ public class Main { // Initialize configuration Configuration config = new Configuration(); - // Use lock file to avoid overlapping runs - LockFile lf = new LockFile(); - if (!lf.acquireLock()) { - log.error("Warning: ERNIE is already running or has not exited " - + "cleanly! Exiting!"); - System.exit(1); - } - // Define stats directory for temporary files File statsDirectory = new File("stats"); // Import relay descriptors - if (config.getImportDirectoryArchives()) { - RelayDescriptorDatabaseImporter rddi = - config.getWriteRelayDescriptorDatabase() - || config.getWriteRelayDescriptorsRawFiles() - ? new RelayDescriptorDatabaseImporter( - config.getWriteRelayDescriptorDatabase() - ? config.getRelayDescriptorDatabaseJdbc() : null, - config.getWriteRelayDescriptorsRawFiles() - ? config.getRelayDescriptorRawFilesDirectory() : null, - config.getDirectoryArchivesDirectories(), - statsDirectory, - config.getKeepDirectoryArchiveImportHistory()) : null; - if (null != rddi) { - rddi.importRelayDescriptors(); - rddi.closeConnection(); - } - } - - // Remove lock file - lf.releaseLock(); + RelayDescriptorDatabaseImporter rddi = new RelayDescriptorDatabaseImporter( + config.getRelayDescriptorDatabaseJdbc(), + config.getDirectoryArchivesDirectories(), statsDirectory); + rddi.importRelayDescriptors(); + rddi.closeConnection(); log.info("Terminating ERNIE."); } diff --git a/src/main/java/org/torproject/metrics/stats/servers/RelayDescriptorDatabaseImporter.java b/src/main/java/org/torproject/metrics/stats/servers/RelayDescriptorDatabaseImporter.java index 2d1ae47..d1ae43c 100644 --- a/src/main/java/org/torproject/metrics/stats/servers/RelayDescriptorDatabaseImporter.java +++ b/src/main/java/org/torproject/metrics/stats/servers/RelayDescriptorDatabaseImporter.java @@ -10,15 +10,10 @@ import org.torproject.descriptor.ExtraInfoDescriptor; import org.torproject.descriptor.NetworkStatusEntry; import org.torproject.descriptor.RelayNetworkStatusConsensus; -import org.postgresql.util.PGbytea; - import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import java.io.BufferedWriter; import java.io.File; -import java.io.FileWriter; -import java.io.IOException; import java.sql.CallableStatement; import java.sql.Connection; import java.sql.DriverManager; @@ -95,21 +90,6 @@ public final class RelayDescriptorDatabaseImporter { = LoggerFactory.getLogger(RelayDescriptorDatabaseImporter.class); /** - * Directory for writing raw import files. - */ - private String rawFilesDirectory; - - /** - * Raw import file containing status entries. - */ - private BufferedWriter statusentryOut; - - /** - * Raw import file containing bandwidth histories. - */ - private BufferedWriter bwhistOut; - - /** * Date format to parse timestamps. */ private SimpleDateFormat dateTimeFormat; @@ -126,30 +106,24 @@ public final class RelayDescriptorDatabaseImporter { */ private Set<String> insertedStatusEntries = new HashSet<>(); - private boolean importIntoDatabase; - - private boolean writeRawImportFiles; + private boolean importIntoDatabase = true; private List<File> archivesDirectories; private File statsDirectory; - private boolean keepImportHistory; - /** * Initialize database importer by connecting to the database and * preparing statements. */ public RelayDescriptorDatabaseImporter(String connectionUrl, - String rawFilesDirectory, List<File> archivesDirectories, - File statsDirectory, boolean keepImportHistory) { + List<File> archivesDirectories, File statsDirectory) { if (archivesDirectories == null || statsDirectory == null) { throw new IllegalArgumentException(); } this.archivesDirectories = archivesDirectories; this.statsDirectory = statsDirectory; - this.keepImportHistory = keepImportHistory; if (connectionUrl != null) { try { @@ -175,18 +149,11 @@ public final class RelayDescriptorDatabaseImporter { this.psU = conn.prepareStatement("INSERT INTO scheduled_updates " + "(date) VALUES (?)"); this.scheduledUpdates = new HashSet<>(); - this.importIntoDatabase = true; } catch (SQLException e) { log.warn("Could not connect to database or prepare statements.", e); } } - /* Remember where we want to write raw import files. */ - if (rawFilesDirectory != null) { - this.rawFilesDirectory = rawFilesDirectory; - this.writeRawImportFiles = true; - } - /* Initialize date format, so that we can format timestamps. */ this.dateTimeFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); this.dateTimeFormat.setTimeZone(TimeZone.getTimeZone("UTC")); @@ -278,53 +245,6 @@ public final class RelayDescriptorDatabaseImporter { this.importIntoDatabase = false; } } - if (this.writeRawImportFiles) { - try { - if (this.statusentryOut == null) { - new File(rawFilesDirectory).mkdirs(); - this.statusentryOut = new BufferedWriter(new FileWriter( - rawFilesDirectory + "/statusentry.sql")); - this.statusentryOut.write(" COPY statusentry (validafter, " - + "nickname, fingerprint, descriptor, published, address, " - + "orport, dirport, isauthority, isbadExit, " - + "isbaddirectory, isexit, isfast, isguard, ishsdir, " - + "isnamed, isstable, isrunning, isunnamed, isvalid, " - + "isv2dir, isv3dir, version, bandwidth, ports, rawdesc) " - + "FROM stdin;\n"); - } - this.statusentryOut.write( - this.dateTimeFormat.format(validAfter) + "\t" + nickname - + "\t" + fingerprint.toLowerCase() + "\t" - + descriptor.toLowerCase() + "\t" - + this.dateTimeFormat.format(published) + "\t" + address - + "\t" + orPort + "\t" + dirPort + "\t" - + (flags.contains("Authority") ? "t" : "f") + "\t" - + (flags.contains("BadExit") ? "t" : "f") + "\t" - + (flags.contains("BadDirectory") ? "t" : "f") + "\t" - + (flags.contains("Exit") ? "t" : "f") + "\t" - + (flags.contains("Fast") ? "t" : "f") + "\t" - + (flags.contains("Guard") ? "t" : "f") + "\t" - + (flags.contains("HSDir") ? "t" : "f") + "\t" - + (flags.contains("Named") ? "t" : "f") + "\t" - + (flags.contains("Stable") ? "t" : "f") + "\t" - + (flags.contains("Running") ? "t" : "f") + "\t" - + (flags.contains("Unnamed") ? "t" : "f") + "\t" - + (flags.contains("Valid") ? "t" : "f") + "\t" - + (flags.contains("V2Dir") ? "t" : "f") + "\t" - + (flags.contains("V3Dir") ? "t" : "f") + "\t" - + (version != null ? version : "\\N") + "\t" - + (bandwidth >= 0 ? bandwidth : "\\N") + "\t" - + (ports != null ? ports : "\\N") + "\t"); - this.statusentryOut.write(PGbytea.toPGString(rawDescriptor) - .replaceAll("\\\\", "\\\\\\\\") + "\n"); - } catch (IOException e) { - log.warn("Could not write network status " - + "consensus entry to raw database import file. We won't " - + "make any further attempts to write raw import files in " - + "this execution.", e); - this.writeRawImportFiles = false; - } - } } /** @@ -552,26 +472,6 @@ public final class RelayDescriptorDatabaseImporter { this.importIntoDatabase = false; } } - if (this.writeRawImportFiles) { - try { - if (this.bwhistOut == null) { - new File(rawFilesDirectory).mkdirs(); - this.bwhistOut = new BufferedWriter(new FileWriter( - rawFilesDirectory + "/bwhist.sql")); - } - this.bwhistOut.write("SELECT insert_bwhist('" + fingerprint - + "','" + lastDate + "','" + readIntArray.toString() - + "','" + writtenIntArray.toString() + "','" - + dirreadIntArray.toString() + "','" - + dirwrittenIntArray.toString() + "');\n"); - } catch (IOException e) { - log.warn("Could not write bandwidth " - + "history to raw database import file. We won't make " - + "any further attempts to write raw import files in " - + "this execution.", e); - this.writeRawImportFiles = false; - } - } readArray = writtenArray = dirreadArray = dirwrittenArray = null; } if (historyLine.equals("EOL")) { @@ -628,9 +528,7 @@ public final class RelayDescriptorDatabaseImporter { reader.setMaxDescriptorsInQueue(10); File historyFile = new File(statsDirectory, "database-importer-relay-descriptor-history"); - if (keepImportHistory) { - reader.setHistoryFile(historyFile); - } + reader.setHistoryFile(historyFile); for (Descriptor descriptor : reader.readDescriptors( this.archivesDirectories.toArray( new File[this.archivesDirectories.size()]))) { @@ -641,9 +539,7 @@ public final class RelayDescriptorDatabaseImporter { this.addExtraInfoDescriptor((ExtraInfoDescriptor) descriptor); } } - if (keepImportHistory) { - reader.saveHistoryFile(historyFile); - } + reader.saveHistoryFile(historyFile); } log.info("Finished importing relay descriptors."); @@ -728,20 +624,6 @@ public final class RelayDescriptorDatabaseImporter { log.warn("Could not close database connection.", e); } } - - /* Close raw import files. */ - try { - if (this.statusentryOut != null) { - this.statusentryOut.write("\\.\n"); - this.statusentryOut.close(); - } - if (this.bwhistOut != null) { - this.bwhistOut.write("\\.\n"); - this.bwhistOut.close(); - } - } catch (IOException e) { - log.warn("Could not close one or more raw database import files.", e); - } } } diff --git a/src/main/resources/legacy.config.template b/src/main/resources/legacy.config.template index 5475c1e..e2e0dac 100644 --- a/src/main/resources/legacy.config.template +++ b/src/main/resources/legacy.config.template @@ -1,28 +1,8 @@ -## Import directory archives from disk, if available -#ImportDirectoryArchives 0 -# ## Relative paths to directories to import directory archives from #DirectoryArchivesDirectory /srv/metrics.torproject.org/metrics/shared/in/recent/relay-descriptors/consensuses/ #DirectoryArchivesDirectory /srv/metrics.torproject.org/metrics/shared/in/recent/relay-descriptors/server-descriptors/ #DirectoryArchivesDirectory /srv/metrics.torproject.org/metrics/shared/in/recent/relay-descriptors/extra-infos/ # -## Keep a history of imported directory archive files to know which files -## have been imported before. This history can be useful when importing -## from a changing source to avoid importing descriptors over and over -## again, but it can be confusing to users who don't know about it. -#KeepDirectoryArchiveImportHistory 0 -# -## Write relay descriptors to the database -#WriteRelayDescriptorDatabase 0 -# ## JDBC string for relay descriptor database #RelayDescriptorDatabaseJDBC jdbc:postgresql://localhost/tordir?user=metrics&password=password # -## Write relay descriptors to raw text files for importing them into the -## database using PostgreSQL's \copy command -#WriteRelayDescriptorsRawFiles 0 -# -## Relative path to directory to write raw text files; note that existing -## files will be overwritten! -#RelayDescriptorRawFilesDirectory pg-import/ -#

1 0

[metrics-web/release] Schedule changes related to #28603.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit b14dbc36ef2df218fc1f91b9e58c458347088935 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Wed Dec 5 11:53:45 2018 +0100 Schedule changes related to #28603. In roughly two weeks from now we're going to remove source parameters and output rows with aggregates over all sources from all Torperf/OnionPerf related graphs. Announce this change now, so that any folks using our CSV files get the chance to update their tools. --- src/main/resources/web/css/style.css | 9 +++++++-- src/main/resources/web/jsps/stats.jsp | 17 +++++++++-------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/src/main/resources/web/css/style.css b/src/main/resources/web/css/style.css index 590b31f..c24691e 100644 --- a/src/main/resources/web/css/style.css +++ b/src/main/resources/web/css/style.css @@ -897,9 +897,14 @@ body.noscript #navbar-toggle-checkbox:checked ~ .collapse { .onionoo th.method { width:10%; } .onionoo th.url { width:40%; } .onionoo th.description { width:50%; } -.onionoo span.red { + + +/* Styles used on our various protocol specification pages. */ + +span.red { color:#d9534f; } -.onionoo span.blue { +span.blue { color:#337ab7; } + diff --git a/src/main/resources/web/jsps/stats.jsp b/src/main/resources/web/jsps/stats.jsp index 3482320..17a49ac 100644 --- a/src/main/resources/web/jsps/stats.jsp +++ b/src/main/resources/web/jsps/stats.jsp @@ -48,6 +48,7 @@ https://metrics.torproject.org/identifier.csv <li>August 15, 2018: Made the first batch of changes to per-graph CSV files.</li> <li>September 15, 2018: Removed all pre-aggregated CSV files.</li> <li>October 28, 2018: Added and/or removed columns to <a href="#webstats-tb-platform">Tor Browser downloads and updates by platform</a> and <a href="#webstats-tb-locale">Tor Browser downloads and updates by locale</a> graphs.</li> +<li>December 20, 2018: Remove source parameters and output rows with aggregates over all sources from <a href="#torperf">Time to download files over Tor</a>, <a href="#torperf-failures">Timeouts and failures of downloading files over Tor</a>, <a href="#onionperf-buildtimes">Circuit build times</a>, <a href="#onionperf-latencies">Circuit round-trip latencies</a> graphs.</li> </ul> </div> @@ -520,7 +521,7 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>start: First UTC date (YYYY-MM-DD) to include in the file.</li> <li>end: Last UTC date (YYYY-MM-DD) to include in the file.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service. This parameter is going to be removed after December 20, 2018.</li> <li>server: Either "public" for requests to a server on the public internet, or "onion" for requests to a version 2 onion server.</li> <li>filesize: Size of the downloaded file in bytes, with pre-defined possible values: "50kb", "1mb", or "5mb".</li> </ul> @@ -530,7 +531,7 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>date: UTC date (YYYY-MM-DD) when download performance was measured.</li> <li>filesize: Size of the downloaded file in bytes.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them. Output rows with aggregates over all sources are going to be removed after December 20, 2018.</li> <li>server: Either "public" if the request was made to a server on the public internet, or "onion" if the request was made to a version 2 onion server.</li> <li>q1: First quartile of time in milliseconds until receiving the last byte.</li> <li>md: Median of time in milliseconds until receiving the last byte.</li> @@ -547,7 +548,7 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>start: First UTC date (YYYY-MM-DD) to include in the file.</li> <li>end: Last UTC date (YYYY-MM-DD) to include in the file.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service. This parameter is going to be removed after December 20, 2018.</li> <li>server: Either "public" for requests to a server on the public internet, or "onion" for requests to a version 2 onion server.</li> <li>filesize: Size of the downloaded file in bytes, with pre-defined possible values: "50kb", "1mb", or "5mb".</li> </ul> @@ -557,7 +558,7 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>date: UTC date (YYYY-MM-DD) when download performance was measured.</li> <li>filesize: Size of the downloaded file in bytes.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them. Output rows with aggregates over all sources are going to be removed after December 20, 2018.</li> <li>server: Either "public" if the request was made to a server on the public internet, or "onion" if the request was made to a version 2 onion server.</li> <li>timeouts: Fraction of requests that timed out when attempting to download the static file over Tor.</li> <li>failures: Fraction of requests that failed when attempting to download the static file over Tor.</li> @@ -573,14 +574,14 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>start: First UTC date (YYYY-MM-DD) to include in the file.</li> <li>end: Last UTC date (YYYY-MM-DD) to include in the file.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service. This parameter is going to be removed after December 20, 2018.</li> </ul> <h4>Columns</h4> <ul> <li>date: UTC date (YYYY-MM-DD) when download performance was measured.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them. Output rows with aggregates over all sources are going to be removed after December 20, 2018.</li> <li>position: Position in the circuit, from first to third hop.</li> <li>q1: First quartile of time in milliseconds until successfully extending the circuit to the given position.</li> <li>md: Median of time in milliseconds until successfully extending the circuit to the given position.</li> @@ -597,14 +598,14 @@ Performance <a href="#performance" name="performance" class="anchor">#</a></h2> <ul> <li>start: First UTC date (YYYY-MM-DD) to include in the file.</li> <li>end: Last UTC date (YYYY-MM-DD) to include in the file.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements, or "all" for measurements performed by any service. This parameter is going to be removed after December 20, 2018.</li> </ul> <h4>Columns</h4> <ul> <li>date: UTC date (YYYY-MM-DD) when download performance was measured.</li> -<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them.</li> +<li>source: Name of the OnionPerf or Torperf service performing measurements. If this column contains the empty string, all measurements are included, regardless of which service performed them. Output rows with aggregates over all sources are going to be removed after December 20, 2018.</li> <li>server: Either "public" if the request was made to a server on the public internet, or "onion" if the request was made to a version 2 onion server.</li> <li>q1: First quartile of time in milliseconds between sending the HTTP request and receiving the HTTP response header.</li> <li>md: Median of time in milliseconds between sending the HTTP request and receiving the HTTP response header.</li>

1 0

[metrics-web/release] Update module name for tordir.sql.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit bfcbed7d19594f523623ee4d708f453f3bb511b1 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Nov 22 15:59:30 2018 +0100 Update module name for tordir.sql. Part of #28116. --- src/main/sql/{legacy => bwhist}/tordir.sql | 0 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/src/main/sql/legacy/tordir.sql b/src/main/sql/bwhist/tordir.sql similarity index 100% rename from src/main/sql/legacy/tordir.sql rename to src/main/sql/bwhist/tordir.sql

1 0

[metrics-web/release] Stop generating servers.csv in legacy module.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit 6765b8ec054bfac54bff188087031e4cf6a46d64 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Mon Nov 12 10:28:06 2018 +0100 Stop generating servers.csv in legacy module. Also stop importing bridge network size statistics into the database. Required changes to existing legacy.config (removals): ImportSanitizedBridges SanitizedBridgesDirectory KeepSanitizedBridgesImportHistory WriteBridgeStats Required schema changes to live tordir databases: DROP VIEW stats_servers; CREATE OR REPLACE FUNCTION refresh_all() [...] DROP TABLE bridge_network_size; DROP FUNCTION refresh_relay_versions(); DROP FUNCTION refresh_relay_platforms(); DROP FUNCTION refresh_network_size(); DROP TABLE relay_versions; DROP TABLE relay_platforms; DROP TABLE relay_countries; DROP TABLE network_size; Part of #28116. --- .../metrics/stats/servers/Configuration.java | 35 -- .../stats/servers/ConsensusStatsFileHandler.java | 398 --------------------- .../org/torproject/metrics/stats/servers/Main.java | 17 - src/main/resources/legacy.config.template | 15 - src/main/sql/legacy/tordir.sql | 284 --------------- 5 files changed, 749 deletions(-) diff --git a/src/main/java/org/torproject/metrics/stats/servers/Configuration.java b/src/main/java/org/torproject/metrics/stats/servers/Configuration.java index 8435b90..c4597bc 100644 --- a/src/main/java/org/torproject/metrics/stats/servers/Configuration.java +++ b/src/main/java/org/torproject/metrics/stats/servers/Configuration.java @@ -30,12 +30,6 @@ public class Configuration { private boolean keepDirectoryArchiveImportHistory = false; - private boolean importSanitizedBridges = false; - - private String sanitizedBridgesDirectory = "in/bridge-descriptors/"; - - private boolean keepSanitizedBridgesImportHistory = false; - private boolean writeRelayDescriptorDatabase = false; private String relayDescriptorDatabaseJdbc = @@ -45,8 +39,6 @@ public class Configuration { private String relayDescriptorRawFilesDirectory = "pg-import/"; - private boolean writeBridgeStats = false; - /** Initializes this configuration class. */ public Configuration() { @@ -67,14 +59,6 @@ public class Configuration { } else if (line.startsWith("KeepDirectoryArchiveImportHistory")) { this.keepDirectoryArchiveImportHistory = Integer.parseInt( line.split(" ")[1]) != 0; - } else if (line.startsWith("ImportSanitizedBridges")) { - this.importSanitizedBridges = Integer.parseInt( - line.split(" ")[1]) != 0; - } else if (line.startsWith("SanitizedBridgesDirectory")) { - this.sanitizedBridgesDirectory = line.split(" ")[1]; - } else if (line.startsWith("KeepSanitizedBridgesImportHistory")) { - this.keepSanitizedBridgesImportHistory = Integer.parseInt( - line.split(" ")[1]) != 0; } else if (line.startsWith("WriteRelayDescriptorDatabase")) { this.writeRelayDescriptorDatabase = Integer.parseInt( line.split(" ")[1]) != 0; @@ -85,9 +69,6 @@ public class Configuration { line.split(" ")[1]) != 0; } else if (line.startsWith("RelayDescriptorRawFilesDirectory")) { this.relayDescriptorRawFilesDirectory = line.split(" ")[1]; - } else if (line.startsWith("WriteBridgeStats")) { - this.writeBridgeStats = Integer.parseInt( - line.split(" ")[1]) != 0; } else if (!line.startsWith("#") && line.length() > 0) { log.error("Configuration file contains unrecognized " + "configuration key in line '{}'! Exiting!", line); @@ -136,18 +117,6 @@ public class Configuration { return this.writeRelayDescriptorDatabase; } - public boolean getImportSanitizedBridges() { - return this.importSanitizedBridges; - } - - public String getSanitizedBridgesDirectory() { - return this.sanitizedBridgesDirectory; - } - - public boolean getKeepSanitizedBridgesImportHistory() { - return this.keepSanitizedBridgesImportHistory; - } - public String getRelayDescriptorDatabaseJdbc() { return this.relayDescriptorDatabaseJdbc; } @@ -159,9 +128,5 @@ public class Configuration { public String getRelayDescriptorRawFilesDirectory() { return this.relayDescriptorRawFilesDirectory; } - - public boolean getWriteBridgeStats() { - return this.writeBridgeStats; - } } diff --git a/src/main/java/org/torproject/metrics/stats/servers/ConsensusStatsFileHandler.java b/src/main/java/org/torproject/metrics/stats/servers/ConsensusStatsFileHandler.java deleted file mode 100644 index 960069c..0000000 --- a/src/main/java/org/torproject/metrics/stats/servers/ConsensusStatsFileHandler.java +++ /dev/null @@ -1,398 +0,0 @@ -/* Copyright 2011--2018 The Tor Project - * See LICENSE for licensing information */ - -package org.torproject.metrics.stats.servers; - -import org.torproject.descriptor.BridgeNetworkStatus; -import org.torproject.descriptor.Descriptor; -import org.torproject.descriptor.DescriptorReader; -import org.torproject.descriptor.DescriptorSourceFactory; -import org.torproject.descriptor.NetworkStatusEntry; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.BufferedReader; -import java.io.BufferedWriter; -import java.io.File; -import java.io.FileReader; -import java.io.FileWriter; -import java.io.IOException; -import java.sql.Connection; -import java.sql.DriverManager; -import java.sql.PreparedStatement; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.sql.Statement; -import java.text.ParseException; -import java.text.SimpleDateFormat; -import java.util.HashMap; -import java.util.Map; -import java.util.SortedMap; -import java.util.TimeZone; -import java.util.TreeMap; - -/** - * Generates statistics on the average number of relays and bridges per - * day. Accepts parse results from {@code RelayDescriptorParser} and - * {@code BridgeDescriptorParser} and stores them in intermediate - * result files {@code stats/consensus-stats-raw} and - * {@code stats/bridge-consensus-stats-raw}. Writes final results to - * {@code stats/consensus-stats} for all days for which at least half - * of the expected consensuses or statuses are known. - */ -public class ConsensusStatsFileHandler { - - /** - * Intermediate results file holding the number of running bridges per - * bridge status. - */ - private File bridgeConsensusStatsRawFile; - - /** - * Number of running bridges in a given bridge status. Map keys are the bridge - * status time formatted as "yyyy-MM-dd HH:mm:ss", a comma, and the bridge - * authority nickname, map values are lines as read from - * {@code stats/bridge-consensus-stats-raw}. - */ - private SortedMap<String, String> bridgesRaw; - - /** - * Average number of running bridges per day. Map keys are dates - * formatted as "yyyy-MM-dd", map values are the remaining columns as written - * to {@code stats/consensus-stats}. - */ - private SortedMap<String, String> bridgesPerDay; - - private static Logger log = LoggerFactory.getLogger( - ConsensusStatsFileHandler.class); - - private int bridgeResultsAdded = 0; - - /* Database connection string. */ - private String connectionUrl; - - private SimpleDateFormat dateTimeFormat; - - private File bridgesDir; - - private File statsDirectory; - - private boolean keepImportHistory; - - /** - * Initializes this class, including reading in intermediate results - * files {@code stats/consensus-stats-raw} and - * {@code stats/bridge-consensus-stats-raw} and final results file - * {@code stats/consensus-stats}. - */ - public ConsensusStatsFileHandler(String connectionUrl, - File bridgesDir, File statsDirectory, - boolean keepImportHistory) { - - if (bridgesDir == null || statsDirectory == null) { - throw new IllegalArgumentException(); - } - this.bridgesDir = bridgesDir; - this.statsDirectory = statsDirectory; - this.keepImportHistory = keepImportHistory; - - /* Initialize local data structures to hold intermediate and final - * results. */ - this.bridgesPerDay = new TreeMap<>(); - this.bridgesRaw = new TreeMap<>(); - - /* Initialize file names for intermediate and final results files. */ - this.bridgeConsensusStatsRawFile = new File( - "stats/bridge-consensus-stats-raw"); - - /* Initialize database connection string. */ - this.connectionUrl = connectionUrl; - - this.dateTimeFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - this.dateTimeFormat.setTimeZone(TimeZone.getTimeZone("UTC")); - - /* Read in number of running bridges per bridge status. */ - if (this.bridgeConsensusStatsRawFile.exists()) { - log.debug("Reading file {}...", - this.bridgeConsensusStatsRawFile.getAbsolutePath()); - try (BufferedReader br = new BufferedReader(new FileReader( - this.bridgeConsensusStatsRawFile))) { - String line; - while ((line = br.readLine()) != null) { - if (line.startsWith("date")) { - /* Skip headers. */ - continue; - } - String[] parts = line.split(","); - if (parts.length < 2 || parts.length > 4) { - log.warn("Corrupt line '{}' in file {}! Aborting to read this " - + "file!", line, - this.bridgeConsensusStatsRawFile.getAbsolutePath()); - break; - } - /* Assume that all lines without authority nickname are based on - * Tonga's network status, not Bifroest's. */ - String key = parts[0] + "," + (parts.length < 4 ? "Tonga" : parts[1]); - String value = null; - if (parts.length == 2) { - value = key + "," + parts[1] + ",0"; - } else if (parts.length == 3) { - value = key + "," + parts[1] + "," + parts[2]; - } else if (parts.length == 4) { - value = key + "," + parts[2] + "," + parts[3]; - } /* No more cases as we already checked the range above. */ - this.bridgesRaw.put(key, value); - } - log.debug("Finished reading file {}.", - this.bridgeConsensusStatsRawFile.getAbsolutePath()); - } catch (IOException e) { - log.warn("Failed to read file {}!", - this.bridgeConsensusStatsRawFile.getAbsolutePath(), e); - } - } - } - - /** - * Adds the intermediate results of the number of running bridges in a - * given bridge status to the existing observations. - */ - public void addBridgeConsensusResults(long publishedMillis, - String authorityNickname, int running, int runningEc2Bridges) { - String publishedAuthority = dateTimeFormat.format(publishedMillis) + "," - + authorityNickname; - String line = publishedAuthority + "," + running + "," + runningEc2Bridges; - if (!this.bridgesRaw.containsKey(publishedAuthority)) { - log.debug("Adding new bridge numbers: {}", line); - this.bridgesRaw.put(publishedAuthority, line); - this.bridgeResultsAdded++; - } else if (!line.equals(this.bridgesRaw.get(publishedAuthority))) { - log.warn("The numbers of running bridges we were just given ({}) are " - + "different from what we learned before ({})! Overwriting!", line, - this.bridgesRaw.get(publishedAuthority)); - this.bridgesRaw.put(publishedAuthority, line); - } - } - - /** Imports sanitized bridge descriptors. */ - public void importSanitizedBridges() { - if (bridgesDir.exists()) { - log.debug("Importing files in directory {}/...", bridgesDir); - DescriptorReader reader = - DescriptorSourceFactory.createDescriptorReader(); - File historyFile = new File(statsDirectory, - "consensus-stats-bridge-descriptor-history"); - if (keepImportHistory) { - reader.setHistoryFile(historyFile); - } - for (Descriptor descriptor : reader.readDescriptors(bridgesDir)) { - if (descriptor instanceof BridgeNetworkStatus) { - String descriptorFileName = descriptor.getDescriptorFile().getName(); - String authority = null; - if (descriptorFileName.contains( - "4A0CCD2DDC7995083D73F5D667100C8A5831F16D")) { - authority = "Tonga"; - } else if (descriptorFileName.contains( - "1D8F3A91C37C5D1C4C19B1AD1D0CFBE8BF72D8E1")) { - authority = "Bifroest"; - } else if (descriptorFileName.contains( - "BA44A889E64B93FAA2B114E02C2A279A8555C533")) { - authority = "Serge"; - } - if (authority == null) { - log.warn("Did not recognize the bridge authority that generated " - + "{}. Skipping.", descriptorFileName); - continue; - } - this.addBridgeNetworkStatus( - (BridgeNetworkStatus) descriptor, authority); - } - } - if (keepImportHistory) { - reader.saveHistoryFile(historyFile); - } - log.info("Finished importing bridge descriptors."); - } - } - - private void addBridgeNetworkStatus(BridgeNetworkStatus status, - String authority) { - int runningBridges = 0; - int runningEc2Bridges = 0; - for (NetworkStatusEntry statusEntry - : status.getStatusEntries().values()) { - if (statusEntry.getFlags().contains("Running")) { - runningBridges++; - if (statusEntry.getNickname().startsWith("ec2bridge")) { - runningEc2Bridges++; - } - } - } - this.addBridgeConsensusResults(status.getPublishedMillis(), authority, - runningBridges, runningEc2Bridges); - } - - /** - * Aggregates the raw observations on relay and bridge numbers and - * writes both raw and aggregate observations to disk. - */ - public void writeFiles() { - - /* Go through raw observations and put everything into nested maps by day - * and bridge authority. */ - Map<String, Map<String, int[]>> bridgesPerDayAndAuthority = new HashMap<>(); - for (String bridgesRawLine : this.bridgesRaw.values()) { - String[] parts = bridgesRawLine.split(","); - int brunning = Integer.parseInt(parts[2]); - if (brunning <= 0) { - /* Skip this status which contains zero bridges with the Running - * flag. */ - continue; - } - String date = bridgesRawLine.substring(0, 10); - bridgesPerDayAndAuthority.putIfAbsent(date, new TreeMap<>()); - String authority = parts[1]; - bridgesPerDayAndAuthority.get(date).putIfAbsent(authority, new int[3]); - int[] bridges = bridgesPerDayAndAuthority.get(date).get(authority); - bridges[0] += brunning; - bridges[1] += Integer.parseInt(parts[3]); - bridges[2]++; - } - - /* Sum up average numbers of running bridges per day reported by all bridge - * authorities and add these averages to final results. */ - for (Map.Entry<String, Map<String, int[]>> perDay - : bridgesPerDayAndAuthority.entrySet()) { - String date = perDay.getKey(); - int brunning = 0; - int brunningEc2 = 0; - for (int[] perAuthority : perDay.getValue().values()) { - int statuses = perAuthority[2]; - if (statuses < 12) { - /* Only write results if we have seen at least a dozen statuses. */ - continue; - } - brunning += perAuthority[0] / statuses; - brunningEc2 += perAuthority[1] / statuses; - } - String line = "," + brunning + "," + brunningEc2; - /* Are our results new? */ - if (!this.bridgesPerDay.containsKey(date)) { - log.debug("Adding new average bridge numbers: {}{}", date, line); - this.bridgesPerDay.put(date, line); - } else if (!line.equals(this.bridgesPerDay.get(date))) { - log.debug("Replacing existing average bridge numbers ({} with new " - + "numbers: {}", this.bridgesPerDay.get(date), line); - this.bridgesPerDay.put(date, line); - } - } - - /* Write raw numbers of running bridges to disk. */ - log.debug("Writing file {}...", - this.bridgeConsensusStatsRawFile.getAbsolutePath()); - this.bridgeConsensusStatsRawFile.getParentFile().mkdirs(); - try (BufferedWriter bw = new BufferedWriter( - new FileWriter(this.bridgeConsensusStatsRawFile))) { - bw.append("datetime,authority,brunning,brunningec2"); - bw.newLine(); - for (String line : this.bridgesRaw.values()) { - bw.append(line); - bw.newLine(); - } - log.debug("Finished writing file {}.", - this.bridgeConsensusStatsRawFile.getAbsolutePath()); - } catch (IOException e) { - log.warn("Failed to write file {}!", - this.bridgeConsensusStatsRawFile.getAbsolutePath(), e); - } - - /* Add average number of bridges per day to the database. */ - if (connectionUrl != null) { - try { - Map<String, String> updateRows = new HashMap<>(); - Map<String, String> insertRows = new HashMap<>(this.bridgesPerDay); - Connection conn = DriverManager.getConnection(connectionUrl); - conn.setAutoCommit(false); - Statement statement = conn.createStatement(); - ResultSet rs = statement.executeQuery( - "SELECT date, avg_running, avg_running_ec2 " - + "FROM bridge_network_size"); - while (rs.next()) { - String date = rs.getDate(1).toString(); - if (insertRows.containsKey(date)) { - String insertRow = insertRows.remove(date); - String[] parts = insertRow.substring(1).split(","); - long newAvgRunning = Long.parseLong(parts[0]); - long newAvgRunningEc2 = Long.parseLong(parts[1]); - long oldAvgRunning = rs.getLong(2); - long oldAvgRunningEc2 = rs.getLong(3); - if (newAvgRunning != oldAvgRunning - || newAvgRunningEc2 != oldAvgRunningEc2) { - updateRows.put(date, insertRow); - } - } - } - rs.close(); - PreparedStatement psU = conn.prepareStatement( - "UPDATE bridge_network_size SET avg_running = ?, " - + "avg_running_ec2 = ? WHERE date = ?"); - for (Map.Entry<String, String> e : updateRows.entrySet()) { - java.sql.Date date = java.sql.Date.valueOf(e.getKey()); - String[] parts = e.getValue().substring(1).split(","); - long avgRunning = Long.parseLong(parts[0]); - long avgRunningEc2 = Long.parseLong(parts[1]); - psU.clearParameters(); - psU.setLong(1, avgRunning); - psU.setLong(2, avgRunningEc2); - psU.setDate(3, date); - psU.executeUpdate(); - } - PreparedStatement psI = conn.prepareStatement( - "INSERT INTO bridge_network_size (avg_running, " - + "avg_running_ec2, date) VALUES (?, ?, ?)"); - for (Map.Entry<String, String> e : insertRows.entrySet()) { - java.sql.Date date = java.sql.Date.valueOf(e.getKey()); - String[] parts = e.getValue().substring(1).split(","); - long avgRunning = Long.parseLong(parts[0]); - long avgRunningEc2 = Long.parseLong(parts[1]); - psI.clearParameters(); - psI.setLong(1, avgRunning); - psI.setLong(2, avgRunningEc2); - psI.setDate(3, date); - psI.executeUpdate(); - } - conn.commit(); - conn.close(); - } catch (SQLException e) { - log.warn("Failed to add average bridge numbers to database.", e); - } - } - - /* Write stats. */ - StringBuilder dumpStats = new StringBuilder("Finished writing " - + "statistics on bridge network statuses to disk.\nAdded " - + this.bridgeResultsAdded + " bridge network status(es) in this " - + "execution."); - long now = System.currentTimeMillis(); - SimpleDateFormat dateTimeFormat = - new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - dateTimeFormat.setTimeZone(TimeZone.getTimeZone("UTC")); - if (this.bridgesRaw.isEmpty()) { - dumpStats.append("\nNo bridge status known yet."); - } else { - dumpStats.append("\nLast known bridge status was published ") - .append(this.bridgesRaw.lastKey()).append("."); - try { - if (now - 6L * 60L * 60L * 1000L > dateTimeFormat.parse( - this.bridgesRaw.lastKey()).getTime()) { - log.warn("Last known bridge status is more than 6 hours old: {}", - this.bridgesRaw.lastKey()); - } - } catch (ParseException e) { - log.warn("Can't parse the timestamp? Reason: {}", e); - } - } - log.info(dumpStats.toString()); - } -} - diff --git a/src/main/java/org/torproject/metrics/stats/servers/Main.java b/src/main/java/org/torproject/metrics/stats/servers/Main.java index 080b6e4..4d349bc 100644 --- a/src/main/java/org/torproject/metrics/stats/servers/Main.java +++ b/src/main/java/org/torproject/metrics/stats/servers/Main.java @@ -54,23 +54,6 @@ public class Main { } } - // Prepare consensus stats file handler (used for stats on running - // bridges only) - ConsensusStatsFileHandler csfh = config.getWriteBridgeStats() - ? new ConsensusStatsFileHandler( - config.getRelayDescriptorDatabaseJdbc(), - new File(config.getSanitizedBridgesDirectory()), - statsDirectory, config.getKeepSanitizedBridgesImportHistory()) - : null; - - // Import sanitized bridges and write updated stats files to disk - if (csfh != null) { - if (config.getImportSanitizedBridges()) { - csfh.importSanitizedBridges(); - } - csfh.writeFiles(); - } - // Remove lock file lf.releaseLock(); diff --git a/src/main/resources/legacy.config.template b/src/main/resources/legacy.config.template index afa8f2d..5475c1e 100644 --- a/src/main/resources/legacy.config.template +++ b/src/main/resources/legacy.config.template @@ -12,18 +12,6 @@ ## again, but it can be confusing to users who don't know about it. #KeepDirectoryArchiveImportHistory 0 # -## Import sanitized bridges from disk, if available -#ImportSanitizedBridges 0 -# -## Relative path to directory to import sanitized bridges from -#SanitizedBridgesDirectory /srv/metrics.torproject.org/metrics/shared/in/recent/bridge-descriptors/ -# -## Keep a history of imported sanitized bridge descriptors. This history -## can be useful when importing from a changing data source to avoid -## importing descriptors more than once, but it can be confusing to users -## who don't know about it. -#KeepSanitizedBridgesImportHistory 0 -# ## Write relay descriptors to the database #WriteRelayDescriptorDatabase 0 # @@ -38,6 +26,3 @@ ## files will be overwritten! #RelayDescriptorRawFilesDirectory pg-import/ # -## Write bridge stats to disk -#WriteBridgeStats 0 -# diff --git a/src/main/sql/legacy/tordir.sql b/src/main/sql/legacy/tordir.sql index 16e7166..f1d6767 100644 --- a/src/main/sql/legacy/tordir.sql +++ b/src/main/sql/legacy/tordir.sql @@ -104,53 +104,6 @@ CREATE TABLE consensus ( CONSTRAINT consensus_pkey PRIMARY KEY (validafter) ); --- TABLE network_size -CREATE TABLE network_size ( - date DATE NOT NULL, - avg_running INTEGER NOT NULL, - avg_exit INTEGER NOT NULL, - avg_guard INTEGER NOT NULL, - avg_fast INTEGER NOT NULL, - avg_stable INTEGER NOT NULL, - avg_authority INTEGER NOT NULL, - avg_badexit INTEGER NOT NULL, - avg_baddirectory INTEGER NOT NULL, - avg_hsdir INTEGER NOT NULL, - avg_named INTEGER NOT NULL, - avg_unnamed INTEGER NOT NULL, - avg_valid INTEGER NOT NULL, - avg_v2dir INTEGER NOT NULL, - avg_v3dir INTEGER NOT NULL, - CONSTRAINT network_size_pkey PRIMARY KEY(date) -); - --- TABLE relay_countries -CREATE TABLE relay_countries ( - date DATE NOT NULL, - country CHARACTER(2) NOT NULL, - relays INTEGER NOT NULL, - CONSTRAINT relay_countries_pkey PRIMARY KEY(date, country) -); - --- TABLE relay_platforms -CREATE TABLE relay_platforms ( - date DATE NOT NULL, - avg_linux INTEGER NOT NULL, - avg_darwin INTEGER NOT NULL, - avg_bsd INTEGER NOT NULL, - avg_windows INTEGER NOT NULL, - avg_other INTEGER NOT NULL, - CONSTRAINT relay_platforms_pkey PRIMARY KEY(date) -); - --- TABLE relay_versions -CREATE TABLE relay_versions ( - date DATE NOT NULL, - version CHARACTER(5) NOT NULL, - relays INTEGER NOT NULL, - CONSTRAINT relay_versions_pkey PRIMARY KEY(date, version) -); - -- TABLE bandwidth_flags CREATE TABLE bandwidth_flags ( date DATE NOT NULL, @@ -299,157 +252,6 @@ $$ LANGUAGE plpgsql; -- They find what new data has been entered or updated based on the -- updates table. --- FUNCTION refresh_network_size() -CREATE OR REPLACE FUNCTION refresh_network_size() RETURNS INTEGER AS $$ - DECLARE - min_date TIMESTAMP WITHOUT TIME ZONE; - max_date TIMESTAMP WITHOUT TIME ZONE; - BEGIN - - min_date := (SELECT MIN(date) FROM updates); - max_date := (SELECT MAX(date) + 1 FROM updates); - - DELETE FROM network_size - WHERE date IN (SELECT date FROM updates); - - EXECUTE ' - INSERT INTO network_size - (date, avg_running, avg_exit, avg_guard, avg_fast, avg_stable, - avg_authority, avg_badexit, avg_baddirectory, avg_hsdir, - avg_named, avg_unnamed, avg_valid, avg_v2dir, avg_v3dir) - SELECT date, - isrunning / count AS avg_running, - isexit / count AS avg_exit, - isguard / count AS avg_guard, - isfast / count AS avg_fast, - isstable / count AS avg_stable, - isauthority / count as avg_authority, - isbadexit / count as avg_badexit, - isbaddirectory / count as avg_baddirectory, - ishsdir / count as avg_hsdir, - isnamed / count as avg_named, - isunnamed / count as avg_unnamed, - isvalid / count as avg_valid, - isv2dir / count as avg_v2dir, - isv3dir / count as avg_v3dir - FROM ( - SELECT DATE(validafter) AS date, - COUNT(*) AS isrunning, - COUNT(NULLIF(isexit, FALSE)) AS isexit, - COUNT(NULLIF(isguard, FALSE)) AS isguard, - COUNT(NULLIF(isfast, FALSE)) AS isfast, - COUNT(NULLIF(isstable, FALSE)) AS isstable, - COUNT(NULLIF(isauthority, FALSE)) AS isauthority, - COUNT(NULLIF(isbadexit, FALSE)) AS isbadexit, - COUNT(NULLIF(isbaddirectory, FALSE)) AS isbaddirectory, - COUNT(NULLIF(ishsdir, FALSE)) AS ishsdir, - COUNT(NULLIF(isnamed, FALSE)) AS isnamed, - COUNT(NULLIF(isunnamed, FALSE)) AS isunnamed, - COUNT(NULLIF(isvalid, FALSE)) AS isvalid, - COUNT(NULLIF(isv2dir, FALSE)) AS isv2dir, - COUNT(NULLIF(isv3dir, FALSE)) AS isv3dir - FROM statusentry - WHERE isrunning = TRUE - AND validafter >= ''' || min_date || ''' - AND validafter < ''' || max_date || ''' - AND DATE(validafter) IN (SELECT date FROM updates) - GROUP BY DATE(validafter) - ) b - NATURAL JOIN relay_statuses_per_day'; - - RETURN 1; - END; -$$ LANGUAGE plpgsql; - --- FUNCTION refresh_relay_platforms() -CREATE OR REPLACE FUNCTION refresh_relay_platforms() RETURNS INTEGER AS $$ - DECLARE - min_date TIMESTAMP WITHOUT TIME ZONE; - max_date TIMESTAMP WITHOUT TIME ZONE; - BEGIN - - min_date := (SELECT MIN(date) FROM updates); - max_date := (SELECT MAX(date) + 1 FROM updates); - - DELETE FROM relay_platforms - WHERE date IN (SELECT date FROM updates); - - EXECUTE ' - INSERT INTO relay_platforms - (date, avg_linux, avg_darwin, avg_bsd, avg_windows, avg_other) - SELECT date, - linux / count AS avg_linux, - darwin / count AS avg_darwin, - bsd / count AS avg_bsd, - windows / count AS avg_windows, - other / count AS avg_other - FROM ( - SELECT DATE(validafter) AS date, - SUM(CASE WHEN platform LIKE ''%Linux%'' THEN 1 ELSE 0 END) - AS linux, - SUM(CASE WHEN platform LIKE ''%Darwin%'' THEN 1 ELSE 0 END) - AS darwin, - SUM(CASE WHEN platform LIKE ''%BSD%'' THEN 1 ELSE 0 END) - AS bsd, - SUM(CASE WHEN platform LIKE ''%Windows%'' THEN 1 ELSE 0 END) - AS windows, - SUM(CASE WHEN platform NOT LIKE ''%Windows%'' - AND platform NOT LIKE ''%Darwin%'' - AND platform NOT LIKE ''%BSD%'' - AND platform NOT LIKE ''%Linux%'' THEN 1 ELSE 0 END) - AS other - FROM descriptor - RIGHT JOIN statusentry - ON statusentry.descriptor = descriptor.descriptor - WHERE isrunning = TRUE - AND validafter >= ''' || min_date || ''' - AND validafter < ''' || max_date || ''' - AND DATE(validafter) IN (SELECT date FROM updates) - GROUP BY DATE(validafter) - ) b - NATURAL JOIN relay_statuses_per_day'; - - RETURN 1; - END; -$$ LANGUAGE plpgsql; - --- FUNCTION refresh_relay_versions() -CREATE OR REPLACE FUNCTION refresh_relay_versions() RETURNS INTEGER AS $$ - DECLARE - min_date TIMESTAMP WITHOUT TIME ZONE; - max_date TIMESTAMP WITHOUT TIME ZONE; - BEGIN - - min_date := (SELECT MIN(date) FROM updates); - max_date := (SELECT MAX(date) + 1 FROM updates); - - DELETE FROM relay_versions - WHERE date IN (SELECT date FROM updates); - - EXECUTE ' - INSERT INTO relay_versions - (date, version, relays) - SELECT date, version, relays / count AS relays - FROM ( - SELECT DATE(validafter), - CASE WHEN platform LIKE ''Tor 0._._%'' THEN - SUBSTRING(platform, 5, 5) ELSE ''Other'' END AS version, - COUNT(*) AS relays - FROM descriptor RIGHT JOIN statusentry - ON descriptor.descriptor = statusentry.descriptor - WHERE isrunning = TRUE - AND platform IS NOT NULL - AND validafter >= ''' || min_date || ''' - AND validafter < ''' || max_date || ''' - AND DATE(validafter) IN (SELECT date FROM updates) - GROUP BY 1, 2 - ) b - NATURAL JOIN relay_statuses_per_day'; - - RETURN 1; - END; -$$ LANGUAGE plpgsql; - CREATE OR REPLACE FUNCTION refresh_bandwidth_flags() RETURNS INTEGER AS $$ DECLARE min_date TIMESTAMP WITHOUT TIME ZONE; @@ -581,20 +383,6 @@ CREATE OR REPLACE FUNCTION refresh_user_stats() RETURNS INTEGER AS $$ END; $$ LANGUAGE plpgsql; --- non-relay statistics --- The following tables contain pre-aggregated statistics that are not --- based on relay descriptors or that are not yet derived from the relay --- descriptors in the database. - --- TABLE bridge_network_size --- Contains average number of running bridges. -CREATE TABLE bridge_network_size ( - "date" DATE NOT NULL, - avg_running INTEGER NOT NULL, - avg_running_ec2 INTEGER NOT NULL, - CONSTRAINT bridge_network_size_pkey PRIMARY KEY(date) -); - -- Refresh all statistics in the database. CREATE OR REPLACE FUNCTION refresh_all() RETURNS INTEGER AS $$ BEGIN @@ -605,12 +393,6 @@ CREATE OR REPLACE FUNCTION refresh_all() RETURNS INTEGER AS $$ INSERT INTO updates SELECT * FROM scheduled_updates; RAISE NOTICE '% Refreshing relay statuses per day.', timeofday(); PERFORM refresh_relay_statuses_per_day(); - RAISE NOTICE '% Refreshing network size.', timeofday(); - PERFORM refresh_network_size(); - RAISE NOTICE '% Refreshing relay platforms.', timeofday(); - PERFORM refresh_relay_platforms(); - RAISE NOTICE '% Refreshing relay versions.', timeofday(); - PERFORM refresh_relay_versions(); RAISE NOTICE '% Refreshing total relay bandwidth.', timeofday(); PERFORM refresh_bandwidth_flags(); RAISE NOTICE '% Refreshing bandwidth history.', timeofday(); @@ -630,72 +412,6 @@ CREATE OR REPLACE FUNCTION refresh_all() RETURNS INTEGER AS $$ END; $$ LANGUAGE plpgsql; --- View for exporting server statistics. -CREATE VIEW stats_servers AS - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - NULL AS platform, TRUE AS ec2bridge, NULL AS relays, - avg_running_ec2 AS bridges FROM bridge_network_size - WHERE date < current_date) -UNION ALL - (SELECT COALESCE(network_size.date, bridge_network_size.date) AS date, - NULL AS flag, NULL AS country, NULL AS version, NULL AS platform, - NULL AS ec2bridge, network_size.avg_running AS relays, - bridge_network_size.avg_running AS bridges FROM network_size - FULL OUTER JOIN bridge_network_size - ON network_size.date = bridge_network_size.date - WHERE COALESCE(network_size.date, bridge_network_size.date) < - current_date) -UNION ALL - (SELECT date, 'Exit' AS flag, NULL AS country, NULL AS version, - NULL AS platform, NULL AS ec2bridge, avg_exit AS relays, - NULL AS bridges FROM network_size WHERE date < current_date) -UNION ALL - (SELECT date, 'Guard' AS flag, NULL AS country, NULL AS version, - NULL AS platform, NULL AS ec2bridge, avg_guard AS relays, - NULL AS bridges FROM network_size WHERE date < current_date) -UNION ALL - (SELECT date, 'Fast' AS flag, NULL AS country, NULL AS version, - NULL AS platform, NULL AS ec2bridge, avg_fast AS relays, - NULL AS bridges FROM network_size WHERE date < current_date) -UNION ALL - (SELECT date, 'Stable' AS flag, NULL AS country, NULL AS version, - NULL AS platform, NULL AS ec2bridge, avg_stable AS relays, - NULL AS bridges FROM network_size WHERE date < current_date) -UNION ALL - (SELECT date, 'HSDir' AS flag, NULL AS country, NULL AS version, - NULL AS platform, NULL AS ec2bridge, avg_hsdir AS relays, - NULL AS bridges FROM network_size WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, CASE WHEN country != 'zz' THEN country - ELSE '??' END AS country, NULL AS version, NULL AS platform, - NULL AS ec2bridge, relays, NULL AS bridges FROM relay_countries - WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, version, NULL AS platform, - NULL AS ec2bridge, relays, NULL AS bridges FROM relay_versions - WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - 'Linux' AS platform, NULL AS ec2bridge, avg_linux AS relays, - NULL AS bridges FROM relay_platforms WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - 'Darwin' AS platform, NULL AS ec2bridge, avg_darwin AS relays, - NULL AS bridges FROM relay_platforms WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - 'BSD' AS platform, NULL AS ec2bridge, avg_bsd AS relays, - NULL AS bridges FROM relay_platforms WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - 'Windows' AS platform, NULL AS ec2bridge, avg_windows AS relays, - NULL AS bridges FROM relay_platforms WHERE date < current_date) -UNION ALL - (SELECT date, NULL AS flag, NULL AS country, NULL AS version, - 'Other' AS platform, NULL AS ec2bridge, avg_other AS relays, - NULL AS bridges FROM relay_platforms WHERE date < current_date) -ORDER BY date, flag, country, version, platform, ec2bridge; - -- View for exporting bandwidth statistics. CREATE VIEW stats_bandwidth AS (SELECT COALESCE(bandwidth_flags.date, bwhist_flags.date) AS date,

1 0

[metrics-web/release] Only include Running relays in totalcw graph.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit 9df357886ebfca86da41fb7835282a3b57d249c0 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Nov 29 08:45:10 2018 +0100 Only include Running relays in totalcw graph. Previously we included measured bandwidths of all relays in a vote in the totalcw graph. Now we only include relays with the Running flag in the vote. Implements #28137. --- src/main/java/org/torproject/metrics/stats/totalcw/Parser.java | 3 ++- .../metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java b/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java index 893184c..b6a35b4 100644 --- a/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java +++ b/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java @@ -19,7 +19,8 @@ class Parser { RelayNetworkStatusVote vote) { Long measuredSum = null; for (NetworkStatusEntry entry : vote.getStatusEntries().values()) { - if (entry.getMeasured() < 0L) { + if (null == entry.getFlags() || !entry.getFlags().contains("Running") + || entry.getMeasured() < 0L) { continue; } if (null == measuredSum) { diff --git a/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java b/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java index 11f931d..7c5ecc7 100644 --- a/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java +++ b/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java @@ -39,15 +39,15 @@ public class TotalcwRelayNetworkStatusVoteTest { { "2018-10-15-00-00-00-vote-27102BC123E7AF1D4741AE047E160C91ADC76B21-" + "049AB3179B12DACC391F06A10C2A8904E4339D33.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "bastet", "27102BC123E7AF1D4741AE047E160C91ADC76B21", 138803L }, + "bastet", "27102BC123E7AF1D4741AE047E160C91ADC76B21", 138700L }, { "2018-10-15-00-00-00-vote-ED03BB616EB2F60BEC80151114BB25CEF515B226-" + "2669AD153408F88E416CE6206D1A75EC3324A2F4.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "gabelmoo", "ED03BB616EB2F60BEC80151114BB25CEF515B226", 133441L }, + "gabelmoo", "ED03BB616EB2F60BEC80151114BB25CEF515B226", 133370L }, { "2018-10-15-00-00-00-vote-EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97-" + "38C6A19F78948B689345EE41D7119D76246C4D3E.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "Faravahar", "EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97", 158534L } + "Faravahar", "EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97", 158395L } }); }

1 0

[metrics-web/release] Break down totalcw numbers by Guard/(Bad)Exit flags.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit 7b39042ecfae011d5891af963ae94bd35141fbc0 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Nov 29 10:09:28 2018 +0100 Break down totalcw numbers by Guard/(Bad)Exit flags. Requires updating the vote table and the totalcw view in the database. Implements #28328. --- src/main/R/rserver/graphs.R | 4 ++- .../torproject/metrics/stats/totalcw/Database.java | 36 ++++++++++------------ .../metrics/stats/totalcw/OutputLine.java | 12 ++++++-- .../torproject/metrics/stats/totalcw/Parser.java | 17 +++++----- .../totalcw/TotalcwRelayNetworkStatusVote.java | 6 ++-- src/main/sql/totalcw/init-totalcw.sql | 16 +++++++--- .../totalcw/TotalcwRelayNetworkStatusVoteTest.java | 18 ++++++----- 7 files changed, 66 insertions(+), 43 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index df108e2..e3ac598 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -1562,7 +1562,9 @@ prepare_totalcw <- function(start_p, end_p) { filter(if (!is.null(start_p)) valid_after_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) - valid_after_date <= as.Date(end_p) else TRUE) + valid_after_date <= as.Date(end_p) else TRUE) %>% + group_by(valid_after_date, nickname) %>% + summarize(measured_sum_avg = sum(measured_sum_avg)) } plot_totalcw <- function(start_p, end_p, path_p) { diff --git a/src/main/java/org/torproject/metrics/stats/totalcw/Database.java b/src/main/java/org/torproject/metrics/stats/totalcw/Database.java index 66b0366..b6dc87c 100644 --- a/src/main/java/org/torproject/metrics/stats/totalcw/Database.java +++ b/src/main/java/org/torproject/metrics/stats/totalcw/Database.java @@ -66,9 +66,8 @@ class Database implements AutoCloseable { "SELECT EXISTS (SELECT 1 FROM vote " + "WHERE valid_after = ? AND authority_id = ?)"); this.psVoteInsert = this.connection.prepareStatement( - "INSERT INTO vote (valid_after, authority_id, measured_sum) " - + "VALUES (?, ?, ?)", - Statement.RETURN_GENERATED_KEYS); + "INSERT INTO vote (valid_after, authority_id, have_guard_flag, " + + "have_exit_flag, measured_sum) VALUES (?, ?, ?, ?, ?)"); } /** Insert a parsed vote into the vote table. */ @@ -116,22 +115,17 @@ class Database implements AutoCloseable { } } } - int voteId = -1; - this.psVoteInsert.clearParameters(); - this.psVoteInsert.setTimestamp(1, - Timestamp.from(ZonedDateTime.of(vote.validAfter, - ZoneId.of("UTC")).toInstant()), calendar); - this.psVoteInsert.setInt(2, authorityId); - this.psVoteInsert.setLong(3, vote.measuredSum); - this.psVoteInsert.execute(); - try (ResultSet rs = this.psVoteInsert.getGeneratedKeys()) { - if (rs.next()) { - voteId = rs.getInt(1); - } - } - if (voteId < 0) { - throw new SQLException("Could not retrieve auto-generated key for new " - + "vote entry."); + for (int measuredSumsIndex = 0; measuredSumsIndex < 4; + measuredSumsIndex++) { + this.psVoteInsert.clearParameters(); + this.psVoteInsert.setTimestamp(1, + Timestamp.from(ZonedDateTime.of(vote.validAfter, + ZoneId.of("UTC")).toInstant()), calendar); + this.psVoteInsert.setInt(2, authorityId); + this.psVoteInsert.setBoolean(3, 1 == (measuredSumsIndex & 1)); + this.psVoteInsert.setBoolean(4, 2 == (measuredSumsIndex & 2)); + this.psVoteInsert.setLong(5, vote.measuredSums[measuredSumsIndex]); + this.psVoteInsert.execute(); } } @@ -159,6 +153,10 @@ class Database implements AutoCloseable { outputLine.validAfterDate = rs.getDate( OutputLine.Column.VALID_AFTER_DATE.name(), calendar).toLocalDate(); outputLine.nickname = rs.getString(OutputLine.Column.NICKNAME.name()); + outputLine.haveGuardFlag = rs.getBoolean( + OutputLine.Column.HAVE_GUARD_FLAG.name()); + outputLine.haveExitFlag = rs.getBoolean( + OutputLine.Column.HAVE_EXIT_FLAG.name()); outputLine.measuredSumAvg = rs.getLong( OutputLine.Column.MEASURED_SUM_AVG.name()); statistics.add(outputLine); diff --git a/src/main/java/org/torproject/metrics/stats/totalcw/OutputLine.java b/src/main/java/org/torproject/metrics/stats/totalcw/OutputLine.java index 450dbac..5587e5d 100644 --- a/src/main/java/org/torproject/metrics/stats/totalcw/OutputLine.java +++ b/src/main/java/org/torproject/metrics/stats/totalcw/OutputLine.java @@ -13,7 +13,8 @@ class OutputLine { /** Column names used in the database and in the first line of the output * file. */ enum Column { - VALID_AFTER_DATE, NICKNAME, MEASURED_SUM_AVG + VALID_AFTER_DATE, NICKNAME, HAVE_GUARD_FLAG, HAVE_EXIT_FLAG, + MEASURED_SUM_AVG } /** Column headers joined together with the given delimiter. */ @@ -28,6 +29,12 @@ class OutputLine { /** Server type, which can be "relay" or "bridge". */ String nickname; + /** Whether contained relays all have the "Guard" flag. */ + boolean haveGuardFlag; + + /** Whether contained relays all have the "Exit" flag. */ + boolean haveExitFlag; + /** Mean value of total measured bandwidths of all relays over the day. */ Long measuredSumAvg; @@ -35,7 +42,8 @@ class OutputLine { * file. */ @Override public String toString() { - return String.format("%s,%s,%d", validAfterDate, nickname, measuredSumAvg); + return String.format("%s,%s,%s,%s,%d", validAfterDate, nickname, + haveGuardFlag ? "t" : "f", haveExitFlag ? "t" : "f", measuredSumAvg); } } diff --git a/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java b/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java index b6a35b4..6070822 100644 --- a/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java +++ b/src/main/java/org/torproject/metrics/stats/totalcw/Parser.java @@ -17,18 +17,21 @@ class Parser { * contain any bandwidth measurements. */ TotalcwRelayNetworkStatusVote parseRelayNetworkStatusVote( RelayNetworkStatusVote vote) { - Long measuredSum = null; + boolean containsMeasuredBandwidths = false; + long[] measuredSums = new long[4]; for (NetworkStatusEntry entry : vote.getStatusEntries().values()) { if (null == entry.getFlags() || !entry.getFlags().contains("Running") || entry.getMeasured() < 0L) { continue; } - if (null == measuredSum) { - measuredSum = 0L; - } - measuredSum += entry.getMeasured(); + containsMeasuredBandwidths = true; + /* Encode flags as sum of Guard = 1 and (Exit and !BadExit) = 2. */ + int measuredSumsIndex = (entry.getFlags().contains("Guard") ? 1 : 0) + + (entry.getFlags().contains("Exit") + && !entry.getFlags().contains("BadExit") ? 2 : 0); + measuredSums[measuredSumsIndex] += entry.getMeasured(); } - if (null == measuredSum) { + if (!containsMeasuredBandwidths) { /* Return null, because we wouldn't want to add this vote to the database * anyway. */ return null; @@ -39,7 +42,7 @@ class Parser { .atZone(ZoneId.of("UTC")).toLocalDateTime(); parsedVote.identityHex = vote.getIdentity(); parsedVote.nickname = vote.getNickname(); - parsedVote.measuredSum = measuredSum; + parsedVote.measuredSums = measuredSums; return parsedVote; } } diff --git a/src/main/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVote.java b/src/main/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVote.java index ff56d91..0c5a095 100644 --- a/src/main/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVote.java +++ b/src/main/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVote.java @@ -19,7 +19,9 @@ class TotalcwRelayNetworkStatusVote { * key. */ String identityHex; - /** Sum of bandwidth measurements of all contained status entries. */ - long measuredSum; + /** Sums of bandwidth measurements of all contained status entries with four + * entries: 0 = neither Exit nor Guard, 1 = only Guard, 2 = only Exit, and + * 3 = both Guard and Exit. */ + long[] measuredSums; } diff --git a/src/main/sql/totalcw/init-totalcw.sql b/src/main/sql/totalcw/init-totalcw.sql index d723adb..cdba275 100644 --- a/src/main/sql/totalcw/init-totalcw.sql +++ b/src/main/sql/totalcw/init-totalcw.sql @@ -31,21 +31,27 @@ CREATE TABLE vote ( -- Numeric identifier uniquely identifying the authority generating this vote. authority_id INTEGER REFERENCES authority (authority_id), + -- Whether contained relays had the Guard flag assigned. + have_guard_flag BOOLEAN NOT NULL, + + -- Whether contained relays had the Exit flag assigned. + have_exit_flag BOOLEAN NOT NULL, + -- Sum of bandwidth measurements of all contained status entries. measured_sum BIGINT NOT NULL, - UNIQUE (valid_after, authority_id) + UNIQUE (valid_after, authority_id, have_guard_flag, have_exit_flag) ); -- View on aggregated total consensus weight statistics in a format that is -- compatible for writing to an output CSV file. Votes are only included in the -- output if at least 12 votes are known for a given authority and day. CREATE OR REPLACE VIEW totalcw AS -SELECT DATE(valid_after) AS valid_after_date, nickname, - FLOOR(AVG(measured_sum)) AS measured_sum_avg +SELECT DATE(valid_after) AS valid_after_date, nickname, have_guard_flag, + have_exit_flag, FLOOR(AVG(measured_sum)) AS measured_sum_avg FROM vote NATURAL JOIN authority -GROUP BY DATE(valid_after), nickname +GROUP BY DATE(valid_after), nickname, have_guard_flag, have_exit_flag HAVING COUNT(vote_id) >= 12 AND DATE(valid_after) < (SELECT MAX(DATE(valid_after)) FROM vote) -ORDER BY DATE(valid_after), nickname; +ORDER BY DATE(valid_after), nickname, have_guard_flag, have_exit_flag; diff --git a/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java b/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java index 7c5ecc7..189b3b7 100644 --- a/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java +++ b/src/test/java/org/torproject/metrics/stats/totalcw/TotalcwRelayNetworkStatusVoteTest.java @@ -4,6 +4,7 @@ package org.torproject.metrics.stats.totalcw; import static junit.framework.TestCase.assertEquals; +import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertNull; import org.torproject.descriptor.Descriptor; @@ -35,19 +36,22 @@ public class TotalcwRelayNetworkStatusVoteTest { { "2018-10-15-00-00-00-vote-0232AF901C31A04EE9848595AF9BB7620D4C5B2E-" + "55A38ED50848BE1F13C6A35C3CA637B0D962C2EF.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "dannenberg", "0232AF901C31A04EE9848595AF9BB7620D4C5B2E", -1L }, + "dannenberg", "0232AF901C31A04EE9848595AF9BB7620D4C5B2E", null }, { "2018-10-15-00-00-00-vote-27102BC123E7AF1D4741AE047E160C91ADC76B21-" + "049AB3179B12DACC391F06A10C2A8904E4339D33.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "bastet", "27102BC123E7AF1D4741AE047E160C91ADC76B21", 138700L }, + "bastet", "27102BC123E7AF1D4741AE047E160C91ADC76B21", + new long[] { 13700L, 47220L, 17080L, 60700L } }, { "2018-10-15-00-00-00-vote-ED03BB616EB2F60BEC80151114BB25CEF515B226-" + "2669AD153408F88E416CE6206D1A75EC3324A2F4.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "gabelmoo", "ED03BB616EB2F60BEC80151114BB25CEF515B226", 133370L }, + "gabelmoo", "ED03BB616EB2F60BEC80151114BB25CEF515B226", + new long[] { 18020L, 43200L, 26150L, 46000L } }, { "2018-10-15-00-00-00-vote-EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97-" + "38C6A19F78948B689345EE41D7119D76246C4D3E.part", ZonedDateTime.parse("2018-10-15T00:00:00Z").toLocalDateTime(), - "Faravahar", "EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97", 158395L } + "Faravahar", "EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97", + new long[] { 17365L, 52030L, 35400L, 53600L } } }); } @@ -64,7 +68,7 @@ public class TotalcwRelayNetworkStatusVoteTest { public String expectedIdentityHex; @Parameter(4) - public long expectedMeasuredSum; + public long[] expectedMeasuredSums; @Test public void testParseVote() throws Exception { @@ -82,13 +86,13 @@ public class TotalcwRelayNetworkStatusVoteTest { sb.toString().getBytes(), new File(this.fileName), this.fileName)) { TotalcwRelayNetworkStatusVote parsedVote = new Parser() .parseRelayNetworkStatusVote((RelayNetworkStatusVote) descriptor); - if (this.expectedMeasuredSum < 0L) { + if (null == this.expectedMeasuredSums) { assertNull(parsedVote); } else { assertEquals(this.expectedValidAfter, parsedVote.validAfter); assertEquals(this.expectedNickname, parsedVote.nickname); assertEquals(this.expectedIdentityHex, parsedVote.identityHex); - assertEquals(this.expectedMeasuredSum, parsedVote.measuredSum); + assertArrayEquals(this.expectedMeasuredSums, parsedVote.measuredSums); } } }

1 0

[metrics-web/release] Use readr to speed up drawing graphs.
by karsten＠torproject.org 09 Nov '19

09 Nov '19

commit 2c44721c9ab903183558b92d7a4e17674fcb79be Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Mon Dec 17 21:03:16 2018 +0100 Use readr to speed up drawing graphs. Over two years ago, in commit 1f90b72 from October 2016, we made our user graphs faster by avoiding to read the large .csv file on demand. Instead we read it once as part of the daily update, saved it to disk as .RData file using R's save() function, and loaded it back to memory using R's load() function when drawing a graph. This approach worked okay. It just had two disadvantages: 1. We had to write a small amount of R code for each graph type, which is why we only did it for graphs with large .csv files. 2. Running these small R script as part of the daily update made it harder to move away from Ant towards a Java-only execution model. The new approach implemented in this commit uses read_csv() fromt the readr package which reads CSV files several times faster than read.csv(). Requires installing the readr package from CRAN, which is available on Debian in stretch-backports and later as r-cran-readr. Implements #28799. --- build.xml | 14 --- src/main/R/clients/split-clients.R | 12 --- src/main/R/rserver/graphs.R | 169 +++++++++++++++++++++++++++++-------- src/main/R/rserver/rserve-init.R | 1 + src/main/R/webstats/write-RData.R | 16 ---- 5 files changed, 136 insertions(+), 76 deletions(-) diff --git a/build.xml b/build.xml index 89c8b31..250417e 100644 --- a/build.xml +++ b/build.xml @@ -362,8 +362,6 @@ <property name="module.name" value="clients" /> <property name="localmoddir" value="${modulebase}/${module.name}" /> - <property name="rdatadir" value="${localmoddir}/RData" /> - <mkdir dir="${rdatadir}" /> <property name="statsdir" value="${localmoddir}/stats" /> <mkdir dir="${statsdir}" /> @@ -410,10 +408,6 @@ <copy file="${localmoddir}/clients.csv" todir="${statsdir}" /> <copy file="${localmoddir}/userstats-combined.csv" todir="${statsdir}" /> - - <antcall target="run-R" > - <param name="module.Rscript" value="split-clients.R" /> - </antcall> </target> <target name="servers" > @@ -426,13 +420,7 @@ <target name="webstats" > <property name="module.name" value="webstats" /> - <property name="rdatadir" value="${modulebase}/${module.name}/RData" /> - <mkdir dir="${rdatadir}" /> - <antcall target="run-java" /> - <antcall target="run-R" > - <param name="module.Rscript" value="write-RData.R" /> - </antcall> </target> <target name="totalcw" > @@ -482,8 +470,6 @@ <fileset dir="${modulebase}/totalcw/stats" includes="totalcw.csv" /> </copy> <copy todir="${rdatadir}" > - <fileset dir="${modulebase}/clients/RData" includes="*.RData" /> - <fileset dir="${modulebase}/webstats/RData" includes="*.RData" /> <fileset dir="${resources}/web/images/" includes="no-data-available.*" /> </copy> </target> diff --git a/src/main/R/clients/split-clients.R b/src/main/R/clients/split-clients.R deleted file mode 100644 index 9f80902..0000000 --- a/src/main/R/clients/split-clients.R +++ /dev/null @@ -1,12 +0,0 @@ -dir.create("RData", showWarnings = FALSE) - -c <- read.csv("clients.csv", stringsAsFactors = FALSE) -data <- c[c$node == 'relay', !(names(c) %in% c("node"))] -save(data, file = "RData/clients-relay.RData") -data <- c[c$node == 'bridge', !(names(c) %in% c("node"))] -save(data, file = "RData/clients-bridge.RData") - -u <- read.csv("userstats-combined.csv", stringsAsFactors = FALSE) -data <- u[, !(names(u) %in% c("node", "version"))] -save(data, file = "RData/userstats-bridge-combined.RData") - diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index 7501a95..e541c30 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -348,6 +348,9 @@ robust_call <- function(wrappee, filename) { }) } +# Disable readr's automatic progress bar. +options(readr.show_progress = FALSE) + prepare_networksize <- function(start_p, end_p) { read.csv(paste(stats_dir, "networksize.csv", sep = ""), colClasses = c("date" = "Date")) %>% @@ -863,8 +866,19 @@ write_bandwidth_flags <- function(start_p = NULL, end_p = NULL, path_p) { plot_userstats <- function(start_p, end_p, node_p, variable_p, value_p, events_p, path_p) { - load(paste(rdata_dir, "clients-", node_p, ".RData", sep = "")) - c <- data + c <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_skip()), + na = character()) %>% + filter(node == node_p) u <- c[c$date >= start_p & c$date <= end_p, c("date", "country", "transport", "version", "lower", "upper", "clients")] u <- rbind(u, data.frame(date = start_p, @@ -1011,14 +1025,24 @@ plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { write_userstats_relay_country <- function(start_p = NULL, end_p = NULL, country_p = NULL, events_p = NULL, path_p) { - load(paste(rdata_dir, "clients-relay.RData", sep = "")) - u <- data %>% + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "relay") %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(country_p)) country == ifelse(country_p == "all", "", country_p) else TRUE) %>% - filter(transport == "") %>% - filter(version == "") %>% + filter(is.na(transport)) %>% + filter(is.na(version)) %>% select(date, country, clients, lower, upper, frac) %>% rename(users = clients) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") @@ -1026,14 +1050,24 @@ write_userstats_relay_country <- function(start_p = NULL, end_p = NULL, write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, country_p = NULL, path_p) { - load(paste(rdata_dir, "clients-bridge.RData", sep = "")) - data %>% + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "bridge") %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(country_p)) country == ifelse(country_p == "all", "", country_p) else TRUE) %>% - filter(transport == "") %>% - filter(version == "") %>% + filter(is.na(transport)) %>% + filter(is.na(version)) %>% select(date, country, clients, frac) %>% rename(users = clients) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") @@ -1041,13 +1075,23 @@ write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, transport_p = NULL, path_p) { - load(paste(rdata_dir, "clients-bridge.RData", sep = "")) - u <- data %>% + u <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "bridge") %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(country == "") %>% - filter(version == "") %>% - filter(transport != "") %>% + filter(is.na(country)) %>% + filter(is.na(version)) %>% + filter(!is.na(transport)) %>% select(date, transport, clients, frac) if (is.null(transport_p) || "!<OR>" %in% transport_p) { n <- u %>% @@ -1068,12 +1112,22 @@ write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, version_p = NULL, path_p) { - load(paste(rdata_dir, "clients-bridge.RData", sep = "")) - data %>% + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "bridge") %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(country == "") %>% - filter(transport == "") %>% + filter(is.na(country)) %>% + filter(is.na(transport)) %>% filter(if (!is.null(version_p)) version == version_p else TRUE) %>% select(date, version, clients, frac) %>% rename(users = clients) %>% @@ -1081,8 +1135,16 @@ write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, } prepare_userstats_bridge_combined <- function(start_p, end_p, country_p) { - load(paste(rdata_dir, "userstats-bridge-combined.RData", sep = "")) - data %>% + read_csv(file = paste(stats_dir, "userstats-combined.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_skip(), + country = col_character(), + transport = col_character(), + version = col_skip(), + frac = col_double(), + low = col_double(), + high = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(country_p)) country == country_p else TRUE) @@ -1135,7 +1197,7 @@ prepare_advbwdist_perc <- function(start_p, end_p, p_p) { filter(if (!is.null(p_p)) percentile %in% as.numeric(p_p) else percentile != "") %>% transmute(date, percentile = as.factor(percentile), - variable = ifelse(isexit != "t", "all", "exits"), + variable = ifelse(is.na(isexit), "all", "exits"), advbw = advbw * 8 / 1e9) } @@ -1258,11 +1320,20 @@ write_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL, } prepare_webstats_tb <- function(start_p, end_p) { - load(paste(rdata_dir, "webstats-tb.RData", sep = "")) - data %>% + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(), + platform = col_skip(), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - mutate(request_type = factor(request_type)) + filter(request_type %in% c("tbid", "tbsd", "tbup", "tbur")) %>% + group_by(log_date, request_type) %>% + summarize(count = sum(count)) } plot_webstats_tb <- function(start_p, end_p, path_p) { @@ -1296,8 +1367,15 @@ write_webstats_tb <- function(start_p = NULL, end_p = NULL, path_p) { } prepare_webstats_tb_platform <- function(start_p, end_p) { - read.csv(paste(stats_dir, "webstats.csv", sep = ""), - colClasses = c("log_date" = "Date")) %>% + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(), + platform = col_factor(), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% filter(request_type %in% c("tbid", "tbup")) %>% @@ -1337,8 +1415,15 @@ write_webstats_tb_platform <- function(start_p = NULL, end_p = NULL, path_p) { } plot_webstats_tb_locale <- function(start_p, end_p, path_p) { - d <- read.csv(paste(stats_dir, "webstats.csv", sep = ""), - colClasses = c("log_date" = "Date", "locale" = "character")) + d <- read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(), + platform = col_skip(), + channel = col_skip(), + locale = col_factor(), + incremental = col_skip(), + count = col_double())) d <- d[d$log_date >= start_p & d$log_date <= end_p & d$request_type %in% c("tbid", "tbup"), ] levels(d$request_type) <- list( @@ -1375,8 +1460,15 @@ plot_webstats_tb_locale <- function(start_p, end_p, path_p) { # plot_webstats_tb_locale needs the preliminary data frame e for its # breaks and labels. Left as future work. write_webstats_tb_locale <- function(start_p = NULL, end_p = NULL, path_p) { - read.csv(paste(stats_dir, "webstats.csv", sep = ""), - colClasses = c("log_date" = "Date", "locale" = "character")) %>% + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(), + platform = col_skip(), + channel = col_skip(), + locale = col_factor(), + incremental = col_skip(), + count = col_double())) %>% filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% filter(request_type %in% c("tbid", "tbup")) %>% @@ -1390,11 +1482,20 @@ write_webstats_tb_locale <- function(start_p = NULL, end_p = NULL, path_p) { } prepare_webstats_tm <- function(start_p, end_p) { - load(paste(rdata_dir, "webstats-tm.RData", sep = "")) - data %>% + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(), + platform = col_skip(), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - mutate(request_type = factor(request_type)) + filter(request_type %in% c("tmid", "tmup")) %>% + group_by(log_date, request_type) %>% + summarize(count = sum(count)) } plot_webstats_tm <- function(start_p, end_p, path_p) { diff --git a/src/main/R/rserver/rserve-init.R b/src/main/R/rserver/rserve-init.R index b9a1d3b..f160698 100644 --- a/src/main/R/rserver/rserve-init.R +++ b/src/main/R/rserver/rserve-init.R @@ -5,6 +5,7 @@ library("RColorBrewer") library("scales") library(dplyr) library(tidyr) +library(readr) source('graphs.R') source('tables.R') diff --git a/src/main/R/webstats/write-RData.R b/src/main/R/webstats/write-RData.R deleted file mode 100644 index 96cc840..0000000 --- a/src/main/R/webstats/write-RData.R +++ /dev/null @@ -1,16 +0,0 @@ -dir.create("RData", showWarnings = FALSE) - -d <- read.csv("stats/webstats.csv", stringsAsFactors = FALSE) -d <- d[d$request_type %in% c('tbid', 'tbsd', 'tbup', 'tbur'), ] -data <- aggregate(list(count = d$count), - by = list(log_date = as.Date(d$log_date), request_type = d$request_type), - FUN = sum) -save(data, file = "RData/webstats-tb.RData") - -d <- read.csv("stats/webstats.csv", stringsAsFactors = FALSE) -d <- d[d$request_type %in% c('tmid', 'tmup'), ] -data <- aggregate(list(count = d$count), - by = list(log_date = as.Date(d$log_date), request_type = d$request_type), - FUN = sum) -save(data, file = "RData/webstats-tm.RData") -

1 0