commit 46d5c6489db728676b1993be3515763d1459e046 Author: iwakeh iwakeh@torproject.org Date: Tue Jan 3 17:35:28 2017 +0100
squash! Implements task-20596: use metrics-base and reduced build.xml, added executable bootstrap script. Removed obsolete DESIGN document and metrics_checks.xml. --- DESIGN | 157 -------------------- build.xml | 2 +- src/main/resources/bootstrap-development.sh | 0 src/test/resources/metrics_checks.xml | 219 ---------------------------- 4 files changed, 1 insertion(+), 377 deletions(-)
diff --git a/DESIGN b/DESIGN deleted file mode 100644 index 6afbdb3..0000000 --- a/DESIGN +++ /dev/null @@ -1,157 +0,0 @@ -Onionoo design document -======================= - -This short document describes Onionoo's design in a mostly informal and -language-independent way. The goal is to be able to discuss design -decisions with non-Java programmers and to provide a blueprint for porting -Onionoo to other programming languages. This document cannot describe all -the details, but it can provide a rough overview. - -There are two main building blocks of Onionoo that are described here: - - 1) an hourly cronjob processing newly published Tor descriptors and - - 2) a web service component answering client requests. - -The interface between the two building blocks is a directory in the local -file system that can be read and written by component 1 and can be read by -component 2. In theory, the two components can be implemented in two -entirely different programming languages. In a possible port from Java to -another programming language, the two components can easily be ported -subsequently. - -The purpose of the hourly batch processor is to read updated Tor -descriptors from the metrics service and transform them to be read by the -web service component. Answering a client request in component 2 of -Onionoo needs to be highly efficient which is why any data aggregation -needs to happen beforehand. Parsing descriptors on-the-fly is not an -option. - -The hourly batch processor is run in a cron job at :15 every hour that -usually takes up to five minutes and that contains the following substeps: - - 1.1) Rsync new Tor descriptors from metrics. - - 1.2) Read previously stored status data about relays and bridges that - have been running in the last seven days to memory. These data - include for each relay or bridge: nickname, fingerprint, primary - OR address and port, additional OR addresses and ports, exit - addresses, network status publication time, directory port, relay - flags, consensus weight, country code, host name as obtained by - reverse domain name lookup, and timestamp of last reverse domain - name lookup. - - 1.3) Import any new relay network status consensuses that have been - published since the last run. - - 1.4) Set the running bit for all relays that are contained in the last - known relay network status consensus. - - 1.5) Look up all relay IP addresses in a local GeoIP database and in a - local AS number database. Extract country codes and names, city - names, geo coordinates, AS name and number, etc. - - 1.6) Import any new bridge network statuses that have been published - since the last run. - - 1.7) Start reverse domain name lookups for all relay IP addresses. Run - in background, only refresh lookups for previously looked up IP - address every 12 hours, run up to five lookups in parallel, and - set timeouts for single requests and for the general lookup - process. In theory, this step could happen a few steps before, - but not before step 1.3. - - 1.8) Import any new relay server descriptors that have been published - since the last run. - - 1.9) Import any new exit lists that have been published since the last - run. - - 1.10) Import any new bridge server descriptors that have been published - since the last run. - - 1.11) Import any new bridge pool assignments that have been published - since the last run. - - 1.12) Make sure that reverse domain name lookups are finished or the - timeout for running lookups has expired. This step cannot happen - at any time later than step 1.13 and shouldn't happen long before. - - 1.13) Rewrite all details files that have changed. Details files - combine information from all previously imported descriptory - types, database lookups, and performed reverse domain name - lookups. The web service component needs to be able to retrieve a - details file for a given relay or bridge without grabbing - information from different data sources. It's best to write the - details file part for a give relay or bridge to a single file in - the target JSON format, saved under the relay's or bridge's - fingerprint. If a database is used, the raw string should be - saved for faster processing. - - 1.14) Import relays' and bridges' bandwidth histories from extra-info - descriptors that have been published since the last run. There - must be internally stored bandwidth histories for each relay and - bridge, regardless of whether they have been running in the last - seven days. The original bandwidth histories, which are available - on 15-minute detail, can be aggregated to longer time periods the - farther the interval lies in the past. The interal bandwidth - histories are different from the bandwidth files described in 1.15 - which are written to be given out to clients. - - 1.15) Rewrite bandwidth files that have changed. Bandwidth files - aggregate bandwidth history information on varying levels of - detail, depending on how far observations lie in the past. It's - inevitable to write JSON-formatted bandwidth files for all relays - and bridges in the hourly cronjob. Any attempts to process years - of bandwidth data while answering a web request can only fail. - The previously aggregated bandwidth files are stored under the - relay's or bridge's fingerprint for quick lookup. - - 1.16) Update the summary file listing all relays and bridges that have - been running in the last seven days which was previously read in - step 1.2. This is the last step in the hourly process. The web - service component checks the modification time of this file to - decide whether it needs to reload its view on the network. If - this step was not the last step, the web service component might - list relays or bridges for which there are no details or bandwidth - files available yet. (With the approach taken here, it's - conveivable that a bandwidth file of a relay or bridge that hasn't - been running for a week has been deleted before step 1.16. This - case has been found acceptable, because it's highly unlikely. If - a database would have been used, steps 1.2 to 1.16 would have - happened in a single database transaction.) - -The web service component has the purpose of answering client requests. -It uses previously prepared data from the hourly cronjob to respond to -requests very quickly. - -During initialization, or whenever the hourly cronjob has finished, the -web service component does the following substeps: - - 2.1) Read the summary file that was produced by the hourly cronjob in - step 1.16. - - 2.2) Keep the list of relays and bridges in memory, including all - information that is used for filtering or sorting results. - - 2.3) Prepare summary lines for all relays and bridges. The summary - resource is a JSON file with a single line per relay or bridge. - This line contains only very few fields as compared to details - files that a client might use for further filtering results. - -When responding to a request, the web service component does the following -steps: - - 2.4) Parse request and its parameters. - - 2.5) Possibly filter relays and bridges. - - 2.6) Possibly re-order and limit results. - - 2.7) Write response or error code. - -Again, (and this can hardly be overstated!) steps 2.4 to 2.7 need to -happen *extremely* fast. Any steps that go beyond file system reads or -simple database lookups need to happen either in the hourly cronjob (1.1 -to 1.16) or in the web service component initialization (2.1 to 2.3). - diff --git a/build.xml b/build.xml index 5f1d798..26620d8 100644 --- a/build.xml +++ b/build.xml @@ -12,7 +12,7 @@ <property name="release.version" value="${onionoo.protocol.version}-1.0.1-dev"/> <property name="descriptorversion" value="1.5.0"/> - <property name="jetty.version" value="" /> + <property name="jetty.version" value="-8.1.16.v20140903" /> <property name="warfile" value="onionoo-${release.version}.war"/>
diff --git a/src/main/resources/bootstrap-development.sh b/src/main/resources/bootstrap-development.sh old mode 100644 new mode 100755 diff --git a/src/test/resources/metrics_checks.xml b/src/test/resources/metrics_checks.xml deleted file mode 100644 index 6ba415a..0000000 --- a/src/test/resources/metrics_checks.xml +++ /dev/null @@ -1,219 +0,0 @@ -<?xml version="1.0"?> -<!DOCTYPE module PUBLIC - "-//Puppy Crawl//DTD Check Configuration 1.3//EN" - "http://www.puppycrawl.com/dtds/configuration_1_3.dtd"> - -<!-- - Checkstyle configuration that checks the Google coding conventions from Google Java Style - that can be found at https://google.github.io/styleguide/javaguide.html with the following - modifications: - - - Replaced com.google with org.torproject in import statement ordering - [CustomImportOrder]. - - - Relaxed requirement that catch parameters must be at least two - characters long [CatchParameterName]. - - Checkstyle is very configurable. Be sure to read the documentation at - http://checkstyle.sf.net (or in your downloaded distribution). - - To completely disable a check, just comment it out or delete it from the file. - - Authors: Max Vetrenko, Ruslan Diachenko, Roman Ivanov. - --> - -<module name = "Checker"> - <property name="charset" value="UTF-8"/> - - <property name="severity" value="warning"/> - - <property name="fileExtensions" value="java, properties, xml"/> - <!-- Checks for whitespace --> - <!-- See http://checkstyle.sf.net/config_whitespace.html --> - <module name="FileTabCharacter"> - <property name="eachLine" value="true"/> - </module> - - <module name="SuppressWarningsFilter" /> - <module name="TreeWalker"> - <module name="OuterTypeFilename"/> - <module name="IllegalTokenText"> - <property name="tokens" value="STRING_LITERAL, CHAR_LITERAL"/> - <property name="format" value="\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|22|27|5(C|c))|\\(0(10|11|12|14|15|42|47)|134)"/> - <property name="message" value="Avoid using corresponding octal or Unicode escape."/> - </module> - <module name="AvoidEscapedUnicodeCharacters"> - <property name="allowEscapesForControlCharacters" value="true"/> - <property name="allowByTailComment" value="true"/> - <property name="allowNonPrintableEscapes" value="true"/> - </module> - <module name="LineLength"> - <property name="max" value="80"/> - <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/> - </module> - <module name="AvoidStarImport"/> - <module name="OneTopLevelClass"/> - <module name="NoLineWrap"/> - <module name="EmptyBlock"> - <property name="option" value="TEXT"/> - <property name="tokens" value="LITERAL_TRY, LITERAL_FINALLY, LITERAL_IF, LITERAL_ELSE, LITERAL_SWITCH"/> - </module> - <module name="NeedBraces"/> - <module name="LeftCurly"> - <property name="maxLineLength" value="100"/> - </module> - <module name="RightCurly"/> - <module name="RightCurly"> - <property name="option" value="alone"/> - <property name="tokens" value="CLASS_DEF, METHOD_DEF, CTOR_DEF, LITERAL_FOR, LITERAL_WHILE, LITERAL_DO, STATIC_INIT, INSTANCE_INIT"/> - </module> - <module name="WhitespaceAround"> - <property name="allowEmptyConstructors" value="true"/> - <property name="allowEmptyMethods" value="true"/> - <property name="allowEmptyTypes" value="true"/> - <property name="allowEmptyLoops" value="true"/> - <message key="ws.notFollowed" - value="WhitespaceAround: ''{0}'' is not followed by whitespace. Empty blocks may only be represented as '{}' when not part of a multi-block statement (4.1.3)"/> - <message key="ws.notPreceded" - value="WhitespaceAround: ''{0}'' is not preceded with whitespace."/> - </module> - <module name="OneStatementPerLine"/> - <module name="MultipleVariableDeclarations"/> - <module name="ArrayTypeStyle"/> - <module name="MissingSwitchDefault"/> - <module name="FallThrough"/> - <module name="UpperEll"/> - <module name="ModifierOrder"/> - <module name="EmptyLineSeparator"> - <property name="allowNoEmptyLineBetweenFields" value="true"/> - </module> - <module name="SeparatorWrap"> - <property name="tokens" value="DOT"/> - <property name="option" value="nl"/> - </module> - <module name="SeparatorWrap"> - <property name="tokens" value="COMMA"/> - <property name="option" value="EOL"/> - </module> - <module name="PackageName"> - <property name="format" value="^[a-z]+(\.[a-z][a-z0-9]*)*$"/> - <message key="name.invalidPattern" - value="Package name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="TypeName"> - <message key="name.invalidPattern" - value="Type name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="MemberName"> - <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/> - <message key="name.invalidPattern" - value="Member name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="ParameterName"> - <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/> - <message key="name.invalidPattern" - value="Parameter name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="CatchParameterName"> - <property name="format" value="^[a-z][a-zA-Z0-9]*$"/> - <message key="name.invalidPattern" - value="Catch parameter name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="LocalVariableName"> - <property name="tokens" value="VARIABLE_DEF"/> - <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/> - <property name="allowOneCharVarInForLoop" value="true"/> - <message key="name.invalidPattern" - value="Local variable name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="ClassTypeParameterName"> - <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/> - <message key="name.invalidPattern" - value="Class type name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="MethodTypeParameterName"> - <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/> - <message key="name.invalidPattern" - value="Method type name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="InterfaceTypeParameterName"> - <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/> - <message key="name.invalidPattern" - value="Interface type name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="NoFinalizer"/> - <module name="GenericWhitespace"> - <message key="ws.followed" - value="GenericWhitespace ''{0}'' is followed by whitespace."/> - <message key="ws.preceded" - value="GenericWhitespace ''{0}'' is preceded with whitespace."/> - <message key="ws.illegalFollow" - value="GenericWhitespace ''{0}'' should followed by whitespace."/> - <message key="ws.notPreceded" - value="GenericWhitespace ''{0}'' is not preceded with whitespace."/> - </module> - <module name="Indentation"> - <property name="basicOffset" value="2"/> - <property name="braceAdjustment" value="0"/> - <property name="caseIndent" value="2"/> - <property name="throwsIndent" value="4"/> - <property name="lineWrappingIndentation" value="4"/> - <property name="arrayInitIndent" value="2"/> - </module> - <module name="AbbreviationAsWordInName"> - <property name="ignoreFinal" value="false"/> - <property name="allowedAbbreviationLength" value="1"/> - </module> - <module name="OverloadMethodsDeclarationOrder"/> - <module name="VariableDeclarationUsageDistance"/> - <module name="CustomImportOrder"> - <property name="specialImportsRegExp" value="org.torproject"/> - <property name="sortImportsInGroupAlphabetically" value="true"/> - <property name="customImportOrderRules" value="STATIC###SPECIAL_IMPORTS###THIRD_PARTY_PACKAGE###STANDARD_JAVA_PACKAGE"/> - </module> - <module name="MethodParamPad"/> - <module name="OperatorWrap"> - <property name="option" value="NL"/> - <property name="tokens" value="BAND, BOR, BSR, BXOR, DIV, EQUAL, GE, GT, LAND, LE, LITERAL_INSTANCEOF, LOR, LT, MINUS, MOD, NOT_EQUAL, PLUS, QUESTION, SL, SR, STAR "/> - </module> - <module name="AnnotationLocation"> - <property name="tokens" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF"/> - </module> - <module name="AnnotationLocation"> - <property name="tokens" value="VARIABLE_DEF"/> - <property name="allowSamelineMultipleAnnotations" value="true"/> - </module> - <module name="NonEmptyAtclauseDescription"/> - <module name="JavadocTagContinuationIndentation"/> - <module name="SummaryJavadoc"> - <property name="forbiddenSummaryFragments" value="^@return the *|^This method returns |^A [{]@code [a-zA-Z0-9]+[}]( is a )"/> - </module> - <module name="JavadocParagraph"/> - <module name="AtclauseOrder"> - <property name="tagOrder" value="@param, @return, @throws, @deprecated"/> - <property name="target" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF, VARIABLE_DEF"/> - </module> - <module name="JavadocMethod"> - <property name="scope" value="public"/> - <property name="allowMissingParamTags" value="true"/> - <property name="allowMissingThrowsTags" value="true"/> - <property name="allowMissingReturnTag" value="true"/> - <property name="minLineCount" value="2"/> - <property name="allowedAnnotations" value="Override, Test"/> - <property name="allowThrowsTagsForSubclasses" value="true"/> - </module> - <module name="MethodName"> - <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9_]*$"/> - <message key="name.invalidPattern" - value="Method name ''{0}'' must match pattern ''{1}''."/> - </module> - <module name="SingleLineJavadoc"> - <property name="ignoreInlineTags" value="false"/> - </module> - <module name="EmptyCatchBlock"> - <property name="exceptionVariableName" value="expected"/> - </module> - <module name="CommentsIndentation"/> - <module name="SuppressWarningsHolder" /> - </module> -</module>
tor-commits@lists.torproject.org