[tor-commits] [metrics-web/master] Write an actually useful README.

karsten at torproject.org karsten at torproject.org
Wed Jun 8 12:46:45 UTC 2011


commit bf314c1aa2900eced290701c6664f09cc73a8825
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Wed Jun 8 14:43:33 2011 +0200

    Write an actually useful README.
---
 README         |  527 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 doc/rserve.pdf |  Bin 99059 -> 0 bytes
 doc/rserve.tex |  140 ---------------
 3 files changed, 523 insertions(+), 144 deletions(-)

diff --git a/README b/README
index 93315ee..0786852 100644
--- a/README
+++ b/README
@@ -1,7 +1,526 @@
-ERNIE is the Enhanced R-based tor Network Intelligence Engine
-         (why ERNIE? because nobody liked BIRT; sorry for misspelling Tor)
+Tor Metrics Database and Website
+================================
 
---------------------------------------------------------------------------
+The metrics database stores publicly available data about the Tor network
+which are visualized by the metrics website.
 
-Please find documentation on ERNIE in website/ernie-howto.html and doc/ .
+This software package, metrics-web, contains (1) the code to import Tor
+network data into a database, (2) the code to generate graphs and .CSV
+output, and (3) the code for a dynamic web application.  metrics-web is
+based on Java, Ant, PostgreSQL, R, Apache HTTP Server, and Apache Tomcat.
+
+This README explains all necessary steps to install metrics-web including
+the database (Section 1), the graphing engine (Section 2), and the web
+application (Section 3).  It is possible to install only the database part
+or only the database and the graphing engine, if desired.
+
+
+1. Installing the metrics database
+==================================
+
+The metrics database contains data about the Tor Network coming from
+different sources, including the Tor directory authorities, Torperf
+performance measurement installations, the GetTor software package
+delivery service, and others.
+
+
+1.1. Preparing the operating system
+-----------------------------------
+
+This README describes the steps for installing metrics-web on a Debian
+GNU/Linux Squeeze server.  Instructions for other operating systems may
+vary.
+
+In the following it is assumed that root privileges are available.
+Commands requiring root privileges will be prefixed with # below.
+
+Start by adding a metrics user that will be used to execute all commands
+that do not require root privileges.  These commands will be prefixed with
+$ below.
+
+# adduser metrics
+
+The database importer and website sources will be installed in
+/srv/metrics-web/ that is created as follows:
+
+# mkdir /srv/metrics-web/
+# chmod g+ws /srv/metrics-web/
+# chown metrics:metrics /srv/metrics-web/
+
+Either extract the metrics-web source tarball...
+
+$ tar xf metrics-web-x.y.z.tar /srv/metrics-web/
+
+... or clone the metrics-web Git repository:
+
+$ git clone git://git.torproject.org/metrics-web /srv/metrics-web/
+
+Install Sun Java 6, Ant 1.8, and PostgreSQL 8.4 that are necessary for
+setting up the metrics database (be sure to include Debian's non-free
+repository in /etc/apt/sources.list).
+
+# apt-get install sun-java6-jdk ant postgresql-8.4
+
+Make Sun's Java the default.
+
+# update-java-alternatives -s java-6-sun
+
+Check the versions of the newly installed tools.
+
+$ java -version
+java version "1.6.0_24"
+Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
+Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
+
+$ ant -version
+Apache Ant version 1.8.0 compiled on March 11 2010
+
+$ psql --version
+psql (PostgreSQL) 8.4.7
+contains support for command-line editing
+
+
+1.2. Configuring the database
+=============================
+
+The first step in setting up the metrics database is to configure the
+PostgreSQL database and import a database schema.
+
+Start by creating a new metrics database user.  There is no need to give
+the metrics user superuser privileges or allow it to create databases or
+new roles.
+
+# sudo -u postgres createuser -P metrics
+
+Create a new database tordir owned by user metrics.
+
+# sudo -u postgres createdb -O metrics tordir
+
+Import the metrics database schema.
+
+$ psql -f /srv/metrics-web/db/tordir.sql tordir
+
+Confirm that the database now contains tables to hold metrics data.  In
+the following, => will be used as the database prompt.
+
+$ psql tordir
+=> \dt+
+=> \q
+
+
+1.3. Importing relay descriptor tarballs
+========================================
+
+In most cases it makes sense to populate the metrics database with
+archived relay descriptors from the official metrics website.
+
+Download the relay descriptor tarballs from the metrics website at
+https://metrics.torproject.org/data.html#relaydesc and extract them to
+/srv/metrics-web/archives/ .  The database importer can process v3 votes,
+v3 consensuses, server descriptors, and extra-infos.
+
+Edit the config file ~/metrics-web/config (or create it if it's not there)
+to contain the following five lines (be sure to remove the linebreak in
+the line defining the JDBC string and insert the real password there):
+
+ImportDirectoryArchives 1
+DirectoryArchivesDirectory archives/
+KeepDirectoryArchiveImportHistory 1
+WriteRelayDescriptorDatabase 1
+RelayDescriptorDatabaseJDBC
+    jdbc:postgresql://localhost/tordir?user=metrics&password=password
+
+Compile and run the Java database importer.
+
+$ cd /srv/metrics-web/
+$ ./run.sh
+
+The database import will take a while.  Once it's complete, check that the
+database tables now contain metrics data:
+
+$ psql tordir
+=> \dt+
+=> \q
+
+It's safe to delete the relay descriptor files in ~/metrics-web/archives/
+once they are imported.
+
+An alternative to importing relay descriptor tarballs directly into the
+database is to convert them into a data format that psql's \copy command
+can process.  Look for the config option WriteRelayDescriptorsRawFiles in
+/srv/metrics-web/config.template for more information on this experimental
+feature.
+
+
+1.4. Importing relay descriptors from a local Tor data directory
+================================================================
+
+WARNING: The functions described in this section are not implemented yet!
+
+In a future version of metrics-web, the metrics database importer will be
+able to import the cached descriptors from a local Tor data directory.
+(A special case of importing descriptors from a continuously updated
+directory is when both metrics-db and metrics-web are run on the same
+machine, but this shouldn't be the general case.)
+
+Configure a local Tor client to fetch all known descriptors as early as
+possible by adding these config options to its torrc file:
+
+FetchUselessDescriptors 1
+FetchDirInfoExtraEarly 1
+
+Tell the metrics database importer where to find the cached descriptor
+files.  One way to achieve this is to add symbolic links to
+/srv/metrics-web/archives/ like this.  Tor's data directory is assumed to
+be /srv/tor/ here.
+
+$ cd /srv/metrics-web/archives/
+$ ln -s /srv/tor/cached-* .
+
+Add a crontab entry for the database importer to run once per hour:
+
+15 * * * * cd /srv/metrics-web/ && ./run.sh
+
+In a future version of metrics-web it may also be possible to update local
+relay descriptor tarballs from the official metrics server via rsync and
+import only the changes into the metrics database.  The idea is to simply
+rsync the data/ directory from the metrics server and have all information
+available.  But until this is implemented, the recommended way to keep the
+metrics website up-to-date would be the one described above in this
+section.
+
+
+1.5. Importing GeoIP information
+================================
+
+Some of the graphs require GeoIP information to resolve IP addresses to
+country codes.  This information is provided in MaxMind's GeoLite City
+database available at http://www.maxmind.com/app/geolitecity.
+
+Download and extract the two files GeoLiteCity-Location.csv and
+GeoLiteCity-Blocks.csv to /srv/metrics-web/.
+
+Import the two files into the metrics database.
+
+$ ant geoipdb
+
+Note that there is no easy way to update the GeoIP information in the
+metrics database yet.  The only way to do so is to manually delete and
+recreate the database table and import the new GeoIP database.
+
+
+1.6. Pre-calculating relay statistics
+=====================================
+
+The relay graphs on the metrics website rely on pre-calculated statistics
+in the metrics database.  These statistics are not calculated after every
+completed import, which would usually be once per hour.  In general it's
+sufficient to pre-calculate statistics 2 or 4 times a day.
+
+Calculate statistics manually after large imports (this may take a while):
+
+$ psql tordir -c 'SELECT * FROM refresh_all();'
+
+If the metrics database gets updated automatically, write a script and add
+a crontab entry for pre-calculating statistics every 6 or 12 hours.
+
+
+1.7. Generating network status information
+==========================================
+
+The metrics database importer can analyze the most recently parsed network
+status consensus for irregularities indicating problems with the directory
+authorities.  There are two possible outputs: the consensus-health page
+that can be found at https://metrics.torproject.org/consensus-health.html
+and a local file that can be parsed by Nagios that will be written to
+/srv/metrics-web/website/consensus-health .
+
+Edit /srv/metrics-web/config to contain either or both of the following
+options:
+
+WriteConsensusHealth 1
+WriteNagiosStatusFile 1
+
+
+1.8. Importing sanitized bridge descriptors
+===========================================
+
+The metrics database can store aggregate statistics about running bridges
+and bridge usage.  These statistics are added by parsing sanitized bridge
+descriptors available on the official metrics website.
+
+Download a sanitized bridge descriptor tarball from the metrics website at
+https://metrics.torproject.org/data.html#bridgedesc and extract it to,
+e.g., /srv/metrics-web/bridges/bridge-descriptors-2011-05/ .
+
+Edit /srv/metrics-web/config to contain the following options:
+
+ImportSanitizedBridges 1
+SanitizedBridgesDirectory bridges/
+KeepSanitizedBridgesImportHistory 1
+WriteBridgeStats 1
+
+Note that the bridge usage statistics require parsing relay descriptors of
+the same time period in order to filter bridges that have been running as
+relays from the results.  When parsing sanitized bridge descriptors for
+the first time it may be necessary to delete the relay descriptor import
+history in /srv/metrics-web/stats/archives-import-history and import all
+relay descriptors once again.
+
+Run the database import:
+
+$ ./run.sh
+
+
+1.9. Importing Torperf performance data
+=======================================
+
+Torperf measures the performance of the Tor network as users experience
+it.  Torperf's measurement data are available on the metrics website and
+can be imported into the metrics database, too.
+
+Download the Torperf measurement files from the metrics website at
+https://metrics.torproject.org/data.html#performance and put them in a
+subdirectory, e.g., /srv/metrics-web/torperf/ .
+
+Edit /srv/metrics-web/config to contain the following options:
+
+ImportWriteTorperfStats 1
+TorperfDirectory torperf/
+
+Run the database import:
+
+$ ./run.sh
+
+
+1.10. Importing GetTor statistics
+=================================
+
+WARNING: The GetTor statistics are not available for download yet, so that
+this section only applies to the official metrics website.
+
+GetTor is a software distribution service that allows users to fetch the
+Tor software via email.  GetTor produces daily statistics of requested
+packages that can be imported into the metrics database.
+
+Put the GetTor statistics file into /srv/metrics-web/gettor/ .
+
+Edit /srv/metrics-web/config to contain the following options:
+
+ProcessGetTorStats 1
+GetTorDirectory gettor/
+
+Run the database import:
+
+$ ./run.sh
+
+
+2. Installing the graphing engine
+=================================
+
+The metrics graphing engine generates custom graphs of Tor network data
+based on user-provided parameters.  The graphing engine requires the
+metrics database to be installed as described in the previous section.
+
+The graphing engine uses R and Rserve to generate its graphs.  Rserve is a
+TCP/IP server that makes it easy for other tools to use R without spawning
+their own R process.  Rserve also pre-loads R code and R libraries which
+saves time when processing user requests.
+
+In this configuration, Rserve will run in the context of the metrics user.
+
+Setting up the graphing engine requires installing PostgreSQL's header
+files and R 2.8 or higher.  R 2.8 or higher is required for the ggplot2
+library.
+
+# apt-get install libpq-dev r-base-dev
+
+Run R as user metrics and install required packages to ~/R/.  In the
+following, R commands will be prefixed with >.
+
+$ R
+> install.packages("Rserve")
+> install.packages("ggplot2")
+> install.packages("RPostgreSQL")
+> q()
+
+Start the Rserve daemon (the exact path of Rserve-bin.so may vary), check
+that it's working by connecting via telnet, and shut it down:
+
+$ R CMD ~/R/x86_64-pc-linux-gnu-library/2.11/Rserve/libs/Rserve-bin.so
+$ telnet 127.0.0.1 6311
+$ echo "library(Rserve); RSshutdown(RSconnect())" | R --slave
+
+Also check that a database connection can be established from within R
+(using the actual password instead of "password"):
+
+$ R
+> library(RPostgreSQL)
+> drv <- dbDriver("PostgreSQL")
+> con <- dbConnect(drv, user = "metrics", password = "password",
+    dbname = "tordir")
+> dbDisconnect(con)
+> dbUnloadDriver(drv)
+> q()
+
+Insert the database password in the Rserve initialization script in
+/srv/metrics-web/rserve/rserve-init.R.
+
+Update the workdir path in /srv/metrics-web/rserve/Rserv.conf .
+
+Start Rserve, this time with the metrics-web-specific configuration that
+includes pre-loading the graph code:
+
+$ cd /srv/metrics-web/rserve/ && ./start.sh
+
+Add a crontab entry to start Rserve on reboot:
+
+ at reboot cd /srv/metrics-web/rserve/ && ./start.sh
+
+Rserve will pre-load the graph code at startup.  If changes are made to
+the graph code, Rserve must be restarted:
+
+$ cd /srv/metrics-web/rserve/
+$ ./shutdown.sh && ./start.sh
+
+
+3. Installing the metrics website
+=================================
+
+The metrics website lets web users search parts of the metrics database
+and visualizes custom graphs.  Both the metrics database and the graphing
+engine are required to set up the metrics website as described in this
+section.
+
+Note that the description here has a few specific parts that only apply to
+the official metrics website.  These parts should be changed when setting
+up a non-official metrics website.
+
+
+3.1. Configuring Apache HTTP Server
+===================================
+
+The Apache HTTP Server is used as the front-end web server that serves
+static resources itself and forwards requests for dynamic resources to
+Apache Tomcat.
+
+Start by installing Apache 2:
+
+# apt-get install apache2
+
+Disable Apache's default site.
+
+# a2dissite default
+
+Enable mod_rewrite to tell Apache where to find static resources on disk.
+Also enable mod_proxy to forward requests to Tomcat.
+
+# a2enmod rewrite proxy_http
+
+Create a new virtual host configuration and store it in a new file
+/etc/apache2/sites-available/metrics.torproject.org with the following
+content:
+
+<VirtualHost *:80>
+  ServerName metrics.torproject.org
+  ServerAdmin torproject-admin at torproject.org
+  ErrorLog /var/log/apache2/error.log
+  CustomLog /var/log/apache2/access.log combined
+  ServerSignature On
+  <IfModule mod_rewrite.c>
+    RewriteEngine On
+    RewriteRule /(data|dist|papers)/(.*) /srv/metrics-web/$1/$2 [L]
+    RewriteRule /(consensus-health.html) /srv/metrics-web/website/$1 [L]
+  </IfModule>
+  <IfModule mod_proxy.c>
+    <Proxy *>
+      Order deny,allow
+      Allow from all
+    </Proxy>
+    ProxyPass / http://127.0.0.1:8080/ernie/ retry=15
+    ProxyPassReverse / http://127.0.0.1:8080/ernie/
+    ProxyPreserveHost on
+  </IfModule>
+</VirtualHost>
+
+Create the directories containing static resources: /srv/metrics-web/data/
+contains the tarballs and other metrics data linked from data.html.
+/srv/metrics-web/dist/ contains the software packages linked from
+tools.html.  /srv/metrics-web/papers/ contains the papers and technical
+reports linked from papers.html.  Note that there is no option not to
+serve these files other than manually removing the links from the .html
+pages.
+
+Enable the new virtual host.
+
+# a2ensite metrics.torproject.org
+
+Restart Apache just to be sure that all changes are effective.
+
+# /etc/init.d/apache2 restart
+
+
+3.2. Configuring Apache Tomcat
+==============================
+
+Apache Tomcat will process requests for dynamic resources, including web
+pages and graphs.
+
+Install Tomcat 6:
+
+# apt-get install tomcat6
+
+Replace Tomcat's default configuration in /etc/tomcat6/server.xml with the
+following configuration:
+
+<Server port="8005" shutdown="SHUTDOWN">
+  <Service name="Catalina">
+    <Connector port="8080" maxHttpHeaderSize="8192"
+               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
+               enableLookups="false" redirectPort="8443" acceptCount="100"
+               connectionTimeout="20000" disableUploadTimeout="true"
+               compression="off" compressionMinSize="2048"
+               noCompressionUserAgents="gozilla, traviata"
+               compressableMimeType="text/html,text/xml,text/plain" />
+    <Engine name="Catalina" defaultHost="yatei.torproject.org">
+      <Host name="metrics.torproject.org" appBase="webapps"
+            unpackWARs="true" autoDeploy="true"
+            xmlValidation="false" xmlNamespaceAware="false">
+        <Alias>yatei.torproject.org</Alias>
+        <Valve className="org.apache.catalina.valves.AccessLogValve"
+               directory="logs" prefix="metrics_access_log." suffix=".txt"
+               pattern="%l %u %t %r %s %b" resolveHosts="false"/>
+      </Host>
+    </Engine>
+  </Service>
+</Server>
+
+Be sure to replace *.torproject.org with something else, unless this is
+a re-installation of the official metrics website.
+
+Update the database password in /srv/metrics-web/etc/context.xml.
+
+Update the paths starting with /srv/metrics.torproject.org/ in
+/srv/metrics-web/etc/web.xml to the correct paths in /srv/metrics-web/.
+The default paths in that file are correct for the official metrics
+website setup which is slightly different than the one described here.
+
+Now generate the web application.
+
+$ ant make-war
+
+Create a symbolic link to the ernie.war file:
+
+# ln -s /srv/metrics-web/ernie.war /var/lib/tomcat6/webapps/
+
+Tomcat will now attempt to deploy the web application automatically.
+
+Whenever the metrics website needs to be redeployed, generate a new .war
+file and Tomcat will reload the web application automatically.
+
+Restart Tomcat to make all configuration changes effective:
+
+# /etc/init.d/tomcat6 restart
+
+The metrics website should now work.
 
diff --git a/doc/rserve.pdf b/doc/rserve.pdf
deleted file mode 100644
index 2adca75..0000000
Binary files a/doc/rserve.pdf and /dev/null differ
diff --git a/doc/rserve.tex b/doc/rserve.tex
deleted file mode 100644
index 317b475..0000000
--- a/doc/rserve.tex
+++ /dev/null
@@ -1,140 +0,0 @@
-\documentclass{article}
-\setlength{\parindent}{0in}
-\begin{document}
-\title{Tor Metrics - Rserve}
-\author{by Kevin Berry \texttt{<kevin.berry at villanova.edu>}}
-\maketitle
-\section{Overview}
-Rserve is a TCP/IP interface which allows other tools and languages to use
-the facilities of the R language. In our case, Java, Tomcat, Postgres and
-other parts of Metrics work with Rserve to generate graphs on-demand. For
-more information about the Metrics website, see the sister repository
-\emph{metrics-db}. Here, we will cover how to install R, Rserve, the R
-Postgres driver, and ggplot2. For more information about the Postgres
-setup, see manual.pdf. The database should be set up before continuing
-this.
-
-\section{Architecture}
-See the \emph{rserve} directory for the start/stop scripts and
-config. The graph code is all pre-loaded when Rserve starts, so, if any
-changes are made to the graph code, Rserve must be restarted. The database
-name, user, and password can be configured in \emph{rserve-init.R}, as well
-as the pre-loaded libraries. Rserve forks itself upon connection, so R code
-can be pre-loaded to speed things up.
-
-\section{Setup}
-\subsection{Installing R}
-Before we get started, we need to have R installed. We need to have the R
-dev package installed so we can use the add-ons.
-
-\begin{verbatim}
-$ sudo apt-get install r-base-dev
-\end{verbatim}
-
-\subsection{Installing and testing Rserve}
-There are a few different ways to install Rserve. However, the easiest and
-most direct way to install it is through R's built-in package manager and
-package network, CRAN (\emph{The Comprehensive R Archive Network -
-http://cran.r-project.org}). Unfortunately, Rserve isn't packaged currently
-for many Linux distributions, so it requires a bit manual configuration and
-administration.
-\\
-
-R needs to be started as root so its build-in package manager can access
-the file system to install its own packages. Select the mirror through the
-Tcl/tk or command line interface and it should install.
-
-\begin{verbatim}
-$ sudo R
-> install.packages("Rserve")
-\end{verbatim}
-
-We want to start a bare server and see if it works correctly.
-
-\begin{verbatim}
-$ R CMD Rserve
-\end{verbatim}
-
-Now, test it with the built-in R connector.
-
-\begin{verbatim}
-$ R
-> library(Rserve)
-> c <- RSconnect()
-> RSshutdown(c)
-\end{verbatim}
-
-If this worked, the server is listening, so it installed and started
-correctly.
-
-\subsection{Installing ggplot2}
-ggplot2 is the second necessary R package that we need for Metrics.
-
-\begin{verbatim}
-$ R
-> install.packages("ggplot2")
-\end{verbatim}
-
-\subsection{Installing and testing the R PostgresSQL driver}
-The Postgres driver installs similarly to the other R packages.
-
-\begin{verbatim}
-$ R
-> install.packages("RPostgreSQL")
-\end{verbatim}
-
-First, make sure Postgres is started and configured correctly according to
-the Metrics specifications (see manual.pdf).
-\\
-
-Start the R console, load the driver, and connect to the database. The
-database user, password may need to be changed.
-\begin{verbatim}
-$ R
-> library(RPostgreSQL)
-> drv <- dbDriver("PostgreSQL")
-> con <- dbConnect(drv, user="ernie", password="", dbname="tordir")
-> dbDisconnect(con)
-> dbUnloadDriver(drv)
-\end{verbatim}
-
-\section{Administrating Rserve}
-Since Rserve is not standardly packaged, a few things must be done to
-ensure it runs smoothly and securely. We need to adjust permissions, add
-users, and modify groups so it works nicely with Tomcat. Feel free to do
-this differenty according to your system's requirements.
-
-\subsection{Adding users and groups}
-We'll add the user 'rserve' without a shell and no home directory.
-
-\begin{verbatim}
-$ useradd rserve -s /bin/false -U
-\end{verbatim}
-
-Now, find the user id and group id of the rserve user, and edit them in
-rserve/Rserv.conf. This is so Rserve properly forks itself and runs as the
-correct user when it is started.
-
-\begin{verbatim}
-$ id rserve
-uid=1011(rserve) gid=1012(rserve) groups=1012(rserve)
-\end{verbatim}
-
-Next, we need to add the rserve user to the 'apache' group (The default
-user for Tomcat), so it can communicate correctly with Tomcat and have the
-necessary permissions for writing graphs. In this case, we will add apache
-to the rserve group.
-
-\begin{verbatim}
-$ usermod -a -G rserve apache
-\end{verbatim}
-
-\section{Start Rserve}
-Now we are ready to start Rserve! Run the script rserve/start.sh as ROOT
-(or else Rserve will not fork itself properly). Then, check the rserve log
-file (\emph{rserve.log}). The path to this log file can be changed by
-modifying the script. Do a "ps -ef | grep rserve" to see if it has started.
-Now, with Rserve installed and Postgres is running, Metrics is almost ready
-to start generating some graphs!
-
-\end{document}



More information about the tor-commits mailing list