[tech-reports/master] Add raw bridge-scaling report from 2012.

commit 30c684cfdf462f1d8c7170f279477f0fe5aa4c73 Author: Karsten Loesing <karsten.loesing@gmx.net> Date: Tue Aug 7 19:30:55 2012 +0200 Add raw bridge-scaling report from 2012. --- 2012/bridge-scaling/.gitignore | 3 + 2012/bridge-scaling/bridge-scaling-graph.pdf | Bin 0 -> 5906 bytes 2012/bridge-scaling/bridge-scaling.tex | 141 ++++++++++++++++++++++++++ 3 files changed, 144 insertions(+), 0 deletions(-) diff --git a/2012/bridge-scaling/.gitignore b/2012/bridge-scaling/.gitignore new file mode 100644 index 0000000..1eb7496 --- /dev/null +++ b/2012/bridge-scaling/.gitignore @@ -0,0 +1,3 @@ +bridge-scaling.pdf +bridge-scaling-2012-03-09.pdf + diff --git a/2012/bridge-scaling/bridge-scaling-graph.pdf b/2012/bridge-scaling/bridge-scaling-graph.pdf new file mode 100644 index 0000000..fc7cdbd Binary files /dev/null and b/2012/bridge-scaling/bridge-scaling-graph.pdf differ diff --git a/2012/bridge-scaling/bridge-scaling.tex b/2012/bridge-scaling/bridge-scaling.tex new file mode 100644 index 0000000..6da0964 --- /dev/null +++ b/2012/bridge-scaling/bridge-scaling.tex @@ -0,0 +1,141 @@ +\documentclass{article} +\usepackage{url} +\usepackage[pdftex]{graphicx} +\usepackage{graphics} +\usepackage{color} +\begin{document} +\title{What if the Tor network had 50,000 bridges?} +\author{Karsten Loesing\\{\tt karsten@torproject.org}} + +\maketitle + +\section{Introduction} + +The current bridge infrastructure relies on a central bridge authority to +collect, distribute, and publish bridge relay descriptors. +There are currently 1,000 bridges running in the Tor network.\footnote{% +\url{https://metrics.torproject.org/network.html#networksize}} +We believe the current infrastructure can handle up to 10,000 bridges. +Potential performance bottlenecks include: + +\begin{itemize} +\item the bridge authority Tonga, where all (public) bridges register and +which performs periodic reachability tests to confirm that bridges are +running, +\item BridgeDB, which stores currently running bridges and hands them out +to bridge users, and +\item metrics-db, which sanitizes bridge descriptors for later analysis +like statistics on daily connecting bridge users. +\end{itemize} + +\section{Load-testing BridgeDB and metrics-db} + +We started this analysis by writing a small tool to generate sample data +for BridgeDB and metrics-db to load-test them. +This tool takes the contents from one of Tonga's bridge tarball as input, +copies them a given number of times, and overwrites the first two bytes of +relay fingerprints in every copy with 0000, 0001, etc. +The tool also fixes references between network statuses, server +descriptors, and extra-info descriptors. +This is sufficient to trick BridgeDB and metrics-db into thinking that +bridges in the copies are distinct bridges. +We used the tool to generate tarballs with 2, 4, 8, 16, 32, and 64 times +as many bridge descriptors in them. + +In the next step we fed the tarballs into BridgeDB and metrics-db. +BridgeDB reads the network statuses and server descriptors from the latest +tarball and writes them to a local database. +metrics-db sanitizes two half-hourly created tarballs every hour, +establishes an internal mapping between descriptors, and writes sanitized +descriptors with fixed references to disk. +Figure~\ref{fig:bridgescaling} shows the results. + +\begin{figure}[t] +\includegraphics[width=\textwidth]{bridge-scaling-graph.pdf} +\caption{Results from load-testing BridgeDB and metrics-db} +\label{fig:bridgescaling} +\end{figure} + +The upper graph shows how the tarballs grow in size with more bridge +descriptors in them. +This growth is, unsurprisingly, linear. +One thing to keep in mind here is that bandwidth and storage requirements +to the hosts transferring and storing bridge tarballs are growing with the +tarballs. +We'll want to pay extra attention to disk space running out on those +hosts. +These tarballs have substantial overlap. +If we have tens of thousands of descriptors, we would want to get smarter +at sending diffs over to BridgeDB and metrics-db.\footnote{See comment at +\url{https://trac.torproject.org/projects/tor/ticket/4499#comment:7}} + +The middle graph shows how long BridgeDB takes to load descriptors from a +tarball. +This graph is linear, too, which indicates that BridgeDB can handle an +increase in the number of bridges pretty well. + +The lower graph shows how metrics-db can or cannot handle more bridges. +The growth is slightly worse than linear. +In any case, the absolute time required to handle 25K bridges is worrisome +(we didn't try 50K). +metrics-db runs in an hourly cronjob, and if that cronjob doesn't finish +within 1 hour, we cannot start the next run and will be missing some data. +We might have to sanitize bridge descriptors in a different thread or +process than the one that fetches all the other metrics data. +We can also look into other Java libraries to handle .gz-compressed files +that are faster than the one we're using. + +\section{Looking at concurrency in BridgeDB} + +While performing the load-test on BridgeDB we were wondering whether it +can serve client requests while loading bridges. +Turns out BridgeDB's interaction with users freezes while it's reading a +new set of data. +This isn't that much of a problem with a few hundred bridges and unlucky +clients having to wait 10 seconds for their bridges. +But it becomes a problem when BridgeDB is busy for a minute or two, twice +an hour. +We started discussing importing bridges into BridgeDB in a separate thread +and database transaction.\footnote{% +\url{https://trac.torproject.org/projects/tor/ticket/5232}} + +\section{Scalability of the bridge authority Tonga} + +We left out the most important part of this analysis: +can Tonga, or more generally, a single bridge authority handle this +increase in bridges? +Tonga still does a reachability test on each bridge every 21~minutes or so. +Eventually the number of TLS handshakes it's doing will overwhelm its +CPU.\footnote{% +\url{https://trac.torproject.org/projects/tor/ticket/4499#comment:7}} + +We're not sure how to test such a setting, or at least without running 50K +bridges in a private network. +We could imagine this requires some more sophisticated sample data +generation including getting the crypto right and then talking to Tonga's +DirPort. +We didn't find an easy way to test this. + +A possible fix would be to increase the reachability test interval from +21~minutes to some higher value. +A long-term fix would be to come up with a design that has more than one +single bridge authority. + +\section{Conclusion} + +In conclusion, we found that a massive increase in bridges in the Tor +network by a factor of 10 to 50 can be harmful to Tor's infrastructure. +We identified possible bottlenecks: Tonga's reachability test interval, +bridge tarball sizes for transfer between Tonga and BridgeDB/metrics-db, +loading bridges into BridgeDB, and sanitizing bridges in metrics-db. + +During this analysis we discovered a design bug in BridgeDB which makes it +freeze while reading new bridge descriptors. +This bug should be fixed regardless of scaling to 10K--50K bridges, +because it already affects users. +The suggested changes to Tonga, transfering tarballs between hosts, and +changes to metrics-db can be postponed until there's an actual problem, +not just a theoretical one. + +\end{document} +
participants (1)
-
karsten@torproject.org