commit 864027a18c761021eee5346dc085813f05868cde Author: Karsten Loesing karsten.loesing@gmx.net Date: Tue Aug 7 19:24:51 2012 +0200
Move #4499 report sources to tech-reports.git. --- task-4499/README | 4 +- task-4499/bridge-scaling.tex | 141 ------------------------------------------ 2 files changed, 1 insertions(+), 144 deletions(-)
diff --git a/task-4499/README b/task-4499/README index 4bf9264..8570f16 100644 --- a/task-4499/README +++ b/task-4499/README @@ -50,7 +50,5 @@ Generate the graph:
$ R --slave -f bridge-scaling.R
-Build the PDF: - - $ pdflatex bridge-scaling.tex +Build the PDF in tech-reports.git/2012/bridge-scaling/ .
diff --git a/task-4499/bridge-scaling.tex b/task-4499/bridge-scaling.tex deleted file mode 100644 index 6acf2d7..0000000 --- a/task-4499/bridge-scaling.tex +++ /dev/null @@ -1,141 +0,0 @@ -\documentclass{article} -\usepackage{url} -\usepackage[pdftex]{graphicx} -\usepackage{graphics} -\usepackage{color} -\begin{document} -\title{What if the Tor network had 50,000 bridges?} -\author{Karsten Loesing\{\tt karsten@torproject.org}} - -\maketitle - -\section{Introduction} - -The current bridge infrastructure relies on a central bridge authority to -collect, distribute, and publish bridge relay descriptors. -There are currently 1,000 bridges running in the Tor network.\footnote{% -\url{https://metrics.torproject.org/network.html#networksize%7D%7D -We believe the current infrastructure can handle up to 10,000 bridges. -Potential performance bottlenecks include: - -\begin{itemize} -\item the bridge authority Tonga, where all (public) bridges register and -which performs periodic reachability tests to confirm that bridges are -running, -\item BridgeDB, which stores currently running bridges and hands them out -to bridge users, and -\item metrics-db, which sanitizes bridge descriptors for later analysis -like statistics on daily connecting bridge users. -\end{itemize} - -\section{Load-testing BridgeDB and metrics-db} - -We started this analysis by writing a small tool to generate sample data -for BridgeDB and metrics-db to load-test them. -This tool takes the contents from one of Tonga's bridge tarball as input, -copies them a given number of times, and overwrites the first two bytes of -relay fingerprints in every copy with 0000, 0001, etc. -The tool also fixes references between network statuses, server -descriptors, and extra-info descriptors. -This is sufficient to trick BridgeDB and metrics-db into thinking that -bridges in the copies are distinct bridges. -We used the tool to generate tarballs with 2, 4, 8, 16, 32, and 64 times -as many bridge descriptors in them. - -In the next step we fed the tarballs into BridgeDB and metrics-db. -BridgeDB reads the network statuses and server descriptors from the latest -tarball and writes them to a local database. -metrics-db sanitizes two half-hourly created tarballs every hour, -establishes an internal mapping between descriptors, and writes sanitized -descriptors with fixed references to disk. -Figure~\ref{fig:bridgescaling} shows the results. - -\begin{figure}[t] -\includegraphics[width=\textwidth]{bridge-scaling.png} -\caption{Results from load-testing BridgeDB and metrics-db} -\label{fig:bridgescaling} -\end{figure} - -The upper graph shows how the tarballs grow in size with more bridge -descriptors in them. -This growth is, unsurprisingly, linear. -One thing to keep in mind here is that bandwidth and storage requirements -to the hosts transferring and storing bridge tarballs are growing with the -tarballs. -We'll want to pay extra attention to disk space running out on those -hosts. -These tarballs have substantial overlap. -If we have tens of thousands of descriptors, we would want to get smarter -at sending diffs over to BridgeDB and metrics-db.\footnote{See comment at -\url{https://trac.torproject.org/projects/tor/ticket/4499#comment:7%7D%7D - -The middle graph shows how long BridgeDB takes to load descriptors from a -tarball. -This graph is linear, too, which indicates that BridgeDB can handle an -increase in the number of bridges pretty well. - -The lower graph shows how metrics-db can or cannot handle more bridges. -The growth is slightly worse than linear. -In any case, the absolute time required to handle 25K bridges is worrisome -(we didn't try 50K). -metrics-db runs in an hourly cronjob, and if that cronjob doesn't finish -within 1 hour, we cannot start the next run and will be missing some data. -We might have to sanitize bridge descriptors in a different thread or -process than the one that fetches all the other metrics data. -We can also look into other Java libraries to handle .gz-compressed files -that are faster than the one we're using. - -\section{Looking at concurrency in BridgeDB} - -While performing the load-test on BridgeDB we were wondering whether it -can serve client requests while loading bridges. -Turns out BridgeDB's interaction with users freezes while it's reading a -new set of data. -This isn't that much of a problem with a few hundred bridges and unlucky -clients having to wait 10 seconds for their bridges. -But it becomes a problem when BridgeDB is busy for a minute or two, twice -an hour. -We started discussing importing bridges into BridgeDB in a separate thread -and database transaction.\footnote{% -\url{https://trac.torproject.org/projects/tor/ticket/5232%7D%7D - -\section{Scalability of the bridge authority Tonga} - -We left out the most important part of this analysis: -can Tonga, or more generally, a single bridge authority handle this -increase in bridges? -Tonga still does a reachability test on each bridge every 21~minutes or so. -Eventually the number of TLS handshakes it's doing will overwhelm its -CPU.\footnote{% -\url{https://trac.torproject.org/projects/tor/ticket/4499#comment:7%7D%7D - -We're not sure how to test such a setting, or at least without running 50K -bridges in a private network. -We could imagine this requires some more sophisticated sample data -generation including getting the crypto right and then talking to Tonga's -DirPort. -We didn't find an easy way to test this. - -A possible fix would be to increase the reachability test interval from -21~minutes to some higher value. -A long-term fix would be to come up with a design that has more than one -single bridge authority. - -\section{Conclusion} - -In conclusion, we found that a massive increase in bridges in the Tor -network by a factor of 10 to 50 can be harmful to Tor's infrastructure. -We identified possible bottlenecks: Tonga's reachability test interval, -bridge tarball sizes for transfer between Tonga and BridgeDB/metrics-db, -loading bridges into BridgeDB, and sanitizing bridges in metrics-db. - -During this analysis we discovered a design bug in BridgeDB which makes it -freeze while reading new bridge descriptors. -This bug should be fixed regardless of scaling to 10K--50K bridges, -because it already affects users. -The suggested changes to Tonga, transfering tarballs between hosts, and -changes to metrics-db can be postponed until there's an actual problem, -not just a theoretical one. - -\end{document} -
tor-commits@lists.torproject.org