I'm running Tor on a router and was wondering why the Tor daemon uses so much memory.
To clarify, do you mean "running a Tor client on a home Internet router"? What version of Tor?
I'm running version 0.2.5.12 (git-99d0579ff5e0349f) The router I use is an GL-AR150 with 64MB RAM, 16MB Flash memory More specs: http://www.gl-inet.com/ar-specifications/
Did a pmap:
pmap `pidof tor`
And got the following result:
1703: /usr/sbin/tor --PidFile /var/run/tor.pid 00400000 1024K r-x-- /usr/sbin/tor (snip < 1MB) 00af2000 17288K rwx-- [ anon ] 7713a000 7140K r---- /tmp/lib/tor/cached-microdescs (snip < 1MB) 779c0000 1300K r-x-- /usr/lib/libcrypto.so.1.0.0 (snip < 1MB) total 29360K
As you can see there is a large 17288K block which turns out to be a heap (of course). When I dumped the block and looked inside I found it was full of router data. Looks like it is mostly an in-memory database of the router list.
Yes, that heap likely contains the parsed version of cached-microdescs, and the parsed version of the consensus. As well as cell queues and many other in-memory data structures.
Seeing the amount of flash memory my device has, I realized that /tmp/lib/tor/cached-microdescs could not be on flash memory. It's on tmpfs and also stored in RAM.
This worries me. If in the future the router list grows, my router (and many other routers running Tor) can run out of memory. For me, it looks a little bit strange to have an in-memory database of the router list. Is there a reason for having this data in memory?
Tor selects relays at random when it builds paths. It also uses the relay list for other operations like finding hidden service directories. It's faster to select relays from a parsed data structure in memory, particularly when each relay needs to be checked (for bandwidth, or address, or any number of attributes).
This memory usage is also an issue for mobile devices, embedded devices, and restricted environments (such as VPN network extensions) on any device.
And, can something be done about it?
Every so often, tor increases the minimum bandwidth required to be a relay. This reduces the size of the consensus, and the number of descriptors.
That would certainly limit the risk of running out of memory.
Over the longer term, there are a few design alternatives:
store the relay information on disk until needed, or
download and use fewer relays, or
keep less information in memory for each relay.
We could store the information in a database or flat file on disk,
and access it as needed. But that could be excruciatingly slow, or result in difficult-to-predict performance. Perhaps a database engine could better cache frequently-used attributes or totals. But the current design still iterates through the entire list every time it selects a relay. We'd need to think carefully about changing this.
With only 16MB of flash, storing the data on flash is not an option. But, most routers have an USB connector which can be connected to a memory stick.
- The issue with each tor client using fewer relays is that it becomes
easy to identify individual tor clients. Perhaps we could split the list in two, and it would split clients into two groups. Maybe that's not too bad. But it might also enable other attacks.
I have seen designs described where tor can retrieve parts of the list of relays, while being able to prove they're part of the network-wide list. But that doesn't really help here, because you're still using fewer relays, and therefore easily distinguishable from other clients.
I like this solution. Maybe a client can download all descriptors, but only store a fixed number of (randomly selected) routers? This could be a configuration option, something like: maxDescriptorStorageCount.
The interesting question is: How does the number of stored descriptors affect the traceability of the client?
- We already keep less information in memory using microdescriptors,
which most tor clients use by default. If there is any unnecessary information in the consensus or in microdescriptors, we'd be happy to remove it.
I think our change to ed25519 keys might do this over the longer term, but in the interim, it means an increase in memory usage.
Please feel free to let us know if this is a pressing issue for you, and we'll see what we can do.
At the moment it is not a pressing issue for me, everything works fine. But if the router list keeps growing it will be a problem and it will break Tor router hardware.
There are not many users running Tor on the router, so I think it's not worth it to put much effort into a solution. I *really* like running Tor on the router, so if memory usage becomes a problem I will buy a router with more memory (or a Raspberry Pi 3).
Thank you for your detailed answer!
Note: For those who are interested, I wrote two articles about Tor on the router: