Re: [tor-dev] Memory usage of Tor daemon

20 May 2016

      ...
...
I'm running Tor on a router and was wondering why the Tor daemon uses so
much memory.
To clarify, do you mean "running a Tor client on a home Internet router"?
What version of Tor?
I'm running version 0.2.5.12 (git-99d0579ff5e0349f)
The router I use is an GL-AR150 with 64MB RAM, 16MB Flash memory
More specs: http://www.gl-inet.com/ar-specifications/
...
...
Did a pmap:
pmap `pidof tor`
And got the following result:
1703:   /usr/sbin/tor --PidFile /var/run/tor.pid
00400000   1024K r-x--  /usr/sbin/tor
(snip < 1MB)
00af2000  17288K rwx--    [ anon ]
7713a000   7140K r----  /tmp/lib/tor/cached-microdescs
(snip < 1MB)
779c0000   1300K r-x--  /usr/lib/libcrypto.so.1.0.0
(snip < 1MB)
total    29360K
As you can see there is a large 17288K block which turns out to be a
heap (of course). When I dumped the block and looked inside I found it
was full of router data. Looks like it is mostly an in-memory database
of the router list.
Yes, that heap likely contains the parsed version of cached-microdescs,
and the parsed version of the consensus. As well as cell queues and
many other in-memory data structures.
Seeing the amount of flash memory my device has, I realized
that /tmp/lib/tor/cached-microdescs could not be on flash memory. It's
on tmpfs and also stored in RAM.
...
...
This worries me. If in the future the router list grows, my router (and
many other routers running Tor) can run out of memory. For me, it looks
a little bit strange to have an in-memory database of the router list.
Is there a reason for having this data in memory?
Tor selects relays at random when it builds paths. It also uses the
relay list for other operations like finding hidden service
directories. It's faster to select relays from a parsed data structure
in memory, particularly when each relay needs to be checked (for
bandwidth, or address, or any number of attributes).
This memory usage is also an issue for mobile devices, embedded
devices, and restricted environments (such as VPN network extensions)
on any device.
...
And, can something be
done about it?
Every so often, tor increases the minimum bandwidth required to be a
relay. This reduces the size of the consensus, and the number of
descriptors.
That would certainly limit the risk of running out of memory.
...
Over the longer term, there are a few design alternatives:
1. store the relay information on disk until needed, or
2. download and use fewer relays, or
3. keep less information in memory for each relay.
1. We could store the information in a database or flat file on disk,
and access it as needed. But that could be excruciatingly slow, or
result in difficult-to-predict performance. Perhaps a database engine
could better cache frequently-used attributes or totals. But the
current design still iterates through the entire list every time it
selects a relay. We'd need to think carefully about changing this.
With only 16MB of flash, storing the data on flash is not an option.
But, most routers have an USB connector which can be connected to a
memory stick.
...
2. The issue with each tor client using fewer relays is that it becomes
easy to identify individual tor clients. Perhaps we could split the
list in two, and it would split clients into two groups. Maybe that's
not too bad. But it might also enable other attacks.
I have seen designs described where tor can retrieve parts of the list
of relays, while being able to prove they're part of the network-wide
list. But that doesn't really help here, because you're still using
fewer relays, and therefore easily distinguishable from other clients.
I like this solution. Maybe a client can download all descriptors, but
only store a fixed number of (randomly selected) routers? This could be
a configuration option, something like: maxDescriptorStorageCount.

The interesting question is: How does the number of stored descriptors
affect the traceability of the client?
...
3. We already keep less information in memory using microdescriptors,
which most tor clients use by default. If there is any unnecessary
information in the consensus or in microdescriptors, we'd be happy to
remove it.
I think our change to ed25519 keys might do this over the longer term,
but in the interim, it means an increase in memory usage.
Please feel free to let us know if this is a pressing issue for you,
and we'll see what we can do.
At the moment it is not a pressing issue for me, everything works fine.
But if the router list keeps growing it will be a problem and it will
break Tor router hardware. 

There are not many users running Tor on the router, so I think it's not
worth it to put much effort into a solution. I *really* like running Tor
on the router, so if memory usage becomes a problem I will buy a router
with more memory (or a Raspberry Pi 3). 

Thank you for your detailed answer!

Rob.
https://hoevenstein.nl

Note: For those who are interested, I wrote two articles about Tor on
the router:

https://hoevenstein.nl/thoughts-on-tor-router-hardware

https://hoevenstein.nl/my-openwrt-tor-configuration

Re: [tor-dev] Memory usage of Tor daemon

Rob van der Hoeven