[tor-dev] Memory usage of Tor daemon

Fri May 20 14:37:40 UTC 2016

> On 20 May 2016, at 06:03, Rob van der Hoeven <robvanderhoeven at ziggo.nl> wrote:
> 
> Hi,
> 
> I'm running Tor on a router and was wondering why the Tor daemon uses so
> much memory.

To clarify, do you mean "running a Tor client on a home Internet router"?
What version of Tor?

> Did a pmap:
> 
> pmap `pidof tor`
> 
> And got the following result:
> 
> 1703:   /usr/sbin/tor --PidFile /var/run/tor.pid
> 00400000   1024K r-x--  /usr/sbin/tor
> (snip < 1MB)
> 00af2000  17288K rwx--    [ anon ]
> 7713a000   7140K r----  /tmp/lib/tor/cached-microdescs
> (snip < 1MB)
> 779c0000   1300K r-x--  /usr/lib/libcrypto.so.1.0.0
> (snip < 1MB)
> total    29360K
> 
> As you can see there is a large 17288K block which turns out to be a
> heap (of course). When I dumped the block and looked inside I found it
> was full of router data. Looks like it is mostly an in-memory database
> of the router list.

Yes, that heap likely contains the parsed version of cached-microdescs, and the parsed version of the consensus.
As well as cell queues and many other in-memory data structures.

> This worries me. If in the future the router list grows, my router (and
> many other routers running Tor) can run out of memory. For me, it looks
> a little bit strange to have an in-memory database of the router list.
> Is there a reason for having this data in memory?

Tor selects relays at random when it builds paths. It also uses the relay list for other operations like finding hidden service directories. It's faster to select relays from a parsed data structure in memory, particularly when each relay needs to be checked (for bandwidth, or address, or any number of attributes).

This memory usage is also an issue for mobile devices, embedded devices, and restricted environments (such as VPN network extensions) on any device.

> And, can something be
> done about it?

Every so often, tor increases the minimum bandwidth required to be a relay. This reduces the size of the consensus, and the number of descriptors.

Over the longer term, there are a few design alternatives:
1. store the relay information on disk until needed, or
2. download and use fewer relays, or
3. keep less information in memory for each relay.

1. We could store the information in a database or flat file on disk, and access it as needed. But that could be excruciatingly slow, or result in difficult-to-predict performance. Perhaps a database engine could better cache frequently-used attributes or totals. But the current design still iterates through the entire list every time it selects a relay. We'd need to think carefully about changing this.

2. The issue with each tor client using fewer relays is that it becomes easy to identify individual tor clients. Perhaps we could split the list in two, and it would split clients into two groups. Maybe that's not too bad. But it might also enable other attacks.

I have seen designs described where tor can retrieve parts of the list of relays, while being able to prove they're part of the network-wide list. But that doesn't really help here, because you're still using fewer relays, and therefore easily distinguishable from other clients.

3. We already keep less information in memory using microdescriptors, which most tor clients use by default. If there is any unnecessary information in the consensus or in microdescriptors, we'd be happy to remove it.

I think our change to ed25519 keys might do this over the longer term, but in the interim, it means an increase in memory usage.

Please feel free to let us know if this is a pressing issue for you, and we'll see what we can do.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B
ricochet:ekmygaiu4rzgsk6n

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160520/d718bd7b/attachment.sig>