[tor-bugs] #24737 [Core Tor/Tor]: oft given MaxMemInQueues advice is wrong

Sun Dec 31 17:10:08 UTC 2017

#24737: oft given MaxMemInQueues advice is wrong
----------------------------+----------------------------------
 Reporter:  starlight       |          Owner:  (none)
     Type:  defect          |         Status:  new
 Priority:  Medium          |      Milestone:  Tor: unspecified
Component:  Core Tor/Tor    |        Version:
 Severity:  Normal          |     Resolution:
 Keywords:  doc, tor-relay  |  Actual Points:
Parent ID:                  |         Points:
 Reviewer:                  |        Sponsor:
----------------------------+----------------------------------

Comment (by starlight):

 Replying to [comment:1 teor]:
 > I'm not sure what you want us to do in response to this ticket.
 > If you can write up a short wiki page with some advice, we could point
 to it rather than trying to guess the right setting.

 I suggest adding some verbiage to the Tor Manual where most people would
 look first when adjusting MaxMemInQueues.

 >
 > I don't think percentages are helpful - I think creating a table with
 free RAM to MaxMemInQueues values would be more helpful. (See below.)
 > . . .
 > To be more precise: MaxMemInQueues doesn't track destroy queues, nor
 does it track various other Tor data structures,
 > So you have to set it at a level that allows space for a few hundred
 megabytes of Tor data, and then some destroy queues.
 >
 > At 1024 MB per instance, this means 512 MB or less.
 >
 > But with 10 GB per instance, it really is ok to allow 5-7 GB in queues.
 > (I have a relay that allows the default 8 GB in queues, and it's fine.)
 >

 My observation is that when MaxMemInQueues triggers a circuit kill, the
 daemon will have consumed in physical memory approximately twice the
 setting value.  Of course YMMV on the precise amount, but this
 observational rule-of-thumb is far away from the suggestion that 120-130%
 of MaxMemInQueues will be used.

 > > the aforementioned incorrect advice was followed in #22255 and the
 operator continues to experience OOM failures
 >
 > Are you the operator?
 > Have they tried 0.3.2.8-rc and reopened another ticket?

 Not the operator on that ticket.  It came up in a search and seems to me
 his MaxMemInQueues is too high relative to RAM.

 > The tor daemon will assert and exit if malloc returns NULL.

 Ah, well then vm.overcommit_memory=2 will cause the daemon to die sooner
 rather than later instead of a more graceful response such as killing one
 circuit.  Still better then allowing Linux OOM handler choose a victim to
 kill.

 Alternately, my advice for hardy souls willing to expend such effort:

 1) leave the default vm.overcommit_memory=0 in effect
 2) write a script to set /proc/<pid>/task/<tid>/oom_adj to -17 for every
 process in the system
 3) have a script set oom_adj=0 for a process you would rather have die
 than the tor daemon
 3b) if one sets -17 for every process, then Linux will suspend the memory
 requester until some becomes available; this could result in a hung
 system, a crashed system, or it could result in a semi-graceful recovery
 in the case where socket buffer memory is freed as queues drain

 Additionally one should set vm.min_free_kbytes=131072 or even =262144.  By
 default Linux sets this value so low that a sudden surge in arriving
 network traffic will use up all free memory so fast OOM killer and dirty-
 cache writes can't keep pace and the system will OOPs (hard crash).

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24737#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online