[tor-relays] Did 'Sandbox 1' break Tor for anyone else on

Peter Gerber tor-lists at arbitrary.ch
Tue Mar 16 11:09:54 UTC 2021

Hi William

William Kane:
> Hi Peter,
>> Would be great if you could get details about the failing call.
> I already thought of gathering said details by tracing the process,
> but did not want to risk my uptime statistics, which would inevitably
> happen if I had to restart the server and service over and over (I
> disabled tracing globally through the Yama LSM as a security measure,
> i.e. kernel.yama.ptrace_scope == 3) - recently I lost the guard flag
> multiple times, caused by some sort of attack that I already reported
> on this list (tor-relays) - someone kept creating a fuckton of
> circuits through my relay (averaging 90k per minute), thus causing tor
> to run out of memory / get oom-killed by the kernel before it could
> even step in and close the circuits - if it was even trying to, it
> would make sense for the DoS mitigation code to be active only for the
> first link in the circuit aka the guard, and my node simply being a
> middle-only relay, it got completely stomped by said attack.
> After somewhat mitigating this attack by tweaking MaxMemInQueues,
> creating a bigger swap file and tuning vm.swappiness, I regained the
> guard flag, but then the hypervisor my KVM box is running under
> experienced some issues and had to be rebooted - once again, I
> received no notice of that until the relay was already offline for a
> few days, causing me to lose the guard flag again.
> Seems like luck is just not on my side these days, or well, it's been weeks now.

You could try to just run a second instance of Tor by copying the
systemd config and Tor settings. You probably don't need to enable
OrPort and ControlPort to reproduce the issue.

>> You should simply see a Permission Denied if the capability is the problem.
> Here's a copy from stdout, only happening if Sandbox is set to 1.:
> Mar 15 20:15:20.000 [notice] Configured to measure statistics. Look
> for the *-stats files that will first be written to the data directory
> in 24 hours from now.
> Mar 15 20:15:21.000 [warn] fstat() on directory /var/lib/tor_debug failed.
> Mar 15 20:15:21.000 [err] Can't create/check datadirectory /var/lib/tor_debug
> Mar 15 20:15:21.000 [err] Error initializing keys; exiting
> Running it as a privileged user does not change thing, so no permissions issue:
> Mar 15 20:17:24.000 [notice] Configured to measure statistics. Look
> for the *-stats files that will first be written to the data directory
> in 24 hours from now.
> Mar 15 20:17:24.000 [warn] You are running Tor as root. You don't need
> to, and you probably shouldn't.
> Mar 15 20:17:25.000 [warn] fstat() on directory /var/lib/tor_debug failed.
> Mar 15 20:17:25.000 [err] Can't create/check datadirectory /var/lib/tor_debug
> Mar 15 20:17:25.000 [err] Error initializing keys; exiting
> I've traced down the origin of the fstat() call to this piece of code:
> https://github.com/torproject/tor/blob/master/src/lib/fs/dir.c#L158
> However, looking at the code that establishes and populates seccomp
> rules, it seems like fstat and it's 64 bit counterpart are not subject
> to (parameter) filtering, i.e. seccomp_rule_add_0 is invoked with the
> parameter SCMP_ACT_ALLOW, reading the manpage for seccomp_rule_add(3)
> reveals: "The seccomp filter will have no effect on the thread calling
> the syscall if it matches the filter rule."
> References:
> https://github.com/torproject/tor/blob/master/src/lib/sandbox/sandbox.c#L148
> https://github.com/torproject/tor/blob/master/src/lib/sandbox/sandbox.c#L1595
> https://man7.org/linux/man-pages/man3/seccomp_rule_add.3.html
> So, even though technically, seccomp should allow these syscalls to be
> invoked, no matter which parameters are passed, somehow enabling the
> whole sandbox subsystem still breaks fstat.

fstat() in the log above refers to the fstat() function in libc but libc
can use numerous syscalls in the background to implement it. I could
find fstat, fstat64 and fstatat64, and newer kernels may have even more
syscalls, that could be used. Usually, when seccomp starts failing, it
is because a library was updated (like libc) and started using another
syscall to implement a function (like fstat()) or the kernel was
updated, which the library detected, and started using a new, "improved"
syscall. To be sure what syscall is used, the auditd logs would be
invaluable. Performance impact should be neglectable if you don't
manually add any auditing rules.

More information about the tor-relays mailing list