[tor-bugs] #9299 [Tor]: Dump stack traces on assertion, crash, or general trouble

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Oct 14 16:09:25 UTC 2013


#9299: Dump stack traces on assertion, crash, or general trouble
-----------------------------+---------------------------------
     Reporter:  nickm        |      Owner:
         Type:  enhancement  |     Status:  needs_review
     Priority:  normal       |  Milestone:  Tor: 0.2.5.x-final
    Component:  Tor          |    Version:
   Resolution:               |   Keywords:  tor-relay debugging
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+---------------------------------

Comment (by nickm):

 Replying to [comment:5 lunar]:
 > Ok, here's a review. I hope I'm not wasting anyone's time.
 >
 > Interesting how the new `format_dec_number_sigsafe()` and
 `format_hex_number_sigsafe()` are slightly different given they do almost
 the same thing.

 Yeah; perhaps they should share more code.

 > {{{
 > +++ b/configure.ac
 >          crt_externs.h \
 > +       execinfo.h \
 >          grp.h \
 > }}}
 >
 > Small whitespace issue.

 Fixed

 > {{{
 > +    tor_assert(0 * 0 == 1);
 > }}}
 >
 > Leaving that in the history is probably bad for future bisection, no?

 That's what the later "fixup!" in the commit message for
 62645b5e18a8994c9257078f571d4227037a894c is for.  See the "--autosquash"
 argument for git rebase.

 > `tor_log_err_sigsafe_helper()` might be better named
 `tor_log_err_sigsafe_write()`

 Renaming.

 > {{{
 > +  if (log_time_granularity >= 2000) {
 > +    int g = log_time_granularity / 1000;
 > }}}
 >
 > A comment for the magic numbers? You previously had 15 minutes. Does
 this mean that the
 > `stack_dump` file will have second granularity by default? Is that a
 problem?

 It means that the file will have the same granularity as the rest of the
 system logs.


 > {{{
 > +tor_log_get_sigsafe_err_fds(const int **out)
 > +{
 > +  *out = sigsafe_log_fds;
 > +  return n_sigsafe_log_fds;
 > }}}
 >
 > Can't we have a race here if `tor_log_get_sigsafe_err_fds()` is called
 and then `tor_log_update_sigsafe_err_fds()`? Or do the usage of the first
 mandate `LOCK_LOGS`? Or are we sure there's no way this can happen?

 Hm. During a signal handler, it's not safe to grab a lock, so we might
 need to tolerate that, or use some kind of atomic set/fetch.

 > `fd7aa1af` commit message does not say anything about the
 `found_real_stderr` thing.

 Added in a squash! commit.

 > {{{
 > +    assert(n_sigsafe_log_fds >= 2);
 > }}}
 >
 > Using `assert()` because `tor_assert()` will not be happy with an
 unconfigured crash handler, right? Might worth a comment.

 Added a comment.  The other reason is that tor_assert logs, and logging
 inside log functions is potentially a recursive nightmare.

 > {{{
 > +#ifdef PC_FROM_UCONTEXT
 > +#if defined(__linux__)
 > +  const int n = 1;
 > +#elif defined(__darwin__) || defined(__APPLE__) || defined(__OpenBSD__)
 \
 > +  || defined(__FreeBSD__)
 > +  const int n = 2;
 > +#endif
 > }}}
 >
 > Maybe add an #else #error here ? The compiler should be unhappy because
 `n` will be uninitialized, but it might save a little bit of time for a
 future port.

 Added

 > {{{
 > +    puts("Argument should be \"assert\" or \"crash\"");
 > }}}
 >
 > “or none”

 Added

 > {{{
 > +  return crash(x) + crash(x*2);
 > }}}
 >
 > What's the point of testing the reader's knowledge about C evaluation
 order? :)

 Added a comment.  The actual point is to prevent the compiler from adding
 a tail-call optimization during the first call to crash().

 >
 >
 > Running the test program with assert has the logging function in the
 trace:
 >
 > {{{
 > $ ./test-bt-cl assert
 > Oct 11 22:55:39.705 [err] tor_assertion_failed_(): Bug:
 src/test/test_bt_cl.c:36: crash: Assertion 1 == 0 failed; aborting.
 > Oct 11 22:55:39.705 [err] Bug: Assertion 1 == 0 failed in crash at
 src/test/test_bt_cl.c:36. Stack trace:
 > Oct 11 22:55:39.705 [err] Bug:     ./test-bt-cl(log_backtrace+0x35)
 [0x7f3236a5e095]
 > Oct 11 22:55:39.705 [err] Bug:     ./test-bt-
 cl(tor_assertion_failed_+0x9f) [0x7f3236a68a8f]
 > Oct 11 22:55:39.705 [err] Bug:     ./test-bt-cl(crash+0x79)
 [0x7f3236a5de69]
 > […]
 > }}}
 >
 > Is that something we want to keep?

 I'd say, "yes".

 > For the other one:
 >
 > {{{
 > $ ./test-bt-cl crash
 > ============================================================
 T=1381524943
 > Tor  died: Caught signal 11
 > ./test-bt-cl(+0x8fc7)[0x7f156871afc7]
 > ./test-bt-cl(crash+0x48)[0x7f156871ae38]
 > ./test-bt-cl(crash+0x48)[0x7f156871ae38]
 > ./test-bt-cl(oh_what+0x25)[0x7f156871ae95]
 > }}}
 >
 > There's no symbol for the first function and `crash()` is there twice.
 Is that expected?

 Odd. This seems Linux-specific. Will investigate.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9299#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list