「转」在 C/C++ 程序中打印当前函数调用栈

¶背景

2011 年 6 月 11 日小武哥

前几天帮同事跟踪的一个程序莫名退出，没有 core dump(当然 ulimit 是打开的) 的问题。我们知道，正常情况下，如果程序因为某种异常条件退出的话，应该会产生 core dump，而如果程序正常退出的话，应该是直接或者间接的调用了 exit() 相关的函数。基于这个事实，我想到了这样一个办法，在程序开始时，通过系统提供的 atexit()，向系统注册一个回调函数，在程序调用 exit() 退出的时候，这个回调函数就会被调用，然后我们在回调函数中打印出当前的函数调用栈，由此便可以知道 exit() 是在哪里调用，从而上述问题便迎刃而解了。上述方法用来解决类似问题是非常行之有效的。在上面，我提到了在 “回调函数中打印出当前的函数调用栈”，相信细心的朋友应该注意到这个了，本文的主要内容就是详细介绍，如何在程序中打印中当前的函数调用栈。

我之前写过一篇题目为《介绍几个关于 C/C++ 程序调试的函数》的文章，看到这里，请读者朋友先看一下前面这篇，因为本文是以前面这篇文章为基础的。我正是用了 backtrace() 和 backtrace_symbols() 这两个函数实现的，下面是一个简单的例子，通过这个例子我们来介绍具体的方法：

#include <execinfo .h>
#include <stdio .h>
#include <stdlib .h>
 
void fun1();
void fun2();
void fun3();
 
void print_stacktrace();
 
int main()
{
    fun3();
}
 
void fun1()
{
    printf("stackstrace begin:\n");
    print_stacktrace();
}
 
void fun2()
{
    fun1();
}
 
void fun3()
{
    fun2();
}
 
void print_stacktrace()
{
    int size = 16;
    void * array[16];
    int stack_num = backtrace(array, size);
    char ** stacktrace = backtrace_symbols(array, stack_num);
    for (int i = 0; i < stack_num; ++i)
    {
        printf("%s\n", stacktrace[i]);
    }
    free(stacktrace);
}

（说明：下面的介绍采用的环境是 ubuntu 11.04, x86_64, gcc-4.5.2）

通过下面的方式编译运行：

wuzesheng@ubuntu:~/work/test$ gcc test.cc -o test1
wuzesheng@ubuntu:~/work/test$ ./test1
stackstrace begin:
./test1() [0x400645]
./test1() [0x400607]
./test1() [0x400612]
./test1() [0x40061d]
./test1() [0x4005ed]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x7f5c59a91eff]
./test1() [0x400529]

从上面的运行结果中，我们的确看到了函数的调用栈，但是都是 16 进制的地址，会有点小小的不爽。当然我们可以通过反汇编得到每个地址对应的函数，但这个还是有点麻烦了。不急，且听我慢慢道来，看第 2 步。

通过下面的方式编译运行：

wuzesheng@ubuntu:~/work/test$ gcc test.cc -rdynamic -o test2
wuzesheng@ubuntu:~/work/test$ ./test2
stackstrace begin:
./test2(_Z16print_stacktracev+0x26) [0x4008e5]
./test2(_Z4fun1v+0x13) [0x4008a7]
./test2(_Z4fun2v+0x9) [0x4008b2]
./test2(_Z4fun3v+0x9) [0x4008bd]
./test2(main+0x9) [0x40088d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x7f9370186eff]
./test2() [0x4007c9]

这下终于可以看到函数的名字了，对比一下 2 和 1 的编译过程，2 比 1 多了一个 - rdynamic 的选项，让我们来看看这个选项是干什么的 (来自 gcc mannual 的说明):

1
2
-rdynamic
  Pass the flag -export-dynamic to the ELF linker, on targets that support it. This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of "dlopen" or to allow obtaining backtraces from within a program.

从上面的说明可以看出，它的主要作用是让链接器把所有的符号都加入到动态符号表中，这下明白了吧。不过这里还有一个问题，这里的函数名都是 mangle 过的，需要 demangle 才能看到原始的函数。关于 c++ 的 mangle/demangle 机制，不了解的朋友可以在搜索引擎上搜一下，我这里就不多就介绍了。这里介绍如何用命令来 demangle，通过 c++filt 命令便可以:

1
2
wuzesheng@ubuntu:~/work/test$ c++filt < << "_Z16print_stacktracev"
print_stacktrace()

写到这里，大部分工作就 ok 了。不过不知道大家有没有想过这样一个问题，同一个函数可以在代码中多个地方调用，如果我们只是知道函数，而不知道在哪里调用的，有时候还是不够方便，bingo，这个也是有办法的，可以通过 address2line 命令来完成，我们用第 2 步中编译出来的 test2 来做实验 (address2line 的 - f 选项可以打出函数名, -C 选项也可以 demangle)：

1
2
3
4
wuzesheng@ubuntu:~/work/test$ addr2line -a 0x4008a7 -e test2 -f
0x00000000004008a7
_Z4fun1v
??:0

Oh no，怎么打出来的位置信息是乱码呢？不急，且看我们的第 3 步。

通过下面的方式编译运行：

wuzesheng@ubuntu:~/work/test$ gcc test.cc -rdynamic -g -o test3
wuzesheng@ubuntu:~/work/test$ ./test3
stackstrace begin:
./test3(_Z16print_stacktracev+0x26) [0x4008e5]
./test3(_Z4fun1v+0x13) [0x4008a7]
./test3(_Z4fun2v+0x9) [0x4008b2]
./test3(_Z4fun3v+0x9) [0x4008bd]
./test3(main+0x9) [0x40088d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x7fa9558c1eff]
./test3() [0x4007c9]
wuzesheng@ubuntu:~/work/test$ addr2line -a 0x4008a7 -e test3 -f -C
0x00000000004008a7
fun1()
/home/wuzesheng/work/test/test.cc:20

看上面的结果，我们不仅得到了调用栈，而且可以得到每个函数的名字，以及被调用的位置，大功告成。在这里需要说明一下的是，第 3 步比第 2 步多了一个 - g 选项，-g 选项的主要作用是生成调试信息，位置信息就属于调试信息的范畴，经常用 gdb 的朋友相信不会对这个选项感到陌生。

¶在 C/C++ 程序里打印调用栈信息

我们知道，GDB 的 backtrace 命令可以查看堆栈信息。但很多时候，GDB 根本用不上。比如说，在线上环境中可能没有 GDB，即使有，也不太可能让我们直接在上面调试。如果能让程序自己输出调用栈，那是最好不过了。本文介绍和调用椎栈相关的几个函数。

NAME
    backtrace, backtrace_symbols, backtrace_symbols_fd - support for application self-debugging

SYNOPSIS
    #include <execinfo.h>
    int backtrace(void **buffer, int size);
  char **backtrace_symbols(void *const *buffer, int size);
  void backtrace_symbols_fd(void *const *buffer, int size, int fd);

以上内容源自这几个函数的 man 手册。

先简单介绍一下这几个函数的功能： backtrace：获取当前的调用栈信息，结果存储在 buffer 中，返回值为栈的深度，参数 size 限制栈的最大深度，即最大取 size 步的栈信息。 backtrace_symbols：把 backtrace 获取的栈信息转化为字符串，以字符指针数组的形式返回，参数 size 限定转换的深度，一般用 backtrace 调用的返回值。 backtrace_symbols_fd：它的功能和 backtrace_symbols 差不多，只不过它不把转换结果返回给调用方，而是写入 fd 指定的文件描述符。

Man 手册里，给出了一个简单的实例，我们看一下：

#include <execinfo.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define  SIZE 100 

void myfunc3(void)
{
    int j, nptrs;
    void *buffer[100];
    char **strings;
    nptrs = backtrace(buffer, SIZE);
    printf("backtrace() returned %d addresses\n", nptrs);
      
 /* The call backtrace_symbols_fd(buffer, nptrs, STDOUT_FILENO)  
  *  would produce similar output to the following: */
        strings = backtrace_symbols(buffer, nptrs);
    if (strings == NULL)
    {
        perror("backtrace_symbols");
        exit(EXIT_FAILURE);
    } 
 
    for (j = 0; j < nptrs; j++)
    {
        printf("%s\n", strings[j]);
    }
    free(strings);
}

/* "static" means don't export the symbol... */
static void myfunc2(void)
{
    myfunc3();
}

void myfunc(int ncalls)
{
    if (ncalls > 1)
    {
        myfunc(ncalls - 1);
    }
    else
    {
        myfunc2();
    }
}

int main(int argc, char *argv[])
{
     
  if  (argc != 2)
    {
        fprintf(stderr, "%s num-calls\n", argv[0]);
        exit(EXIT_FAILURE);  
    }
    myfunc(atoi(argv[1]));
    exit(EXIT_SUCCESS);
}

编译:

1	# cc prog.c -o prog

运行：

# ./prog 0
backtrace() returned 6 addresses
./prog() [0x80485a3]
./prog() [0x8048630]
./prog() [0x8048653]
./prog() [0x80486a7]

这样，是输出了调用栈，不过只是以十六进制输出函数地址而已，可读性很差。仔细看下 man 手册，原来很简单，编译时加上个参数：

重新编译：

1	# cc -rdynamic prog.c -o prog

通过 gcc 手册，我们可以也解下参数的说明:

1
2

-rdynamic
        Pass the flag -export-dynamic to the ELF linker, on targets that support it. This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of "dlopen" or to allow obtaining backtraces from within a program.

再执行：

# ./prog 0
backtrace() returned 6 addresses
./prog(myfunc3+0x1f) [0x8048763]
./prog() [0x80487f0]
./prog(myfunc+0x21) [0x8048813]
./prog(main+0x52) [0x8048867]
/lib/libc.so.6(__libc_start_main+0xe6) [0xaf9cc6]
./prog() [0x80486b1]

这回，可以看到函数名了。是不是很酷呢？把它封装到你的调试代码中吧。