SystemTap

SystemTap is not part of Linux Kernel, so it have to adapt to kernel changes: i.e. sometimes runtime and code-generator have to adapt to new kernel releases. Also, Linux kernels in most distributions are stripped which means that debug information in DWARF format or symbol tables are removed. SystemTap supports DWARF-less tracing, but it has very limited capabilities, so we need to provide DWARF information to it.

Many distributions have separate packages with debug information: packages with -debuginfo suffix on RPM-based distributions, packages with -dbg on Debian-based distributions. They have files that originate from same build the binary came from (it is crucial for SystemTap because it verifies buildid of kernel), but instead of text and data sections they contain debug sections. For example, RHEL need kernel-devel, kernel-debuginfo and kernel-debuginfo-common packages to make SystemTap working. Recent SystemTap versions have stap-prep tool that automatically install kernel debuginfo from appropriate repositories with correct versions.

For vanilla kernels you will need to configure CONFIG_DEBUG_INFO option so debug information will be linked with kernel. You will also need to set CONFIG_KPROBES to allow SystemTap patching kernel code, CONFIG_RELAY and CONFIG_DEBUG_FS to allow transfer information between buffers and consumer and CONFIG_MODULES with CONFIG_MODULE_UNLOAD to provide module facilities. You will also need uncompressed vmlinux file and kernel sources located in /lib/modules/$(uname -r)/build/.

SystemTap doesn't have VM in-kernel (unlike DTrace and KTap), instead it generates kernel module source written in C than builds it, so you will also need a compiler toolchain (make, gcc and ld). Compilation takes five phases: parse, elaborate in which tapsets and debuginfo is linked with script, translate in which C code is generated, compile and run:

image:stapprocess

SystemTap uses two sets of libraries during compilation process to provide kernel-version independent API for accessing. Tapsets are a helpers that are written in SystemTap language (but some parts may be written in C) and they are plugged during elaborate stage. Runtime is written in C and used during compile stage. Because of high complexity of preparing source code and compiling that, SystemTap is slower than a DTrace. To mitigate that issue, it can cache compiled modules, or even use compile servers.

Unlike DTrace, SystemTap has several front-end tools with different capabilities:

Warning

If stap parent is exited, than killall -9 stap won't finish stapio daemon. You have to signal it with SIGTERM: killall -15 stap

stap

Like many other scripting tools, SystemTap accepts script as command line option or external file, for example:

Here are some useful stap(1) options:

When SystemTap needs to resolve address into a symbol (for example, instruction pointer to a corresponding function name), it doesn't look into libraries or kernel modules. Here are some useful command-line options that enable that:

SystemTap example

Here is sample SystemTap script:

#!/usr/sbin/stap 

probe syscall.write
{
    if(pid() == target())
      printf("Written %d bytes", $count);
}

Save it to test.stp and run like this:

root@host# stap /root/test.stp -c "dd if=/dev/zero of=/dev/null count=1"

Q: Run SystemTap with following options: # stap -vv -k -p4 /root/test.stp , find generated directory in /tmp and look into created C source.

Q: Calculate number of probes in a syscall provider and number of variables provided by syscall.write probe:

# stap -l 'syscall.*' | wc -l
# stap -L 'syscall.write'

References