Kernel Debugging Guide
This guide covers essential debugging techniques for Linux kernel development. Master these tools to diagnose kernel bugs efficiently.
Quick Reference
| Problem | Tool | Command |
|---|---|---|
| Kernel panic | dmesg | dmesg \| tail -50 |
| Module won’t load | dmesg | dmesg \| grep -i error |
| Function tracing | ftrace | echo function > current_tracer |
| Syscall tracing | strace | strace -e openat ./program |
| Performance issues | perf | perf top |
| Memory issues | KASAN | Enable CONFIG_KASAN |
| Step debugging | GDB+QEMU | See section below |
1. printk() - Your First Debugging Tool
Log Levels
printk(KERN_EMERG "Emergency: %s\n", msg); /* 0 - System unusable */
printk(KERN_ALERT "Alert: %s\n", msg); /* 1 - Action required */
printk(KERN_CRIT "Critical: %s\n", msg); /* 2 - Critical condition */
printk(KERN_ERR "Error: %s\n", msg); /* 3 - Error condition */
printk(KERN_WARNING "Warning: %s\n", msg); /* 4 - Warning */
printk(KERN_NOTICE "Notice: %s\n", msg); /* 5 - Normal but significant */
printk(KERN_INFO "Info: %s\n", msg); /* 6 - Informational */
printk(KERN_DEBUG "Debug: %s\n", msg); /* 7 - Debug messages */
Modern Macros (Preferred)
pr_info("Module loaded\n");
pr_err("Failed to allocate memory\n");
pr_debug("Debug: value = %d\n", val); /* Only if DEBUG defined */
/* Dynamic debug - can be enabled at runtime */
pr_debug("count = %d\n", count);
/* Enable: echo 'module mymod +p' > /sys/kernel/debug/dynamic_debug/control */
Useful Format Specifiers
pr_info("Pointer: %px\n", ptr); /* Raw pointer */
pr_info("Symbol: %pS\n", func); /* Kernel symbol name */
pr_info("dentry: %pd\n", dentry); /* Dentry name */
pr_info("task: %s\n", current->comm); /* Current process name */
pr_info("PID: %d\n", current->pid); /* Current PID */
Viewing Kernel Messages
# View kernel log
dmesg
dmesg | tail -20
dmesg -w # Follow (like tail -f)
dmesg -T # Human-readable timestamps
dmesg -l err,warn # Filter by level
dmesg -c # Clear and display
# Persistent logs (systemd)
journalctl -k # Kernel messages
journalctl -k -f # Follow kernel log
2. GDB + QEMU Debugging
Setup
Step 1: Configure Kernel
# Essential config options for debugging
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF5=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
# Disable for reliable debugging
# CONFIG_RANDOMIZE_BASE is not set (Disable KASLR)
Step 2: Start QEMU
# Start with debug stub
qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage \
-initrd initramfs.cpio.gz \
-append "console=ttyS0 nokaslr" \
-nographic \
-s -S # -s = GDB on :1234, -S = pause at start
Step 3: Connect GDB
gdb vmlinux
(gdb) target remote :1234
(gdb) continue
Essential GDB Commands
# Breakpoints
break start_kernel # Break at function
break kernel/fork.c:1234 # Break at line
break copy_process if pid == 0 # Conditional breakpoint
delete 1 # Delete breakpoint 1
info breakpoints # List breakpoints
# Execution
continue (c) # Continue execution
next (n) # Step over
step (s) # Step into
finish # Run until function returns
# Inspection
print task->pid # Print variable
print/x addr # Print in hex
print *ptr # Dereference pointer
x/10x $rsp # Examine 10 hex words at RSP
x/s str # Examine string
# Stack
backtrace (bt) # Show call stack
frame 3 # Select frame 3
info locals # Show local variables
info args # Show function arguments
# Threads/CPUs
info threads # List CPUs/threads
thread 2 # Switch to CPU 2
# Kernel-specific
lx-dmesg # Show kernel log
lx-ps # Show processes
lx-lsmod # Show modules
lx-symbols # Reload symbols after module load
Debugging a Kernel Module
# 1. Load module in QEMU
(qemu) insmod mymodule.ko
# 2. Find module load address
(gdb) lx-lsmod
# Or: cat /sys/module/mymodule/sections/.text
# 3. Add symbol file
(gdb) add-symbol-file mymodule.ko 0xffffffffa0000000
# 4. Set breakpoints
(gdb) break my_function
Common Scenarios
Debugging a Crash
# After panic, examine the crash
(gdb) bt
(gdb) info registers
(gdb) print $rip
(gdb) list *$rip
Finding Where Memory Corruption Occurred
# Set hardware watchpoint
(gdb) watch *(int*)0xffff88007c0d1234
(gdb) continue
# GDB stops when value changes
3. ftrace - Kernel Function Tracer
Basic Usage
cd /sys/kernel/debug/tracing
# Enable function tracer
echo function > current_tracer
echo 1 > tracing_on
cat trace_pipe
# Disable tracing
echo 0 > tracing_on
echo nop > current_tracer
Trace Specific Functions
# Filter functions
echo 'vfs_*' > set_ftrace_filter
echo function > current_tracer
# Exclude functions
echo '!vfs_read' >> set_ftrace_filter
# Clear filters
echo > set_ftrace_filter
Function Graph Tracer
echo function_graph > current_tracer
# Output shows call graph with timing:
# 0) | vfs_read() {
# 0) 0.123 us | rw_verify_area();
# 0) | __vfs_read() {
# 0) 0.456 us | new_sync_read();
# 0) 0.789 us | }
# 0) 1.234 us | }
Tracepoints
# List available tracepoints
cat available_events
# Enable syscall tracing
echo 1 > events/syscalls/sys_enter_openat/enable
cat trace_pipe
# Custom format
echo 'comm == "myprogram"' > events/syscalls/sys_enter_openat/filter
4. perf - Performance Analysis
Basic Profiling
# Count events
perf stat ./program
perf stat -e cycles,instructions,cache-misses ./program
# Sample call stacks
perf record -g ./program
perf report
# Live top-like view
perf top
perf top -p <pid>
Kernel Profiling
# Profile kernel functions
sudo perf top -a
# Record kernel activity
sudo perf record -a -g sleep 10
sudo perf report
# Trace specific kernel function
sudo perf probe vfs_read
sudo perf record -e probe:vfs_read -a sleep 5
sudo perf probe -d vfs_read # Remove probe
Flamegraphs
# Generate data
sudo perf record -a -g sleep 30
sudo perf script > out.perf
# Create flamegraph (requires FlameGraph tools)
stackcollapse-perf.pl out.perf > out.folded
flamegraph.pl out.folded > flamegraph.svg
5. strace - System Call Tracing
Basic Usage
strace ./program
strace -e openat,read,write ./program # Filter syscalls
strace -f ./program # Follow forks
strace -c ./program # Summary statistics
strace -T ./program # Show time in syscall
strace -o trace.log ./program # Output to file
Tracing Running Process
strace -p <pid>
strace -p <pid> -e read,write
Useful Patterns
# What files does it open?
strace -e openat ./program 2>&1 | grep -v ENOENT
# What's it waiting on?
strace -e poll,select,epoll_wait ./program
# Network activity
strace -e socket,connect,sendto,recvfrom ./program
6. Memory Debugging
KASAN (Kernel Address Sanitizer)
Detects use-after-free, out-of-bounds access.
# Enable in config
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
# KASAN will print detailed reports on violations:
# BUG: KASAN: use-after-free in my_function+0x42/0x100
KMSAN (Kernel Memory Sanitizer)
Detects uninitialized memory reads.
CONFIG_KMSAN=y
KMEMLEAK
Detects memory leaks.
CONFIG_DEBUG_KMEMLEAK=y
# Scan for leaks
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
SLUB Debugging
# Boot with debug options
slub_debug=FZPU
# Per-slab debugging
echo 1 > /sys/kernel/slab/<cache>/sanity_checks
7. Lockdep - Lock Debugging
Enable Lock Debugging
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
What Lockdep Detects
- Deadlocks (circular dependencies)
- Lock ordering violations
- Incorrect lock usage (spinlock in sleepable context)
- Double locks
Lockdep Output Example
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
(&mm->mmap_lock){++++}-{3:3}, at: do_page_fault+0x123/0x456
but task is already holding lock:
(&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: vfs_read+0x78/0x90
which lock already depends on the new lock.
8. Common Debugging Scenarios
Scenario 1: Module Won’t Load
# Check for errors
dmesg | tail -20
# Common issues:
# - Version mismatch: rebuild against running kernel
# - Symbol not found: check CONFIG options
# - Invalid module format: check architecture
Scenario 2: Kernel Panic
# Get panic info
dmesg | grep -A 20 "Kernel panic"
# Decode stack trace (if symbols available)
scripts/decode_stacktrace.sh < panic.log
# With GDB
(gdb) list *0xffffffff81234567 # Address from trace
Scenario 3: Soft Lockup
# Soft lockup = CPU stuck in kernel for too long
# Look for "watchdog: BUG: soft lockup" in dmesg
# Often caused by:
# - Infinite loop
# - Spinlock held too long
# - Interrupt disabled too long
Scenario 4: Memory Corruption
# Enable debugging
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_KASAN=y
# Use GDB watchpoints
(gdb) watch *(int*)0xaddress
Scenario 5: Performance Issue
# Profile with perf
sudo perf record -a -g sleep 10
sudo perf report
# Check for lock contention
sudo perf lock record
sudo perf lock report
# Check memory bandwidth
sudo perf stat -e LLC-load-misses,LLC-store-misses ./program
9. Debugging Checklist
Before asking for help, verify:
dmesgoutput checked for errors- Module builds without warnings (
make W=1) - Correct kernel version (
uname -rmatches build) - Debug config options enabled
- Latest kernel log captured
- Reproduction steps documented
Information to Include in Bug Reports
1. Kernel version: uname -a
2. Module/code version
3. Exact steps to reproduce
4. Expected vs actual behavior
5. Full dmesg output
6. Config options (if relevant)
7. Hardware info (if relevant)
10. Additional Resources
Documentation
Books
- Linux Kernel Debugging by Kaiwan N. Billimoria
- Linux Device Drivers, 3rd Ed - Chapter 4 (Debugging)
Tools
- crash utility - Kernel crash dump analyzer
- drgn - Programmable debugger
- bpftrace - Dynamic tracing
Quick Troubleshooting
| Symptom | Likely Cause | Debug Steps |
|---|---|---|
| Instant reboot | Kernel panic early | Boot with earlycon, check serial |
| System hang | Deadlock or infinite loop | Enable lockdep, use NMI watchdog |
| Random crashes | Memory corruption | Enable KASAN, SLUB debug |
| Slow performance | Lock contention | Use perf lock, check /proc/lock_stat |
| Module fails to load | Symbol/version mismatch | Check dmesg, rebuild module |