Labels

Thursday, March 12, 2015

Analyze Linux kernel crashes using kdump

A kernel panic is an action taken by the operating system when fatal errors occur in the kernel. Hardware failures or software bugs are the main reasons cause that kind of errors. To see what happened inside the kernel and to debug or analyze those kernel panic issues and crashes, there are some tools and mechanisms, kdump is one of them.

kdump can dump and save the machine memory content on kernel crash to a specified location, to a disk or to a machine connected over a network. It is based on kexec which gives the facility to the boot another kernel without rebooting the computer or without going through bios. When a kernel panic or a crash occurs kernel crash handler boots the dump capture kernel using kexec and gives the ability to get access to memory image of the crashed system kernel through vmcore and to save it.

To use this,

1. Install kexec-tools
$ wget http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
$ tar xvpzf kexec-tools.tar.gz
$ ls -d kexec-tools-* #to check the VERSION number 
$ cd kexec-tools-VERSION
$ ./configure
$ make
$ sudo make install

2. Dump-capture kernel
There are two methods of using kdump.
a) Build a separate custom dump-capture kernel
b) Use system kernel binary itself as dump-capture-kernel
Most operating systems has the kdump packages installed by default. However to build system/dump capture kernel, in the kernel config file, following options should be enabled.

CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y

3. Setup kdump kernel
For dump capture kernel, some memory should be reserved. To do that boot need to be edited. In boot loader configuration file, crashkernel=512M-2G:64M, 2G-:128M line to the system kernel. This means, if ram is below 512mb reserve no space, if ram is in range 512mb to 2gb reserve 64mb and ram is greater than 2gb reserve 128mb for the dump-capture kernel.

a) syslinux(/boot/syslinux/syslinux.cfg)
APPEND root=/dev/sdx crashkernel=512M-2G:64M, 2G-:128M

b) grub(/boot/grub/grub.conf)
linux /vmlinuz-3.x.x.x ro root=dev/sdx crashkernel=512M-2G:64M, 2G-:128M

4. Configure the kdump.conf file and enable the kdump service
There is a description in this /etc/kdump.conf file itself, kdump can be configured as per user's needs. Enable the kdump service and reboot the system.

$ sudo systemctl enable kdump.service

5. Test kdump
To check whether the kdump service is running,

$ systemctl status kdump.service

Linux kernel has a feature to crash the kernel immediately for testing purposes. To do this switch user to root and,

# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

After the kdump process, the crash can be analyzed using the captured dump file using a suitable tool.

No comments:

Post a Comment