kdump is very good tool to debug complex system like cloud data center, it can easy to save kernel status when crash.
Running kdump in ARM64 just like running in x86, but I met a strange problem on ARM64 platform.
The kernel version is HWE kernel, current version is 4.18.0-25, mihbt be encounter those problems
$ crash /usr/lib/debug/boot/vmlinux-4.18.0-25-generic 201907111618/dump.201907111618
crash: cannot determine page size
or (after add pagesize parameter )
$ crash /usr/lib/debug/boot/vmlinux-4.18.0-25-generic 201907111618/dump.201907111618 -p 65536
please wait… (gathering module symbol data)
WARNING: cannot access vmalloc’d module memory
please wait… (gathering task table data)
WARNING: duplicate idle tasks?
please wait… (determining panic task)
crash: invalid kernel virtual address: ffff0000100d0000 type: “64-bit KVADDR”
If you have problem like this, please update your Ubuntu crash version.
In my opinion, Ubuntu 18.04’s kdump crash only for kernel 4.15, not for HWE kernel or above.
In here, suggest to use HWE kernel or later for more hardware compatibility
download latest crash and compile it
vim /etc/apt/source.list # removed all "#" before # deb-src apt update apt-get build-dep linux-crashdump git clone https://github.com/crash-utility/crash.git cd crash make
and it can assign absolutely path to run new version crash, ex:
/root/crash/crash /usr/lib/debug/boot/vmlinux-5.0.0-20-generic 201907171611/dump.201907171611
Prepare QEMU Image and Run It
Use ubuntu cloud image as a QEMU Image would be fast and easy, it can download Ubuntu 18.04 cloud image first
wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-arm64.img
It also can download from local mirror site, like Taiwan mirror server
wget http://ftp.yzu.edu.tw/Linux/ubuntu-cloud-images/bionic/current/bionic-server-cloudimg-arm64.img
After finished download job, it needs to change cloud image default password for login.
please refer another article Change Ubuntu Cloud Image Password. to change it.
Run QEMU With Cloud Image
Before run QEMU command, it needs to install necessary packages
sudo apt install -y qemu-efi qemu bridge-utils
Write bridge devices into qemu
mkdir -p /etc/qemu echo "allow br0" > /etc/qemu/bridge.conf echo "allow virbr0" >> /etc/qemu/bridge.conf
Create UEFI Bios image and flush for QEMU
dd if=/dev/zero of=flash0.img bs=1M count=64 dd if=/usr/share/qemu-efi/QEMU_EFI.fd of=flash0.img conv=notrunc dd if=/dev/zero of=flash1.img bs=1M count=64
Configure you major LAN as bridge device, it can modify setup below to fit current network environment
ORIGNIC=enP6p1s0
ip addr flush $ORIGNIC
brctl addbr br0
brctl addif br0 $ORIGNIC
ifconfig br0 up
ifconfig br0 192.168.1.100 netmask 255.255.255.0
route add default gw 192.168.1.1
echo nameserver 8.8.8.8 >> /etc/resolv.conf
Save text below into a shell file and run it.
SERVERFILE=bionic-server-cloudimg-arm64.img sudo qemu-system-aarch64 -name kdump-vm \ -machine virt,gic_version=3,accel=kvm,usb=off \ -cpu host -m 8192 \ -smp 8,sockets=1,cores=8,threads=1 \ -nographic -nodefaults \ -drive if=pflash,format=raw,readonly,file=flash0.img \ -drive if=pflash,format=raw,file=flash1.img \ -drive file=$SERVERFILE,if=none,id=disk1 \ -device virtio-blk-device,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \ -netdev tap,id=net0,ifname=tap0 \ -device virtio-net-device,netdev=net0 \ -serial telnet::9001,server,nowait > vm_log.txt 2>&1 & sleep 5 brctl addif br0 tap0
it can use command “telnet localhost 9001” to login qemu image .
if failed, please check vm_log.txt for reason.
if there is blank screen on VM, please check whether copy right qemu-efi into flash0.img.
Running Kdump in VM
It can install openssh-server in VM, thus, it would be more easy to run command remotely. especially, if used telnet, vim usually cannot reflect right terminal window size and cannot wrap word correctly.
Install kdump (only for ARM64 QEMU, if used HEW kernel, it needs to download latest version linux-crashdump)
sudo apt install linux-crashdump
More information about install kdump, please refer to this article Kernel Crash Dump
I would suggest to increase kdump ramdisk size to 512M, not used ubuntu linux-crashdump default size. edit /etc/default/grub.d/kdump-tools.cfg and find your RAM size, change following parameter to 512M, ex: my VM memory size is 8GB, so, I modify 4G-32G paramete to 4G-32G:512M
GRUB_CMDLINE_LINUX_DEFAULT=”$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M”
After update kernel parameter, update grub to make sure it will get new parameter on boot
update-grub2
Install Kernel Debug Symbol
Install GPG Key
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622
Add repository config
codename=$(lsb_release -c | awk '{print $2}') sudo tee /etc/apt/sources.list.d/ddebs.list < < EOF deb http://ddebs.ubuntu.com/ ${codename} main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-updates main restricted universe multiverse deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse EOF
Update packages and install symbol. (ref Installing Ubuntu Kernel Debugging Symbols)
sudo apt-get update sudo apt-get install linux-image-$(uname -r)-dbgsym
Trigger Kdump
it can used command
kdump-config show
to check current kdump status.
$ kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0xdfe00000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.0-54-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-54-generic
current state: ready to kdump
kexec command:
/sbin/kexec -p –command-line=”BOOT_IMAGE=/boot/vmlinuz-4.15.0-54-generic root=UUID=6242429e-2876-4d52-a590-99654a6a91ac ro quiet splash vt.handoff=1 nr_cpus=1 systemd.unit=kdump-tools-dump.service” –initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
if evenything is ok, it should have state: ready to kdump .
Trigger Kdump
it can easy to trigger Kdump via sysrq. I will suggest to trigger sysrq on telnet. it will show dump message and reboot message. more easy to know it get right result.
echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger
First command to configure sysrq to accept all trigger.
Second command is to trigger kernel crash
$ echo c > /proc/sysrq-trigger
[ 31.855103] sysrq: SysRq : Trigger a crash
[ 31.861713] Internal error: Accessing user space memory outside uaccess.h routines: 96000044 [#1] SMP
[ 31.869136] Modules linked in: nls_iso8859_1 sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear aes_ce_blk aes_ce_cipher crc32_ce
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net virtio_blk aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[ 31.879776] CPU: 1 PID: 1385 Comm: bash Not tainted 4.15.0-54-generic #58-Ubuntu
[ 31.882430] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 31.884892] pstate: 00400005 (nzcv daif +PAN -UAO)
[ 31.886641] pc : sysrq_handle_crash+0x24/0x30
[ 31.888198] lr : __handle_sysrq+0xbc/0x1c0
[ 31.889627] sp : ffff00000e513d50
[ 31.890835] x29: ffff00000e513d50 x28: ffff8001f2e93c00
[ 31.892723] x27: ffff000008b32000 x26: 0000000000000040
[ 31.894639] x25: 0000000000000124 x24: ffff0000095ee000
[ 31.896524] x23: 0000000000000004 x22: 0000000000000002
[ 31.898433] x21: 0000000000000063 x20: ffff000009550000
[ 31.900309] x19: ffff0000095ee4e0 x18: ffffffffffffffff
[ 31.902229] x17: 0000000000000000 x16: 0000000000000000
[ 31.904127] x15: ffff000009528c08 x14: ffff0000896d687f
[ 31.906046] x13: ffff0000096d688d x12: ffff00000954f000
[ 31.907954] x11: ffff000009529660 x10: ffff000008708540
[ 31.909848] x9 : 00000000ffffffd0 x8 : 0000000000000017
[ 31.911801] x7 : 6767697254203a20 x6 : ffff8001ffdc82e8
[ 31.913655] x5 : ffff8001ffdc82e8 x4 : 0000000000000000
[ 31.915569] x3 : ffff8001ffdd06c8 x2 : 068be67f3165eb00
[ 31.917430] x1 : 0000000000000000 x0 : 0000000000000001
[ 31.919312] Process bash (pid: 1385, stack limit = 0x (ptrval))
[ 31.921599] Call trace:
[ 31.922506] sysrq_handle_crash+0x24/0x30
[ 31.923964] __handle_sysrq+0xbc/0x1c0
[ 31.925311] write_sysrq_trigger+0xd8/0x120
[ 31.926851] proc_reg_write+0x80/0xc0
[ 31.928178] __vfs_write+0x48/0x80
[ 31.929399] vfs_write+0xac/0x1b0
[ 31.930618] SyS_write+0x6c/0xd8
[ 31.931777] el0_svc_naked+0x30/0x34
[ 31.933062] Code: 52800020 b90ca020 d5033e9f d2800001 (39000020)
[ 31.935269] SMP: stopping secondary CPUs
[ 31.937745] Starting crashdump kernel…
[ 31.939108] Bye!
After kdump save current status and reboot , it will save crash log on /var/crash, run crash command below, it can see some output
$ crash –mod /usr/lib/debug/lib/modules/4.15.0-54-generic/ /usr/lib/debug/boot/vmlinux-4.15.0-
54-generic 201907150556/dump.201907150556
crash 7.2.1
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter “help copying” to see the conditions.
This program has absolutely no warranty. Enter “help warranty” for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “aarch64-unknown-linux-gnu”…
KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-54-generic DUMPFILE: 201907150556/dump.201907150556 [PARTIAL DUMP] CPUS: 12 DATE: Mon Jul 15 05:56:07 2019 UPTIME: 2135039823346 days, 00:15:35
LOAD AVERAGE: 0.41, 0.32, 0.13
TASKS: 202
NODENAME: ubuntu
RELEASE: 4.15.0-54-generic
VERSION: #58-Ubuntu SMP Mon Jun 24 10:56:40 UTC 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 8 GB
PANIC: “sysrq: SysRq : Trigger a crash”
PID: 1444
COMMAND: “bash”
TASK: ffff8001f019ad00 [THREAD_INFO: ffff8001f019ad00]
CPU: 1
STATE: TASK_RUNNING (SYSRQ)
Next step, more how to use crash information, please refer to article Analyzing Linux kernel crash dumps with crash – The one tutorial that has it all
發佈留言