Linux, 工作, 生活, 家人

ARM, Ubuntu

Kdump in ubuntu 18.04 ARM64


kdump is very good tool to debug complex system like cloud data center, it can easy to save kernel status when crash.
Running kdump in ARM64 just like running in x86, but I met a strange problem on ARM64 platform.

The kernel version is HWE kernel, current version is 4.18.0-25, mihbt be encounter those problems

$ crash /usr/lib/debug/boot/vmlinux-4.18.0-25-generic 201907111618/dump.201907111618

crash: cannot determine page size

or (after add pagesize parameter )

$ crash /usr/lib/debug/boot/vmlinux-4.18.0-25-generic 201907111618/dump.201907111618 -p 65536

please wait… (gathering module symbol data)
WARNING: cannot access vmalloc’d module memory
please wait… (gathering task table data)
WARNING: duplicate idle tasks?
please wait… (determining panic task)
crash: invalid kernel virtual address: ffff0000100d0000 type: “64-bit KVADDR”

If you have problem like this, please update your Ubuntu crash version.
In my opinion, Ubuntu 18.04’s kdump crash only for kernel 4.15, not for HWE kernel or above.

In here, suggest to use HWE kernel or later for more hardware compatibility

download latest crash and compile it

vim /etc/apt/source.list
# removed all "#" before # deb-src
apt update
apt-get build-dep linux-crashdump
git clone https://github.com/crash-utility/crash.git
cd crash
make

and it can assign absolutely path to run new version crash, ex:

/root/crash/crash /usr/lib/debug/boot/vmlinux-5.0.0-20-generic 201907171611/dump.201907171611

Prepare QEMU Image and Run It

Use ubuntu cloud image as a QEMU Image would be fast and easy, it can download Ubuntu 18.04 cloud image first

wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-arm64.img

It also can download from local mirror site, like Taiwan mirror server

wget http://ftp.yzu.edu.tw/Linux/ubuntu-cloud-images/bionic/current/bionic-server-cloudimg-arm64.img

After finished download job, it needs to change cloud image default password for login.
please refer another article Change Ubuntu Cloud Image Password. to change it.

Run QEMU With Cloud Image

Before run QEMU command, it needs to install necessary packages

sudo apt install -y qemu-efi qemu bridge-utils

Write bridge devices into qemu

mkdir -p /etc/qemu
echo "allow br0" > /etc/qemu/bridge.conf
echo "allow virbr0" >> /etc/qemu/bridge.conf

Create UEFI Bios image and flush for QEMU

dd if=/dev/zero of=flash0.img bs=1M count=64
dd if=/usr/share/qemu-efi/QEMU_EFI.fd of=flash0.img conv=notrunc
dd if=/dev/zero of=flash1.img bs=1M count=64

Configure you major LAN as bridge device, it can modify setup below to fit current network environment

ORIGNIC=enP6p1s0
ip addr flush $ORIGNIC
brctl addbr br0
brctl addif br0 $ORIGNIC
ifconfig br0 up
ifconfig br0 192.168.1.100 netmask 255.255.255.0
route add default gw 192.168.1.1
echo nameserver 8.8.8.8 >> /etc/resolv.conf


Save text below into a shell file and run it.

  
SERVERFILE=bionic-server-cloudimg-arm64.img

 sudo qemu-system-aarch64 -name kdump-vm \
         -machine virt,gic_version=3,accel=kvm,usb=off \
         -cpu host -m 8192 \
         -smp 8,sockets=1,cores=8,threads=1 \
         -nographic -nodefaults \
         -drive if=pflash,format=raw,readonly,file=flash0.img \
         -drive if=pflash,format=raw,file=flash1.img \
         -drive file=$SERVERFILE,if=none,id=disk1  \
         -device virtio-blk-device,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \
         -netdev tap,id=net0,ifname=tap0 \
         -device virtio-net-device,netdev=net0 \
         -serial telnet::9001,server,nowait > vm_log.txt 2>&1 &
 sleep 5
 brctl addif br0 tap0 

it can use command “telnet localhost 9001” to login qemu image .
if failed, please check vm_log.txt for reason.

if there is blank screen on VM, please check whether copy right qemu-efi into flash0.img.

Running Kdump in VM

It can install openssh-server in VM, thus, it would be more easy to run command remotely. especially, if used telnet, vim usually cannot reflect right terminal window size and cannot wrap word correctly.

Install kdump (only for ARM64 QEMU, if used HEW kernel, it needs to download latest version linux-crashdump)

sudo apt install linux-crashdump

More information about install kdump, please refer to this article Kernel Crash Dump

I would suggest to increase kdump ramdisk size to 512M, not used ubuntu linux-crashdump default size. edit /etc/default/grub.d/kdump-tools.cfg and find your RAM size, change following parameter to 512M, ex: my VM memory size is 8GB, so, I modify 4G-32G paramete to 4G-32G:512M

GRUB_CMDLINE_LINUX_DEFAULT=”$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M”

After update kernel parameter, update grub to make sure it will get new parameter on boot

update-grub2

Install Kernel Debug Symbol

Install GPG Key

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622

Add repository config

codename=$(lsb_release -c | awk  '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list < < EOF
deb http://ddebs.ubuntu.com/ ${codename}      main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates  main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF

Update packages and install symbol. (ref Installing Ubuntu Kernel Debugging Symbols)

sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym

Trigger Kdump

it can used command

kdump-config show 

to check current kdump status.

$ kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0xdfe00000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.0-54-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-54-generic
current state: ready to kdump
kexec command:
/sbin/kexec -p –command-line=”BOOT_IMAGE=/boot/vmlinuz-4.15.0-54-generic root=UUID=6242429e-2876-4d52-a590-99654a6a91ac ro quiet splash vt.handoff=1 nr_cpus=1 systemd.unit=kdump-tools-dump.service” –initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

if evenything is ok, it should have state: ready to kdump .

Trigger Kdump

it can easy to trigger Kdump via sysrq. I will suggest to trigger sysrq on telnet. it will show dump message and reboot message. more easy to know it get right result.

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

First command to configure sysrq to accept all trigger.
Second command is to trigger kernel crash

$ echo c > /proc/sysrq-trigger
[ 31.855103] sysrq: SysRq : Trigger a crash
[ 31.861713] Internal error: Accessing user space memory outside uaccess.h routines: 96000044 [#1] SMP
[ 31.869136] Modules linked in: nls_iso8859_1 sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear aes_ce_blk aes_ce_cipher crc32_ce
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net virtio_blk aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[ 31.879776] CPU: 1 PID: 1385 Comm: bash Not tainted 4.15.0-54-generic #58-Ubuntu
[ 31.882430] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 31.884892] pstate: 00400005 (nzcv daif +PAN -UAO)
[ 31.886641] pc : sysrq_handle_crash+0x24/0x30
[ 31.888198] lr : __handle_sysrq+0xbc/0x1c0
[ 31.889627] sp : ffff00000e513d50
[ 31.890835] x29: ffff00000e513d50 x28: ffff8001f2e93c00
[ 31.892723] x27: ffff000008b32000 x26: 0000000000000040
[ 31.894639] x25: 0000000000000124 x24: ffff0000095ee000
[ 31.896524] x23: 0000000000000004 x22: 0000000000000002
[ 31.898433] x21: 0000000000000063 x20: ffff000009550000
[ 31.900309] x19: ffff0000095ee4e0 x18: ffffffffffffffff
[ 31.902229] x17: 0000000000000000 x16: 0000000000000000
[ 31.904127] x15: ffff000009528c08 x14: ffff0000896d687f
[ 31.906046] x13: ffff0000096d688d x12: ffff00000954f000
[ 31.907954] x11: ffff000009529660 x10: ffff000008708540
[ 31.909848] x9 : 00000000ffffffd0 x8 : 0000000000000017
[ 31.911801] x7 : 6767697254203a20 x6 : ffff8001ffdc82e8
[ 31.913655] x5 : ffff8001ffdc82e8 x4 : 0000000000000000
[ 31.915569] x3 : ffff8001ffdd06c8 x2 : 068be67f3165eb00
[ 31.917430] x1 : 0000000000000000 x0 : 0000000000000001
[ 31.919312] Process bash (pid: 1385, stack limit = 0x (ptrval))
[ 31.921599] Call trace:
[ 31.922506] sysrq_handle_crash+0x24/0x30
[ 31.923964] __handle_sysrq+0xbc/0x1c0
[ 31.925311] write_sysrq_trigger+0xd8/0x120
[ 31.926851] proc_reg_write+0x80/0xc0
[ 31.928178] __vfs_write+0x48/0x80
[ 31.929399] vfs_write+0xac/0x1b0
[ 31.930618] SyS_write+0x6c/0xd8
[ 31.931777] el0_svc_naked+0x30/0x34
[ 31.933062] Code: 52800020 b90ca020 d5033e9f d2800001 (39000020)
[ 31.935269] SMP: stopping secondary CPUs
[ 31.937745] Starting crashdump kernel…
[ 31.939108] Bye!

After kdump save current status and reboot , it will save crash log on /var/crash, run crash command below, it can see some output

$ crash –mod /usr/lib/debug/lib/modules/4.15.0-54-generic/ /usr/lib/debug/boot/vmlinux-4.15.0-
54-generic 201907150556/dump.201907150556
crash 7.2.1
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter “help copying” to see the conditions.
This program has absolutely no warranty. Enter “help warranty” for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “aarch64-unknown-linux-gnu”…
KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-54-generic DUMPFILE: 201907150556/dump.201907150556 [PARTIAL DUMP] CPUS: 12 DATE: Mon Jul 15 05:56:07 2019 UPTIME: 2135039823346 days, 00:15:35
LOAD AVERAGE: 0.41, 0.32, 0.13
TASKS: 202
NODENAME: ubuntu
RELEASE: 4.15.0-54-generic
VERSION: #58-Ubuntu SMP Mon Jun 24 10:56:40 UTC 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 8 GB
PANIC: “sysrq: SysRq : Trigger a crash”
PID: 1444
COMMAND: “bash”
TASK: ffff8001f019ad00 [THREAD_INFO: ffff8001f019ad00]
CPU: 1
STATE: TASK_RUNNING (SYSRQ)

Next step, more how to use crash information, please refer to article Analyzing Linux kernel crash dumps with crash – The one tutorial that has it all

發佈留言