BUG assertion triggered when replacing a physical page in a process - linux

I modified the Linux kernel in a way to have it modify some of the memory pages of a specific process. In summary, the functions I wrote receive a process id and address in that process, they then replace the page at that specific address with another dummy page. Finally, one of the functions call __free_page() on the original page that was replaced.
The problem is I get this error from the Linux kernel when it tries to reuse the original page. So, what is that flag it is complaining about? and how to get rid of this error? here is the relevant lines from syslog.
Thanks.
Nov 14 19:15:23 localhost kernel: [ 1466.949451] BUG: Bad page state in process mytestapp pfn:7d309
Nov 14 19:15:23 localhost kernel: [ 1466.949452] page:ffffea0001f4c240 count:-1 mapcount:0 mapping: (null) index:0x7fd632179
Nov 14 19:15:23 localhost kernel: [ 1466.949453] page flags: 0x100000000000000()
Nov 14 19:15:23 localhost kernel: [ 1466.949453] Modules linked in: test_module(O) acpiphp bnep rfcomm bluetooth binfmt_misc joydev hid_generic usbhid hid snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm ghash_clmulni_intel snd_seq_midi snd_rawmidi snd_seq_midi_event ppdev snd_seq aesni_intel ablk_helper cryptd aes_x86_64 snd_timer snd_seq_device psmouse microcode snd vmw_balloon acpi_memhotplug parport_pc soundcore snd_page_alloc vmwgfx ttm mac_hid drm i2c_piix4 serio_raw shpchp lp parport e1000 mptspi mptscsih mptbase floppy vmw_pvscsi vmxnet3
Nov 14 19:15:23 localhost kernel: [ 1466.949484] Pid: 15064, comm: mytestapp Tainted: G B O 3.6.11-elasticos-0.01 #31
Nov 14 19:15:23 localhost kernel: [ 1466.949485] Call Trace:
Nov 14 19:15:23 localhost kernel: [ 1466.949487] [<ffffffff8111941f>] bad_page+0xbf/0x110
Nov 14 19:15:23 localhost kernel: [ 1466.949505] [<ffffffff8111aac9>] get_page_from_freelist+0x6f9/0x810
Nov 14 19:15:23 localhost kernel: [ 1466.949508] [<ffffffff8111a702>] ? get_page_from_freelist+0x332/0x810
Nov 14 19:15:23 localhost kernel: [ 1466.949509] [<ffffffff8111b06e>] __alloc_pages_nodemask+0x48e/0x9b0
Nov 14 19:15:23 localhost kernel: [ 1466.949512] [<ffffffff8111f03a>] ? pagevec_lru_move_fn+0xea/0x110
Nov 14 19:15:23 localhost kernel: [ 1466.949514] [<ffffffff81154ec3>] alloc_pages_vma+0xb3/0x190
Nov 14 19:15:23 localhost kernel: [ 1466.949515] [<ffffffff811397cc>] handle_pte_fault+0x56c/0xb00
Nov 14 19:15:23 localhost kernel: [ 1466.949517] [<ffffffff810473f7>] ? pte_alloc_one+0x37/0x50
Nov 14 19:15:23 localhost kernel: [ 1466.949527] [<ffffffff8113afd9>] handle_mm_fault+0x259/0x340
Nov 14 19:15:23 localhost kernel: [ 1466.949538] [<ffffffff8107c218>] ? up_read+0x18/0x30
Nov 14 19:15:23 localhost kernel: [ 1466.949540] [<ffffffff816213d2>] do_page_fault+0x152/0x520
Nov 14 19:15:23 localhost kernel: [ 1466.949541] [<ffffffff8108c36d>] ? set_next_entity+0x9d/0xb0
Nov 14 19:15:23 localhost kernel: [ 1466.949543] [<ffffffff810135ca>] ? __switch_to+0x17a/0x410
Nov 14 19:15:23 localhost kernel: [ 1466.949545] [<ffffffff8161de65>] page_fault+0x25/0x30

This macro checks that appropriate page flags are unset. As I see in your case you have problems with PG_LOCKED flag set. It means that you freed locked page. See unlock_page to handle this or (probably) use free_page instead

Related

Soft lockup occured in VMX testing in Linux

I am testing VMX in Linux (Ubuntu-16.04) in a VMware platform.
When the VM is running a long loop, the host Linux hit CPU softlock, as follows,
Jun 11 00:01:13 ubuntu kernel: [ 5624.196130] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [guest:8297]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196134] Modules linked in: vmm(OE) rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bnep vmw_vsock_vmci_transport vsock crct10dif_pclmul vmw_balloon crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev btusb btrtl btbcm btintel snd_ens1371 input_leds serio_raw bluetooth snd_ac97_codec gameport ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore i2c_piix4 shpchp vmw_vmci 8250_fintek mac_hid kvm_intel kvm irqbypass parport_pc ppdev lp parport autofs4 hid_generic usbhid hid vmwgfx psmouse ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm ahci mptbase libahci e1000 scsi_transport_spi pata_acpi fjes [last unloaded: vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196210] CPU: 0 PID: 8297 Comm: guest Tainted: G OEL 4.4.0-127-generic #153-Ubuntu
Jun 11 00:01:13 ubuntu kernel: [ 5624.196212] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
Jun 11 00:01:13 ubuntu kernel: [ 5624.196213] task: ffff880092788000 ti: ffff88007ffb8000 task.ti: ffff88007ffb8000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196215] RIP: 0010:[<ffffffffc0648c7f>] [<ffffffffc0648c7f>] failInvalid+0x18/0xa9 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196221] RSP: 0018:ffff88007ffbbe60 EFLAGS: 00000246
Jun 11 00:01:13 ubuntu kernel: [ 5624.196223] RAX: 0000000000000000 RBX: 00000000006041c0 RCX: 0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196224] RDX: 0000000000001dfa RSI: 0000000000000000 RDI: 000000000000640a
Jun 11 00:01:13 ubuntu kernel: [ 5624.196225] RBP: ffff88007ffbbe70 R08: 000007ff00000017 R09: 0000000000000010
Jun 11 00:01:13 ubuntu kernel: [ 5624.196226] R10: c093ffffffff0000 R11: 0000000000100000 R12: 00000000006041c0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196227] R13: ffff8800b37d2600 R14: 0000000040087401 R15: 00000000006041c0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196229] FS: 00007f8793426700(0000) GS:ffff8800ba600000(0000) knlGS:0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196230] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 00:01:13 ubuntu kernel: [ 5624.196231] CR2: 0000000000910000 CR3: 00000000671f2000 CR4: 0000000000362670
Jun 11 00:01:13 ubuntu kernel: [ 5624.196237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196238] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 11 00:01:13 ubuntu kernel: [ 5624.196239] Stack:
Jun 11 00:01:13 ubuntu kernel: [ 5624.196240] 00000000006041c0 00000000006041c0 ffff88007ffbbe88 ffffffffc064a538
Jun 11 00:01:13 ubuntu kernel: [ 5624.196242] ffff8800907ab398 ffff88007ffbbe98 ffffffffc064a83c ffff88007ffbbf08
Jun 11 00:01:13 ubuntu kernel: [ 5624.196244] ffffffff81227f8f 0000000000000002 ffff8800b399e610 ffff88008cf50b10
Jun 11 00:01:13 ubuntu kernel: [ 5624.196246] Call Trace:
Jun 11 00:01:13 ubuntu kernel: [ 5624.196251] [<ffffffffc064a538>] vmm_vcpu_run+0x58/0x310 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196254] [<ffffffffc064a83c>] my_ioctl+0x4c/0x50 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196258] [<ffffffff81227f8f>] do_vfs_ioctl+0x2af/0x4b0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196260] [<ffffffff81214329>] ? vfs_write+0x149/0x1a0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196262] [<ffffffff81228209>] SyS_ioctl+0x79/0x90
Jun 11 00:01:13 ubuntu kernel: [ 5624.196265] [<ffffffff81850c88>] entry_SYSCALL_64_fastpath+0x1c/0xbb
Jun 11 00:01:13 ubuntu kernel: [ 5624.196267] Code: 00 48 8d 1c 25 74 19 65 c0 0f 78 03 8b 04 25 74 19 65 c0 41 5f 41 5e 41 5d 41 5c 41 5b 41 5a 41 59 41 58 5f 5e 5d 5a 59 5b 58 9d <48> c7 c3 68 01 65 c0 eb 19 4c 8b 23 e8 c0 c4 ff ff 41 89 04 24
VMM enables VMEXIT on Interrupt, and there is timer in VMM to check VM state.
If I called touch_softlockup_watchdog() in the timer function, there is NO such lockup happened.
But I think there is still NO chance for other threads/processes to scheduled on the pCPU where the VM is running.
So the question is, how can the VMM make host Linux scheduler schedule other entities on the pCPU, instead of letting the VM to take over the whole pCPU, calling schedule() or something else?
The VM code is loaded and started by a host application, which loads the VM image into a memory region allocated/setup by VMM. Then sets the vCPU context (like, GPRs, RIP, segment registers, etc), and asks the VMM to start the VM by calling 'VMLaunch' and 'VMResume'.
The easiest way to avoid this is to call schedule() in the handling loop of vmexit in kernel space.
I'm currently facing the exact same problem you described. I'm running a ubuntu 18 in VMWare and try to setup a kernel module than enable VMX. I'm able to run the initialization routine which setup control, guest and host before executing vmlaunch instruction.
vmlaunch execute sucessfully, as I quickly enter hypervisor mode because of wrmsr and rdmsr from the current OS.
However, it seems that I get stuck very quickly, my call trace mostly looks like yours. I don't undestand well what's happening but it seems that i'm trap in exit_to_usermode_loop(). Is it the problem you faced ?

Pulling of docker image from a registry fails

Would love some help about an issue. When I was trying to download an image form the repository it failed and I could see the following errors in the log file. This is the syslog which I received
Jan 4 10:22:05 <hostname> kernel: device-mapper: thin: commit failed: error = -22
Jan 4 10:22:05 <hostname> kernel: device-mapper: thin: switching pool to read-only mode
Jan 4 10:22:05 <hostname> kernel: bio: create slab <bio-2> at 2
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: thin: process_bio_read_only: dm_thin_find_block() failed: error = -15
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388592
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388592
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388606
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388606
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 0x
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 0
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 1
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map common: dm_tm_shadow_block() failed
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map common: dm_tm_shadow_block() failed
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map metadata: unable to allocate new metadata block
Jan 4 10:22:08 <hostname> kernel: device-mapper: thin: Deletion of thin device 66 failed.
Jan 4 10:22:10 <hostname> kernel: device-mapper: table: 252:3: thin: Couldn't open thin internal device
Jan 4 10:22:10 <hostname> kernel: device-mapper: ioctl: error adding target to table
Jan 4 10:22:10 <hostname> kernel: device-mapper: thin: Deletion of thin device 66 failed.
Jan 4 10:22:10 <hostname> kernel: device-mapper: table: 252:3: thin: Couldn't open thin internal device
This is the logs from docker
time="2017-01-04T10:22:08Z" level=error msg="Error from V2 registry: read /dev/mapper/docker-252:0-263231-3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880: input/output error"
Error: image ims:<image_tag> not found
time="2017-01-04T10:22:09Z" level=info msg="-job pull(<iamge>, <image_tag>) = ERR (1)"
time="2017-01-04T10:22:09Z" level=info msg="POST /v1.18/containers/create?name=<container_name>"
time="2017-01-04T10:22:09Z" level=info msg="+job create(<container_name>)"No such image: <iamge>:<image_tag> (tag: <image_tag>)
time="2017-01-04T10:22:09Z" level=info msg="-job create(<container_name>) = ERR (1)"
time="2017-01-04T10:22:09Z" level=error msg="Handler for POST /containers/create returned error: No such image: <iamge>:<image_tag> (tag: <image_tag>)"
time="2017-01-04T10:22:09Z" level=error msg="HTTP Error: statusCode=404
No such image: <iamge>:<image_tag> (tag: <image_tag>)"
time="2017-01-04T10:22:09Z" level=info msg="POST /v1.18/images/create?fromImage=<image_url>"
time="2017-01-04T10:22:09Z" level=info msg="+job pull(<iamge>, <image_tag>)"
time="2017-01-04T10:22:09Z" level=info msg="+job resolve_repository(<iamge>)"
time="2017-01-04T10:22:09Z" level=info msg="-job resolve_repository(<iamge>) = OK (0)"
time="2017-01-04T10:22:09Z" level=info msg="+job trust_key_check(/xyz)"
time="2017-01-04T10:22:09Z" level=info msg="-job trust_key_check(/xyz) = OK (0)"
time="2017-01-04T10:22:10Z" level=error msg="Error from V2 registry: Driver devicemapper failed to create image rootfs 3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880: device 3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880 already exists”
Please let me know if any other data is needed. I have the backup of the old docker lib folder.

Is __init attribute used in loadable kernel modules?

The description at this - http://www.tldp.org/LDP/lkmpg/2.4/html/x281.htm - page (as well as some related answers on SO, for example the answer here - __init and __exit macros usage for built-in and loadable modules ) says
The __init macro causes the init function to be discarded and its memory freed once the init function finishes for built-in drivers, but not loadable modules.
However, I tried to insert the following module, in which I try to call the init functions with __init attribute from a non-init function (f2()), and I get error from the kernel, thus indicating that __init has an effect on loadable modules as well.
How and where can I find reliable information about this?
My (above mentioned) program:
#include<linux/module.h>
#include<linux/init.h>
static int __init f1(void){
printk(KERN_ALERT "hello \n");
return 0;
}
static void __exit f2(void){
f1();
printk(KERN_ALERT "bye N\n");
}
module_init(f1);
module_exit(f2);
Error from the kernel:
Jul 8 08:15:51 localhost kernel: hello NOTICE
Jul 8 08:15:54 localhost kernel: [303032.948188] BUG: unable to handle kernel paging request at f9b13000
Jul 8 08:15:54 localhost kernel: [303032.949003] IP: [<f9b13000>] 0xf9b12fff
Jul 8 08:15:54 localhost kernel: [303032.949003] *pdpt = 0000000000d3c001 *pde = 000000003100b067 *pte = 0000000000000000
Jul 8 08:15:54 localhost kernel: [303032.949003] Modules linked in: hello(POF-) tcp_lp lp wacom fuse bnep bluetooth ip6t_rpfilter ip6t_REJECT cfg80211 rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm coretemp kvm iTCO_wdt iTCO_vendor_support ppdev r8169 mii snd_page_alloc snd_timer snd soundcore microcode serio_raw i2c_i801 lpc_ich mfd_core parport_pc parport acpi_cpufreq mperf binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc i915 ata_generic i2c_algo_bit pata_acpi drm_kms_helper drm i2c_core video [last unloaded: hello]
Jul 8 08:15:54 localhost kernel: [303032.949003] CPU: 1 PID: 11924 Comm: rmmod Tainted: PF O 3.11.10-301.fc20.i686+PAE #1
Jul 8 08:15:54 localhost kernel: [303032.949003] Hardware name: /DG41RQ, BIOS RQG4110H.86A.0013.2009.1223.1136 12/23/2009
Jul 8 08:15:54 localhost kernel: [303032.949003] task: d1bad780 ti: c33a4000 task.ti: c33a4000
Jul 8 08:15:54 localhost kernel: [303032.949003] EIP: 0060:[<f9b13000>] EFLAGS: 00010282 CPU: 1
Jul 8 08:15:54 localhost kernel: [303032.949003] EIP is at 0xf9b13000
Jul 8 08:15:54 localhost kernel: [303032.949003] EAX: f9af6000 EBX: f9af8000 ECX: c0c77270 EDX: 00000000
Jul 8 08:15:54 localhost kernel: [303032.949003] ESI: 00000000 EDI: 00000000 EBP: c33a5f3c ESP: c33a5f30
Jul 8 08:15:54 localhost kernel: [303032.949003] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul 8 08:15:54 localhost kernel: [303032.949003] CR0: 8005003b CR2: f9b13000 CR3: 10e7b000 CR4: 000407f0
Jul 8 08:15:54 localhost kernel: [303032.949003] Stack:
Jul 8 08:15:54 localhost kernel: [303032.949003] f9af600b 00000000 00000000 c33a5fac c04b06a9 f4852ac0 c33a5f50 c057989d
Jul 8 08:15:54 localhost kernel: [303032.949003] 00000000 f9af8000 00000800 c33a5f50 6c6c6568 0000006f f4852ac0 f5312490
Jul 8 08:15:54 localhost kernel: [303032.949003] db5e7100 00000000 d1bad780 d1bada9c c33a5f88 c056160d c33a5f9c c046e9de
Jul 8 08:15:54 localhost kernel: [303032.949003] Call Trace:
Jul 8 08:15:54 localhost kernel: [303032.949003] [<f9af600b>] ? f2+0xb/0x1000 [hello]
Jul 8 08:15:54 localhost kernel: [303032.949003] [<c04b06a9>] SyS_delete_module+0x149/0x2a0
Jul 8 08:15:54 localhost kernel: [303032.949003] [<c057989d>] ? mntput+0x1d/0x30
Jul 8 08:15:54 localhost kernel: [303032.949003] [<c056160d>] ? ____fput+0xd/0x10
Jul 8 08:15:54 localhost kernel: [303032.949003] [<c046e9de>] ? task_work_run+0x7e/0xb0
Jul 8 08:15:54 localhost kernel: [303032.949003] [<c099ff0d>] sysenter_do_call+0x12/0x28
Jul 8 08:15:54 localhost kernel: [303032.949003] Code: Bad EIP value.
Jul 8 08:15:54 localhost kernel: [303032.949003] EIP: [<f9b13000>] 0xf9b13000 SS:ESP 0068:c33a5f30
Jul 8 08:15:54 localhost kernel: [303032.949003] CR2: 00000000f9b13000
Jul 8 08:15:54 localhost kernel: [303032.949003] ---[ end trace ca338922043618f4 ]---
Jul 8 08:15:54 localhost kernel: BUG: unable to handle kernel paging request at f9b13000
Jul 8 08:15:54 localhost kernel: IP: [<f9b13000>] 0xf9b12fff
Jul 8 08:15:54 localhost kernel: *pdpt = 0000000000d3c001 *pde = 000000003100b067 *pte = 0000000000000000
Actually, __init attribute affects on loadable modules code.
Probably, this is misprint in the book you refers.
BTW, you should get warning about sections mismatching when build given module.

INFO: task v8:SweeperThrea:<pid> blocked for more than 120 seconds

when running node a simple node process in container I see this in my kernel log and the process becomes defunct:
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745710] INFO: task v8:SweeperThrea:2569 blocked for more than 120 seconds.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745723] v8:SweeperThrea D 0000000000000000 0 2569 2470 0x00000002
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745727] ffff8801d228fca8 0000000000000246 ffff8801d0fb1740 ffff8801d228ffd8
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745730] ffff8801d228ffd8 ffff8801d228ffd8 ffffffff81c15440 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745737] ffff8801d0fb1740 ffff8801d02f8878 0000000000000002 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745741] Call Trace:
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745745] [<ffffffff816ca029>] schedule+0x29/0x70
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745748] [<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745752] [<ffffffff8105e91c>] do_exit+0x85c/0x9d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745755] [<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745759] [<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745764] [<ffffffff8101439f>] do_signal+0x3f/0x8d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745767] [<ffffffff810f3427>] ? call_rcu_sched+0x17/0x20
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745771] [<ffffffff8108429f>] ? __put_cred+0x3f/0x50
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745775] [<ffffffff810843b9>] ? abort_creds+0x29/0x30
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745779] [<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745781] [<ffffffff816d3a9a>] int_signal+0x12/0x17
more info:
docker is doing a SIGTERM then SIGKILL on the process.
docker -d -D logs:
[debug] server.go:924 Calling POST /containers/create
2014/03/21 19:04:11 POST /v1.10/containers/create
[/var/lib/docker|90a3fa34] +job create()
[/var/lib/docker|90a3fa34] -job create() = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/start
2014/03/21 19:04:11 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/start
[/var/lib/docker|90a3fa34] +job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] +job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] -job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[/var/lib/docker|90a3fa34] -job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/stop
2014/03/21 19:04:16 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/stop?t=10
[/var/lib/docker|90a3fa34] +job stop(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
2014/03/21 19:04:26 Container 074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766 failed to exit within 10 seconds of SIGTERM - using the force
2014/03/21 19:04:36 Container SIGKILL failed to exit within 10 seconds of lxc-kill 074dda12cb03 - trying direct SIGKILL
pstree:
docker(2387)─node(2532)─{node}(2569)
ps aux
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 2387 2386 0 80 0 - 153471 futex_ pts/0 00:00:07 docker
4 Z 0 2532 2387 0 80 0 - 0 exit ? 00:00:00 node <defunct>
no files in /proc/2532/fd
no files in /proc/2532//task/2569/fd/
stack from /proc/2532/stack
[<ffffffff8105e763>] do_exit+0x6a3/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8105eb87>] sys_exit_group+0x17/0x20
[<ffffffff816d37dd>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
stack from /proc/2532/task/2569/stack
[<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
[<ffffffff8105e91c>] do_exit+0x85c/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
[<ffffffff8101439f>] do_signal+0x3f/0x8d0
[<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
[<ffffffff816d3a9a>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
the repro script:
CNT=0
while true
do
echo $CNT
DOCK=$(sudo docker run -d -t anandkumarpatel/zombie_bug ./node index.js)
sleep 60 && sudo docker stop $DOCK > out.log &
sleep 1
CNT=$(($CNT+1))
if [[ "$CNT" == "50" ]]; then
exit
fi
done
strace of docker deamon during a failed kill can be found on pastebin:
http://pastebin.com/HxDwiRBW
my system info and versions. I am using custom build docker but it is forked form 0.9.0 release with 2 small patches. but this will also repro on clean 0.9.0 release.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.04
Release: 13.04
Codename: raring
$ sudo docker version
Client version: 0.9.0
Go version (client): go1.2.1
Git commit (client): 70f72ea
Server version: 0.9.0
Git commit (server): 70f72ea
Go version (server): go1.2.1
Last stable version: 0.9.0
$ uname -a
Linux ip-10-0-2-233 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
HOWEVER, this does not repo on every system! for some reason it only repros on our production servers and one other server. they all have similar configs but same ubuntu version.
Let me know if there is more info to gather and I will grab it. I have near 100% repro rate on a test system so I can gather whatever is needed.
related:
https://github.com/dotcloud/docker/issues/4811
Docker container refuses to get killed after run command turns into a zombie
Changing to newer kernel does not repo this issue:
does not repro: linux-image-3.8.0-35-generic
Repros with linux-image-3.8.0-19-generic
will do a search to see when this gets fixed to see if it helps find root cause.
changing to latest kernel fixes the issue
found exact kernel difference:
REPRO: linux-image-3.8.0-31-generic
NO REPRO: linux-image-3.8.0-32-generic
I think this is the fix:
+++ linux-3.8.0/kernel/pid_namespace.c
## -181,6 +181,7 ##
int nr;
int rc;
struct task_struct *task, *me = current;
+ int init_pids = thread_group_leader(me) ? 1 : 2;
/* Don't allow any more processes into the pid namespace */
disable_pid_allocation(pid_ns);
## -230,7 +231,7 ##
*/
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (pid_ns->nr_hashed == 1)
+ if (pid_ns->nr_hashed == init_pids)
break;
schedule();
}
which came from here:
https://groups.google.com/forum/#!msg/fa.linux.kernel/u4b3n4oYDQ4/GuLrXfDIYggJ
going to upgrade all our servers which repro this and see if it still occurs.

Why does cryptsetup segfault?

While most of my use a chrooted Gentoo as supplement to another Linux distribution project works fine, unfortunately cryptsetup segfaults on both luksFormat and luksOpen. How to troubleshoot this best?
edit[ For testing purposes I downgraded the Firmware to an older version which runs a 2.6.31 Kernel instead of a 3.3 one, and there the segfault does not occur. Nonetheless I would like to get this working with the newer version, so any help on troubleshooting this is appreciated...
edit2 Following this hint, I compared the output of grep dm_crypt /proc/kallsyms and noticed that the newer Kernel lacks __initcall_dm_crypt_init6 - could this be reason of failure? How can that be fixed?
]
An strace | tail reveals
open("/dev/loop0", O_RDONLY|O_LARGEFILE) = 6
ioctl(6, BLKRAGET, 256) = 0
close(6) = 0
ioctl(3, DM_DEV_CREATE, 0x3cfb0) = 0
ioctl(3, DM_TABLE_LOAD <unfinished ...>
+++ killed by SIGSEGV +++
that ioctrl(3, DM_TABLE_LOAD seems to be responsible, where 3, according to a grep '= 3$' should stem from
open("/dev/mapper/control", O_RDWR|O_LARGEFILE) = 3
so there is some trouble with /dev/mapper/control, where /dev is bind-mounted from the host system into the chroot environment. But what exactly is the problem? How can one figure that out?
Additional output, the usefulness of which I couldn't determine yet:
cryptsetup --debug -v's output is not very informative:
# cryptsetup 1.4.3 processing "cryptsetup --debug -v luksFormat /dev/loop0"
# Running command luksFormat.
# Locking memory.
WARNING!
======== This will overwrite data on /dev/loop0 irrevocably.
Are you sure? (Type uppercase yes): YES
# Allocating crypt device /dev/loop0 context.
# Trying to open and read device /dev/loop0.
# Initialising device-mapper backend, UDEV is disabled.
# Detected dm-crypt version 1.5.1, dm-ioctl version 4.22.0.
# Timeout set to 0 miliseconds.
# Iteration time set to 1000 miliseconds.
# Interactive passphrase entry requested. Enter LUKS passphrase: Verify passphrase:
# Formatting device /dev/loop0 as type LUKS1.
# Crypto backend (gcrypt 1.5.3) initialized.
# Topology: IO (512/0), offset = 0; Required alignment is 1048576 bytes.
# Generating LUKS header version 1 using hash sha1, aes, cbc-essiv:sha256, MK 32 bytes
# PBKDF2: 43251 iterations per second using hash sha1.
# Data offset 4096, UUID 03dcff20-158e-4d16-b89c-90af7b176a80, digest iterations 5250
# Updating LUKS header of size 1024 on device /dev/loop0
# Reading LUKS header of size 1024 from device /dev/loop0
# Adding new keyslot -1 using volume key.
# Calculating data for key slot 0
# Key slot 0 use 21118 password iterations.
# Using hash sha1 for AF in key slot 0, 4000 stripes
# Updating key slot 0 [0x1000] area on device /dev/loop0.
# DM-UUID is CRYPT-TEMP-temporary-cryptsetup-12060
# dm create temporary-cryptsetup-12060 CRYPT-TEMP-temporary-cryptsetup-12060 OF [16384] (*1)
# dm reload temporary-cryptsetup-12060 OFW [16384] (*1) Segmentation fault
Finally, the systrace from the host's /var/log/messages reads
Nov 14 14:02:15 host kernel: Unable to handle kernel paging request at virtual address e58c20f0
Nov 14 14:02:15 host kernel: pgd = cab58000
Nov 14 14:02:15 host kernel: [e58c20f0] *pgd=00000000
Nov 14 14:02:15 host kernel: Internal error: Oops: 5 [#1]
Nov 14 14:02:15 host kernel: Modules linked in: usblp usb_storage uhci_hcd ohci_hcd ehci_hcd usbcore usb_common
Nov 14 14:02:15 host kernel: CPU: 0 Not tainted (3.3.4-88f6281 #1)
Nov 14 14:02:15 host kernel: PC is at async_encrypt+0x44/0x50
Nov 14 14:02:15 host kernel: LR is at async_encrypt+0x48/0x50
Nov 14 14:02:15 host kernel: pc : [<c01fa374>] lr : [<c01fa378>] psr: 20000013
Nov 14 14:02:15 host kernel: sp : c10bfd60 ip : cc372580 fp : c10bfd84
Nov 14 14:02:15 host kernel: r10: d0a75160 r9 : d0a75175 r8 : 00000000
Nov 14 14:02:15 host kernel: r7 : c2f78428 r6 : 00000020 r5 : c2f783c0 r4 : e58c2040
Nov 14 14:02:15 host kernel: r3 : c94a72a0 r2 : 00000040 r1 : 00000000 r0 : c10bfd64
Nov 14 14:02:15 host kernel: Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Nov 14 14:02:15 host kernel: Control: 0005397f Table: 0ab58000 DAC: 00000015
Nov 14 14:02:15 host kernel: Process cryptsetup (pid: 5994, stack limit = 0xc10be270)
Nov 14 14:02:15 host kernel: Stack: (0xc10bfd60 to 0xc10c0000)
Nov 14 14:02:15 host kernel: fd60: c023eddc c01fab50 00000010 c01f88f8 c030e4c8 00000000 c10bfdac c10bfd88
Nov 14 14:02:15 host kernel: fd80: c030e564 c01fa340 c2f783c0 00000040 c2f783c0 d0a75175 d0a7a020 d0a75164
Nov 14 14:02:15 host kernel: fda0: c10bfdcc c10bfdb0 c030e61c c030e50c d0a75168 00000000 c2f783c0 d0a75168
Nov 14 14:02:15 host kernel: fdc0: c10bfe1c c10bfdd0 c030fbcc c030e598 d0a75160 c00d262c c10bfe0c c7752a40
Nov 14 14:02:15 host kernel: fde0: c00dd198 00000000 d0a7516e 00000000 c030feb4 00000020 cebc9200 c2f783c0
Nov 14 14:02:15 host kernel: fe00: d0a7a020 00000005 00000000 cebc9500 c10bfe64 c10bfe20 c030fedc c030f9ec
Nov 14 14:02:15 host kernel: fe20: c10bfe64 c10bfe30 c0308148 c023d270 c10bfe64 00000040 d0a7a020 000000fa
Nov 14 14:02:15 host kernel: fe40: 000000fa d0a7a020 cebc9500 d0a75150 00000000 cebc9500 c10bfe9c c10bfe68
Nov 14 14:02:15 host kernel: fe60: c0308954 c030fe7c 00000000 00000000 cebc9200 00000005 000000fa 00000000
Nov 14 14:02:15 host kernel: fe80: 00000001 d0a75000 d0a79000 c10bfeb0 c10bfee4 c10bfea0 c030b57c c03087b0
Nov 14 14:02:15 host kernel: fea0: 000000fa 00000000 d0a75160 00008004 d0a75160 d0a75138 c0308b5c 00004000
Nov 14 14:02:15 host kernel: fec0: cb637c00 d0a75000 00000000 c030b604 c10be000 00008004 c10bff0c c10bfee8
Nov 14 14:02:15 host kernel: fee0: c030b678 c030b52c 00000016 cebc9500 c10be000 00000000 0003cf80 00004000
Nov 14 14:02:15 host kernel: ff00: c10bff3c c10bff10 c030c5c4 c030b614 c00fb8b4 d0a75000 cb4580a0 0003cf80
Nov 14 14:02:15 host kernel: ff20: c138fd09 cb4580a0 c000bca8 00000000 c10bff4c c10bff40 c030c65c c030c4c8
Nov 14 14:02:15 host kernel: ff40: c10bff5c c10bff50 c00f06b4 c030c654 c10bff7c c10bff60 c00f0e10 c00f0694
Nov 14 14:02:15 host kernel: ff60: c10bff8c c10bff70 00000003 0003cf80 c10bffa4 c10bff80 c00f0fbc c00f0db0
Nov 14 14:02:15 host kernel: ff80: c10bffa4 00000000 00000001 b6f93cc8 b6e22e80 00000036 00000000 c10bffa8
Nov 14 14:02:15 host kernel: ffa0: c000bb00 c00f0f8c 00000001 b6f93cc8 00000003 c138fd09 0003cf80 b6e31450
Nov 14 14:02:15 host kernel: ffc0: 00000001 b6f93cc8 b6e22e80 00000036 0003cfb0 0003cec0 b6e22e80 0003cf80
Nov 14 14:02:15 host kernel: ffe0: b6e2f0f4 bedcbbfc b6e1e880 b6f083bc 60000010 00000003 e2433001 e5863000
Nov 14 14:02:15 host kernel: Backtrace:
Nov 14 14:02:15 host kernel: [<c01fa330>] (async_encrypt+0x0/0x50) from [<c030e564>] (crypt_setkey_allcpus+0x68/0x8c)
Nov 14 14:02:15 host kernel: r4:00000000
Nov 14 14:02:15 host kernel: [<c030e4fc>] (crypt_setkey_allcpus+0x0/0x8c) from [<c030e61c>] (crypt_set_key+0x94/0xb8)
Nov 14 14:02:15 host kernel: r8:d0a75164 r7:d0a7a020 r6:d0a75175 r5:c2f783c0 r4:00000040
Nov 14 14:02:15 host kernel: [<c030e588>] (crypt_set_key+0x0/0xb8) from [<c030fbcc>] (crypt_ctr_cipher+0x1f0/0x490)
Nov 14 14:02:15 host kernel: r6:d0a75168 r5:c2f783c0 r4:00000000
Nov 14 14:02:15 host kernel: [<c030f9dc>] (crypt_ctr_cipher+0x0/0x490) from [<c030fedc>] (crypt_ctr+0x70/0x2fc)
Nov 14 14:02:15 host kernel: [<c030fe6c>] (crypt_ctr+0x0/0x2fc) from [<c0308954>] (dm_table_add_target+0x1b4/0x350)
Nov 14 14:02:15 host kernel: [<c03087a0>] (dm_table_add_target+0x0/0x350) from [<c030b57c>] (populate_table+0x60/0xe8)
Nov 14 14:02:15 host kernel: r9:c10bfeb0 r8:d0a79000 r7:d0a75000 r6:00000001 r5:00000000
Nov 14 14:02:15 host kernel: r4:000000fa
Nov 14 14:02:15 host kernel: [<c030b51c>] (populate_table+0x0/0xe8) from [<c030b678>] (table_load+0x74/0x1f4)
Nov 14 14:02:15 host kernel: [<c030b604>] (table_load+0x0/0x1f4) from [<c030c5c4>] (ctl_ioctl+0x10c/0x18c)
Nov 14 14:02:15 host kernel: r7:00004000 r6:0003cf80 r5:00000000 r4:c10be000
Nov 14 14:02:15 host kernel: [<c030c4b8>] (ctl_ioctl+0x0/0x18c) from [<c030c65c>] (dm_ctl_ioctl+0x18/0x1c)
Nov 14 14:02:15 host kernel: [<c030c644>] (dm_ctl_ioctl+0x0/0x1c) from [<c00f06b4>] (vfs_ioctl+0x30/0x44)
Nov 14 14:02:15 host kernel: [<c00f0684>] (vfs_ioctl+0x0/0x44) from [<c00f0e10>] (do_vfs_ioctl+0x70/0x1dc)
Nov 14 14:02:15 host kernel: [<c00f0da0>] (do_vfs_ioctl+0x0/0x1dc) from [<c00f0fbc>] (sys_ioctl+0x40/0x68)
Nov 14 14:02:15 host kernel: r5:0003cf80 r4:00000003
Nov 14 14:02:15 host kernel: [<c00f0f7c>] (sys_ioctl+0x0/0x68) from [<c000bb00>] (ret_fast_syscall+0x0/0x2c)
Nov 14 14:02:15 host kernel: r7:00000036 r6:b6e22e80 r5:b6f93cc8 r4:00000001
Nov 14 14:02:15 host kernel: Code: e24b0020 e59c1024 e59c2020 e1a0e00f (e594f0b0)
Nov 14 14:02:15 host kernel: ---[ end trace 3c58c565608fcd8c ]---
Ok, so it's an "Oops: 5", the meaning of which I don't know...

Resources