INFO: task v8:SweeperThrea:<pid> blocked for more than 120 seconds - linux

when running node a simple node process in container I see this in my kernel log and the process becomes defunct:
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745710] INFO: task v8:SweeperThrea:2569 blocked for more than 120 seconds.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745723] v8:SweeperThrea D 0000000000000000 0 2569 2470 0x00000002
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745727] ffff8801d228fca8 0000000000000246 ffff8801d0fb1740 ffff8801d228ffd8
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745730] ffff8801d228ffd8 ffff8801d228ffd8 ffffffff81c15440 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745737] ffff8801d0fb1740 ffff8801d02f8878 0000000000000002 ffff8801d0fb1740
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745741] Call Trace:
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745745] [<ffffffff816ca029>] schedule+0x29/0x70
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745748] [<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745752] [<ffffffff8105e91c>] do_exit+0x85c/0x9d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745755] [<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745759] [<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745764] [<ffffffff8101439f>] do_signal+0x3f/0x8d0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745767] [<ffffffff810f3427>] ? call_rcu_sched+0x17/0x20
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745771] [<ffffffff8108429f>] ? __put_cred+0x3f/0x50
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745775] [<ffffffff810843b9>] ? abort_creds+0x29/0x30
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745779] [<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
Mar 21 19:07:08 ip-10-0-2-233 kernel: [26336450.745781] [<ffffffff816d3a9a>] int_signal+0x12/0x17
more info:
docker is doing a SIGTERM then SIGKILL on the process.
docker -d -D logs:
[debug] server.go:924 Calling POST /containers/create
2014/03/21 19:04:11 POST /v1.10/containers/create
[/var/lib/docker|90a3fa34] +job create()
[/var/lib/docker|90a3fa34] -job create() = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/start
2014/03/21 19:04:11 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/start
[/var/lib/docker|90a3fa34] +job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] +job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
[/var/lib/docker|90a3fa34] -job allocate_interface(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[/var/lib/docker|90a3fa34] -job start(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766) = OK (0)
[debug] server.go:924 Calling POST /containers/{name:.*}/stop
2014/03/21 19:04:16 POST /v1.10/containers/074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766/stop?t=10
[/var/lib/docker|90a3fa34] +job stop(074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766)
2014/03/21 19:04:26 Container 074dda12cb039f688777e2cb5115dd0c5088f6b93ed21586782cfe4e57533766 failed to exit within 10 seconds of SIGTERM - using the force
2014/03/21 19:04:36 Container SIGKILL failed to exit within 10 seconds of lxc-kill 074dda12cb03 - trying direct SIGKILL
pstree:
docker(2387)─node(2532)─{node}(2569)
ps aux
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 2387 2386 0 80 0 - 153471 futex_ pts/0 00:00:07 docker
4 Z 0 2532 2387 0 80 0 - 0 exit ? 00:00:00 node <defunct>
no files in /proc/2532/fd
no files in /proc/2532//task/2569/fd/
stack from /proc/2532/stack
[<ffffffff8105e763>] do_exit+0x6a3/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8105eb87>] sys_exit_group+0x17/0x20
[<ffffffff816d37dd>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
stack from /proc/2532/task/2569/stack
[<ffffffff810d1365>] zap_pid_ns_processes+0x125/0x180
[<ffffffff8105e91c>] do_exit+0x85c/0x9d0
[<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[<ffffffff8106e571>] get_signal_to_deliver+0x1c1/0x610
[<ffffffff8101439f>] do_signal+0x3f/0x8d0
[<ffffffff81014cb0>] do_notify_resume+0x80/0xb0
[<ffffffff816d3a9a>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
the repro script:
CNT=0
while true
do
echo $CNT
DOCK=$(sudo docker run -d -t anandkumarpatel/zombie_bug ./node index.js)
sleep 60 && sudo docker stop $DOCK > out.log &
sleep 1
CNT=$(($CNT+1))
if [[ "$CNT" == "50" ]]; then
exit
fi
done
strace of docker deamon during a failed kill can be found on pastebin:
http://pastebin.com/HxDwiRBW
my system info and versions. I am using custom build docker but it is forked form 0.9.0 release with 2 small patches. but this will also repro on clean 0.9.0 release.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.04
Release: 13.04
Codename: raring
$ sudo docker version
Client version: 0.9.0
Go version (client): go1.2.1
Git commit (client): 70f72ea
Server version: 0.9.0
Git commit (server): 70f72ea
Go version (server): go1.2.1
Last stable version: 0.9.0
$ uname -a
Linux ip-10-0-2-233 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
HOWEVER, this does not repo on every system! for some reason it only repros on our production servers and one other server. they all have similar configs but same ubuntu version.
Let me know if there is more info to gather and I will grab it. I have near 100% repro rate on a test system so I can gather whatever is needed.
related:
https://github.com/dotcloud/docker/issues/4811
Docker container refuses to get killed after run command turns into a zombie
Changing to newer kernel does not repo this issue:
does not repro: linux-image-3.8.0-35-generic
Repros with linux-image-3.8.0-19-generic
will do a search to see when this gets fixed to see if it helps find root cause.

changing to latest kernel fixes the issue
found exact kernel difference:
REPRO: linux-image-3.8.0-31-generic
NO REPRO: linux-image-3.8.0-32-generic
I think this is the fix:
+++ linux-3.8.0/kernel/pid_namespace.c
## -181,6 +181,7 ##
int nr;
int rc;
struct task_struct *task, *me = current;
+ int init_pids = thread_group_leader(me) ? 1 : 2;
/* Don't allow any more processes into the pid namespace */
disable_pid_allocation(pid_ns);
## -230,7 +231,7 ##
*/
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (pid_ns->nr_hashed == 1)
+ if (pid_ns->nr_hashed == init_pids)
break;
schedule();
}
which came from here:
https://groups.google.com/forum/#!msg/fa.linux.kernel/u4b3n4oYDQ4/GuLrXfDIYggJ
going to upgrade all our servers which repro this and see if it still occurs.

Related

How to I properly prevent systemd suspend using a script in /lib/systemd/system-sleep/

I'm fairly new to Linux and trying to learn. I'm using Plex Media Server and I'm trying to prevent the system from sleeping while streaming a file. I've searched the internet over the last few days and none of the solutions seem to work. One solution I feel is almost getting me there, but it's not quite working. Here is the script I've placed (and made executable) in /lib/systemd/system-sleep/ (based on this site).
#!/bin/bash
PATH=/sbin:/usr/sbin:/bin:/usr/bin
X_DISPLAY_USERNAME=myusername
plexresume()
{
number_of_sessions=$(curl -s localhost:32400/status/sessions? | sed -n "s/.*MediaContainer size=\"\(.*\)\".*/\1/p")
if [ ${number_of_sessions} -gt 0 ]; then
echo "[$(date +"%Y.%m.%d-%T")] Number of streamers = ${number_of_sessions} Plex session active, cancel suspend" >> /tmp/plex_sleep_log
return 1
else
echo "[$(date +"%Y.%m.%d-%T")] Number of streamers = ${number_of_sessions} Plex session -IN-active, going to sleep now zzzzzzzz........" >> /tmp/plex_sleep_log
return 0
fi
}
plexkeepalive()
{
echo "[$(date +"%Y.%m.%d-%T")] Resuming!!..." >> /tmp/plex_sleep_log
su ${X_DISPLAY_USERNAME} -c "DISPLAY=:0 /usr/bin/xdotool getmouselocation | grep 'x:1 y:1 '" > /dev/null
if [ "$?" == "0" ]; then
su ${X_DISPLAY_USERNAME} -c "DISPLAY=:0 /usr/bin/xdotool mousemove 9 9"
else
su ${X_DISPLAY_USERNAME} -c "DISPLAY=:0 /usr/bin/xdotool mousemove 1 1"
fi
return 0
}
case "$1" in
pre)
plexresume
;;
post)
plexkeepalive
;;
esac
I know it's executing because it's printing to the log file. But even when it's printing that Plex is active, it's still suspending the system. I've manually run the script using sudo outside of systemd and checking the value of $? after, which is 1.
When I use
journalctl -b -u systemd-suspend.service
I see the following:
Sep 26 12:47:03 systemname systemd[1]: Starting System Suspend...
Sep 26 12:47:03 systemname [141890]: /usr/lib/systemd/system-sleep/plexkeepalive failed with exit status 1.
Sep 26 12:47:03 systemname systemd-sleep[141887]: Entering sleep state 'suspend'...
One time I got a successful result, but I'm not sure how it happened:
Sep 26 10:51:45 systemname systemd-sleep[112491]: Entering sleep state 'suspend'...
Sep 26 10:52:25 systemname systemd-sleep[112491]: Failed to put system to sleep. System resumed again: Device or resource busy
Sep 26 10:52:25 systemname su[112556]: (to myusername) root on none
Sep 26 10:52:25 systemname su[112556]: pam_unix(su:session): session opened for user myusername(uid=1000) by (uid=0)
Sep 26 10:52:25 systemname su[112556]: pam_unix(su:session): session closed for user myusername
Sep 26 10:52:25 systemname su[112592]: (to myusername) root on none
Sep 26 10:52:25 systemname su[112592]: pam_unix(su:session): session opened for user myusername(uid=1000) by (uid=0)
Sep 26 10:52:25 systemname su[112592]: pam_unix(su:session): session closed for user myusername
Sep 26 10:52:25 systemname systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
Sep 26 10:52:25 systemname systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
Sep 26 10:52:25 systemname systemd[1]: Failed to start System Suspend.
Sep 26 10:52:25 systemname systemd[1]: systemd-suspend.service: Consumed 2.732s CPU time.
Any help on this issue would be appreciated. I don't understand why returning 1 from the script is not preventing systemd-suspend.service from running. Thank you!

How to use a Linux "platform_driver"?

There is an embedded system, and it provides functions in a struct of platform_driver:
static struct platform_driver infinity_wdt_driver = {
.probe = infinity_wdt_probe,
.remove = infinity_wdt_remove,
.shutdown = infinity_wdt_shutdown,
.driver = {
.owner = THIS_MODULE,
.name = "infinity-wdt",
.of_match_table = ms_watchdog_of_match_table,
},
};
module_platform_driver(infinity_wdt_driver);
In the infinity_wdt_probe it calls devm_kzalloc and devm_ioremap_resource:
static int infinity_wdt_probe(struct platform_device *pdev) {
...
wdt = devm_kzalloc(&pdev->dev, sizeof(*wdt), GFP_KERNEL);
...
wdt->reg_base = devm_ioremap_resource(&pdev->dev, res);
...
}
How shall I utilize this driver? Do I need to write some C code, or Linux provides some standard way through filesystem for controlling this?
In the filesystem there is a directory under /sys/bus/platform/drivers/infinity-wdt, but it contains only a few files:
# ls -l
total 0
lrwxrwxrwx 1 root root 0 Nov 16 20:23 1f006000.watchdog -> ../../../../devices/soc0/soc/1f006000.watchdog
--w------- 1 root root 4096 Nov 16 20:23 bind
--w------- 1 root root 4096 Nov 16 20:23 uevent
--w------- 1 root root 4096 Nov 16 20:23 unbind
Is it somehow possible to use the driver with this filesystem above?
Some extra info: 1f006000.watchdog is a symlink to a directory which contains these:
# ls -l
total 0
lrwxrwxrwx 1 root root 0 Nov 16 20:23 driver -> ../../../../bus/platform/drivers/infinity-wdt
-rw-r--r-- 1 root root 4096 Nov 16 20:23 driver_override
-r--r--r-- 1 root root 4096 Nov 16 20:23 modalias
lrwxrwxrwx 1 root root 0 Nov 16 20:23 of_node -> ../../../../firmware/devicetree/base/soc/watchdog
lrwxrwxrwx 1 root root 0 Nov 16 20:23 subsystem -> ../../../../bus/platform
-rw-r--r-- 1 root root 4096 Nov 16 20:23 uevent
This is the kernel log:
# cat /var/log/messages | grep -i watchdog
Jan 1 04:00:02 kernel: [WatchDog]infinity_wdt_probe
Jan 1 04:00:02 kernel: [WatchDog]infinity_wdt_set_heartbeat
Jan 1 04:00:04 kernel: [WatchDog]infinity_wdt_start
Jan 1 04:00:04 kernel: [WatchDog] infinity_wdt_ping tmr_margin=a ^M
Jan 1 04:00:04 kernel: watchdog: watchdog0: watchdog did not stop!
Jan 1 04:00:04 kernel: [WatchDog] infinity_wdt_ping tmr_margin=a ^M
Jan 1 04:00:04 kernel: [WatchDog]infinity_wdt_set_timeout=60
Jan 1 04:00:04 kernel: [WatchDog]infinity_wdt_set_timeout data=3c ^M
Jan 1 04:00:04 kernel: [WatchDog] infinity_wdt_ping tmr_margin=3c ^M
Jan 1 04:00:04 kernel: [WatchDog] infinity_wdt_ping tmr_margin=3c ^M
Nov 16 21:03:11 kernel: [WatchDog] infinity_wdt_ping tmr_margin=3c ^M
Nov 16 21:03:41 kernel: [WatchDog] infinity_wdt_ping tmr_margin=3c ^M
"Platform" driver means driver that doesn't fit into other standard subsystem (e.g. USB, I2C, etc). In this case it's a watchdog driver, which is apparently supposed to reboot embedded system if it's not responsive.
Entries in sysfs are standard bookkeeping entries, automatically created for any driver by kernel.
Since driver contains "of_match_table", it implies that it must be correctly specified in the device tree. Given it generally works and has no other explicit interfaces (e.g. procfs, sysfs), this should be enough to enable it.
You might also check if corresponding /dev/watchdog* is created by this driver. If so, standard userspace watchdog can be used by specified this /dev/watchdog* file in its config file.

Pulling of docker image from a registry fails

Would love some help about an issue. When I was trying to download an image form the repository it failed and I could see the following errors in the log file. This is the syslog which I received
Jan 4 10:22:05 <hostname> kernel: device-mapper: thin: commit failed: error = -22
Jan 4 10:22:05 <hostname> kernel: device-mapper: thin: switching pool to read-only mode
Jan 4 10:22:05 <hostname> kernel: bio: create slab <bio-2> at 2
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: thin: process_bio_read_only: dm_thin_find_block() failed: error = -15
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388592
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388592
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388606
Jan 4 10:22:06 <hostname> kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 27706
Jan 4 10:22:06 <hostname> kernel: device-mapper: block manager: btree_node validator check failed for block 27706
Jan 4 10:22:06 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388606
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 0x
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 0
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 1
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:07 <hostname> kernel: Buffer I/O error on device dm-3, logical block 8388607
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map common: dm_tm_shadow_block() failed
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map common: dm_tm_shadow_block() failed
Jan 4 10:22:08 <hostname> kernel: device-mapper: space map metadata: unable to allocate new metadata block
Jan 4 10:22:08 <hostname> kernel: device-mapper: thin: Deletion of thin device 66 failed.
Jan 4 10:22:10 <hostname> kernel: device-mapper: table: 252:3: thin: Couldn't open thin internal device
Jan 4 10:22:10 <hostname> kernel: device-mapper: ioctl: error adding target to table
Jan 4 10:22:10 <hostname> kernel: device-mapper: thin: Deletion of thin device 66 failed.
Jan 4 10:22:10 <hostname> kernel: device-mapper: table: 252:3: thin: Couldn't open thin internal device
This is the logs from docker
time="2017-01-04T10:22:08Z" level=error msg="Error from V2 registry: read /dev/mapper/docker-252:0-263231-3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880: input/output error"
Error: image ims:<image_tag> not found
time="2017-01-04T10:22:09Z" level=info msg="-job pull(<iamge>, <image_tag>) = ERR (1)"
time="2017-01-04T10:22:09Z" level=info msg="POST /v1.18/containers/create?name=<container_name>"
time="2017-01-04T10:22:09Z" level=info msg="+job create(<container_name>)"No such image: <iamge>:<image_tag> (tag: <image_tag>)
time="2017-01-04T10:22:09Z" level=info msg="-job create(<container_name>) = ERR (1)"
time="2017-01-04T10:22:09Z" level=error msg="Handler for POST /containers/create returned error: No such image: <iamge>:<image_tag> (tag: <image_tag>)"
time="2017-01-04T10:22:09Z" level=error msg="HTTP Error: statusCode=404
No such image: <iamge>:<image_tag> (tag: <image_tag>)"
time="2017-01-04T10:22:09Z" level=info msg="POST /v1.18/images/create?fromImage=<image_url>"
time="2017-01-04T10:22:09Z" level=info msg="+job pull(<iamge>, <image_tag>)"
time="2017-01-04T10:22:09Z" level=info msg="+job resolve_repository(<iamge>)"
time="2017-01-04T10:22:09Z" level=info msg="-job resolve_repository(<iamge>) = OK (0)"
time="2017-01-04T10:22:09Z" level=info msg="+job trust_key_check(/xyz)"
time="2017-01-04T10:22:09Z" level=info msg="-job trust_key_check(/xyz) = OK (0)"
time="2017-01-04T10:22:10Z" level=error msg="Error from V2 registry: Driver devicemapper failed to create image rootfs 3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880: device 3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880 already exists”
Please let me know if any other data is needed. I have the backup of the old docker lib folder.

BUG assertion triggered when replacing a physical page in a process

I modified the Linux kernel in a way to have it modify some of the memory pages of a specific process. In summary, the functions I wrote receive a process id and address in that process, they then replace the page at that specific address with another dummy page. Finally, one of the functions call __free_page() on the original page that was replaced.
The problem is I get this error from the Linux kernel when it tries to reuse the original page. So, what is that flag it is complaining about? and how to get rid of this error? here is the relevant lines from syslog.
Thanks.
Nov 14 19:15:23 localhost kernel: [ 1466.949451] BUG: Bad page state in process mytestapp pfn:7d309
Nov 14 19:15:23 localhost kernel: [ 1466.949452] page:ffffea0001f4c240 count:-1 mapcount:0 mapping: (null) index:0x7fd632179
Nov 14 19:15:23 localhost kernel: [ 1466.949453] page flags: 0x100000000000000()
Nov 14 19:15:23 localhost kernel: [ 1466.949453] Modules linked in: test_module(O) acpiphp bnep rfcomm bluetooth binfmt_misc joydev hid_generic usbhid hid snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm ghash_clmulni_intel snd_seq_midi snd_rawmidi snd_seq_midi_event ppdev snd_seq aesni_intel ablk_helper cryptd aes_x86_64 snd_timer snd_seq_device psmouse microcode snd vmw_balloon acpi_memhotplug parport_pc soundcore snd_page_alloc vmwgfx ttm mac_hid drm i2c_piix4 serio_raw shpchp lp parport e1000 mptspi mptscsih mptbase floppy vmw_pvscsi vmxnet3
Nov 14 19:15:23 localhost kernel: [ 1466.949484] Pid: 15064, comm: mytestapp Tainted: G B O 3.6.11-elasticos-0.01 #31
Nov 14 19:15:23 localhost kernel: [ 1466.949485] Call Trace:
Nov 14 19:15:23 localhost kernel: [ 1466.949487] [<ffffffff8111941f>] bad_page+0xbf/0x110
Nov 14 19:15:23 localhost kernel: [ 1466.949505] [<ffffffff8111aac9>] get_page_from_freelist+0x6f9/0x810
Nov 14 19:15:23 localhost kernel: [ 1466.949508] [<ffffffff8111a702>] ? get_page_from_freelist+0x332/0x810
Nov 14 19:15:23 localhost kernel: [ 1466.949509] [<ffffffff8111b06e>] __alloc_pages_nodemask+0x48e/0x9b0
Nov 14 19:15:23 localhost kernel: [ 1466.949512] [<ffffffff8111f03a>] ? pagevec_lru_move_fn+0xea/0x110
Nov 14 19:15:23 localhost kernel: [ 1466.949514] [<ffffffff81154ec3>] alloc_pages_vma+0xb3/0x190
Nov 14 19:15:23 localhost kernel: [ 1466.949515] [<ffffffff811397cc>] handle_pte_fault+0x56c/0xb00
Nov 14 19:15:23 localhost kernel: [ 1466.949517] [<ffffffff810473f7>] ? pte_alloc_one+0x37/0x50
Nov 14 19:15:23 localhost kernel: [ 1466.949527] [<ffffffff8113afd9>] handle_mm_fault+0x259/0x340
Nov 14 19:15:23 localhost kernel: [ 1466.949538] [<ffffffff8107c218>] ? up_read+0x18/0x30
Nov 14 19:15:23 localhost kernel: [ 1466.949540] [<ffffffff816213d2>] do_page_fault+0x152/0x520
Nov 14 19:15:23 localhost kernel: [ 1466.949541] [<ffffffff8108c36d>] ? set_next_entity+0x9d/0xb0
Nov 14 19:15:23 localhost kernel: [ 1466.949543] [<ffffffff810135ca>] ? __switch_to+0x17a/0x410
Nov 14 19:15:23 localhost kernel: [ 1466.949545] [<ffffffff8161de65>] page_fault+0x25/0x30
This macro checks that appropriate page flags are unset. As I see in your case you have problems with PG_LOCKED flag set. It means that you freed locked page. See unlock_page to handle this or (probably) use free_page instead

Why does cryptsetup segfault?

While most of my use a chrooted Gentoo as supplement to another Linux distribution project works fine, unfortunately cryptsetup segfaults on both luksFormat and luksOpen. How to troubleshoot this best?
edit[ For testing purposes I downgraded the Firmware to an older version which runs a 2.6.31 Kernel instead of a 3.3 one, and there the segfault does not occur. Nonetheless I would like to get this working with the newer version, so any help on troubleshooting this is appreciated...
edit2 Following this hint, I compared the output of grep dm_crypt /proc/kallsyms and noticed that the newer Kernel lacks __initcall_dm_crypt_init6 - could this be reason of failure? How can that be fixed?
]
An strace | tail reveals
open("/dev/loop0", O_RDONLY|O_LARGEFILE) = 6
ioctl(6, BLKRAGET, 256) = 0
close(6) = 0
ioctl(3, DM_DEV_CREATE, 0x3cfb0) = 0
ioctl(3, DM_TABLE_LOAD <unfinished ...>
+++ killed by SIGSEGV +++
that ioctrl(3, DM_TABLE_LOAD seems to be responsible, where 3, according to a grep '= 3$' should stem from
open("/dev/mapper/control", O_RDWR|O_LARGEFILE) = 3
so there is some trouble with /dev/mapper/control, where /dev is bind-mounted from the host system into the chroot environment. But what exactly is the problem? How can one figure that out?
Additional output, the usefulness of which I couldn't determine yet:
cryptsetup --debug -v's output is not very informative:
# cryptsetup 1.4.3 processing "cryptsetup --debug -v luksFormat /dev/loop0"
# Running command luksFormat.
# Locking memory.
WARNING!
======== This will overwrite data on /dev/loop0 irrevocably.
Are you sure? (Type uppercase yes): YES
# Allocating crypt device /dev/loop0 context.
# Trying to open and read device /dev/loop0.
# Initialising device-mapper backend, UDEV is disabled.
# Detected dm-crypt version 1.5.1, dm-ioctl version 4.22.0.
# Timeout set to 0 miliseconds.
# Iteration time set to 1000 miliseconds.
# Interactive passphrase entry requested. Enter LUKS passphrase: Verify passphrase:
# Formatting device /dev/loop0 as type LUKS1.
# Crypto backend (gcrypt 1.5.3) initialized.
# Topology: IO (512/0), offset = 0; Required alignment is 1048576 bytes.
# Generating LUKS header version 1 using hash sha1, aes, cbc-essiv:sha256, MK 32 bytes
# PBKDF2: 43251 iterations per second using hash sha1.
# Data offset 4096, UUID 03dcff20-158e-4d16-b89c-90af7b176a80, digest iterations 5250
# Updating LUKS header of size 1024 on device /dev/loop0
# Reading LUKS header of size 1024 from device /dev/loop0
# Adding new keyslot -1 using volume key.
# Calculating data for key slot 0
# Key slot 0 use 21118 password iterations.
# Using hash sha1 for AF in key slot 0, 4000 stripes
# Updating key slot 0 [0x1000] area on device /dev/loop0.
# DM-UUID is CRYPT-TEMP-temporary-cryptsetup-12060
# dm create temporary-cryptsetup-12060 CRYPT-TEMP-temporary-cryptsetup-12060 OF [16384] (*1)
# dm reload temporary-cryptsetup-12060 OFW [16384] (*1) Segmentation fault
Finally, the systrace from the host's /var/log/messages reads
Nov 14 14:02:15 host kernel: Unable to handle kernel paging request at virtual address e58c20f0
Nov 14 14:02:15 host kernel: pgd = cab58000
Nov 14 14:02:15 host kernel: [e58c20f0] *pgd=00000000
Nov 14 14:02:15 host kernel: Internal error: Oops: 5 [#1]
Nov 14 14:02:15 host kernel: Modules linked in: usblp usb_storage uhci_hcd ohci_hcd ehci_hcd usbcore usb_common
Nov 14 14:02:15 host kernel: CPU: 0 Not tainted (3.3.4-88f6281 #1)
Nov 14 14:02:15 host kernel: PC is at async_encrypt+0x44/0x50
Nov 14 14:02:15 host kernel: LR is at async_encrypt+0x48/0x50
Nov 14 14:02:15 host kernel: pc : [<c01fa374>] lr : [<c01fa378>] psr: 20000013
Nov 14 14:02:15 host kernel: sp : c10bfd60 ip : cc372580 fp : c10bfd84
Nov 14 14:02:15 host kernel: r10: d0a75160 r9 : d0a75175 r8 : 00000000
Nov 14 14:02:15 host kernel: r7 : c2f78428 r6 : 00000020 r5 : c2f783c0 r4 : e58c2040
Nov 14 14:02:15 host kernel: r3 : c94a72a0 r2 : 00000040 r1 : 00000000 r0 : c10bfd64
Nov 14 14:02:15 host kernel: Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Nov 14 14:02:15 host kernel: Control: 0005397f Table: 0ab58000 DAC: 00000015
Nov 14 14:02:15 host kernel: Process cryptsetup (pid: 5994, stack limit = 0xc10be270)
Nov 14 14:02:15 host kernel: Stack: (0xc10bfd60 to 0xc10c0000)
Nov 14 14:02:15 host kernel: fd60: c023eddc c01fab50 00000010 c01f88f8 c030e4c8 00000000 c10bfdac c10bfd88
Nov 14 14:02:15 host kernel: fd80: c030e564 c01fa340 c2f783c0 00000040 c2f783c0 d0a75175 d0a7a020 d0a75164
Nov 14 14:02:15 host kernel: fda0: c10bfdcc c10bfdb0 c030e61c c030e50c d0a75168 00000000 c2f783c0 d0a75168
Nov 14 14:02:15 host kernel: fdc0: c10bfe1c c10bfdd0 c030fbcc c030e598 d0a75160 c00d262c c10bfe0c c7752a40
Nov 14 14:02:15 host kernel: fde0: c00dd198 00000000 d0a7516e 00000000 c030feb4 00000020 cebc9200 c2f783c0
Nov 14 14:02:15 host kernel: fe00: d0a7a020 00000005 00000000 cebc9500 c10bfe64 c10bfe20 c030fedc c030f9ec
Nov 14 14:02:15 host kernel: fe20: c10bfe64 c10bfe30 c0308148 c023d270 c10bfe64 00000040 d0a7a020 000000fa
Nov 14 14:02:15 host kernel: fe40: 000000fa d0a7a020 cebc9500 d0a75150 00000000 cebc9500 c10bfe9c c10bfe68
Nov 14 14:02:15 host kernel: fe60: c0308954 c030fe7c 00000000 00000000 cebc9200 00000005 000000fa 00000000
Nov 14 14:02:15 host kernel: fe80: 00000001 d0a75000 d0a79000 c10bfeb0 c10bfee4 c10bfea0 c030b57c c03087b0
Nov 14 14:02:15 host kernel: fea0: 000000fa 00000000 d0a75160 00008004 d0a75160 d0a75138 c0308b5c 00004000
Nov 14 14:02:15 host kernel: fec0: cb637c00 d0a75000 00000000 c030b604 c10be000 00008004 c10bff0c c10bfee8
Nov 14 14:02:15 host kernel: fee0: c030b678 c030b52c 00000016 cebc9500 c10be000 00000000 0003cf80 00004000
Nov 14 14:02:15 host kernel: ff00: c10bff3c c10bff10 c030c5c4 c030b614 c00fb8b4 d0a75000 cb4580a0 0003cf80
Nov 14 14:02:15 host kernel: ff20: c138fd09 cb4580a0 c000bca8 00000000 c10bff4c c10bff40 c030c65c c030c4c8
Nov 14 14:02:15 host kernel: ff40: c10bff5c c10bff50 c00f06b4 c030c654 c10bff7c c10bff60 c00f0e10 c00f0694
Nov 14 14:02:15 host kernel: ff60: c10bff8c c10bff70 00000003 0003cf80 c10bffa4 c10bff80 c00f0fbc c00f0db0
Nov 14 14:02:15 host kernel: ff80: c10bffa4 00000000 00000001 b6f93cc8 b6e22e80 00000036 00000000 c10bffa8
Nov 14 14:02:15 host kernel: ffa0: c000bb00 c00f0f8c 00000001 b6f93cc8 00000003 c138fd09 0003cf80 b6e31450
Nov 14 14:02:15 host kernel: ffc0: 00000001 b6f93cc8 b6e22e80 00000036 0003cfb0 0003cec0 b6e22e80 0003cf80
Nov 14 14:02:15 host kernel: ffe0: b6e2f0f4 bedcbbfc b6e1e880 b6f083bc 60000010 00000003 e2433001 e5863000
Nov 14 14:02:15 host kernel: Backtrace:
Nov 14 14:02:15 host kernel: [<c01fa330>] (async_encrypt+0x0/0x50) from [<c030e564>] (crypt_setkey_allcpus+0x68/0x8c)
Nov 14 14:02:15 host kernel: r4:00000000
Nov 14 14:02:15 host kernel: [<c030e4fc>] (crypt_setkey_allcpus+0x0/0x8c) from [<c030e61c>] (crypt_set_key+0x94/0xb8)
Nov 14 14:02:15 host kernel: r8:d0a75164 r7:d0a7a020 r6:d0a75175 r5:c2f783c0 r4:00000040
Nov 14 14:02:15 host kernel: [<c030e588>] (crypt_set_key+0x0/0xb8) from [<c030fbcc>] (crypt_ctr_cipher+0x1f0/0x490)
Nov 14 14:02:15 host kernel: r6:d0a75168 r5:c2f783c0 r4:00000000
Nov 14 14:02:15 host kernel: [<c030f9dc>] (crypt_ctr_cipher+0x0/0x490) from [<c030fedc>] (crypt_ctr+0x70/0x2fc)
Nov 14 14:02:15 host kernel: [<c030fe6c>] (crypt_ctr+0x0/0x2fc) from [<c0308954>] (dm_table_add_target+0x1b4/0x350)
Nov 14 14:02:15 host kernel: [<c03087a0>] (dm_table_add_target+0x0/0x350) from [<c030b57c>] (populate_table+0x60/0xe8)
Nov 14 14:02:15 host kernel: r9:c10bfeb0 r8:d0a79000 r7:d0a75000 r6:00000001 r5:00000000
Nov 14 14:02:15 host kernel: r4:000000fa
Nov 14 14:02:15 host kernel: [<c030b51c>] (populate_table+0x0/0xe8) from [<c030b678>] (table_load+0x74/0x1f4)
Nov 14 14:02:15 host kernel: [<c030b604>] (table_load+0x0/0x1f4) from [<c030c5c4>] (ctl_ioctl+0x10c/0x18c)
Nov 14 14:02:15 host kernel: r7:00004000 r6:0003cf80 r5:00000000 r4:c10be000
Nov 14 14:02:15 host kernel: [<c030c4b8>] (ctl_ioctl+0x0/0x18c) from [<c030c65c>] (dm_ctl_ioctl+0x18/0x1c)
Nov 14 14:02:15 host kernel: [<c030c644>] (dm_ctl_ioctl+0x0/0x1c) from [<c00f06b4>] (vfs_ioctl+0x30/0x44)
Nov 14 14:02:15 host kernel: [<c00f0684>] (vfs_ioctl+0x0/0x44) from [<c00f0e10>] (do_vfs_ioctl+0x70/0x1dc)
Nov 14 14:02:15 host kernel: [<c00f0da0>] (do_vfs_ioctl+0x0/0x1dc) from [<c00f0fbc>] (sys_ioctl+0x40/0x68)
Nov 14 14:02:15 host kernel: r5:0003cf80 r4:00000003
Nov 14 14:02:15 host kernel: [<c00f0f7c>] (sys_ioctl+0x0/0x68) from [<c000bb00>] (ret_fast_syscall+0x0/0x2c)
Nov 14 14:02:15 host kernel: r7:00000036 r6:b6e22e80 r5:b6f93cc8 r4:00000001
Nov 14 14:02:15 host kernel: Code: e24b0020 e59c1024 e59c2020 e1a0e00f (e594f0b0)
Nov 14 14:02:15 host kernel: ---[ end trace 3c58c565608fcd8c ]---
Ok, so it's an "Oops: 5", the meaning of which I don't know...

Resources