Call trace when loading a module in Linux - linux

I'm writing my first Linux kernel module, which actually is a RAM disk driver plus some additional features. When I tried to insmod the module, "Segmentation fault" happened.
And here is the corresponding kernel log, actually two pieces of kernel oops messages. After reading a lot of related tutorials, I still have some questions regarding this log:
In the call trace list, there are functions preceeded with and without question marks, what is the special meaning of the question mark "?" for that function?
My understanding of the call trace is: every function, except the bottom one, should be called by the one below it. But for this:
[ 397.855035] [<c05a603b>] ? exact_lock+0x0/0x16
[ 397.855035] [<f787c252>] ? diag_init+0x252/0x4bd [b2bntb_diag]
[ 397.855035] [<c0451e35>] ? __blocking_notifier_call_chain+0x42/0x4d
[ 397.855035] [<f787c000>] ? diag_init+0x0/0x4bd [b2bntb_diag]
diag_init the module init function written by me. It does not call any function named either exact_lock or __blocking_notifier_call_chain, how come these two functions appear such in the call trace here?
What is the error and how to resolve it?
BTW, the Linux kernel I'm running has version 2.6.35.6.
[ 397.850955] ------------[ cut here ]------------
[ 397.851544] WARNING: at lib/kobject.c:168 kobject_add_internal+0x3a/0x1e2()
[ 397.851601] Hardware name: VirtualBox
[ 397.851639] kobject: (f4580258): attempted to be registered with empty name!
[ 397.851678] Modules linked in: b2bntb_diag(+) fuse vboxvideo drm sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 vboxsf uinput snd_intel8x0 snd_ac97_codec vboxguest ac97_bus snd_seq snd_seq_device ppdev snd_pcm parport_pc parport microcode snd_timer joydev snd e1000 i2c_piix4 soundcore i2c_core snd_page_alloc [last unloaded: mperf]
[ 397.852707] Pid: 1958, comm: insmod Tainted: G W 2.6.35.6-45.fc14.i686 #1
[ 397.852749] Call Trace:
[ 397.852828] [<c043938d>] warn_slowpath_common+0x6a/0x7f
[ 397.852970] [<c05b054d>] ? kobject_add_internal+0x3a/0x1e2
[ 397.853130] [<c0439415>] warn_slowpath_fmt+0x2b/0x2f
[ 397.853182] [<c05b054d>] kobject_add_internal+0x3a/0x1e2
[ 397.853235] [<c05b098b>] kobject_add+0x5b/0x66
[ 397.853292] [<c064e8e3>] device_add+0xda/0x4b6
[ 397.853346] [<c05b7bc7>] ? kvasprintf+0x38/0x43
[ 397.853394] [<c05b08e0>] ? kobject_set_name_vargs+0x46/0x4c
[ 397.853467] [<c051b9bc>] register_disk+0x31/0x109
[ 397.853528] [<c05a6234>] ? blk_register_region+0x20/0x25
[ 397.853579] [<c05a6b08>] add_disk+0x9f/0xf0
[ 397.853627] [<c05a5bff>] ? exact_match+0x0/0xd
[ 397.853678] [<c05a603b>] ? exact_lock+0x0/0x16
[ 397.853731] [<f787c252>] diag_init+0x252/0x4bd [b2bntb_diag]
[ 397.853785] [<c0451e35>] ? __blocking_notifier_call_chain+0x42/0x4d
[ 397.853836] [<f787c000>] ? diag_init+0x0/0x4bd [b2bntb_diag]
[ 397.853889] [<c0401246>] do_one_initcall+0x4f/0x139
[ 397.853967] [<c0451e51>] ? blocking_notifier_call_chain+0x11/0x13
[ 397.854086] [<c04621a4>] sys_init_module+0x7f/0x19b
[ 397.854142] [<c07a7374>] syscall_call+0x7/0xb
[ 397.854177] ---[ end trace 6dc509801197bdc3 ]---
[ 397.855035] ------------[ cut here ]------------
[ 397.855035] kernel BUG at fs/sysfs/group.c:65!
[ 397.855035] invalid opcode: 0000 [#1] SMP
[ 397.855035] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/energy_full
[ 397.855035] Modules linked in: b2bntb_diag(+) fuse vboxvideo drm sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 vboxsf uinput snd_intel8x0 snd_ac97_codec vboxguest ac97_bus snd_seq snd_seq_device ppdev snd_pcm parport_pc parport microcode snd_timer joydev snd e1000 i2c_piix4 soundcore i2c_core snd_page_alloc [last unloaded: mperf]
[ 397.855035]
[ 397.855035] Pid: 1958, comm: insmod Tainted: G W 2.6.35.6-45.fc14.i686 #1 /VirtualBox
[ 397.855035] EIP: 0060:[<c0520d15>] EFLAGS: 00010246 CPU: 0
[ 397.855035] EIP is at internal_create_group+0x23/0x103
[ 397.855035] EAX: f4580258 EBX: f4580258 ECX: c09d4344 EDX: 00000000
[ 397.855035] ESI: f60521f0 EDI: c09d4344 EBP: f45b7ef0 ESP: f45b7ed0
[ 397.855035] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 397.855035] Process insmod (pid: 1958, ti=f45b6000 task=f3a68ca0 task.ti=f45b6000)
[ 397.855035] Stack:
[ 397.855035] 00000000 f45b7ee4 c05b08e0 8eecb04c f4580200 f4580200 f60521f0 f4580200
[ 397.855035] <0> f45b7ef8 c0520e1c f45b7f00 c0498de9 f45b7f18 c05a261a f4580250 f4580200
[ 397.855035] <0> 00000001 00000000 f45b7f38 c05a6b0f c05a5bff c05a603b f4580200 0fc00000
[ 397.855035] Call Trace:
[ 397.855035] [<c05b08e0>] ? kobject_set_name_vargs+0x46/0x4c
[ 397.855035] [<c0520e1c>] ? sysfs_create_group+0x11/0x15
[ 397.855035] [<c0498de9>] ? blk_trace_init_sysfs+0x10/0x12
[ 397.855035] [<c05a261a>] ? blk_register_queue+0x3b/0xac
[ 397.855035] [<c05a6b0f>] ? add_disk+0xa6/0xf0
[ 397.855035] [<c05a5bff>] ? exact_match+0x0/0xd
[ 397.855035] [<c05a603b>] ? exact_lock+0x0/0x16
[ 397.855035] [<f787c252>] ? diag_init+0x252/0x4bd [b2bntb_diag]
[ 397.855035] [<c0451e35>] ? __blocking_notifier_call_chain+0x42/0x4d
[ 397.855035] [<f787c000>] ? diag_init+0x0/0x4bd [b2bntb_diag]
[ 397.855035] [<c0401246>] ? do_one_initcall+0x4f/0x139
[ 397.855035] [<c0451e51>] ? blocking_notifier_call_chain+0x11/0x13
[ 397.855035] [<c04621a4>] ? sys_init_module+0x7f/0x19b
[ 397.855035] [<c07a7374>] ? syscall_call+0x7/0xb
[ 397.855035] Code: 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 57 56 53 83 ec 14 0f 1f 44 00 00 85 c0 89 c3 89 55 e0 89 cf 74 0a 85 d2 75 08 83 78 18 00 75 11 <0f> 0b 83 78 18 00 be ea ff ff ff 0f 84 c5 00 00 00 8b 17 85 d2
[ 397.855035] EIP: [<c0520d15>] internal_create_group+0x23/0x103 SS:ESP 0068:f45b7ed0
[ 397.865682] ---[ end trace 6dc509801197bdc4 ]---
[root#localhost ntb]#

The first oopss message is actually a warning from the kernel. The important part of the warning is right here: "attempted to be registered with empty name!". It means a descriptive name string field in a kobject was not supplied. Specifically, since in the call trace of the warning we see register_disk, I assume you forgot to properly init the name field of a struct you passed during registration. This is the warning part.
The next oopss message is an actual crash - some code in the sysfs file system that tried to create the name of a group from the name you were supposed to give in your registration process hit a kernel runtime assertion, not doubt due to the missing name field.
So this is why it is crashing. About your questions - some of the functions you see in the trace are actually called from inlined functions (and/or macros) that are used in your code. So your code is calling them, although not by name.
About the question mark, the kernel stack tracking mechanism reports if the address to symbol name lookup it does is "reliable" or not. Not 100% sure what that means, but if it doesn't you get the question mark in the symbol name.

Related

Additional serial ports are half working: why is that?

I have an embedded board with 2 serial ports and an additional PCI dual serial port + LPT, to reach a total of 4 serial ttys, though the added ttyS2 and ttyS3 almost don't work.
The system runs Debian buster, with a very few packages added to minimal setup.
All the ports are recognized by kernel
[ +0,021002] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ +0,021110] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[ +0,036520] 0000:04:00.0: ttyS2 at I/O 0xc030 (irq = 19, base_baud = 115200) is a ST16650V2
[ +0,035081] 0000:04:00.1: ttyS3 at I/O 0xc020 (irq = 16, base_baud = 115200) is a ST16650V2
and a successive test with setserial gives the same result.
Note however that a dpkg-reconfigure setserial does not write the file in /etc/setserial.conf and I have no idea on why - I tried resolving copying configuration by hand.
From some applications like minicom I see no result in opening the port and connecting from a remote terminal, nothing sent, nothing received.
From a test application using librxtx-java it looks to be sending data, but when data is received what happens is
[mar23 08:42] irq 16: nobody cared (try booting with the "irqpoll" option)
[ +0,000014] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
[ +0,000003] Hardware name: /, BIOS 5.6.5 12/18/2018
[ +0,000002] Call Trace:
[ +0,000006] <IRQ>
[ +0,000012] dump_stack+0x66/0x81
[ +0,000008] __report_bad_irq+0x3a/0xb4
[ +0,000006] note_interrupt.cold.9+0xa/0x63
[ +0,000008] handle_irq_event_percpu+0x6d/0x80
[ +0,000006] handle_irq_event+0x3c/0x60
[ +0,000004] handle_fasteoi_irq+0xa3/0x160
[ +0,000007] handle_irq+0x1f/0x30
[ +0,000006] do_IRQ+0x49/0xe0
[ +0,000005] common_interrupt+0xf/0xf
[ +0,000003] </IRQ>
[ +0,000007] RIP: 0010:cpuidle_enter_state+0xb9/0x320
[ +0,000006] Code: e8 7c 85 b2 ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00
[ +0,000003] RSP: 0018:ffffb9e30020be90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
[ +0,000005] RAX: ffff9329f8122140 RBX: 000000ae53921a15 RCX: 000000000000001f
[ +0,000002] RDX: 000000ae53921a15 RSI: 0000000060062542 RDI: 0000000000000000
[ +0,000003] RBP: ffff9329f812a248 R08: 0000000000000002 R09: 0000000000021a00
[ +0,000002] R10: 000000ecc794a002 R11: ffff9329f8121128 R12: 0000000000000001
[ +0,000002] R13: ffffffffa98b70b8 R14: 0000000000000001 R15: 0000000000000000
[ +0,000011] do_idle+0x228/0x270
[ +0,000006] cpu_startup_entry+0x6f/0x80
[ +0,000005] start_secondary+0x1a4/0x200
[ +0,000006] secondary_startup_64+0xa4/0xb0
[ +0,000005] handlers:
[ +0,000009] [<00000000920e25ee>] serial8250_interrupt
[ +0,000005] Disabling IRQ #16
I read a few articles and did a quick read of Serial-HOWTO, but it seems that once setserial has found the correct configuration everything should go as intended, so I have no clue on what's going on
---- EDIT
Well, I (almost) resolved: the board has 2 serial and 1 parallel, all connected to an external device. I mismatched a non-standard external connector and routed some RS232 level signals into the parallel port: that resulted to be fatal.
The confusing result is that the controller looked to still be working, while it isn't 100% doing so.
I'm waiting to get a new board...
Well, I (almost) resolved: it's all about an hardware fault.
The board has 2 serial and 1 parallel, all connected to an external device. I mismatched a non-standard external connector and routed some RS232 level signals into the parallel port: that resulted to be fatal.
The confusing result is that the controller looked to still be working, while it isn't 100% doing so. I'm waiting to get a new board...

How to run "invd" instruction with disabled SMP support?

I'm trying to execute "invd" instruction from a kernel module. I have asked a similar question How to execute “invd” instruction? previously and from #Peter Cordes's answer, I understand I can't safely run this instruction on SMP system after system boot. So, shouldn't I be able to run this instruction after boot without SMP support? Because there is no other core running, therefore there is no change for memory inconsistency? I have the following kernel module compiled with -o0 flag,
static int __init deviceDriver_init(void){
unsigned long flags;
int LEN=10;
int STEP=1;
int VALUE=1;
int arr[LEN];
int i;
unsigned long dummy;
printk(KERN_INFO "invd Driver loaded\n");
//wbinvd();
//asm volatile("cpuid\n":::);
local_irq_disable();
__asm__ __volatile__(
"wbinvd\n"
"loop:"
"movq %%rdx, (%%rbx);"
"leaq (%%rbx, %%rcx, 8), %%rbx;"
"cmpq %%rbx, %%rax;"
"jg loop;"
"invd\n"
: "=b"(dummy) // output
: "b" (arr),
"a" (arr+LEN),
"c" (STEP),
"d" (VALUE)
: "cc", "memory"
);
local_irq_enable();
//asm volatile("invd\n":::);
printk(KERN_INFO "invd execute\n");
return 0;
}
I'm still getting the following error upon inserting the module I'm getting Segmentation fault (core dumped) in the terminal and the dmesg shows,
[ 2590.518614] invd Driver loaded
[ 2590.518840] general protection fault: 0000 [#5] SMP PTI
I have boot my kernel with nosmp but I do not understand why dmesg still shows SMP PTI
$cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-136-generic root=UUID=dbe747ff-a6a5-45cb-8553-c6db6d445d3d ro quiet splash nosmp vt.handoff=7
Update post:
As I mentioned in the comment section, After disabling, SGX from BIOS, I was able to run this invd without any error. However, when I try to run the same code on a different machine with the same kernel version, I still get the same error message. It is strange and I can't explain why this is happening. As in the comment section, #prl mentions that the error may be coming from the instruction following invd. I begin to think that maybe that is true. Because second from the last line in the dmesg is higlighted in RED [ 153.527386] RIP: loop+0xc/0xf22 [noSmp8] RSP: ffffb8d9450a7be0. So, seems like the error is coming from inside the loop. I have updated the __init function code according to the suggestion. I'm not good at assembly code, can anyone please tell me if the inline assembly code is correct or not? If this inline assembly code is not correct how to fix the code? My whole dmesg trace is,
[ 153.514293] invd Driver loaded
[ 153.514547] general protection fault: 0000 [#1] SMP PTI
[ 153.514656] Modules linked in: noSmp8(OE+) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables ccm arc4 intel_rapl rt2800usb rt2x00usb x86_pkg_temp_thermal intel_powerclamp rt2800lib coretemp rt2x00lib mac80211 cfg80211 kvm_intel kvm irqbypass snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf dell_smm_hwmon dell_wmi dell_smbios dcdbas intel_wmi_thunderbolt snd_hda_codec_generic dell_wmi_descriptor wmi_bmof snd_seq_midi snd_seq_midi_event
[ 153.515454] serio_raw snd_hda_intel snd_hda_codec snd_hda_core sparse_keymap snd_hwdep snd_rawmidi joydev input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore mei_me mei shpchp intel_pch_thermal mac_hid acpi_pad parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt igb e1000e dca i2c_algo_bit ptp pps_core ahci libahci fb_sys_fops drm wmi video
[ 153.516038] CPU: 0 PID: 4024 Comm: insmod Tainted: G OE 4.15.0-136-generic #140~16.04.1-Ubuntu
[ 153.516331] Hardware name: Dell Inc. BIOS 1.3.2 01/25/2016
[ 153.516626] RIP: 0010:loop+0xc/0xf22 [noSmp8]
[ 153.516917] RSP: 0018:ffffb8d9450a7be0 EFLAGS: 00010046
[ 153.517213] RAX: ffffb8d9450a7c08 RBX: ffffb8d9450a7c08 RCX: 0000000000000001
[ 153.517513] RDX: 0000000000000001 RSI: ffffb8d9450a7be0 RDI: ffff8edaadc16490
[ 153.517814] RBP: ffffb8d9450a7c60 R08: 0000000000012c40 R09: ffffffffb39624c4
[ 153.518119] R10: ffffb8d9450a7c78 R11: 000000000000038c R12: ffffb8d9450a7c10
[ 153.518427] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8eda4c6bd660
[ 153.518730] FS: 00007fd7f09cf700(0000) GS:ffff8edaadc00000(0000) knlGS:0000000000000000
[ 153.519036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 153.519346] CR2: 00005634f95fde50 CR3: 000000040dd2c001 CR4: 00000000003606f0
[ 153.519656] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 153.519980] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 153.520289] Call Trace:
[ 153.520597] ? 0xffffffffc050d000
[ 153.520899] do_one_initcall+0x55/0x1ac
[ 153.521201] ? do_one_initcall+0x55/0x1ac
[ 153.521504] ? do_init_module+0x27/0x223
[ 153.521808] ? _cond_resched+0x32/0x50
[ 153.522107] ? kmem_cache_alloc_trace+0x165/0x1c0
[ 153.522408] do_init_module+0x5f/0x223
[ 153.522710] load_module+0x188c/0x1ea0
[ 153.523016] ? ima_post_read_file+0x83/0xa0
[ 153.523320] SYSC_finit_module+0xe5/0x120
[ 153.523623] ? SYSC_finit_module+0xe5/0x120
[ 153.523927] SyS_finit_module+0xe/0x10
[ 153.524231] do_syscall_64+0x73/0x130
[ 153.524534] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 153.524838] RIP: 0033:0x7fd7f04fd599
[ 153.525144] RSP: 002b:00007ffda61c2968 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[ 153.525455] RAX: ffffffffffffffda RBX: 00005643631d7210 RCX: 00007fd7f04fd599
[ 153.525768] RDX: 0000000000000000 RSI: 0000564361c3226b RDI: 0000000000000003
[ 153.526084] RBP: 0000564361c3226b R08: 0000000000000000 R09: 00007fd7f07c2ea0
[ 153.526403] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
[ 153.526722] R13: 00005643631d7ca0 R14: 0000000000000000 R15: 0000000000000000
[ 153.527040] Code: 00 48 8b 75 c8 48 8b 45 c8 8b 55 b8 48 63 d2 48 c1 e2 02 48 01 d0 8b 4d b4 8b 55 bc 48 89 f3 48 89 13 48 8d 1c cb 48 39 d8 7f f4 <0f> 08 48 89 d8 48 89 45 d0 e8 40 ef 73 00 48 c7 c7 c7 d0 c4 c0
[ 153.527386] RIP: loop+0xc/0xf22 [noSmp8] RSP: ffffb8d9450a7be0
[ 153.530228] ---[ end trace cc9ea64985c9fe34 ]---
So, it not possible to run invd even without SMP?
There's 2 questions here:
a) How to execute INVD (unsafely)
For this, you need to be running at CPL=0, and you have to make sure the CPU isn't using any "processor reserved memory protections" which are part of Intel's Software Guard Extensions (an extension to allow programs to have a shielded/private/encrypted space that the OS can't tamper with, often used for digital rights management schemes but possibly usable for enhancing security/confidentiality of other things).
Note that SGX is supported in recent versions of Linux, but I'm not sure when support was introduced or how old your kernel is, or if it's enabled/disabled.
If either of these isn't true (e.g. you're at CPL=3 or there are "processor reserved memory protections) you will get a general protection fault exception.
b) How to execute INVD Safely
For this, you have to make sure that the caches (which includes "external caches" - e.g. possibly including things like eDRAM and caches built into non-volatile RAM) don't contain any modified data that will cause problems if lost. This includes data from:
IRQs. These can be disabled.
NMI and machine check exceptions. For a running OS it's mostly impossible to stop/disable these and if you can disable them then it's like crossing your fingers while ignoring critical hardware failures (an extremely bad idea).
the firmware's System Management Mode. This is a special CPU mode the firmware uses for various things (e.g. ECC scrubbing, some power management, emulation of legacy devices) that't beyond the control of the OS/kernel. It can't be disabled.
writes done by the CPU itself. This includes updating the accessed/dirty flags in page tables (which can not be disabled), plus any performance monitoring or debugging features that store data in memory (which can be "not enabled").
With these restrictions (and not forgetting the performance problems) there are only 2 cases where INVD might be sane - early firmware code that needs to determine RAM chip sizes and configure memory controllers (where it's very likely to be useful/sane), and the instant before the computer is turned off (where it's likely to be pointless).
Guesswork
I'm guessing (based on my inability to think of any other plausible reason) that you want to construct temporary shielded/private area of memory (to enhance security - e.g. so that the data you put in that area won't/can't leak into RAM). In this case (ironically) it's possible that the tool designed specifically for this job (SGX) is preventing you from doing it badly.

Virtual address to physical address and reverse in android linux kernel

I'm trying to transform virtual address to physical address and map this physical address to virtual address with android linux kernel environment.
I can modify kernel code. So I tried next flow.
malloc() in android user space native binary not app
Transform va from malloc() to pa using the guide
Is there any API for determining the physical address from virtual address in Linux?
Pass pa to a system call function I made.
Re-map received pa to va in linux kernel space using ioremap()
Read value using readl() or ioread32()
But it's not working now.
The va to pa logic is in above link; in my native binary, below is the pseudo-code.
int main(){
char *va=malloc(100);
memset(va, "ttttt", ...)
uintptr_t paddr;
vir_to_phys_user(&paddr, getpid(), va);
syscall(sys_readpa, (unsigned long)paddr);
}
system call function
void sys_readpa(unsigned long pa){
void __iomem* mapped_add = ioremap(pa);
printk("%c", readl(mapped_add));
printk("%c", ioread32(mapped_add));
}
My code has similar logic:
I define va in user space and calculate pa from va.
I set va to "ttttt".
Pass pa to linux kernel space using syscall.
Remap this pa to va in kernel space.
Read va in kernel space and expect the value to be "ttttt"
I don't know the va to pa logic is correct. But it returns an address not failure.
But when syscall is called, kernel panic occur - e.g. "dereference for 0000000 address", and other kinds of errors. I checked pa in syscall is same with the one in user space.
My purpose of this try is study. I just wonder this implementation is possible if I can modify kernel code too but I met with an obstacle.
Please let me know what is problem or it's impossible? If needed, I'll update more detail code or specific error message.
I add detail errors and my debug log.
My user space log
: vitrual address : 0xf079c000
: 0xf079c000 -> 0xa4a8a000
I pass 0xa4a8a000 to syscall.
dmesg
[ 96.794448] accepted pa : 00000000a4a8a000
[ 96.794473] ------------[ cut here ]------------
[ 96.794500] WARNING: CPU: 6 PID: 11644 at arch/arm64/mm/ioremap.c:58 __ioremap_caller+0xc0/0xcc
[ 96.794519] Modules linked in:
[ 96.794552] CPU: 6 PID: 11644 Comm: mt Not tainted 4.14.113 #1
[ 96.794590] Call trace:
[ 96.794611] [<0000000000000000>] dump_backtrace+0x0/0x2b8
[ 96.794632] [<0000000000000000>] show_stack+0x18/0x24
[ 96.794655] [<0000000000000000>] dump_stack+0xa0/0xdc
[ 96.794676] [<0000000000000000>] __warn+0xbc/0x164
[ 96.794695] [<0000000000000000>] report_bug+0xac/0xdc
[ 96.794713] [<0000000000000000>] bug_handler+0x30/0x8c
[ 96.794732] [<0000000000000000>] brk_handler+0x94/0x150
[ 96.794751] [<0000000000000000>] do_debug_exception+0xd4/0x170
[ 96.794769] Exception stack(0xffffff8010fdbc10 to 0xffffff8010fdbd50)
[ 96.794787] bc00: 0000000000000000 0000000000000004
[ 96.794805] bc20: 00e8000000000f07 ffffff8008358714 000000000000000c 0000000000002d7c
[ 96.794822] bc40: ffffffc0119630e7 5b20205d38343434 0000000000000000 0000000000000001
[ 96.794839] bc60: 0000000000000001 00000000bab00000 0000000000000000 0000000080000000
[ 96.794856] bc80: ffffff800b18d000 0000000000000082 00000000000564c8 0000000000000074
[ 96.794873] bca0: 0000000000000074 00e8000000000f07 00000000a4a8a000 0000000000001000
[ 96.794890] bcc0: ffffff8008358714 0000000000000000 0000000000000011 000000000000018f
[ 96.794908] bce0: 000000000000018e ffffff8009316000 ffffffc8767edf80 ffffff8010fdbe80
[ 96.794926] bd00: ffffff80081fe124 ffffff8010fdbe50 ffffff80081fe188 0000000020400145
[ 96.794943] bd20: 0000000000000034 7cebe7b2cf849500 0000007fffffffff ffffff8009316000
[ 96.794961] bd40: ffffff8010fdbe80 ffffff80081fe188
[ 96.794978] [<0000000000000000>] el1_dbg+0x18/0x74
[ 96.794995] [<0000000000000000>] __ioremap_caller+0xc0/0xcc
[ 96.795014] [<0000000000000000>] __ioremap+0x10/0x1c
[ 96.795035] [<0000000000000000>] sys_readpa+0x78/0xfc
[ 96.795055] Exception stack(0xffffff8010fdbec0 to 0xffffff8010fdc000)
[ 96.795072] bec0: 00000000a4a8a000 0000000028bf4d08 0000000000000003 00000000f079c000
[ 96.795090] bee0: 0000000000000000 00000000a4a8a000 0000000000000000 000000000000018e
[ 96.795107] bf00: 00000000f09afd94 00000000f09d2b99 00000000ae6c9e84 00000000ae6a261e
[ 96.795124] bf20: 00000000ff921bf0 00000000ff921be0 00000000ae5f7b27 0000000000000000
[ 96.795142] bf40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 96.795159] bf60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 96.795176] bf80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 96.795195] bfa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 96.795212] bfc0: 00000000f091ce20 0000000060000010 00000000a4a8a000 000000000000018e
[ 96.795229] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 96.795247] [<0000000000000000>] __sys_trace_return+0x0/0x4
[ 96.795265] ---[ end trace 91e76f3be7c0b9bd ]---
[ 96.795418] ioremap return null
I found fix.
ioremap have a check logic for validation of address.
This function is for reserved address but it tring to map address that's already mapped to a process.
So, I modify the check logic in ioreamp and it works well.

__stack_chk_fail when executing copy_process() in _do_fork()?

I'm trying to do an experimental project about linux kernel(4.4.52) on x86_64, and one requirement of which is that whenever the control flow leaves specific function, the Write Protection bit in CR0 register would always be enabled. Generally speaking, it is like(the idea comes from nested kernel, but that is not very relevant to my question):
DISABLE_CR0.WP_BIT
original_func()
ENABLE_CR0.WP_BIT
By doing that, the whole kernel would be executing with CR0.WP enabled. I have replaced the original native_set_pte function and native_write_cr3 function with the format above, and now the kernel crashes when booting.
Here is the log(that's its original log, although the sequence seems weird):
[ 1.403888] IP: [<ffff8800351ebbb0>] 0xffff8800351ebbb0
[ 1.403891] PGD 2876067 PUD 2877067 PMD 3500e063 PTE 80000000351eb163
[ 1.403892] Oops: 0011 [#2] SMP
[ 1.403898] Modules linked in: crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse pata_acpi floppy
[ 1.403901] CPU: 0 PID: 143 Comm: systemd-udevd Tainted: G D 4.4.52v1+ #2
[ 1.403902] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[ 1.403903] task: ffff8800351c0e00 ti: ffff8800351e8000 task.ti: ffff8800351e8000
[ 1.403905] RIP: 0010:[<ffff8800351ebbb0>] [<ffff8800351ebbb0>] 0xffff8800351ebbb0
[ 1.403906] RSP: 0018:ffff8800351ebba8 EFLAGS: 00211086
[ 1.403906] RAX: 000000000000000e RBX: ffff8800351ebcf8 RCX: 000000000000000e
[ 1.403907] RDX: 0000000000000000 RSI: 0000000000201092 RDI: 0000000000201092
[ 1.403908] RBP: 0000000000000003 R08: ffffffff82778d60 R09: ffff8800351ebb40
[ 1.403909] R10: 0000000000000030 R11: ffffc00000000fff R12: ffff8800351c0e00
[ 1.403909] R13: 0000000000000010 R14: 0000000000201046 R15: ffffffffffffffff
[ 1.403911] FS: 00007f1f021e38c0(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 1.403912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.403912] CR2: ffff8800351ebbb0 CR3: 00000000351cd000 CR4: 00000000001406f0
[ 1.403916] Stack:
[ 1.403918] ffffffff810b62ae ffff8800351ebbc0 000000000000006c 0000000000000000
[ 1.403920] ffff8800351ebbd8 ffffffff810b62ae ffffffff8111ce51 00000000000364a4
[ 1.403921] ffffffff82783168 000000000000005c 000000000000000c ffffffff820583b0
[ 1.403922] Call Trace:
[ 1.403928] [<ffffffff810b62ae>] ? kvm_sched_clock_read+0x1e/0x30
[ 1.403930] [<ffffffff810b62ae>] ? kvm_sched_clock_read+0x1e/0x30
[ 1.403933] [<ffffffff8111ce51>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 1.403935] [<ffffffff8111c49e>] ? down_trylock+0x2e/0x40
[ 1.403937] [<ffffffff81129959>] ? console_trylock+0x19/0x60
[ 1.403938] [<ffffffff8112af2e>] ? vprintk_emit+0x29e/0x530
[ 1.403945] [<ffffffff8115fe8e>] ? crash_kexec+0x7e/0x140
[ 1.403953] [<ffffffff81440ae5>] ? find_next_bit+0x15/0x20
[ 1.403955] [<ffffffff814390bb>] ? __const_udelay+0x2b/0x30
[ 1.403958] [<ffffffff810a2a0c>] ? native_stop_other_cpus+0x8c/0x170
[ 1.403965] [<ffffffff811dde8f>] ? panic+0xeb/0x215
[ 1.403968] [<ffffffff810d12a7>] ? copy_process+0x727/0x1b20
[ 1.403970] [<ffffffff810d32f9>] ? __stack_chk_fail+0x19/0x20
[ 1.403972] [<ffffffff810d12a7>] ? copy_process+0x727/0x1b20
[ 1.403974] [<ffffffff810d2808>] ? _do_fork+0x78/0x360
[ 1.403975] [<ffffffff810d2b99>] ? SyS_clone+0x19/0x20
[ 1.403986] [<ffffffff818694f2>] ? entry_SYSCALL_64_fastpath+0x16/0x71
[ 1.404004] Code: 00 00 00 86 10 21 00 00 00 00 00 a8 bb 1e 35 00 88 ff ff 18 00 00 00 00 00 00 00 b0 bb 1e 35 00 88 ff ff ae 62 0b 81 ff ff ff ff <c0> bb 1e 35 00 88 ff ff 6c 00 00 00 00 00 00 00 00 00 00 00 00
[ 1.404005] RIP [<ffff8800351ebbb0>] 0xffff8800351ebbb0
[ 1.404006] RSP <ffff8800351ebba8>
[ 1.404006] CR2: ffff8800351ebbb0
[ 1.404008] ---[ end trace b62acacf75e0c54f ]---
[ 1.406415] Kernel Offset: disabled
[ 1.456105] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff810d12a7
[ 1.456105]
I guess the problem is that something at copy_process causes an overflow, maybe it writes to some read-only memory? But CR0.WP bit should only affects the supervisor mode according to intel's document, so does that mean kernel is running in supervisor mode when executing copy_process?
I tried to disassemble the kernel, and got really upset about all those countless assembly instructions... So I decide to find it out with qemu. However, the kernel did NOT crash in qemu!! The command I use is that
qemu-system-x86_64 -m 1G -kernel arch/x86/boot/bzImage -initrd arch/x86/boot/linux4.4.52-rootfs.img -hda vdisk.img --append "root=/dev/sda rw console=ttyS0" -nographic
I used to think that _do_fork is independent of specific devices and filesystems(correct me if I'm wrong), so what causes the kernel to crash at my VPS should make it crash at qemu as well, which it didn't.
Has anyone come across the same issue? I really need some help now.
P.S. I do this at my VPS, ubuntu 16.04.2, but I think this is not the reason.
Please note that QEMU is not fully architecture accurate model, and QEMU do not provide full support of all architecture referenced features, because QEMU aim is emulation speed, not accuracy, and then it only provide some workable architecture profile to run OS.
Some features QEMU do precise and correct like paging and segmentation, but many are not : there are some troubles with CPUID, syscall, many instructions do not generate #GP, #SS, #PF, floating point errors, some instructions are not implemented (AVX, AVX2, FMA) and so on.
You should use one of x86 golden models to catch such tricky cases: try to use Bochs, or one of provided by Intel itself.

Create a ethernet packet in a kernel module and send it

I need to create an ethernet packet an send it in my kernel module. Someone can help me to do this?
I think i need to create a skb using dev_alloc_skb, then i need to write the mac_ethernet, insert the data and send it using dev_queu_xmit.
But i'm not sure if this work, or if it is the right and easiest way to do it.
Best Regards
EDIT1:
int sendpacket ()
{
unsigned char dest[ETH_ALEN]={0x00,0x25,0x22,0x05,0xF3,0xF0};
unsigned char src[ETH_ALEN] = {0x90,0xE6,0xBA,0x48,0x7C,0x87};
struct sk_buff * skbt =alloc_skb(ETH_FRAME_LEN,GFP_KERNEL);
//skb_reserve(skb,ETH_FRAME_LEN);
dev_hard_header(skbt,dev_eth1,ETH_P_802_3,dest,src,dev_eth1->addr_len);
if(dev_queue_xmit(skbt)!=NET_XMIT_SUCCESS)
{
printk("Not send!!\n");
}
kfree_skb(skbt);
return 0;
}
> Dmesg command:
>
> 677.826933] Hello:I'm the hook module!!!! [ 677.826937] 2!!!! [ 677.826941] skb_under_panic: text:c0723608 len:14 put:14 head:f1843800 data:f18437f2 tail:0xf1843800 end:0xf1843e00 dev:<NULL> [ 677.826959]
> ------------[ cut here ]------------ [ 677.826961] kernel BUG at net/core/skbuff.c:146! [ 677.826964] invalid opcode: 0000 [#1] SMP [
> 677.826967] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map [
> 677.826969] Modules linked in: sendpacket(+) bluetooth rfkill vfat fat fuse sunrpc cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput
> snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel
> snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd
> soundcore atl1e snd_page_alloc iTCO_wdt iTCO_vendor_support r8169 mii
> i2c_i801 microcode asus_atk0110 pcspkr ata_generic pata_acpi
> usb_storage pata_marvell radeon ttm drm_kms_helper drm i2c_algo_bit
> i2c_core [last unloaded: sendpacket] [ 677.827003] [ 677.827003]
> Pid: 4780, comm: insmod Tainted: G W 2.6.35101 #7 P5QL
> PRO/P5QL PRO [ 677.827003] EIP: 0060:[<c070a192>] EFLAGS: 00210246
> CPU: 0 [ 677.827003] EIP is at skb_push+0x57/0x62 [ 677.827003] EAX:
> 00000088 EBX: c08f9fdc ECX: f156bf10 EDX: c093b4ca [ 677.827003] ESI:
> 00000000 EDI: f51ca000 EBP: f156bf38 ESP: f156bf0c [ 677.827003] DS:
> 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 677.827003] Process insmod
> (pid: 4780, ti=f156a000 task=f2b071a0 task.ti=f156a000) [ 677.827003]
> Stack: [ 677.827003] c093b4ca c0723608 0000000e 0000000e f1843800
> f18437f2 f1843800 f1843e00 [ 677.827003] <0> c08f9fdc f156bf64
> f156bf6a f156bf50 c0723608 00000001 c07235e5 f3b6c000 [ 677.827003]
> <0> 00835ff4 f156bf78 f7d640a8 f156bf6a f156bf64 00000006 48bae690
> 2500877c [ 677.827003] Call Trace: [ 677.827003] [<c0723608>] ?
> eth_header+0x23/0x93 [ 677.827003] [<c0723608>] ?
> eth_header+0x23/0x93 [ 677.827003] [<c07235e5>] ?
> eth_header+0x0/0x93 [ 677.827003] [<f7d640a8>] ?
> sendpacket+0x8f/0xb6 [sendpacket] [ 677.827003] [<f7d67000>] ?
> hook_init+0x0/0x46 [sendpacket] [ 677.827003] [<f7d67044>] ?
> hook_init+0x44/0x46 [sendpacket] [ 677.827003] [<c0401246>] ?
> do_one_initcall+0x4f/0x139 [ 677.827003] [<c0451e29>] ?
> blocking_notifier_call_chain+0x11/0x13 [ 677.827003] [<c046210c>] ?
> sys_init_module+0x7f/0x19b [ 677.827003] [<c040321f>] ?
> sysenter_do_call+0x12/0x28 [ 677.827003] Code: c0 85 f6 0f 45 de 53
> ff b0 a8 00 00 00 ff b0 a4 00 00 00 51 ff b0 ac 00 00 00 52 ff 70 50
> ff 75 04 68 ca b4 93 c0 e8 ad 4a 09 00 <0f> 0b 8d 65 f8 89 c8 5b 5e 5d
> c3 55 89 e5 56 53 0f 1f 44 00 00 [ 677.827116] EIP: [<c070a192>]
> skb_push+0x57/0x62 SS:ESP 0068:f156bf0c [ 677.827154] ---[ end trace
> dee1e3278503a581 ]---
In your case you just want to use raw packets from user space instead of dealing with the complexities of kernel code.
This blog post details how to do everything you need.
At the risk of sounding like a broken record you're learning why this should be done from user space.
Because you seem determined to make this mistake anyway, let's try to figure out what the problem is.
It's also a good illustration of how helpful it is to have source code. The exception log tells you the problem occurred on line 146 of net/core/skbuff.c.
That's within the function skb_under_panic(), which is only used in that file (it's static after all), from within skb_push().
The skb_push() function expands the skb forwards. Basically it creates room in the buffer for a new header. It does this by shifting the internal data pointer forward.
In your case, the internal data pointer is still in its original localtion: at the very from of the skb. You need to reserve some room at the front of the skb first. Use skb_reserve(), pretty much just like you had. Why did you comment that out?
Also, you need to check that the allocation of the skb succeeded. Kernel allocators can (and do) return NULL sometimes.

Resources