How to debug problems in Linux kernel module `init()`? - linux

I am using remote (k)gdb to debug a problem in a module that causes a panic when loaded e.g. when init() is called.
The stack trace just shows that do_one_initcall(mod->init) causes the crash. In order to get the symbol file loaded in the gdb, I need to get the address of the module text section, and to get that I need to get the module loaded.
Because the insmod in busybox (1.16.1) doesn't support -m so I'm stuck to grep modulename /proc/modules + adding the offset from nm to figure out the address.
So I'm facing a sort a of a chicken and an egg problem here - to be able to debug the module loading, I need to get the module loaded - but in order to get the module loaded, I need to debug the problem...
So I am currently thinking about two options - is there a way to get the address information either:
by printk() in the module init code
by printk() somewhere in the kernel code
all this prior to calling the mod->init() - so I could place a breakpoint there, load the symbol file, hit c and see it crash and burn...

Can you build your code into the kernel rather than as a module? That might simplify debugging the init() call.
You could also set a breakpoint at do_one_initcall() and look at the address of mod->init to get the load address.

Related

Vulkan can't find layer libs on Linux

All my Vulkan SDK paths are sourced in .profile and give the following results when echoed:
I can enumerate all layers and the application compiles without problems. However, when I run it, I get the following error messages from the debug report callback:
I'm on Ubuntu 17.10 with a GTX 1060 with the 387.42.05 drivers, which support Vulkan 1.1.
Running the application with LD_DEBUG=libs shows 2 errors:
/lib/x86_64-linux-gnu/libpthread.so.0: error: symbol lookup error: undefined symbol: pthread_setname_np, version GLIBC_2.2.5 (fatal)
/home/jesta88/Vulkan/VulkanSDK/1.1.70.1/x86_64/lib/libVkLayer_parameter_validation.so: error: symbol lookup error: undefined symbol: vkNegotiateLoaderLayerInterfaceVersion (fatal)
I have no idea what to make of these errors.
I can't completely explain the first error, although I can reproduce it. It is preceded by
calling init: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
so I suspect that the nvidia driver is probing for a symbol and fails to find it. Although this is marked as "fatal", it isn't really.
For the second error, I can see that too. I reproduced it by running the build_examples.sh script in the SDK. Then:
cd examples/build
LD_DEBUG=libs ./cube --validate -c 300 2> log
The app runs fine.
To convince myself that the validation layers are loaded and working, I created a validation error by commenting out the call to vkDestroyDescriptorPool (line 2252 in cube.c) and got the expected validation errors.
In this case, I think that the Vulkan loader is trying to look up the vkNegotiateLoaderLayerInterfaceVersion symbol in the driver and failing to find it. This is not a fatal condition either since the export of this symbol by a driver is optional. If the loader does not find the symbol, then it assumes a particular protocol between the loader and the driver. If the symbol does exist, the loader calls it to get additional information about the loader<->ICD interface that the driver supports.
Some more detail can be found in this document.
In short, I don't think that these are actual problems.
Edit: The vkNegotiateLoaderLayerInterfaceVersion issue is really happening when the loader attempts to load a layer, and not the ICD (driver), but the same explanation still applies.
I still can't explain the messages you are getting about not finding the layers.
I suggest setting VK_LOADER_DEBUG=all to get some detailed messages about what the Vulkan loader is doing while it is looking for the layers.
Also, try running the cube demo as I outlined above to see if that app runs correctly.

export per cpu symbol for kernel module

I'm trying to export a per-cpu symbol "x86_cpu_to_logical_apicid" from kernel so that my kernel module can access it. In "arch/x86/kernel/apic/x2apic_cluster.c", I did
//static DEFINE_PER_CPU(u32, x86_cpu_to_logical_apicid);
DEFINE_PER_CPU(u32, x86_cpu_to_logical_apicid); //I remove static
EXPORT_PER_CPU_SYMBOL(x86_cpu_to_logical_apicid); // I add this
And after I recompile the kernel, the /proc/kallsyms shows
0000000000011fc0 V x86_cpu_to_logical_apicid
0000000000012288 V x86_cpu_to_node_map
ffffffff8187df50 r __ksymtab_x86_cpu_to_apicid
Then I try to access the "x86_cpu_to_logical_apicid" in my kernel module, by using
int apicid = per_cpu(x86_cpu_to_logical_apicid, 2)
However, when I loaded it, it fails to load it due to "Unknown symbol in module". The flag "V" means weak object, however I'm not sure whether this is the reason I fails to export the symbol. Can anyone give me some suggestions? Thank you!
I realize that the OP perhaps is not interested in the answer anymore, but today I had a similar issue, and I thought it might help others as well.
Before using an exported per_cpu variable in a module, you have to declare it first. For your case:
DECLARE_PER_CPU(u32, x86_cpu_to_logical_apicid);
Then you can use get_cpu_var and put_cpu_var to safely access the current processor's copy of the variable. You can read more here.

How are intermodule dependencies resolved when...?

How are intermodule dependencies resolved when both modules are built outside of the kernel tree and modversioning is enabled?
Modversioning is used to ensure that binary loadable modules are compatible with the kernels they are loaded upon. It is enabled with .config option CONFIG_MODVERSIONS.
We have two dynamically loaded kernel modules, one of which uses an exported symbol from the other. Although the module with the dependence on the other is loaded after the other, insmod complains that it can not resolve a dependency.
[FWIW, these particular modules would serve no useful purpose in the open source world. The people who designed these modules like to keep them out of the kernel tree for their own SCM purposes. The solution of deploying these as a kernel patch will not work.]
This is what the kernel log shows.
<4>foomod: no symbol version for bar_api
<4>foomod: Unknown symbol bar_api
However, if I cat /proc/kallsyms, the bar_api is there and shown as exported.
Another developer here suggested that we use a .conf file to get invoked from the loadmodules script that ignores this error and forces a load, something like this.
install foomod { /sbin/modprobe --ignore-install --force-modversion foomod
$CMDLINE_OPTS; }
I think there has got to be a cleaner way to fix this.
I've tried modifying the Makefile to reference symvers of the module exporting the symbol. The module source for each are in peer directories. It does not seem to matter, but I could be doing this wrong.
KBUILD_EXTRA_SYMBOLS := ../barmod/Module.symvers
This is the content of Module.symvers:
0x00000000 bar_api bar_api barmod
The 0x00000000 is supposed to be valid with modversioning disabled. I think if I could use modprobe like this and see the exported function, then the modprobe would be successful. However, this would only work when modversions is disabled.
# modprobe --dump-modversions foomod.ko
0x00000000 bar_api
However, copying both drivers into the kernel tree and building from within it works. This is a partial listing of the symbols referenced with checksums.
# modprobe --dump-modversions foomod.ko
0x46085e4f add_timer
0x7d11c268 jiffies
0x6a9f26c9 init_timer_key
0x7ec9bfbc strncpy
0xe43dc92b misc_register
0x3302b500 copy_from_user
0x85f8a266 copy_to_user
0xc6538cfc bar_api
0xea147363 printk
: :
Way back around ~2002 having CONFIG_MODVERSIONS would have caused the build to append a checksum generated from genksyms to each exported kernel function. Symbols would look something like this: printk_R1b7d40. This is the last time I've had to deal with modversioning since all of my work since has been with open-source code, within the stock kernel code, or with modversioning disabled. However, today's builds use genksyms to create a checksum for each symbol that goes into a special section. This special section is checked for a checksum match.
There used to be a kernel macro called EXPORT_SYMBOL_NOVERS() that would have worked, but that has been deprecated.
The Linux kernel used is 2.6.32.
I've found these articles relevant and helpful, but inconclusive:
http://lxr.free-electrons.com/source/Documentation/kbuild/modules.txt
http://lwn.net/Articles/21393/
http://www.linuxchix.org/content/courses/kernel_hacking/lesson8
http://lwn.net/Kernel/LDD2/ch11.lwn
How do I cleanly export a function from a loadable module and allow it to be used by another, dependent loadable module when both are built outside of the Linux kernel?

specify linux kernel module dependency when using jprobe

I am building two linux kernel modules.
The second module (called debugging module hereafter) basically uses jprobe to intercept calling of functions inside the first module (called main module) and prints some states for debugging. They work pretty well. But I got one question on dependency for the debugging module on the main module.
Apparently, the debugging module depends on the main module, as when loading the debugging module without the main module loaded, I got error
"Unknown symbol in module, or unknown parameter"
However, it looks like that modules.dep could not figure it out. By looking at
nm -u <debugging_module.ko>
I did not find any unresolved symbol related to the main module. But jprobe needs function name from main module to intercept, and that is as string assigned for .kp.symbol_name in jprobe structure.
How can we specify dependency in this situation?

How to load a modified kernel module which is already exist in precompiled kernel

one way is to do so is to build the kernel sources again with making original module loadable which need to be modified. so original module can be removed and modified module can be inserted. but this is time consuming process.
I am wondering if there is some other way to load modified module.
I made some modification in MD driver and tried to load it on precompiled kernel.
insmod failed with following error messege:
md_mod: exports duplicate symbol bitmap_close_sync (owned by kernel)
insmod: error inserting 'md-mod.ko': -1 Invalid module format
Please provide the feedback if it can be done. Any help would be appreciated. Thanks !
This error shows up because already the bitmap_close_sync symbol is exported by the kernel, again you are trying to re-export the same symbol in md_mod module. Try not to export the symbol, compile and try inserting the module (module should be compiled against the kernel version on which it is inserted). Go through stackoverflow link What will happen if two kernel module export same symbol. Hope it answers your question :-).

Resources