TCP sendto (C++) fails on Linux but not OSX. Errno: EINVAL 22 Invalid argument - linux

On the client side of my application, the following runs fine on OSX. But when compiled/run on Linux (Ubuntu 12 or Raspbian) sendto() always fails with a EINVAL/22/invalid argument. How do I run it on Linux?
std::vector<uint8_t> rawVect;
// rawVect.push_back()...a bunch of bytes
const uint8_t* sendBytes = &rawVect[0]; // or rawVect.data();
size_t sendSize = rawVect.size();
if(sendSize > 0){
long numBytes = sendto(control_fd, sendBytes, sendSize, 0, res->ai_addr, _res->ai_addrlen);
}
I suspect C++ 11 libraries and std::vectors on Linux. My makefile looks similar to this.
mac:
g++ -std=c++0x myprogram.cpp
# (w/ llvm libc++)
ubuntu:
clang++-3.5 -g -std=c++11 -stdlib=libc++ myprogram.cpp
# couldn't use g++ 4.8 or prior because it didn't support std::vector::insert as I was using it elsewhere. 4.9 not avail for Ubuntu 12.
pi:
g++-4.9 -std=c++0x myprogram.cpp

man 3 sendto says that EINVAL may be returned if "The dest_len argument is not a valid length for the address family", perhaps despite the fact that the address argument is ignored for connected-mode sockets. Given that you mention TCP in the title, I assume that control_fd is a connected-mode socket. Try simply using send(control_fd, sendBytes, sendSize, 0) or even write(control_fd, sendBytes, SendSize) instead.

There's not enough to go on. Add print statements to reveal the values of all the parameters passed to sendto. Then print out the relevant members of res->ai_addr after casting back to sockaddr_in.
One hypothesis. The value of ai_addrlen should exactly equal sizeof(struct sockaddr_in) assuming ipv4. Or sizeof(sockaddr_in6) if the socket is ipv6. Some operating systems are less forgiving if you pass in a value that's bigger than the actual size expected for that socket type. Such would be the case with assigning the ai_addrlen to be sizeof(sockaddr_storage).

Related

Kernel make generates “ld: arch/x86/entry/syscall_64.o:(.rodata+0xdc0): undefined reference to `__x64_sys_s_enable'”

OS is Ubuntu 20.10 Kernel Source is linux_5.8.0-59.66
I am porting kernel modifications from Centos 7 Rhel 7.9 to Ubuntu.
The original unmodified Ubuntu kernel source compiles and runs cleanly on this machine. The compiler set up seems to be functioning properly.
My current problem is related to a system call I've added. The error generated is -
LD .tmp_vmlinux.btf
ld: arch/x86/entry/syscall_64.o:(.rodata+0xdc0): undefined reference to `__x64_sys_s_enable'
BTF .btf.vmlinux.bin.o
Segmentation fault (core dumped)
LD .tmp_vmlinux.kallsyms1
.btf.vmlinux.bin.o: file not recognized: file format not recognized
make: *** [Makefile:1163: vmlinux] Error 1
I have searched and googled this original error "undefined reference", found possible fixes which have not worked.
Here are the steps I used to add the system call, which originally worked on Centos 7 and RHEL 7.9.
Modified /SOURCE-DIRECTORY/include/linux/syscalls.h commentng out the original line and adding the reference to __64 (including a blank line above it)-
asmlinkage long __64_sys_s_enable(int s_enable_flag);
//asmlinkage long sys_s_enable(int s_enable_flag);
Modified /SOURCE-DIRECTORY/arch/x86/include/asm/syscalls.h adding -
440 64 s_enable sys_s_enable
The fields are delimited by TAB, and I did not add any blank lines.
Created the source directory and files - /SOURCE-DIRECTORY/s_enable containing s_enable.c. s_enable.c in it's entirety is
#include <linux/kernel.h>
extern int s_enable_flag;
asmlinkage long sys_s_enable(int i)
{
// printk(KERN_INFO "In ORIGINAL SYSCALL s_enable\n");
s_enable_flag = i;
return 0;
}
And added the appropriate syscall directory to the Makefile.
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/ s_enable/
And ran "sudo make".
I'm not sure what I might be doing wrong in that the "make" works with the original kernel source, and the system call I am trying to add has worked on the other mentioned distros.
Thanks for any input you can provide.
UPDATE 07-18-2021
I made the following changes on 07-17-2021 in order to use SYSCALL_DEFINE1.
SOURCEDIR/include/linux/syscalls.h
The reference to sys_s_enable has been commented out.
//asmlinkage long sys_s_enable(int s_enable_flag);
SOURCEDIR/arch/x86/entry/syscalls/syscall_64.tbl
"64" changed to "common"
440 common s_enable sys_s_enable
SOURCEDIR/Makefile has been edited to remove SOURCEDIR/s_enable from core-y
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/
#core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/ s_enable/
Copied/edited the original s_enable.c into SOURCEDIR/kernel/sys.c using SYSCALL_DEFINE1
SYSCALL_DEFINE1(su_enable, int, i)
{
extern int s_enable_flag;
s_enable_flag = i;
return 0;
}
The compile command was sudo make -j4 and took 12-15 hours which is somewhat normal.
The error was
LD .tmp_vmlinux.btf
ld: arch/x86/entry/syscall_64.o:(.rodata+0xdc0): undefined reference to `__x64_sys_s_enable'
Thanks - Roger
If we want to create our own system call a newer version of Linux __x64_sys_
Here is the comment from arch/x86/entry/syscalls/syscall_64.tbl in the begin
The x64_sys*() stubs are created on-the-fly for sys*() system calls
so that our system call function name might start the prefix with __x64_sys_, here is the sample code for your own function.
asmlinkage long __x64_sys_s_enable(int i)
{
// printk(KERN_INFO "In ORIGINAL SYSCALL s_enable\n");
s_enable_flag = i;
return 0;
}
Then the include/linux/syscalls.h file might need to add this prefix name which aligns with the function name
asmlinkage long __x64_sys_s_enable(int i);
the system-call entry we can just use your expectation function name arch/x86/entry/syscalls/syscall_64.tbl
440 common s_enable sys_s_enable
We can recompile your kernel if we follow those steps, and we might get a successful build.

Error 'Unknown symbol __copy_to_user' during module load

I have NAS Terramaster F4-210 based on arm64 Realtek RTD1296 CPU. It has custom OpenWrt 15.05.1 based firmware with 4.4.18 linux kernel. I want to create kernel module to use my zigbee stick (cdc-acm - USB Modem (CDC ACM) support) and run homeassistant on it.
# uname -a
Linux TNAS-BA68 4.4.18-g8bcbd8a-dirty #1327 SMP Mon Aug 31 11:55:52 CST 2020 aarch64 GNU/Linux
I downloaded appropriate kernel, created some config and after installing my newly compiled module I get the following errors in kernel log:
# modprobe cdc-acm
1 module could not be probed
- cdc-acm
# dmesg
...
cdc_acm: Unknown symbol __copy_to_user (err 0)
cdc_acm: Unknown symbol __copy_from_user (err 0)
cdc_acm: Unknown symbol _mcount (err 0)
As far as I understand that means module expects copy_to_user, copy_from_user, mcount to be part of the kernel (or other loaded module). But kernel doesn't export these symbols:
# cat /proc/kallsyms | grep copy_to_user
ffffff80082923f0 T copy_to_user_page
ffffff80087d2600 T __arch_copy_to_user
ffffff80087e67b0 t kfifo_copy_to_user
ffffff8008871854 T my_copy_to_user
File arch/arm64/include/asm/uaccess.h has definition of copy_to_user:
extern unsigned long __must_check __copy_to_user(void __user *to, const void *from, unsigned long n);
...
static inline unsigned long __must_check copy_to_user(void __user *to, const void *from, unsigned long n)
File arch/arm64/lib/copy_to_user.S contains source code of __copy_to_user:
ENTRY(__copy_to_user)
ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
...
ENDPROC(__copy_to_user)
My initial idea was I have bad kernel configuration or bad toolchain. So I used toolchain and OpenWrt+kernel configuration from banana pi W2 board which has the same CPU. Without any luck but compile warning had disappeared.
So can somebody please confirm that the problem can't be solved by applying some different kernel configuration or proper toolchain. Instead kernel source code must be modified. E.g. instead of __copy_to_user one of __arch_copy_to_user or my_copy_to_user must be used.
So my assumption is: Terramaster took kernel sources, modified (probably used __arch_copy_to_user instead of __copy_to_user) and then compiled sources.
BTW: I also checked kernel sources and didn't find __arch_copy_to_user. Does that mean it was introduced by kernel sources modifications or it still can be present thereby usage of some nasty defs.

Need help resolving segfault in libc-2.23.so

Need help debugging shared library with gdb.
I am trying to debug a shared library and in my case it is:
libc-2.23.so
The reason is that I get theese lines in dmesg:
[10081.433266] compiz[11346]: segfault at 7f30a4100010 ip 00007f309c36f44b sp 00007ffdde303aa0 error 4 in libc-2.23.so[7f309c2f1000+1bf000]
[22005.764635] compiz[16149]: segfault at 7f30e3456db0 ip 00007f30db85044b sp 00007fffaab9c0a0 error 4 in libc-2.23.so[7f30db7d2000+1bf000]
[48777.031064] compiz[25203]: segfault at 7f0b8e23b050 ip 00007f0b87edf44b sp 00007ffd51d15740 error 4 in libc-2.23.so[7f0b87e61000+1bf000]
[78850.413793] compiz[4889]: segfault at 7f60ddbf2440 ip 00007f60d598944b sp 00007ffedc5e31b0 error 4 in libc-2.23.so[7f60d590b000+1bf000]
[84583.754783] compiz[8441]: segfault at 7f5f8c3930c0 ip 00007f5f871d544b sp 00007ffc436bb5a0 error 4 in libc-2.23.so[7f5f87157000+1bf000]
[100625.457854] compiz[15619]: segfault at 7ffffa967680 ip 00007ffff722844b sp 00007fffffffdad0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[104234.596331] compiz[19076]: segfault at 7ffffa2dc540 ip 00007ffff722844b sp 00007fffffffd810 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[112314.238115] compiz[22152]: segfault at 7ffffe232760 ip 00007ffff722844b sp 00007fffffffd810 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[130828.195732] compiz[26013]: segfault at 7ffffa966180 ip 00007ffff722844b sp 00007fffffffdad0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
[225379.026592] compiz[19275]: segfault at 7ffff821b6d0 ip 00007ffff722844b sp 00007fffffffd7c0 error 4 in libc-2.23.so[7ffff71aa000+1bf000]
The address where libc-2.23.so is loaded does not change after time stamp 100625.457854 since I ran the command:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
In order to be able to load it under gdb.
What I have done so far is that I have established that the segfault always occur on the same offset from the shared librarys loaded address.
I calculated the offset by taking instruction pointer minus load address in python:
ld = ["7f309c2f1000", "7f30db7d2000", "7f0b87e61000", "7f60d590b000", "7f5f87157000", "7ffff71aa000"]
ip = ["7f309c36f44b", "7f30db85044b", "7f0b87edf44b", "7f60d598944b", "7f5f871d544b", "7ffff722844b"]
ld_val = [int(x,16) for x in ld]
ip_val=[int(x,16) for x in ip]
ip_off=[i-s for (i,s) in zip(ip_val,ld_val)]
ip_off
[517195, 517195, 517195, 517195, 517195, 517195]
So using this information I got the offending line from executing:
$ addr2line -e /lib/x86_64-linux-gnu/libc-2.23.so -fCi 0x7e44b
malloc_consolidate
/build/glibc-9tT8Do/glibc-2.23/malloc/malloc.c:4167
Since I run Ubuntu 16.04 I installed the sources by issuing:
$ apt-get source glibc-source
Inspecting the offending line showed that it was just a comment.
malloc.c:4167
/* Slightly streamlined version of consolidation code in free() */
inside function:
static void malloc_consolidate(mstate av)
So I am assuming I am doing something wrong here.
Any pointer on how to capture this "segfault"?
So I am assuming I am doing something wrong here.
You aren't.
The symptoms you are looking at are 99.999% result of heap corruption, and since this is happening in compiz, there is little you can do except file a bug report.
To make a useful bug report, it would help if you could run compiz under Valgrind. Running it under GDB will not help.
I had gdb loaded with the library and breakpoint on line 4167 but no break even if I got a new entry in dmesg.
That means you are debugging the wrong process. Perhaps compiz forks helper processes, and one of them dies?

File capabilities do not transfer to process once executed

I'm trying to write a program which requires elevated capabilities (rather than simply run it with sudo). However, none of the capabilities I set using setcap seem to transfer into the process once executed. This problem occurs across multiple executables and using different capabilities.
This code uses cap_set_file() to give the CAP_NET_RAW capability to a file passed as a CLA. (Don't ask me why I need this.)
#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/capability.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#define handle_error(msg) \
do { printf("%s: %s\n", msg, strerror(errno)); exit(EXIT_FAILURE); } while (0)
void print_cap_buf(cap_t cur) {
char *buf;
buf = cap_to_text(cur, NULL);
printf("%s\n", buf);
cap_free(buf);
}
void get_and_print_cap_buf() {
cap_t cur = cap_get_proc();
print_cap_buf(cur);
cap_free(cur);
}
int main(int argc, char *argv[]) {
cap_t file_cap;
printf("Process capabilities: ");
get_and_print_cap_buf(); // Print the current process capability list.
file_cap = cap_from_text("cap_net_raw=ep");
if (file_cap == NULL) handle_error("cap_from_text");
printf("Capabilities to set in file: "); print_cap_buf(file_cap);
if (argc == 2) {
if ( cap_set_file(argv[1], file_cap) != 0) handle_error("cap_set_file");
} else printf("No file specified.\n");
cap_free(file_cap);
return 0;
}
After compiling with gcc:
gcc -Wall -pedantic -std=gnu99 test.c -o tt -lcap
I give it the capabilities with:
sudo setcap "cap_setfcap,cap_fowner,cap_net_raw=eip" tt
and using getcap tt, the output is:
$ getcap tt
tt = cap_fowner,cap_net_raw,cap_setfcap+eip
However, when I run the program, I get the following output (test-client is an executable which creates a raw Ethernet socket):
$ ./tt test-client
Process capabilities: =
Capabilities to set in file: = cap_net_raw+ep
cap_set_file: Operation not permitted
HOWEVER... when I run the program with sudo, all process capabilities come through just fine.
$ sudo ./tt test-client
Process capabilities: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep
Capabilities to set in file: = cap_net_raw+ep
and the target file "test-client" gets its capabilities set properly.
However, even with CAP_NET_RAW, the client fails on its socket() call with EPERM. I've tried setting CAP_NET_ADMIN in case it needed that as well; same issue. I've tried using CAP_SETPCAP on the program above; no dice. I'm fairly sure I've narrowed it down to some disconnect where the executable file's capabilities aren't getting into the running process.
What am I missing here?
EDIT, the next morning:
Okay, so I've done some more testing and it turns out this code works just fine on a Raspberry Pi. I'm running Lubuntu 16.04 with LXTerminal on my primary machine and that's the one that's failing. It fails inside LXTerminal and also in a text-only shell. Maybe it's an OS bug?
The Lubuntu machine (cat /proc/version):
Linux version 4.4.0-34-generic (buildd#lgw01-20) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
The pi:
Linux version 4.4.11-v7+ (dc4#dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #888 SMP Mon May 23 20:10:33 BST 2016
EDIT AGAIN: --
Tested on a different machine with the same USB key I used to install. Slightly different /proc/version:
Linux version 4.4.0-31-generic (buildd#lgw01-16) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016
Works fine. I'm so confused.
I finally got this to work, thanks to the information found here:
https://superuser.com/questions/865310/file-capabilities-setcap-not-being-applied-in-linux-mint-17-1
It turns out that my home directory is being mounted as nosuid, which disables all capability flags.
When running the program on a filesystem without nosuid, it works as expected.
For future readers: if you encounter this issue, make sure your filesystem is not mounted as nosuid. Using the mount command, check for the filesystem that matches where you're storing the data (in my case /home/user) and see if the nosuid flag is set.
$ mount
...
/home/.ecryptfs/user/.Private on /home/user type ecryptfs (rw,nosuid,nodev,relatime,ecryptfs_fnek_sig=***,ecryptfs_sig=***,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs)
(It's an ecryptfs system, so if you selected "Encrypt my home directory" on the Ubuntu install you'll probably have this problem. I couldn't figure out a way to mount this as suid, and probably wouldn't want to anyway.)
I ended up making a new directory /code (it's my filesystem, I can do what I want) which is mounted on a different partition without nosuid.
It would be nice if the man pages for capabilities referenced this fact... (edit: patch submitted, it does now :) )
Just a data point: your code works here on an older LTS machine:
$ uname -vr
3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015
$ ./tt test-client
Process capabilities: = cap_fowner,cap_net_raw,cap_setfcap+ep
Capabilities to set in file: = cap_net_raw+ep
$ cat /etc/debian_version
jessie/sid
Maybe perhaps it might have something to do with the capabilities of the user's process (invoking ./tt)? As it says in capabilities(7), Capabilities are a
per-thread attribute.

Using Linux virtual mouse driver

I am trying to implement a virtual mouse driver according to the Essential Linux device Drivers book. There is a user space application, which generates coordinates as well as a kernel module.
See: Virtual mouse driver and userspace application code and also a step by step on how to use this driver.
1.) I compile the code of the user space application and driver.
2.) Next i checked dmesg output and have,
input: Unspecified device as /class/input/input32
Virtual Mouse Driver Initialized
3.) The sysfs node was created properly during initialization (found in /sys/devices/platform/vms/coordinates)
4.) I know that the virtual mouse driver (input32 ) is linked to event5 by checking the following:
$ cat /proc/bus/input/devices
I: Bus=0000 Vendor=0000 Product=0000 Version=0000
N: Name=""
P: Phys=
S: Sysfs=/devices/virtual/input/input32
U: Uniq=
H: Handlers=event5
B: EV=5
B: REL=3
5.) Next i attach a GPM server to the event interface: gpm -m /dev/input/event5 -t evdev
6.) Run the user space application to generate random coordinates for virtual mouse and observe generated coordinates using od -x /dev/input/event5.
And nothing happens. Why?
Also here author mentioned that gdm should be stopped, using /etc/init.d/gdm stop, but i get "no such service" when stopping gdm.
Here is my complete script for building and runing virtual mouse:
make -C /usr/src/kernel/2.6.35.6-45.fc14.i686/ SUBDIRS=$PWD modules
gcc -o app_userspace app_userspace.c
insmod app.ko
gpm -m /dev/input-event5 -t evdev
./app_userspace
Makefile:
obj-m+=app.o
Kernel version: 2.6.35.6
As i said before i can recieve the result through od, but i received it through your program
echo 9 19 > /sys/devices/platform/virmouse/vmevent
gives:
time 1368284298.207654 type 2 code 0 value 9
time 1368284298.207657 type 2 code 1 value 19
time 1368284298.207662 type 0 code 0 value 0
So now the question is: what is wrong with X11? I would like to stress, that i tried this code under two different distributions Ubuntu 11.04 and Fedora 14.
Maybe this will help: in Xorg.0.log i see the following:
[ 21.022] (II) No input driver/identifier specified (ignoring)
[ 272.987] (II) config/udev: Adding input device (/dev/input/event5)
[ 272.987] (II) No input driver/identifier specified (ignoring)
[ 666.521] (II) config/udev: Adding input device (/dev/input/event5)
[ 666.521] (II) No input driver/identifier specified (ignoring)
I spent a huge amount of time, resolving this issue, and i would like to help other people, who run in this problem. I think some outer X11 features interfered my module work. After disabling GDM it now works fine (runlevel 3). Working code you can find here http://fred-zone.blogspot.ru/2010/01/mouse-linux-kernel-driver.html working distro ubuntu 11.04 (gdm disabled)
Try replacing the below lines of code in the input device driver
set_bit(EV_REL, vms_input_dev->evbit);
set_bit(REL_X, vms_input_dev->relbit);
set_bit(REL_Y, vms_input_dev->relbit);
with
vms_input_dev->name = "Virtual Mouse";
vms_input_dev->phys = "vmd/input0"; // "vmd" is the driver's name
vms_input_dev->id.bustype = BUS_VIRTUAL;
vms_input_dev->id.vendor = 0x0000;
vms_input_dev->id.product = 0x0000;
vms_input_dev->id.version = 0x0000;
vms_input_dev->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_REL);
vms_input_dev->keybit[BIT_WORD(BTN_MOUSE)] = BIT_MASK(BTN_LEFT) | BIT_MASK(BTN_RIGHT) | BIT_MASK(BTN_MIDDLE);
vms_input_dev->relbit[0] = BIT_MASK(REL_X) | BIT_MASK(REL_Y);
vms_input_dev->keybit[BIT_WORD(BTN_MOUSE)] |= BIT_MASK(BTN_SIDE) | BIT_MASK(BTN_EXTRA);
vms_input_dev->relbit[0] |= BIT_MASK(REL_WHEEL);
It worked for me on ubuntu 12.04

Resources