trying to understand the sys_socketcall parameter - linux

Can anyone explain what this line exactly does:
socketcall(7,255);
I know, that the command is opening a port on the system, but I don't understand the parameter.
the man-page says
int socketcall(int call, unsigned long *args);
DESCRIPTION
socketcall() is a common kernel entry point for the socket system calls. call determines which socket function to invoke. args points to a block con-
taining the actual arguments, which are passed through to the appropriate call.
User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about
socketcall().
Ok, call 7 is sys_getpeername, but if I take a look in the man-page:
int getpeername(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
getpeername() returns the address of the peer connected to the socket sockfd, in the buffer pointed to by addr. The addrlen argument should be initial-
ized to indicate the amount of space pointed to by addr. On return it contains the actual size of the name returned (in bytes). The name is truncated
if the buffer provided is too small.
The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.
I really don't get it. The function needs 3 parameter. how did the function get the parameter? what means the 255? has anyone an idea how the function is opening a port?

Although Linux has a system call that is commonly called socketcall, the C library does not expose any C function with that name. Normally the standard wrapper functions such as socket() and getpeername() should be used, which will end up calling the system call, but if for some reason it is necessary to call the system call directly then that can be done with syscall(SYS_socketcall, call, args) or using assembly.
In this case the application or a library that it uses (other than the standard C library) has most likely defined its own function called socketcall(), that is unrelated to the system call. You should check that function or its documentation to see what it does.

Related

Understanding how Linux syscall() works

I'm trying to understand what the Linux syscall() function expects to get. I'm looking at the man of the syscall and I can't seem to figure out the amount of parameters and what they represent. In the source code:
extern long int syscall (long int __sysno, ...) __THROW;
Does it mean that it can handle unlimited number of parameters? If not, what which parameter represents?
The second arg ... indicates a variadic function -- one that accepts a variable number of args; common examples are printf() and co. By design, while the number and types of args are unknown to any variadic function, for syscall() the correct arg-count and types are specific to each system call, which is indexed by __sysno and should be a manifest constant like SYS_exit found in a system header.
Although the number of args is mostly unlimited, there are practical limitations, performance considerations, and arch differences; in short, fewer is often better.
Note that variadic functions can be quite versatile. As one example: create your own (error_message + exit) variadic routine that combines an error status as the first arg followed by printf args; see man stdarg and services like vdprintf() and vfprintf().
Dual benefits include more concise source and a smaller .text segment.

The difference between `entry_SYSCALL64_slow_path` and `entry_SYSCALL64_fast_path`

We know that system call will call the function entry_SYSCALL_64 in entry_64.S. When I read the source code, I find there are two different types of call after the prepartion of registers, one is entry_SYSCALL64_slow_path and the other is entry_SYSCALL64_fast_path. Can you tell the difference between the two functions?
Upon entry in entry_SYSCALL_64 Linux will:
Swap gs to get per-cpu parameters.
Set the stack from the parameters of above.
Disable the IRQs.
Create a partial pt_regs structure on the stack. This saves the caller context.
If the current task has _TIF_WORK_SYSCALL_ENTRY or _TIF_ALLWORK_MASK set, it goes to the slow path.
Enter the fast path otherwise.
_TIF_WORK_SYSCALL_ENTRY is defined here with a comment stating:
/*
* work to do in syscall_trace_enter(). Also includes TIF_NOHZ for
* enter_from_user_mode()
*/
_TIF_ALLWORK_MASK does not seems to be defined for x86, a definition for MIPS is here with a comment stating:
/* work to do on any return to u-space */
Fast path
Linux will:
Enable the IRQs.
Check if the syscall number is out of range (note the pt_regs struct was already created with ENOSYS for the value of rax).
Dispatch to the system call with an indirect jump.
Save the return value (rax) of the syscall into rax in the pt_regs on the stack.
Check again if _TIF_ALLWORK_MASK is set for the current task, if it is it will jump to the slow return path.
Restore the caller context and issue a sysret.
Slow return path
Save the registers not saved before in pt_regs (rbx, rbp, r12-r15).
Call syscall_return_slowpath, defined here.
Note that point 2 will end up calling trace_sys_exit.
Slow path
Save the registers not saved before in pt_regs (see above)
Call do_syscall_64, defined here.
Point 2 will call syscall_trace_enter.
So the slow vs fast path has to do with ptrace. I haven't dug into the code but I suppose the whole machinery is skipped if ptrace is not needed for the caller.
This is indeed an important optimization.

Is there a way to access and modify user data from system call in Minix 3? Can I use sys_datacopy() here? Why is my attempt not working?

I want to implement a syscall in the PM server in Minix that has access to some data in the user space, and can modify it.
I am passing data to the syscall using Minix's message passing mechanism. In the message structure that is being passed, I assign one of the pointers to the address of the variable from the user space that I want to pass.
For example, in the user program,
message m;
m.m1_p1 = &var; //data to be passed
//pass it to the syscall
In the kernel, in the syscall function, I do
char *ptr = m_in.m1_p1;
However, when I try to either read or write the data, I get an error that the kernel has panicked, and needs a reboot.
I realise that this is probably because in user space, virtual addresses specific to the user are being used, which is not recognizable in the syscall.
On searching further, I found that Linux has functions copy_from_user() and copy_to_user() to achieve this.
Is there an equivalent of this is Minix? If not, is there any other way to achieve this?
With the help of #osgx's suggestion in the comments, I have tried using sys_datacopy(). While this allows me to read and write the data in the system call, the changes I make are not reflected back into the user program that called the system call.
My latest attempt is as follows:
In the user program,
message m;
m.m1_p1 = &var; //data to be passed
printf("%c\n",*(m.m1_p1)); //gives the value in var
//pass it to the syscall
printf("%c\n",var); //gives the old value of var
Inside the syscall,
char *ptr = (char*)malloc(sizeof(char));
sys_datacopy(who_e,(vir_bytes)(m_in.m1_p1),SELF,(vir_bytes)(ptr),sizeof(char*)); //or some other version of sys_vircopy()?
printf("Read value of ptr : %c\n",*ptr); //gives correct value
*ptr = //new value
printf("New value of ptr : %c\n",*ptr); //gives modified value
Here, now I can access the value of var inside the syscall using ptr, and modify it inside the syscall as well. However, after returning from the syscall, I observe that the underlying value of `var' has not changed.
As per my understanding, what should have happened is that the sys_datacopy() should have copied an equivalent virtual of m_in.m1_p1 that lies in the address space of the syscall to ptr, that points to the same physical address. So, *ptrshould exactly reach var, thus modifying it.
Or is it that the data corresponding to the address is copied, when I use sys_datacopy()? If this is the case, one solution I can think of is defining a message structure that allows double pointers, and passing an char** to the syscall. Then, dereferencing once will ensure that the address is copied to ptr. But then again, dereferencing ptr will attempt to dereference a virtual address that belongs to the user process's address space, which will not work.
Why is this method not working? What is the correct way to achieve this?
I am using Minix 3.2.1.
Thank you.

Modifying IOCTL function call (where is the definition of ioctl) to flip GPIO pins

I want to know as to where IOCTL is defined. I want to flip the state of a GPIO pin from within the IOCTL function call. I am using Yocto linux.
The ioctl requests are defined per driver. For the new chardev GPIO this is defined in <linux/gpio.h>.
The logic these values are encoded is in <asm/ioctl.h>. Please note, that this is platform dependent (e.g. MIPS is different than x86 and x86_64).
If you are interested, here is the logic ported to rust: https://docs.rs/nix/0.11.0/src/nix/sys/ioctl/linux.rs.html
However in practice you shouldn't need to convert these request codes on your own. You would just include <linux/gpio.h> and then you can use the defined IOCTL request codes like GPIOHANDLE_GET_LINE_VALUES_IOCTL. Here are some example implementations: https://github.com/torvalds/linux/tree/master/tools/gpio
ioctl is a c-language kernel function declared in <sys/ioctl.h>. See the linux manual page.
Here's a copy of the upper portion:
NAME
ioctl - control device
SYNOPSIS
#include <sys/ioctl.h>
int ioctl(int fd, unsigned long request, ...);
DESCRIPTION
The ioctl() function manipulates the underlying device parameters of
special files. In particular, many operating characteristics of
character special files (e.g., terminals) may be controlled with
ioctl() requests. The argument fd must be an open file descriptor.
The second argument is a device-dependent request code. The third
argument is an untyped pointer to memory. It's traditionally char
*argp (from the days before void * was valid C), and will be so named
for this discussion.
An ioctl() request has encoded in it whether the argument is an in
parameter or out parameter, and the size of the argument argp in
bytes. Macros and defines used in specifying an ioctl() request are
located in the file <sys/ioctl.h>.

socket descriptor vs file descriptor

read(2) and write(2) works both on socket descriptor as well as on file descriptor. In case of file descriptor, User file descriptor table->file table and finally to inode table where it checks for the file type(regular file/char/block), and reads accordingly. In case of char spl file, it gets the function pointers based on the major number of the file from the char device switch and calls the appropriate read/write routines registered for the device.
Similarly appropriate read/write routine is called for block special file by getting the function pointers from the block device switch.
Could you please let me know what exatly happens when read/write called on socket descriptor. If read/write works on socket descriptor, we cant we use open instead of socket to get the descriptor?
As i know in memory, the file descriptor will contains flag to identify the file-system type of this fd. The kernel will invoke corresponding handler function depends on the file-system type. You can see the source read_write.c in linux kernel.
To be speak in brief, the kernel did:
In read-write.c, there is a file_system_wrapper function, that call corresponding handler function depends on fd's file type (ext2/ ext3/ socket/ ..)
In socket.c, there is a socket_type_wrapper function; that calls corresponding socket handler function depends on socket's type (ipv4, ipv6, atm others)
In socket_ipv4.c, there is a protocol_type wrapper function; that calls corresponding protocol handler function depends on protocol tpye (udp/ tcp)
In tcp_ip4.c; there is tcp_sendmsg and this function would be called when write to FD of tcp ipv4 type.
Hope this clearly,
thanks,
Houcheng
Socket descriptors are associated with file structures too, but a set of file_operations functions for that structures differs from the usual. Initialization and use of those descriptors are therefore different. Read and write part of kernel-level interface just happened to be exactly equivalent.
read and write are valid for some types of sockets in some states; this all depends on the various structs which are passed around inside the kernel.
In principle, open() could create a socket descriptor, but the BSD sockets API was never defined that way.
There are some other (Somewhat linux-specific) types of file descriptor which are opened by system calls other than open(), for example epoll_create or timerfd_create. These work the same.

Resources