What is the callee token, actual argc and descriptor (on the stack)? - spidermonkey

Often see the picture of stack like this (it can be another forms but the essence is like):
...
args
...
thisv
actual argc
callee token
descriptor
return address
What is the callee token, descriptor and actual argc?

Related

The difference between `entry_SYSCALL64_slow_path` and `entry_SYSCALL64_fast_path`

We know that system call will call the function entry_SYSCALL_64 in entry_64.S. When I read the source code, I find there are two different types of call after the prepartion of registers, one is entry_SYSCALL64_slow_path and the other is entry_SYSCALL64_fast_path. Can you tell the difference between the two functions?
Upon entry in entry_SYSCALL_64 Linux will:
Swap gs to get per-cpu parameters.
Set the stack from the parameters of above.
Disable the IRQs.
Create a partial pt_regs structure on the stack. This saves the caller context.
If the current task has _TIF_WORK_SYSCALL_ENTRY or _TIF_ALLWORK_MASK set, it goes to the slow path.
Enter the fast path otherwise.
_TIF_WORK_SYSCALL_ENTRY is defined here with a comment stating:
/*
* work to do in syscall_trace_enter(). Also includes TIF_NOHZ for
* enter_from_user_mode()
*/
_TIF_ALLWORK_MASK does not seems to be defined for x86, a definition for MIPS is here with a comment stating:
/* work to do on any return to u-space */
Fast path
Linux will:
Enable the IRQs.
Check if the syscall number is out of range (note the pt_regs struct was already created with ENOSYS for the value of rax).
Dispatch to the system call with an indirect jump.
Save the return value (rax) of the syscall into rax in the pt_regs on the stack.
Check again if _TIF_ALLWORK_MASK is set for the current task, if it is it will jump to the slow return path.
Restore the caller context and issue a sysret.
Slow return path
Save the registers not saved before in pt_regs (rbx, rbp, r12-r15).
Call syscall_return_slowpath, defined here.
Note that point 2 will end up calling trace_sys_exit.
Slow path
Save the registers not saved before in pt_regs (see above)
Call do_syscall_64, defined here.
Point 2 will call syscall_trace_enter.
So the slow vs fast path has to do with ptrace. I haven't dug into the code but I suppose the whole machinery is skipped if ptrace is not needed for the caller.
This is indeed an important optimization.

Is there a way to access and modify user data from system call in Minix 3? Can I use sys_datacopy() here? Why is my attempt not working?

I want to implement a syscall in the PM server in Minix that has access to some data in the user space, and can modify it.
I am passing data to the syscall using Minix's message passing mechanism. In the message structure that is being passed, I assign one of the pointers to the address of the variable from the user space that I want to pass.
For example, in the user program,
message m;
m.m1_p1 = &var; //data to be passed
//pass it to the syscall
In the kernel, in the syscall function, I do
char *ptr = m_in.m1_p1;
However, when I try to either read or write the data, I get an error that the kernel has panicked, and needs a reboot.
I realise that this is probably because in user space, virtual addresses specific to the user are being used, which is not recognizable in the syscall.
On searching further, I found that Linux has functions copy_from_user() and copy_to_user() to achieve this.
Is there an equivalent of this is Minix? If not, is there any other way to achieve this?
With the help of #osgx's suggestion in the comments, I have tried using sys_datacopy(). While this allows me to read and write the data in the system call, the changes I make are not reflected back into the user program that called the system call.
My latest attempt is as follows:
In the user program,
message m;
m.m1_p1 = &var; //data to be passed
printf("%c\n",*(m.m1_p1)); //gives the value in var
//pass it to the syscall
printf("%c\n",var); //gives the old value of var
Inside the syscall,
char *ptr = (char*)malloc(sizeof(char));
sys_datacopy(who_e,(vir_bytes)(m_in.m1_p1),SELF,(vir_bytes)(ptr),sizeof(char*)); //or some other version of sys_vircopy()?
printf("Read value of ptr : %c\n",*ptr); //gives correct value
*ptr = //new value
printf("New value of ptr : %c\n",*ptr); //gives modified value
Here, now I can access the value of var inside the syscall using ptr, and modify it inside the syscall as well. However, after returning from the syscall, I observe that the underlying value of `var' has not changed.
As per my understanding, what should have happened is that the sys_datacopy() should have copied an equivalent virtual of m_in.m1_p1 that lies in the address space of the syscall to ptr, that points to the same physical address. So, *ptrshould exactly reach var, thus modifying it.
Or is it that the data corresponding to the address is copied, when I use sys_datacopy()? If this is the case, one solution I can think of is defining a message structure that allows double pointers, and passing an char** to the syscall. Then, dereferencing once will ensure that the address is copied to ptr. But then again, dereferencing ptr will attempt to dereference a virtual address that belongs to the user process's address space, which will not work.
Why is this method not working? What is the correct way to achieve this?
I am using Minix 3.2.1.
Thank you.

trying to understand the sys_socketcall parameter

Can anyone explain what this line exactly does:
socketcall(7,255);
I know, that the command is opening a port on the system, but I don't understand the parameter.
the man-page says
int socketcall(int call, unsigned long *args);
DESCRIPTION
socketcall() is a common kernel entry point for the socket system calls. call determines which socket function to invoke. args points to a block con-
taining the actual arguments, which are passed through to the appropriate call.
User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about
socketcall().
Ok, call 7 is sys_getpeername, but if I take a look in the man-page:
int getpeername(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
getpeername() returns the address of the peer connected to the socket sockfd, in the buffer pointed to by addr. The addrlen argument should be initial-
ized to indicate the amount of space pointed to by addr. On return it contains the actual size of the name returned (in bytes). The name is truncated
if the buffer provided is too small.
The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.
I really don't get it. The function needs 3 parameter. how did the function get the parameter? what means the 255? has anyone an idea how the function is opening a port?
Although Linux has a system call that is commonly called socketcall, the C library does not expose any C function with that name. Normally the standard wrapper functions such as socket() and getpeername() should be used, which will end up calling the system call, but if for some reason it is necessary to call the system call directly then that can be done with syscall(SYS_socketcall, call, args) or using assembly.
In this case the application or a library that it uses (other than the standard C library) has most likely defined its own function called socketcall(), that is unrelated to the system call. You should check that function or its documentation to see what it does.

Decoding ptrace Registers

I'm wondering where in the contents/members of
`struct user_regs_struct ur`
which is filled in by a call to
ptrace(PTRACE_GETREGS, pid, 0, &ur); // get registers
I can extract the information about whether a traced child process syscall is currently entering or exiting. After some testing my conclusion so far is that the lower 8-bits of struct member ax is equal to 0xda for exits. I'm however unsure whether this is a correct general conclusion so I'm ask whether this is documented anywhere.
In the strace sources I have found the bitmask define TCB_INSYSCALL which is probably what I'm looking for but I can't decode where the source of the information (some member of user_regs_struct?) is read.

Initial state of program registers and stack on Linux ARM

I'm currently playing with ARM assembly on Linux as a learning exercise. I'm using 'bare' assembly, i.e. no libcrt or libgcc. Can anybody point me to information about what state the stack-pointer and other registers will at the start of the program before the first instruction is called? Obviously pc/r15 points at _start, and the rest appear to be initialised to 0, with two exceptions; sp/r13 points to an address far outside my program, and r1 points to a slightly higher address.
So to some solid questions:
What is the value in r1?
Is the value in sp a legitimate stack allocated by the kernel?
If not, what is the preferred method of allocating a stack; using brk or allocate a static .bss section?
Any pointers would be appreciated.
Since this is Linux, you can look at how it is implemented by the kernel.
The registers seem to be set by the call to start_thread at the end of load_elf_binary (if you are using a modern Linux system, it will almost always be using the ELF format). For ARM, the registers seem to be set as follows:
r0 = first word in the stack
r1 = second word in the stack
r2 = third word in the stack
sp = address of the stack
pc = binary entry point
cpsr = endianess, thumb mode, and address limit set as needed
Clearly you have a valid stack. I think the values of r0-r2 are junk, and you should instead read everything from the stack (you will see why I think this later). Now, let's look at what is on the stack. What you will read from the stack is filled by create_elf_tables.
One interesting thing to notice here is that this function is architecture-independent, so the same things (mostly) will be put on the stack on every ELF-based Linux architecture. The following is on the stack, in the order you would read it:
The number of parameters (this is argc in main()).
One pointer to a C string for each parameter, followed by a zero (this is the contents of argv in main(); argv would point to the first of these pointers).
One pointer to a C string for each environment variable, followed by a zero (this is the contents of the rarely-seen envp third parameter of main(); envp would point to the first of these pointers).
The "auxiliary vector", which is a sequence of pairs (a type followed by a value), terminated by a pair with a zero (AT_NULL) in the first element. This auxiliary vector has some interesting and useful information, which you can see (if you are using glibc) by running any dynamically-linked program with the LD_SHOW_AUXV environment variable set to 1 (for instance LD_SHOW_AUXV=1 /bin/true). This is also where things can vary a bit depending on the architecture.
Since this structure is the same for every architecture, you can look for instance at the drawing on page 54 of the SYSV 386 ABI to get a better idea of how things fit together (note, however, that the auxiliary vector type constants on that document are different from what Linux uses, so you should look at the Linux headers for them).
Now you can see why the contents of r0-r2 are garbage. The first word in the stack is argc, the second is a pointer to the program name (argv[0]), and the third probably was zero for you because you called the program with no arguments (it would be argv[1]). I guess they are set up this way for the older a.out binary format, which as you can see at create_aout_tables puts argc, argv, and envp in the stack (so they would end up in r0-r2 in the order expected for a call to main()).
Finally, why was r0 zero for you instead of one (argc should be one if you called the program with no arguments)? I am guessing something deep in the syscall machinery overwrote it with the return value of the system call (which would be zero since the exec succeeded). You can see in kernel_execve (which does not use the syscall machinery, since it is what the kernel calls when it wants to exec from kernel mode) that it deliberately overwrites r0 with the return value of do_execve.
Here's what I use to get a Linux/ARM program started with my compiler:
/** The initial entry point.
*/
asm(
" .text\n"
" .globl _start\n"
" .align 2\n"
"_start:\n"
" sub lr, lr, lr\n" // Clear the link register.
" ldr r0, [sp]\n" // Get argc...
" add r1, sp, #4\n" // ... and argv ...
" add r2, r1, r0, LSL #2\n" // ... and compute environ.
" bl _estart\n" // Let's go!
" b .\n" // Never gets here.
" .size _start, .-_start\n"
);
As you can see, I just get the argc, argv, and environ stuff from the stack at [sp].
A little clarification: The stack pointer points to a valid area in the process' memory. r0, r1, r2, and r3 are the first three parameters to the function being called. I populate them with argc, argv, and environ, respectively.
Here's the uClibc crt. It seems to suggest that all registers are undefined except r0 (which contains a function pointer to be registered with atexit()) and sp which contains a valid stack address.
So, the value you see in r1 is probably not something you can rely on.
Some data are placed on the stack for you.
I've never used ARM Linux but I suggest you either look at the source for the libcrt and see what they do, or use gdb to step into an existing executable. You shouldn't need the source code just step through the assembly code.
Everything you need to find out should happen within the very first code executed by any binary executable.
Hope this helps.
Tony

Resources