Linux equivalent of FreeBSD's cpu_set_syscall_retval() - linux

The title pretty much says it all. Looking for the Linux equivalent of cpu_set_syscall_retval() found in /usr/src/sys/amd64/amd64/vm_machdep.c. Not sure if there is even such a thing in Linux but I thought I'd ask anyway.
cpu_set_syscall_retval(struct thread *td, int error)
{
switch (error) {
case 0:
td->td_frame->tf_rax = td->td_retval[0];
td->td_frame->tf_rdx = td->td_retval[1];
td->td_frame->tf_rflags &= ~PSL_C;
break;
case ERESTART:
/*
* Reconstruct pc, we know that 'syscall' is 2 bytes,
* lcall $X,y is 7 bytes, int 0x80 is 2 bytes.
* We saved this in tf_err.
* %r10 (which was holding the value of %rcx) is restored
* for the next iteration.
* %r10 restore is only required for freebsd/amd64 processes,
* but shall be innocent for any ia32 ABI.
*/
td->td_frame->tf_rip -= td->td_frame->tf_err;
td->td_frame->tf_r10 = td->td_frame->tf_rcx;
break;
case EJUSTRETURN:
break;
default:
if (td->td_proc->p_sysent->sv_errsize) {
if (error >= td->td_proc->p_sysent->sv_errsize)
error = -1; /* XXX */
else
error = td->td_proc->p_sysent->sv_errtbl[error];
}
td->td_frame->tf_rax = error;
td->td_frame->tf_rflags |= PSL_C;
break;
}
}

There's no way to do the equivalent in linux. The return value of system calls is propagated via return value from whatever functions are called internally to implement the function all the way back to user-mode. The general convention is that a non-negative return value means success and a negative value indicates an error (with the errno being the negated return value: for example, a "-2" indicates an error with an errno value of 2 [ENOENT]).
You could look up the stored register values that will be popped on return to user-mode and replace one of them (what the BSD code here is doing), but the critical one that contains the return value will just be overwritten by the normal return-from-system-call path anyway, just prior to returning to user mode.

Related

mutex unlocking and request_module() behaviour

I've observed the following code pattern in the Linux kernel, for example net/sched/act_api.c or many other places as well :
rtnl_lock();
rtnetlink_rcv_msg(skb, ...);
replay:
ret = process_msg(skb);
...
/* try to obtain symbol which is in module. */
/* if fail, try to load the module, otherwise use the symbol */
a = get_symbol();
if (a == NULL) {
rtnl_unlock();
request_module();
rtnl_lock();
/* now verify that we can obtain symbols from requested module and return EAGAIN.*/
a = get_symbol();
module_put();
return -EAGAIN;
}
...
if (ret == -EAGAIN)
goto replay;
...
rtnl_unlock();
After request_module has succeeded, the symbol we are interested in, becomes available in kernel memory space, and we can use it. However I don't understand why return EAGAIN and re-read the symbol, why can't just continue right after request_module()?
If you look at the current implementation in the Linux kernel, there is a comment right after the 2nd call equivalent to get_symbol() in your above code (it is tc_lookup_action_n()) that explains exactly why:
rtnl_unlock();
request_module("act_%s", act_name);
rtnl_lock();
a_o = tc_lookup_action_n(act_name);
/* We dropped the RTNL semaphore in order to
* perform the module load. So, even if we
* succeeded in loading the module we have to
* tell the caller to replay the request. We
* indicate this using -EAGAIN.
*/
if (a_o != NULL) {
err = -EAGAIN;
goto err_mod;
}
Even though the module could be requested and loaded, since the semaphore was dropped in order to load the module which is an operation that can sleep (and is not the "standard way" this function is executed, the function returns EAGAIN to signal it.
EDIT for clarification:
If we look at the call sequence when a new action is added (which could cause a required module to be loaded) we have this sequence: tc_ctl_action() -> tcf_action_add() -> tcf_action_init() -> tcf_action_init_1().
Now if "move back" the EAGAIN error back up to tc_ctl_action() in the case RTM_NEWACTION:, we see that with the EAGAIN ret value the call to tcf_action_add is repeated.

How can I get the value of a registry key in c++ without an access violation?

Hey I'm new to C++ and I am trying to find out if a specified registry index exists. I have to check multiple locations due to the possibility of the software being run on a 64bit machine and being under the WOW6432Node key instead of the usual position. When RegQueryValueExA (using visual c++ 6.0 on xp so I can't use a newer function) is run it should return a Boolean of true if the key exists, (I'll deal with getting the value of the key later). However on run it generates access violation 0xc00005. Any ideas whats gone wrong?
bool FindAndRemoveUninstall(string path){
bool result;
result = RegQueryValueExA(HKEY_LOCAL_MACHINE,
TEXT("SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ABC"), NULL, NULL, NULL, (unsigned long *)MAX_PATH);
if (result= ERROR_SUCCESS){
cout <<" is a 32 bit program\n";
//path= Value in key
}
result = RegQueryValueEx(HKEY_LOCAL_MACHINE,
TEXT("SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ABC"), NULL, NULL, NULL, (unsigned long *)MAX_PATH);
if (result= ERROR_SUCCESS){
cout << " is 64 bit program\n";
//path= Value in key
}
return true;
}
You have multiple problems.
The last parameter to RegQueryValueExA is documented as
lpcbData [in, out, optional]
A pointer to a variable that specifies the size of the buffer pointed to by the lpData parameter,
But you are not passing a pointer to a variable. You are passing (unsigned long *)MAX_PATH, which is a garbage pointer. When the operating system tries to store the result into the pointer, it takes an access violation. You need to pass a pointer to a variable, like the documentation says.
The next problem is that you are calling the A function (explicit ANSI) but using the TEXT macro (adaptive character set). Make up your mind which model you are using (ANSI or adaptive) and choose one model or the other. Let's assume you explicit ANSI.
The next problem is that you didn't specify an output buffer, so you don't actually retrieve the path.
Another problem is that the RegQueryValueExA function does not return a bool; it returns an error code.
Yet another problem is that your if test contains an assignment, so it does not actually test anything.
Another problem is that you didn't specify a way for the function to return the path to the caller. Let's assume you want the result to be returned in the path parameter.
Yet another problem is that you have the 32-bit and 64-bit cases reversed.
Also, you are using '\n' instead of std::endl.
The eight problem is that your function returns true even if it didn't do anything.
And the ninth problem is that the function says FindAndRemove, and it finds, but doesn't remove.
bool FindUninstall(string& path){ // parameter passed by reference, fix function name
LONG result; // change variable type
char buffer[MAX_PATH]; // provide an output buffer
DWORD bufferSize = MAX_PATH; // and a variable to specify the buffer size / receive the data size
result = RegQueryValueExA(HKEY_LOCAL_MACHINE,
"SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ABC", NULL, NULL, (LPBYTE)buffer, &bufferSize); // remove TEXT macro, pass the buffer and buffer size
if (result== ERROR_SUCCESS){ // fix comparison
cout <<" is a 64 bit program" << std::endl; // fix message
path = buffer;
return true; // stop once we have an answer
}
buffersize = MAX_PATH; // reset for next query
result = RegQueryValueEx(HKEY_LOCAL_MACHINE,
"SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ABC", NULL, NULL, (LPBYTE)buffer, &bufferSize); // remove TEXT macro, pass the buffer and buffer size
if (result== ERROR_SUCCESS){ // fix comparison
cout << " is 32 bit program" << std::endl; // fix message
path = buffer;
return true; // stop once we have an answer
}
return false; // nothing found
}
Since you are new to C++, I would recommend that you get some experience with C++ doing simpler projects before diving into more complicated things like this.

How to set errno in Linux device driver?

I am designing a Linux character device driver. I want to set errno when error occurs in ioctl() system call.
long my_own_ioctl(struct file *file, unsigned int req, unsigned long arg)
{
long ret = 0;
BOOL isErr = FALSE;
// some operation
// ...
if (isErr) {
// set errno
// ... <--- What should I do?
ret = -1;
}
return ret;
}
What should I do to achieve that? Thank you at advance!
Please allow me to explain my application with more detail.
My device is located in /dev/myCharDev. My user space application is like this:
#define _COMMAND (1)
#define _ERROR_COMMAND_PARAMETER (-1)
int main()
{
int fd = open("/dev/myCharDec", O_RDONLY);
int errnoCopy;
if (fd) {
if (ioctl(fd, _COMMAND, _ERROR_COMMAND_PARAMETER) < 0) { // should cause error in ioctl()
errnoCopy = errno;
printf("Oops, error occurred: %s\n", strerr(errnoCopy)); // I want this "errno" printed correctly
}
close(fd);
}
return 0;
}
As I mentioned in the comments above, How should I set the "errno" in my own device driver codes and make it readable by user space application?
Nice question!
Ok, you could think of errno as global variable (to be honnest, it is an extern int). errno has plenty of pre-defined macros for errorcodes in the errno.h library. You can have a look here. It is very likely that some of these errorcodes describe what you want to show. Pick up the right one, set it like if it was a variable you defined, and (important!) exit immediately!
You may ask yourself though if setting errno is the right approach to your problem. You can always define an (*int) and develop your own error codes, and error handling mechanism. Errno's purpose is to show and explain system errors. Do u consider your code part of the "system" (as I can see you develop your own system call, so this might be the case) ? So go on and use errno to explain your "system error".
Edit (On question update): Ok more info. As i said errno is an extern int and is set by the kernel. The value at which errno is set is simply the return value of the system call. Linux kernel then interprets this negative value through the library errno.h. So an example error message is set simply by returning (EBUSY is just an example - you can use all of the predifined error types) the error message you want from your system call. Example:
return -EBUSY
Hope it helps
Return the negative error number from the ioctl. The c library interprets this and gives a -1 return code and sets errno to the positive error. For instance your original example will set errno to 1.
As an aside your prototype for an ioctl function in the kernel looks wrong. Which kernel version are you using?
if (isErr)
{
printk(KERN_ALERT "Error %d: your description\n", errno);
ret = errno;
}
where, errno is the return value of some function.
Your device driver should always return a status for a request it received.
It is advocated you always use enumerated return codes as well as normal return
codes. Returning 0 = pass 1 or -1 = failed is vague and could be misleading.
Read section 3.1 Efficient error handling, reporting and recovery: for more information

How does seccomp-bpf filter syscalls?

I'm investigating the implementation detail of seccomp-bpf, the syscall filtration mechanism that was introduced into Linux since version 3.5.
I looked into the source code of kernel/seccomp.c from Linux 3.10 and want to ask some questions about it.
From seccomp.c, it seems that seccomp_run_filters() is called from __secure_computing() to test the syscall called by the current process.
But looking into seccomp_run_filters(), the syscall number that is passed as an argument is not used anywhere.
It seems that sk_run_filter() is the implementation of BPF filter machine, but sk_run_filter() is called from seccomp_run_filters() with the first argument (the buffer to run the filter on) NULL.
My question is: how can seccomp_run_filters() filter syscalls without using the argument?
The following is the source code of seccomp_run_filters():
/**
* seccomp_run_filters - evaluates all seccomp filters against #syscall
* #syscall: number of the current system call
*
* Returns valid seccomp BPF response codes.
*/
static u32 seccomp_run_filters(int syscall)
{
struct seccomp_filter *f;
u32 ret = SECCOMP_RET_ALLOW;
/* Ensure unexpected behavior doesn't result in failing open. */
if (WARN_ON(current->seccomp.filter == NULL))
return SECCOMP_RET_KILL;
/*
* All filters in the list are evaluated and the lowest BPF return
* value always takes priority (ignoring the DATA).
*/
for (f = current->seccomp.filter; f; f = f->prev) {
u32 cur_ret = sk_run_filter(NULL, f->insns);
if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
}
return ret;
}
When a user process enters the kernel, the register set is stored to a kernel variable.
The function sk_run_filter implements the interpreter for the filter language. The relevant instruction for seccomp filters is BPF_S_ANC_SECCOMP_LD_W. Each instruction has a constant k, and in this case it specifies the index of the word to be read.
#ifdef CONFIG_SECCOMP_FILTER
case BPF_S_ANC_SECCOMP_LD_W:
A = seccomp_bpf_load(fentry->k);
continue;
#endif
The function seccomp_bpf_load uses the current register set of the user thread to determine the system call information.

Linux Terminal Problem with Non-Canonical Terminal I/O app

I have a small app written in C designed to run on Linux. Part of the app accepts user-input from the keyboard, and it uses non-canonical terminal mode so that it can respond to each keystroke.
The section of code that accepts input is a simple function which is called repeatedly in a loop:
char get_input()
{
char c = 0;
int res = read(input_terminal, &c, 1);
if (res == 0) return 0;
if (res == -1) { /* snip error handling */ }
return c;
}
This reads a single character from the terminal. If no input is received within a certain timeframe, (specified by the c_cc[VTIME] value in the termios struct), read() returns 0, and get_input() is called again.
This all works great, except I recently discovered that if you run this app in a terminal window, and then close the terminal window without terminating the app, the app does not exit but launches into a CPU intensive infinite loop, where read() continuously returns 0 without waiting.
So how can I have the app exit gracefully if it is run from a terminal window, and then the terminal window is closed? The problem is that read() never returns -1, so the error condition is indistinguishable from a normal case where read() returns 0. So the only solution I see is to put in a timer, and assume there is an error condition if read returns 0 faster than the time specified in c_cc[V_TIME]. But that solution seems hacky at best, and I was hoping there is some better way to handle this situation.
Any ideas or suggestions?
Are you catching signals and resetting things before your program exits? I think SIGHUP is the one you need to focus on. Possibly set a switch in the signal handler, if switch is on when returning from read() clean up and exit.
You should handle timeout with select rather than with terminal settings. If the terminal is configured without timeout, then it will never return 0 on a read except on EOF.
Select gives you the timeout, and read gives you the 0 on close.
rc = select(...);
if(rc > 0) {
char c = 0;
int res = read(input_terminal, &c, 1);
if (res == 0) {/* EOF detected, close your app ?*/}
if (res == -1) { /* snip error handling */ }
return c;
} else if (rc == 0) {
/* timeout */
return 0;
} else {
/* handle select error */
}
Read should return 0 on EOF. I.e. it will read nothing successfully.
Your function will return 0 in that case!
What you should do is compare value returned from read with 1 and process exception.
I.e. you asked for one, but did you get one?
You will probably want to handle errno==EINTR if -1 is returned.
char get_input()
{
char c = 0;
int res = read(input_terminal, &c, 1);
switch(res) {
case 1:
return c;
case 0:
/* EOF */
case -1:
/* error */
}
}

Resources