ff_replay substructure in ff_effect empty - linux

I am developing a force feedback driver (linux) for a yet unsupported gamepad.
Whenever a application in userspace requests a ff-effect (e.g rumbling), a function in my driver is called:
static int foo_ff_play(struct input_dev *dev, void *data, struct ff_effect *effect)
this is set by the following code inside my init function:
input_set_capability(dev, EV_FF, FF_RUMBLE);
input_ff_create_memless(dev, NULL, foo_ff_play);
I'm accessing the ff_effect struct (which is passed to my foo_ff_play function) like this:
static int foo_ff_play(struct input_dev *dev, void *data, struct ff_effect *effect)
{
u16 length;
length = effect->replay.length;
printk(KERN_DEBUG "length: %i", length);
return 0;
}
The problem is, that the reported length (in ff_effect->replay) is always zero.
That's confusing, since i am running fftest on my device, and fftest definitely sets the length attribute: https://github.com/flosse/linuxconsole/blob/master/utils/fftest.c (line 308)
/* a strong rumbling effect */
effects[4].type = FF_RUMBLE;
effects[4].id = -1;
effects[4].u.rumble.strong_magnitude = 0x8000;
effects[4].u.rumble.weak_magnitude = 0;
effects[4].replay.length = 5000;
effects[4].replay.delay = 1000;
Does this have something to do with the "memlessness"? Why does the data in ff_replay seem to be zero if it isn't?
Thank you in advance

Why is the replay struct empty?
Taking a look at https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L406 we find:
static void ml_play_effects(struct ml_device *ml)
{
struct ff_effect effect;
DECLARE_BITMAP(handled_bm, FF_MEMLESS_EFFECTS);
memset(handled_bm, 0, sizeof(handled_bm));
while (ml_get_combo_effect(ml, handled_bm, &effect))
ml->play_effect(ml->dev, ml->private, &effect);
ml_schedule_timer(ml);
}
ml_get_combo_effect sets the effect by calling ml_combine_effects., but ml_combine_effects simply does not copy replay.length to the ff_effect struct which is passed to our foo_play_effect (at least not if the effect-type is FF_RUMBLE): https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L286
That's why we cannot read out the ff_replay-data in our foo_play_effect function.
Okay, replay is empty - how can we determine how long we have to play the effect (e.g. FF_RUMBLE) then?
Looks like the replay structure is something we do not even need to carry about. Yes, fftest sets the length and then uploads the effect to the driver, but if we take a look at ml_ff_upload (https://elixir.free-electrons.com/linux/v4.4/source/drivers/input/ff-memless.c#L481), we can see the following:
if (test_bit(FF_EFFECT_STARTED, &state->flags)) {
__clear_bit(FF_EFFECT_PLAYING, &state->flags);
state->play_at = jiffies +
msecs_to_jiffies(state->effect->replay.delay);
state->stop_at = state->play_at +
msecs_to_jiffies(state->effect->replay.length);
state->adj_at = state->play_at;
ml_schedule_timer(ml);
}
That means that the duration is already handled by the input-subsystem. It starts the effect and also stops it as needed.
Furthermore we can see at https://elixir.free-electrons.com/linux/v4.4/source/include/uapi/linux/input.h#L279 that
/*
* All duration values are expressed in ms. Values above 32767 ms (0x7fff)
* should not be used and have unspecified results.
*/
That means that we have to make our effect play at least 32767ms. Everything else (stopping the effect before) is up to the scheduler - which is not our part :D

Related

Dynamic memory in a struct

So, my struct is like this:
struct player{
char name[20];
int time;
}s[50];
I don't know how many players i am going to add to the struct, and i also have to use dynamic memory for this. So how can i allocate and reallocate more space when i add a player to my struct?
I am inexperienced programmer, but i have been googling for this for a long time and i also don't perfectly understand structs.
This site doesn't accept my question so let's put some more text to this post
I assume you are programming in C/C++.
Your struct player has static-allocated fields, so, when you use malloc on it, you are asking space for a 20-byte char array a for one integer.
My suggestion is to store in a variable (or #define a symbol) the initial number of structures you can accept. Then, use malloc to allocate a static array that contains these structures.
Also you have to think about a strategy to store new players coming. The simplest one could be have an index variable to store last free position and use it to add over that position.
A short example follows:
#define init_cap 50
struct player {
char name[20];
int time;
};
int main() {
int index;
struct player* players;
players = (struct player*) malloc(init_cap * sizeof(struct player));
for(i = 0; i < init_cap; i++) {
strcpy(players[i].name, "peppe");
players[i].time = i;
}
free(players);
return 0;
}
At this point you should also think about reallocating memory if the number of players you get a runtime exceeds your initial capacity. You can use:
players = (struct player*) realloc(2 * init_cap * sizeof(struct player));
in order to double the initial capacity.
At the end, always remember to free the requested memory.

Linux: How to mmap a sequence of physically contiguous areas into user space?

In my driver I have certain number of physically contiguous DMA buffers (e.g. 4MB long each) to receive data from a device. They are handled by hardware using the SG list. As the received data will be subjected to intensive processing, I don't want to switch off cache and I will use dma_sync_single_for_cpu after each buffer is filled by DMA.
To simplify data processing, I want those buffers to appear as a single huge, contiguous, circular buffer in the user space.
In case of a single buffer I simply use remap_pfn_range or dma_mmap_coherent. However, I can't use those functions multiple times to map consecutive buffers.
Of course, I can implement the fault operation in the vm_operations so that it finds the pfn of the corresponding page in the right buffer, and inserts it into the vma with vm_insert_pfn.
The acquisition will be really fast, so I can't handle mapping when the real data arrive. But this can be solved easily. To have all mapping ready before the data acquisition starts, I can simply read the whole mmapped buffer in my application before starting the acquisition, so that all pages are already inserted when the first data arrive.
Tha fault based trick should work, but maybe there is something more elegant? Just a single function, that may be called multiple times to build the whole mapping incrementally?
Additional difficulty is that the solution should be applicable (with minimal adjustments) to kernels starting from 2.6.32 to the newest one.
PS. I have seen that annoying post. Is there a danger that if the application attempts to write something to the mmapped buffer (just doing the in place processing of data), my carefully built mapping will be destroyed by COW?
Below is my solution that works for buffers allocated with dmam_alloc_noncoherent.
Allocation of the buffers:
[...]
for(i=0;i<DMA_NOFBUFS;i++) {
ext->buf_addr[i] = dmam_alloc_noncoherent(&my_dev->dev, DMA_BUFLEN, &my_dev->buf_dma_t[i],GFP_USER);
if(my_dev->buf_addr[i] == NULL) {
res = -ENOMEM;
goto err1;
}
//Make buffer ready for filling by the device
dma_sync_single_range_for_device(&my_dev->dev, my_dev->buf_dma_t[i],0,DMA_BUFLEN,DMA_FROM_DEVICE);
}
[...]
Mapping of the buffers
void swz_mmap_open(struct vm_area_struct *vma)
{
}
void swz_mmap_close(struct vm_area_struct *vma)
{
}
static int swz_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
long offset;
char * buffer = NULL;
int buf_num = 0;
//Calculate the offset (according to info in https://lxr.missinglinkelectronics.com/linux+v2.6.32/drivers/gpu/drm/i915/i915_gem.c#L1195 it is better not ot use the vmf->pgoff )
offset = (unsigned long)(vmf->virtual_address - vma->vm_start);
buf_num = offset/DMA_BUFLEN;
if(buf_num > DMA_NOFBUFS) {
printk(KERN_ERR "Access outside the buffer\n");
return -EFAULT;
}
offset = offset - buf_num * DMA_BUFLEN;
buffer = my_dev->buf_addr[buf_num];
vm_insert_pfn(vma,(unsigned long)(vmf->virtual_address),virt_to_phys(&buffer[offset]) >> PAGE_SHIFT);
return VM_FAULT_NOPAGE;
}
struct vm_operations_struct swz_mmap_vm_ops =
{
.open = swz_mmap_open,
.close = swz_mmap_close,
.fault = swz_mmap_fault,
};
static int char_sgdma_wz_mmap(struct file *file, struct vm_area_struct *vma)
{
vma->vm_ops = &swz_mmap_vm_ops;
vma->vm_flags |= VM_IO | VM_RESERVED | VM_CAN_NONLINEAR | VM_PFNMAP;
swz_mmap_open(vma);
return 0;
}

linux wake_up_interruptible() having no effect

I am writing a "sleepy" device driver for an Operating Systems class.
The way it works is, the user accesses the device via read()/write().
When the user writes to the device like so: write(fd, &wait, size), the device is put to sleep for the amount of time in seconds of the value of wait. If the wait time expires then driver's write method returns 0 and the program finishes. But if the user reads from the driver while a process is sleeping on a wait queue, then the driver's write method returns immediately with the number of seconds the sleeping process had left to wait before the timeout would have occurred on its own.
Another catch is that 10 instances of the device are created, and each of the 10 devices must be independent of each other. So a read to device 1 must only wake up sleeping processes on device 1.
Much code has been provided, and I have been charged with the task of mainly writing the read() and write() methods for the driver.
The way I have tried to solve the problem of keeping the devices independent of each other is to include two global static arrays of size 10. One of type wait_head_queue_t, and one of type Int(Bool flags). Both of these arrays are initialized once when I open the device via open(). The problem is that when I call wake_up_interruptible(), nothing happens, and the program terminates upon timeout. Here is my write method:
ssize_t sleepy_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos){
struct sleepy_dev *dev = (struct sleepy_dev *)filp->private_data;
ssize_t retval = 0;
int mem_to_be_copied = 0;
if (mutex_lock_killable(&dev->sleepy_mutex))
{
return -EINTR;
}
// check size
if(count != 4) // user must provide 4 byte Int
{
return EINVAL; // = 22
}
// else if the user provided valid sized input...
else
{
if((mem_to_be_copied = copy_from_user(&long_buff[0], buf, count)))
{
return -EFAULT;
}
// check for negative wait time entered by user
if(long_buff[0] > -1)// "long_buff[]"is global,for now only holds 1 value
{
proc_read_flags[MINOR(dev->cdev.dev)] = 0; //****** flag array
retval = wait_event_interruptible_timeout(wqs[MINOR(dev->cdev.dev)], proc_read_flags[MINOR(dev->cdev.dev)] == 1, long_buff[0] * HZ) / HZ;
proc_read_flags[MINOR(dev->cdev.dev)] = 0; // MINOR numbers for each
// device correspond to array indices
// devices 0 - 9
// "wqs" is array of wait queues
}
else
{
printk(KERN_INFO "user entered negative value for sleep time\n");
}
}
mutex_unlock(&dev->sleepy_mutex);
return retval;}
Unlike the many examples on this topic, I am switching the flag back to zero immediately before the call to wait_event_interruptible_timeout() because flag values seem to be lingering between subsequent runs of the program. Here is the code for my read method:
ssize_t sleepy_read(struct file *filp, char __user *buf, size_t count,
loff_t *f_pos){
struct sleepy_dev *dev = (struct sleepy_dev *)filp->private_data;
ssize_t retval = 0;
if (mutex_lock_killable(&dev->sleepy_mutex))
return -EINTR;
// switch the flag
proc_read_flags[MINOR(dev->cdev.dev)] = 1; // again device minor numbers
// correspond to array indices
// TODO: this is not waking up the process in write!
// wake up the queue
wake_up_interruptible(&wqs[MINOR(dev->cdev.dev)]);
mutex_unlock(&dev->sleepy_mutex);
return retval;}
The way I am trying to test the program is to have two main.c's, one for writing to the device and one for reading from the device, and I just ./a.out them in separate consoles in my ubuntu installation in Virtual Box. Another thing, the way it is set up now, neither the writing or reading a.outs return until timeout occurs. I apologize for the spotty formatting of the code. I'm not sure exactly what is going on here, so any help would be much appreciated! Thanks!
Your write method hold sleepy_mutex while wait event. So read method waits on mutex_lock_killable(&dev->sleepy_mutex) while the mutex become unlocked by the writer. It is occured only when writer's timeout exceeds, and write method returns. It is the behaviour you observe.
Usually, wait_event* is executed outside of any critical section. That can be achieved by using _lock-suffixed variants of such macros, or simply wrapping cond argument of such macros with spinlock acquire/release pair:
int check_cond()
{
int res;
spin_lock(&lock);
res = <cond>;
spin_unlock(&lock);
return res;
}
...
wait_event_interruptible(&wq, check_cond());
Unfortunately, wait_event-family macros cannot be used, when condition checking should be protected with a mutex. In that case, you can use wait_woken() function with manual condition checking code. Or rewrite your code without needs of mutex lock/unlock around condition checking.
For achive "reader wake writer, if it is sleep" functionality, you can adopt code from that answer https://stackoverflow.com/a/29765695/3440745.
Writer code:
//Declare local variable at the beginning of the function
int cflag;
...
// Outside of any critical section(after mutex_unlock())
cflag = proc_read_flags[MINOR(dev->cdev.dev)];
wait_event_interruptible_timeout(&wqs[MINOR(dev->cdev.dev)],
proc_read_flags[MINOR(dev->cdev.dev)] != cflag, long_buff[0]*HZ);
Reader code:
// Mutex holding protects this flag's increment from concurrent one.
proc_read_flags[MINOR(dev->cdev.dev)]++;
wake_up_interruptible_all(&wqs[MINOR(dev->cdev.dev)]);

issue with copy_from_user in kernel

I'm trying to use this function to copy a buffer from the user to one in kernel.
both buffers were allocated. I'm using while in case not all the bytes were copied on the first try. but for some reason, nothing is copied and the program is stuck in the while loop.
what can be the reasons for that?
void my_copy_from_user(const char* source_buff, char* dest_buff, int size_to_copy){
int not_copied = size_to_copy
int left = size_to_copy;
while( not_copied ){
not_copied = copy_from_user(dest_buff, source_buff, left);
dest_buff += (left - not_copied);
source_buff += (left - not_copied);
left = not_copied;
}
}
It is possible that it is legitimately failing for reasons that you cannot recover from.
Please look at: http://lxr.free-electrons.com/source/arch/x86/lib/usercopy_32.c#L681
unsigned long _copy_from_user(void *to, const void __user *from, unsigned n)
{
if (access_ok(VERIFY_READ, from, n))
n = __copy_from_user(to, from, n);
else
memset(to, 0, n);
return n;
}
This is the underlying implementation for copy_from_user for Linux on x86 processors. It first checks access_ok. If access is not allowed, it will fail and return with n (the number of bytes you requested to copy) immediately. This would cause an infinite loop.
Two points:
I do not think you should invoke copy_from_user in a loop like that. If it fails to copy in kernel mode, there is a reason why. This is a different beast from read() functions when reading from sockets, etc, where you are encouraged to read() in a loop.
Are you sure that you are passing in the correct dest_buff to copy_from_user?
Tips:
Printk all the values and see what's happening. Is left being changed or not? It is likely not.

Why accessing pthread keys' sequence number is not synchronized in glibc's NPTL implementation?

Recently when I look into how the thread-local storage is implemented in glibc, I found the following code, which implements the API pthread_key_create()
int
__pthread_key_create (key, destr)
pthread_key_t *key;
void (*destr) (void *);
{
/* Find a slot in __pthread_kyes which is unused. */
for (size_t cnt = 0; cnt < PTHREAD_KEYS_MAX; ++cnt)
{
uintptr_t seq = __pthread_keys[cnt].seq;
if (KEY_UNUSED (seq) && KEY_USABLE (seq)
/* We found an unused slot. Try to allocate it. */
&& ! atomic_compare_and_exchange_bool_acq (&__pthread_keys[cnt].seq,
seq + 1, seq))
{
/* Remember the destructor. */
__pthread_keys[cnt].destr = destr;
/* Return the key to the caller. */
*key = cnt;
/* The call succeeded. */
return 0;
}
}
return EAGAIN;
}
__pthread_keys is a global array accessed by all threads. I don't understand why the read of its member seq is not synchronized as in the following:
uintptr_t seq = __pthread_keys[cnt].seq;
although it is syncrhonized when modified later.
FYI, __pthread_keys is an array of type struct pthread_key_struct, which is defined as follows:
/* Thread-local data handling. */
struct pthread_key_struct
{
/* Sequence numbers. Even numbers indicated vacant entries. Note
that zero is even. We use uintptr_t to not require padding on
32- and 64-bit machines. On 64-bit machines it helps to avoid
wrapping, too. */
uintptr_t seq;
/* Destructor for the data. */
void (*destr) (void *);
};
Thanks in advance.
In this case, the loop can avoid an expensive lock acquisition. The atomic compare and swap operation done later (atomic_compare_and_exchange_bool_acq) will make sure only one thread can successfully increment the sequence value and return the key to the caller. Other threads reading the same value in the first step will keep looping since the CAS can only succeed for a single thread.
This works because the sequence value alternates between even (empty) and odd (occupied). Incrementing the value to odd prevents other threads from acquiring the slot.
Just reading the value is fewer cycles than the CAS instruction typically, so it makes sense to peek at the value, before doing the CAS.
There are many wait-free and lock-free algorithms that take advantage of the CAS instruction to achieve low-overhead synchronization.

Resources