I try to follow the path of parameters setting from linux user space (arecord/aplay) down to kernel driver. Let's take arecords --period-size as an example.
It all starts in set_params function in aplay.c:
if (period_time > 0)
err = snd_pcm_hw_params_set_period_time_near(handle, params, &period_time, 0);
else
err = snd_pcm_hw_params_set_period_size_near(handle, params, &period_frames, 0);
The function snd_pcm_hw_params_set_period_size_near() is defined in [pcm.c : 5186](alsa-lib https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm.c#L5186), and here my headache starts... This function starts a chain of calls to other functions which doesn't make much sense to me and doesn't seem to be leading to any end call of driver.
There is _end label so I skipped all calls like snd_pcm_hw_param_set_min() or snd_pcm_hw_param_set_max() and went to snd_pcm_hw_param_set_last() hoping for some driver invocation like:
drv->hw_params_set(...);
but instead I found an end call to:
MASK_INLINE unsigned int snd_mask_min(const snd_mask_t *mask)
{
int i;
assert(!snd_mask_empty(mask));
for (i = 0; i < MASK_SIZE; i++) {
if (mask->bits[i])
return ffs(mask->bits[i]) - 1 + (i << 5);
}
return 0;
}
where return values shall be the parameter set.
So to summarize, I found alsa-lib to be very difficult to read and understand. Maybe I am lacking some knowledge though. My question is simple, how is user space parameter passed to kernel driver. Can you provide a software path showing interfaces called?
Thanks.
The hw_params structure contains a configuration space, which is a description of all possible configurations that the device can support. Numeric parameters are described as intervals (i.e., min and max), access and format as bitmasks.
When you change one parameter, the library calls the kernel driver (SNDRV_PCM_IOCTL_HW_REFINE) to adjust all the other parameters in the hw_params structure that depend on the changed parameter.
After you have reduced the configuration space to the configuration you actually want, you call snd_pcm_hw_params() (→ SNDRV_PCM_IOCTL_HW_PARAMS) to actually configure the device for those parameters. (If some parameter has not been reduced to a single value, snd_pcm_hw_params() will choose a random one.)
snd_pcm_hw_params_set_xxx_near() is more complex because there is no SET_NEAR ioctl. This function tries to adjust the interval so that either its maximum or its minimum is the desired value, and then check whether the actual maximum or minimum is nearer.
For example, assume a device that supports period sizes of 1024, 2048, 4096, and 8192 frames. Initially, the interval is described as [1024, 8192]. When you call snd_pcm_hw_params_set_period_size_near(4000), the snd_pcm_hw_param_set_near() helper function calls set_min(4000) and set_max(4000) (on separate copies of the hw_params structure), so the intervals are [1024, 4000] and [4000, 8192]; after refining, the driver returns the intervals [1024, 2048] and [4096, 8192]. snd_pcm_hw_param_set_near() then sees that 4096 is nearest to the desired calue, so it then calls set_first on the second interval, which results in [4096, 4096].
Related
I have a kernel which uses a global uint array, and I want to access and change entries in that array from all threads using the atom_or function in OpenCL.
The code:
void SetMove(int c, uint m, volatile __global uint *prevmove)
{
uint idx = c >> 4;
uint mask = m << (2 * (c & 15));
atom_or(&prevmove[idx], mask);
}
I am implementing a BFS algorithm using OpenCL. This is done in 'waves', where in every wave, an array in of states is analyzed. Each GPU thread evaluates a single state, finds it's successors, and places them in a different array out. At the end of the wave the contents of out are places in in.
The prevmove array is to keep track of what action you should do in a certain state to find the goal state. The SetMove function updates prevmove when a thread finds a new state.
Now the problem is:
The results are always non-deterministic. If I would chance the atom_or operation with a normal, non atomic |= operator, the kernel behaves exactly the same. I have tested this by checking how many entries in prevmove are nonzero when the algorithm terminates. This varies everytime I run my program.
Is cause of my problem due to not implementing the atom_or correctly or is it something else?
(I have #pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable in my kernel).
I need to write a sequence of values (buffer, ~10bytes) via UART.
This sequence needs to start with a BREAK delimiter, and in my case I need to decrease the baud rate to a lower value.
Details about my environment:
Development board: BeagleBone Black.
Linux Kernel version: 3.8.13-bone70.
Serial driver used by the tty discipline: omap-serial.
What I finally get is something like this:
UARTIOHandler->setBaudRate(B9600);
unsigned char breakChar[] = { 0 };
UARTIOHandler->write(breakChar, 1);
UARTIOHandler->setBaudRate(B19200);
UARTIOHandler->write({1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
The write method is implemented this way:
int UARTIOHandler::write(const std::initializer_list<uchar8> &data) {
uchar8 buffer[data.size()];
int counter = 0;
for(auto i : data) {
buffer[counter++] = i;
}
auto output = ::write(this->fd_write, buffer, data.size());
this->flush();
return output;
}
And finally the flush() method:
void UARTIOHandler::flush() {
tcflush(this->fd_write, TCIOFLUSH);
}
The problem with this code is that the flushing doesn't always work, sometimes the distance between the BREAK and the first byte of data (observed on a scope) is ~500us (which is fine for my application), and sometimes is up to ~3ms.
EDIT: This is the actual behavior:
For the first five seconds everything works fine (the distance between the BREAK and the rest of the message doesn't exceed ~1ms), then, after five seconds there are some frames that exceed this inter-byte timing (for up to ~3ms).
There's always the code that I posted which is executed, so there's no possible way that I somehow forget flushing the buffers.
Why do this variations happen?
I have searched for relevant problems and found this, one workaround described there is to use a delay in front of the tcflush(...) function call. I can't use this method in my application because it will affect the functionality.
Another comment in that topic suggests that this was a bug in the Linux kernel, could this also be in my case?
I was writing programs to count the time of page faults in a linux system. More precisely, the time kernel execute the function __do_page_fault.
And somehow I wrote two global variables, named pfcount_at_beg and pfcount_at_end, which increase once when the function __do_page_fault is executed at different locations of the function.
To illustrate, the modified function goes as:
unsigned long pfcount_at_beg = 0;
unsigned long pfcount_at_end = 0;
static void __kprobes
__do_page_fault(...)
{
struct vm_area_sruct *vma;
... // VARIABLES DEFINITION
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
pfcount_at_beg++; // I add THIS
...
...
// ORIGINAL CODE OF THE FUNCTION
...
pfcount_at_end++; // I add THIS
}
I expected that the value of pfcount_at_end is smaller than the value of pfcount_at_beg.
Because, I think, every time kernel executes the instructions of code pfcount_at_end++, it must have executed pfcount_at_beg++(Every function starts at the very beginning of the code).
On the other hand, as there are many conditional return between these two lines of code.
However, the result turns out oppositely. The value of pfcount_at_end is larger than the value of pfcount_at_beg.
I use printk to print these kernel variables through a self-defined syscall. And I wrote the user level program to call the system call.
Here is my simple syscall and user-level program:
// syscall
asmlinkage int sys_mysyscall(void)
{
printk( KERN_INFO "total pf_at_beg%lu\ntotal pf_at_end%lu\n", pfcount_at_beg, pfcount_at_end)
return 0;
}
// user-level program
#include<linux/unistd.h>
#include<sys/syscall.h>
#define __NR_mysyscall 223
int main()
{
syscall(__NR_mysyscall);
return 0;
}
Is there anybody who knows what exactly happened during this?
Just now I modified the code, to make pfcount_at_beg and pfcount_at_end static. However the result did not change, i.e. the value of pfcount_at_end is larger than the value of pfcount_at_beg.
So possibly it might be caused by in-atomic operation of increment. Would it be better if I use read-write lock?
The ++ operator is not garanteed to be atomic, so your counters may suffer concurrent access and have incorrect values. You should protect your increment as a critical section, or use the atomic_t type defined in <asm/atomic.h>, and its related atomic_set() and atomic_add() functions (and a lot more).
Not directly connected to your issue, but using a specific syscall is overkill (but maybe it is an exercise). A lighter solution could be to use a /proc entry (also an interesting exercise).
I am implementing a virtual filesystem using the fuse, and need some understanding regarding the offset parameter in readdir.
Earlier we were ignoring the offset and passing 0 in the filler function, in which case the kernel should take care.
Our filesystem database, is storing: directory name, filelength, inode number and parent inode number.
How do i calculate get the offset?
Then is the offset of each components, equal to their size sorted in incremental form of their inode number? What happens is there is a directory inside a directory, is the offset in that case equal to the sum of the files inside?
Example: in case the dir listing is - a.txt b.txt c.txt
And inode number of a.txt=3, b.txt=5, c.txt=7
Offset of a.txt= directory offset
Offset of b.txt=dir offset + size of a.txt
Offset of c.txt=dir offset + size of b.txt
Is the above assumption correct?
P.S: Here are the callbacks of fuse
The selected answer is not correct
Despite the lack of upvotes on this answer, this is the correct answer. Cracking into the format of the void buffer should be discouraged, and that's the intent behind declaring such things void in C code - you shouldn't write code that assumes knowledge of the format of the data behind void pointers, use whatever API is provided properly instead.
The code below is very simple and straightforward, as it should be. No knowledge of the format of the Fuse buffer is required.
Fictitious API
This is a contrived example of what some device's API could look
like. This is not part of Fuse.
// get_some_file_names() -
// returns a struct with buffers holding the names of files.
// PARAMETERS
// * path - A path of some sort that the fictitious device groks.
// * offset - Where in the list of file names to start.
// RETURNS
// * A name_list, it has some char buffers holding the file names
// and a couple other auxiliary vars.
//
name_list *get_some_file_names(char *path, size_t offset);
Listing the files in parts
Here's a Fuse callback that can be registered with the Fuse system to
list the filenames provided by get_some_file_names(). It's arbitrarily named readdir_callback() so its purpose is obvious.
int readdir_callback( char *path,
void *buf, // This is meant to be "opaque".
fuse_fill_dir_t *filler, // filler takes care of buf.
off_t off, // Last value given to filler.
struct fuse_file_info *fi )
{
// Call the fictitious API to get a list of file names.
name_list *list = get_some_file_names(path, off);
for (int i = 0; i < list->length; i++)
{
// Feed the file names to filler() one at a time.
if (filler(buf, list->names[i], NULL, off + i + 1))
{
break; // filler() returned 1, requesting a break.
}
incr_num_files_listed(list);
}
if (all_files_listed(list))
{
return 1; // Tell Fuse we're done.
}
return 0;
}
The off (offset) value is not used by the filler function to fill its opaque buffer, buf. The off value is, however, meaningful to the callback as an offset base as it provides file names to filler(). Whatever value was last passed to filler() is what gets passed back to readdir_callback() on its next invocation. filler()
itself only cares whether the off value is 0 or not-0.
Indicating "I'm done listing!" to Fuse
To signal to the Fuse system that your readdir_callback() is done listing file names in parts (when the last of the list of names has been given to filler()), simply return 1 from it.
How off Is Used
The off, offset, parameter should be non-0 to perform the partial listings. That's its only requirement as far as filler() is concerned. If off is 0, that indicates to Fuse that you're going to do a full listing in one shot (see below).
Although filler() doesn't care what the off value is beyond it being non-0, the value can still be meaningfully used. The code above is using the index of the next item in its own file list as its value. Fuse will keep passing the last off value it received back to the read dir callback on each invocation until the listing is complete (when readdir_callback() returns 1).
Listing the files all at once
int readdir_callback( char *path,
void *buf,
fuse_fill_dir_t *filler,
off_t off,
struct fuse_file_info *fi )
{
name_list *list = get_all_file_names(path);
for (int i = 0; i < list->length; i++)
{
filler(buf, list->names[i], NULL, 0);
}
return 0;
}
Listing all the files in one shot, as above, is simpler - but not by much. Note that off is 0 for the full listing. One may wonder, 'why even bother with the first approach of reading the folder contents in parts?'
The in-parts strategy is useful where a set number of buffers for file names is allocated, and the number of files within folders may exceed this number. For instance, the implementation of name_list above may only have 8 allocated buffers (char names[8][256]). Also, buf may fill up and filler() start returning 1 if too many names are given at once. The first approach avoids this.
The offset passed to the filler function is the offset of the next item in the directory. You can have the entries in the directory in any order you want. If you don't want to return an entire directory at once, you need to use the offset to determine what gets asked for and stored. The order of items in the directory is up to you, and doesn't matter what order the names or inodes or anything else is.
Specifically, in the readdir call, you are passed an offset. You want to start calling the filler function with entries that will be at this callback or later. In the simplest case, the length of each entry is 24 bytes + strlen(name of entry), rounded up to the nearest multiple of 8 bytes. However, see the fuse source code at http://sourceforge.net/projects/fuse/ for when this might not be the case.
I have a simple example, where I have a loop (pseudo c-code) in my readdir function:
int my_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi)
{
(a bunch of prep work has been omitted)
struct stat st;
int off, nextoff=0, lenentry, i;
char namebuf[(long enough for any one name)];
for (i=0; i<NumDirectoryEntries; i++)
{
(fill st with the stat information, including inode, etc.)
(fill namebuf with the name of the directory entry)
lenentry = ((24+strlen(namebuf)+7)&~7);
off = nextoff; /* offset of this entry */
nextoff += lenentry;
/* Skip this entry if we weren't asked for it */
if (off<offset)
continue;
/* Add this to our response until we are asked to stop */
if (filler(buf, namebuf, &st, nextoff))
break;
}
/* All done because we were asked to stop or because we finished */
return 0;
}
I tested this within my own code (I had never used the offset before), and it works fine.
My PC has an AMD processor with an ATI 3200 GPU which doesn't support OpenCL. The rest of the codes all running by "Falling back to CPU itself".
I am converting one of the code from CUDA to OpenCL but stuck in some particular part for which there is no exact conversion code in OpenCL. since i have less experience in OpenCL I can't make out this, please suggest me some solution if any of you think will work,
The CUDA code is,
size_t pitch = 0;
cudaError error = cudaMallocPitch((void**)&gpu_data, (size_t*)&pitch,
instances->cols * sizeof(float), instances->rows);
for( int i = 0; i < instances->rows; i++ ){
error = cudaMemcpy((void*)(gpu_data + (pitch/sizeof(float))*i),
(void*)(instances->data + (instances->cols*i)),
instances->cols * sizeof(float) ,cudaMemcpyHostToDevice);
If I remove the pitch value from the above I end up with an problem which doesn't write to the device memory "gpu_data".
Somebody please convert this code to OpenCL and reply. I have converted it to OpenCL, but its not working and the data is not written to "gpu_data". My converted OpenCL code is
gpu_data = clCreateBuffer(context, CL_MEM_READ_WRITE, ((instances->cols)*(instances->rows))*sizeof(float), NULL, &ret);
for( int i = 0; i < instances->rows; i++ ){
ret = clEnqueueWriteBuffer(command_queue, gpu_data, CL_TRUE, 0, ((instances->cols)*(instances->rows))*sizeof(float),(void*)(instances->data + (instances->cols*i)) , 0, NULL, NULL);
Sometimes it runs well for this code and gets stuck in the reading part i.e.
ret = clEnqueueReadBuffer(command_queue, gpu_data, CL_TRUE, 0,sizeof( float ) * instances->cols* 1 , instances->data, 0, NULL, NULL);
overhere. And it gives error like
Unhandled exception at 0x10001098 in CL_kmeans.exe: 0xC000001D: Illegal Instruction.
when break is pressed , it gives:
No symbols are loaded for any call stack frame. The source code cannot be displayed.
while debugging. In the call stack it is displaying:
OCL8CA9.tmp.dll!10001098()
[Frames below may be incorrect and/or missing, no symbols loaded for OCL8CA9.tmp.dll]
amdocl.dll!5c39de16()
I really dont know what it means. someone please help me to rid of this problem.
First of all, in the CUDA code you're doing a horribly inefficient thing to copy the data. The CUDA runtime has the function cudaMemcpy2D that does exactly what you are trying to do by looping over different rows.
What cudaMallocPitch does is to compute an optimal pitch (= distance in byte between rows in a 2D array) such that each new row begins at an address that is optimal for coalescing, and then allocates a memory area as large as pitch times the number of rows you specify. You can emulate the same thing in OpenCL by first computing the optimal pitch and then doing the allocation of the correct size.
The optimal pitch is computed by (1) getting the base address alignment preference for your card (CL_DEVICE_MEM_BASE_ADDR_ALIGN property with clGetDeviceInfo: note that the returned value is in bits, so you have to divide by 8 to get it in bytes); let's call this base (2) find the largest multiple of base that is no less than your natural data pitch (sizeof(type) times number of columns); this will be your pitch.
You then allocate pitch times number of rows bytes, and pass the pitch information to kernels.
Also, when copying data from the host to the device and converesely, you want to use clEnqueue{Read,Write}BufferRect, that are specifically designed to copy 2D data (they are the counterparts to cudaMemcpy2D).