Linux dma driver dma_cap_set,dma_cap_zero

Linux dma driver dma_cap_set,dma_cap_zero - linux

I'm writing a linux device driver for an dma and while going across the source of dma drivers in LXR i came across the functions dma_cap_zero and dma_cap_set and whole family of dma_cap_* . What are these functions ?
Also there a structure called dma_transaction_type
enum dma_transaction_type {
DMA_MEMCPY,
DMA_XOR,
DMA_PQ,
DMA_XOR_VAL,
DMA_PQ_VAL,
DMA_MEMSET,
DMA_INTERRUPT,
DMA_SG,
DMA_PRIVATE,
DMA_ASYNC_TX,
DMA_SLAVE,
DMA_CYCLIC,
DMA_INTERLEAVE,
/* last transaction type for creation of the capabilities mask */
DMA_TX_TYPE_END,
};
What do the enum types represent ?

These functions are actually preprocessor macro functions, and are used by slave DMA devices to configure and request DMA channels.
Here is an example of them being used:
dma_cap_mask_t mask;
dma_cap_zero(mask);
dma_cap_set(DMA_MEMCPY,mask);
dma_chan1 = dma_request_channel(mask,0,NULL);
This code is from http://ecourse.wikidot.com/dmatest.
First, there's the datatype dma_cap_mask_t defined in dmaengine.h, ~line 233. It is a type of bitfield where the bits indicate what kind of transfers a DMA channel is capable of.
In the code snippet above, which occurs in the linked code's __init routine, the mask is declared to be of the special dma_cap_mask_t datatype. Then the dma_cap_zero() function is called and mask is passed to it.
I believe dma_cap_zero is simply zeroing out the capability mask. It is defined in dmaengine.h, ~line 733. The function has returns void, and I think is zeroing the bitfield. I'm not entirely sure, though, because kernel code is a massive pile of macro magic that I have a hard time deciphering sometimes.
After the mask is zeroed, or initialized in some fashion by dma_cap_zero, the capabilities of the channel must be set. The dma_cap_set function accomplishes this. It takes the request channel type and sets the mask according to the capabilities required to perform that type of transaction. If you're confused about how the enumeration is being used, take a look at this page for a simple review of enums. In this case, it looks like the values in the enum are used to describe different types of DMA transactions, each of which need a different set of "capabilities". The dma_set_cap function sets the capabilities mask according to the capabilities required for the specified transaction type.
Once the mask is properly set for the type of DMA transactions you want to perform, you request the DMA channel.
The other dma_cap* macros are used to perform other types of manipulations on the DMA mask without really knowing what's going on behind the scenes. These types of macros are all over the place in the kernel code, for many more manipulations that just DMA. They allow device drivers to get things done in the kernel without having to worry about how the kernel does it.

Related

Are sockaddr_in and sockaddr_in6 still using sin_len and sin6_len?

So, basically the title says it all. I've been porting my unix socket C code to Windows, and apparently those structures do not have sin_len or sin6_len in windows.
I'm using a union between sockaddr_storage, sockaddr_in and sockaddr_in6 everywhere, and just using the correct member according to ss_family. It would make sense that the socket library could just deduce the size according to the family, so the length field would indeed be redundant.
If I comment out the code that sets the length field, everything still works on OSX and linux, but that may be just an illusion, so I decided to ask here.
Is that variable deprecated, somehow? Can I safely stop using it, and rely on the socket implementation to use the family variable?

The sin_len field is not required by the POSIX specification.
Here's the relevant information from Unix Network Programming - Vol 1. The Sockets Networking API, 3rd Edition:
3.2 Socket Address Structures
The length member, sin_len, was added with 4.3BSD-Reno, when support
for the OSI protocols was added (Figure 1.15). Before this release,
the first member was sin_family, which was historically an unsigned
short. Not all vendors support a length field for socket address
structures and the POSIX specification does not require this member.
Further Steven's provides the motivation behind the field:
Having a length field simplifies the handling of variable-length socket address structures.
Even if the length field is present, we need never set it and need never examine it, unless we are dealing with routing sockets (Chapter 18). It is used within the kernel by the routines that deal with socket address structures from various protocol families (e.g, the routing table code).
The four socket functions that pass a socket address structure from the process to the kernel, bind, connect, sendto, and sendmsg, all go through the sockargs function in a Berkeley-derived implementation (p. 452 of TCPv2). This function copies the socket address structure from the process and explicitly sets its sin_len member to the size of the structure that was passed as an argument to these four functions. The five socket functions that pass a socket address structure from the kernel to the process, accept, recvfrom, recvmsg, getpeername, and getsockname, all set the sin_len member before returning to the process.
Unfortunately, there is normally no simple compile-time test to determine whether an implementation defines a length field for its socket address structures...We will see in Figure 3.4 that IPv6 implementations are required to define SIN6_LEN if the socket address structures have a length field. Some IPv4 implementations provide the length field of the socket address structure to the application based on a compile-time option (e.g., _SOCKADDR_LEN).
You'll want to evaluate the usage of sin_len in your code. If it's just initializing it to 0, you can remove the code. If you're reading the value from the result of accept, recvfrom, recvmsg, getpeername, or getsockname, you'll unfortunately need some platform specific compile switches or switch to using the separate address length variable

What type of FEI should I use?

I have a device that has one 40MHz wideband input and can output one wideband output and 8 narrowband DDC channels from a device. I am not sure how to setup the FEI interface for this device. Should it be a CHANNELIZER or an RX_DIGITIZER_CHANNELIZER? I am leaning towards the latter due to the wideband output.
Regarding the allocation of such a device. For the RX_DIGITIZER_CHANNELIZER types, the manual is vague in the use of the CHANNELIZER portion in this instance. Should I still allow CHANNELIZERS to be allocated or just allocate the RX_DIGITIZER_CHANNELIZERs and DDCs? Should changes to the RX_DIGITIZER_CHANNELIZER determine when to drop the DDCs in this instance? If CHANNELIZERS can still be allocated with a RX_DIGITIZER_CHANNELIZER, how does this work?

A channelizer device takes already digitized data and provides DDCs from this digital stream. A RX_DIGITIZER_CHANNELIZER has a analog input and performs the receiver, digitizer and channelizer function in a non-separable device. It sounds like you choose correctly to use the RX_DIGITIZER_CHANNELIZER for a device with an analog input. Typically allocating the RX_DIGITIZER_CHANNELIZER takes care of allocating the Channelizer because all three parts are linked together. Therefore the only additional allocations are usually DDCs.

Linux PCIe Driver: What to use for private data structure?

I'm creating my first PCIe driver for Linux and have a question regarding which structure to use for the pci_set_drvdata() function.
The PCIe hardware is built in house and we will be using DMA to send data to and from the device. It is not a sound card or any other subsystem which needs to be plugged into the kernel.
When I look at examples there seems to be a specific struct to fill in and then send to pci_set_drvdata().
What do I fill in for this case? Do I simply ignore this and send in a blank structure? The line I am referring to in any PCIe driver is:
struct structure_in_question *my_struct;
my_struct = kzalloc( sizeof(*my_struct), GFP_KERNEL) );
This is usually found in the probe() function.

That function is used for associating with the device private data that cannot be supplied any other way. If there is no such data then the function should simply not be used.

It is a convenient way for example to save a pointer to a local dynamically allocated device context in the device probe callback and then retrieve it back with pci_get_drvdata in the device remove callback and do a proper cleanup of the context.

Linux DMA API: specifying address increment behavior?

I am writing a driver for Altera Soc Developement Kit and need to support two modes of data transfer to/from a FPGA:
FIFO transfers: When writing to (or reading from) an FPGA FIFO, the destination (or source) address must not be incremented by the DMA controller.
non-FIFO transfers: These are normal (RAM-like) transfers where both the source and destination addresses require an increment for each word transferred.
The particular DMA controller I am using is the CoreLink DMA-330 DMA Controller and its Linux driver is pl330.c (drivers/dma/pl330.c). This DMA controller does provide a mechanism to switch between "Fixed-address burst" and "Incrementing-address burst" (these are synonymous with my "FIFO transfers" and "non-FIFO transfers"). The pl330 driver specifies which behavior it wants by setting the appropriate bits in the CCRn register
#define CC_SRCINC (1 << 0)
#define CC_DSTINC (1 << 14)
My question: it is not at all clear to me how clients of the pl330 (my driver, for example) should specify the address-incrementing behavior.
The DMA engine client API says nothing about how to specify this while the DMA engine provider API simply states:
Addresses pointing to RAM are typically incremented (or decremented)
after each transfer. In case of a ring buffer, they may loop
(DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
are typically fixed.
without giving any detail as to how the address types are communicated to providers (in my case the pl300 driver).
The in the pl330_prep_slave_sg method it does:
if (direction == DMA_MEM_TO_DEV) {
desc->rqcfg.src_inc = 1;
desc->rqcfg.dst_inc = 0;
desc->req.rqtype = MEMTODEV;
fill_px(&desc->px,
addr, sg_dma_address(sg), sg_dma_len(sg));
} else {
desc->rqcfg.src_inc = 0;
desc->rqcfg.dst_inc = 1;
desc->req.rqtype = DEVTOMEM;
fill_px(&desc->px,
sg_dma_address(sg), addr, sg_dma_len(sg));
}
where later, the desc->rqcfg.src_inc, and desc->rqcfg.dst_inc are used by the driver to specify the address-increment behavior.
This implies the following:
Specifying a direction = DMA_MEM_TO_DEV means the client wishes to pull data from a FIFO into RAM. And presumably DMA_DEV_TO_MEM means the client wishes to push data from RAM into a FIFO.
Scatter-gather DMA operations (for the pl300 at least) is restricted to cases where either the source or destination end point is a FIFO. What if I wanted to do a scatter-gather operation from system RAM into FPGA (non-FIFO) memory?
Am I misunderstanding and/or overlooking something? Does the DMA engine already provide a (undocumented) mechanism to specify address-increment behavior?

Look at this
pd->device_prep_dma_memcpy = pl330_prep_dma_memcpy;
pd->device_prep_dma_cyclic = pl330_prep_dma_cyclic;
pd->device_prep_slave_sg = pl330_prep_slave_sg;
It means you have different approaches like you have read in documentation. RAM-like transfers could be done, I suspect, via device_prep_dma_memcpy().

It appears to me (after looking to various drivers in a kernel) the only DMA transfer styles which allows you (indirectly) control auto-increment behavior is the ones which have enum dma_transfer_direction in its corresponding device_prep_... function.
And this parameter declared only for device_prep_slave_sg and device_prep_dma_cyclic, according to include/linux/dmaengine.h
Another option should be to use and struct dma_interleaved_template which allows you to specify increment behaviour directly. But support for this method is limited (only i.MX DMA driver does support it in 3.8 kernel, for example. And even this support seems to be limited)
So I think, we are stuck with device_prep_slave_sg case with all sg-related complexities for a some while.
That is that I am doing at the moment (although it is for accessing of some EBI-connected device on Atmel SAM9 SOC)
Another thing to consider is a device's bus width. memcopy-variant can perform different bus-width transfers, depending on a source and target addresses and sizes. And this may not match size of FIFO element.

Designing a Linux char device driver so multiple processes can read

I notice that for serial devices, e.g. /dev/ttyUSB0, multiple processes can open the device but only one process gets the bytes (whichever reads them first).
However, for the Linux input API, e.g. /dev/input/event0, multiple processes can open the device, and all of the processes are able to read the input events.
My current goal:
I'd like to write a driver for several multi-position switches (e.g. like a slider switch with 3 or 4 possible positions), where apps can get a notification of any switch position changes. Ideally I'd like to use the Linux input API, however it seems that the Linux input API has no support for the concept of multi-position switches. So I'm looking at making a custom driver with similar capabilities to the Linux input API.
Two questions:
From a driver design point-of-view, why is there that difference in behaviour between Linux input API and Linux serial devices? I reckon it could be useful for multiple processes to all be able to open one serial port and all listen to incoming bytes.
What is a good way to write a Linux character device driver so that it's like the Linux input API, so multiple processes can open the device and read all the data?

The distinction is partly historical and partly due to the different expectation models.
The event subsystem is designed for unidirectional notification of simple events from multiple writers into the system with very little (or no) configuration options.
The tty subsystem is intended for bidirectional end-to-end communication of potentially large amounts of data and provides a reasonably flexible (albeit fairly baroque) configuration mechanism.
Historically, the tty subsystem was the main mechanism of communicating with the system: you plug your "teletype" into a serial port and bits went in and out. Different teletypes from different vendors used different protocols and thus the termios interface was born. To make the system perform well in a multi-user context, buffering was added in the kernel (and made configurable). The expectation model of the tty subsystem is that of a point-to-point link between moderately intelligent endpoints who will agree on what the data passing between them will look like.
While there are circumstances where "single writer, multiple readers" would make sense in the tty subsystem (GPS receiver connected to a serial port, continually reporting its position, for instance), that's not the main purpose of the system. But you can easily accomplish this "multiple readers" in userspace.
The event system on the other hand, is basically an interrupt mechanism intended for things like mice and keyboards. Unlike teletypes, input devices are unidirectional and provide little or no control over the data they produce. There is also little point in buffering the data. Nobody is going to be interested in where the mouse moved ten minutes ago.
I hope that answers your first question.
For your second question: "it depends". What do you want to accomplish? And what is the "longevity" of the data? You also have to ask yourself whether it makes sense to put the complexity in the kernel or if it wouldn't be better to put it in userspace.
Getting data out to multiple readers isn't particularly difficult. You could create a receive buffer per reader and fill each of them as the data comes in. Things get a little more interesting if the data comes in faster than the readers can consume it, but even that is mostly a solved problem. Look at the network stack for inspiration!
If your device is simple and just produces events, maybe you just want to be an input driver?
Your second question is a lot more difficult to answer without knowing more about what you want to accomplish.
Update after you added your specific goal:
When I do position switches, I usually just create a character device and implement poll and read. If you want to be fancy and have a lot of switches, you could do mmap but I wouldn't bother.
Userspace just opens your /dev/foo and reads the current state and starts polling. When your switches change state, you just wake up the readers and they they'll read again. All your readers will wake up, they'll all read the new state and everyone will be happy.
Be careful to only wake up readers when your switches are 'settled'. Many position switches are very noisy and they'll bounce around a fair bit.
In other words: I would ignore the input system altogether for this. As you surmise, position switches are not really "inputs".

How a character device handles these kinds of semantics is completely up to the driver to define and implement.
It would certainly be possible, for example, to implement a driver for a serial device that will deliver all read data to every process that has the character driver open. And it would also be possible to implement an input device driver that delivers events to only one process, whichever one is queued up to receive the latest event. It's all a matter of coding the appropriate implementation.
The difference is that it all comes down to a simple question: "what makes sense". For a serial device, it's been decided that it makes more sense to handle any read data by a single process. For an input device, it's been decided that it makes more sense to deliver all input events to every process that has the input device open. It would be reasonable to expect that, for example, one process might only care about a particular input event, say pointer button #3 pressed, while another process wants to process all pointer motion events. So, in this situation, it might make more sense to distribute all input events to all concerned parties.
I am ignoring some side issues, for simplicity, like in the situation of serial data being delivered to all reading processes what should happen when one of them stops reading from the device. That's also something that would be factored in, when deciding how to implement the semantics of a particular device.

What is a good way to write a Linux character device driver so that it's like the Linux input API, so multiple processes can open the device and read all the data?
See the .open member of struct file_operations for the char device. Whenever userspace opens the device, then the .open function is called. It can add the open file to a list of open files for the device (and then .release removes it).
The char device data struct should most likely use a kernel struct list_head to keep a list of open files:
struct my_dev_data {
...
struct cdev cdev;
struct list_head file_open_list;
...
}
Data for each file:
struct file_data {
struct my_dev_data *dev_data;
struct list_head file_open_list;
...
}
In the .open function, add the open file to dev_data->file_open_list (use a mutex to protect these operations as needed):
static int my_dev_input_open(struct inode * inode, struct file * filp)
{
struct my_dev_data *dev_data;
dev_data = container_of(inode->i_cdev, struct my_dev_data, cdev);
...
/* Allocate memory for file data and channel data */
file_data = devm_kzalloc(&dev_data->pdev->dev,
sizeof(struct file_data), GFP_KERNEL);
...
/* Add open file data to list */
INIT_LIST_HEAD(&file_data->file_open_list);
list_add(&file_data->file_open_list, &dev_data->file_open_list);
...
file_data->dev_data = dev_data;
filp->private_data = file_data;
}
The .release function should remove the open file from dev_data->file_open_list, and release the memory of file_data.
Now that the .open and .release functions maintain the list of open files, it is possible for all open files to read data. Two possible strategies:
A separate read buffer for each open file. When data is received, it is copied into the buffers of all open files.
A single read buffer for the device, but a separate read pointer for each open file. Data can be freed from the buffer once it has been read through all open files.

Serial to input/event
You could try to look into serial mouse driver source code.
This seem to be what you're searching for: from a TTYSx build a input/event device.
Simplier: creating a server, instead of a driver.
Historically, the 1st character device I remember is /dev/lp0.
To be able to write on it from many source, without overlap or other conflict,
a LPR server was wrotten.
To share a device, you have to
open this device in exclusive (rw) mode.
Create a socket (un*x or TCP) to listen on
redirect request from socket's clients to the device and maybe
store device status (from reading device's answers)
send device status to socket's clients when required.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string