Cuda GPUDirect to NIC/Harddrive? - linux

I am currently writing a CUDA application and am running into a few IO issues "feeding the beast."
I am wondering if there is any way that I can directly read data from a RAID controller or NIC and have that data sent directly to the GPU. What I'm trying to accomplish is shown directly on slide #3 of the following presentation: http://developer.download.nvidia.com/devzone/devcenter/cuda/docs/GPUDirect_Technology_Overview.pdf.
That being said, apparently this has been answered already here: Is it possible to access hard disk directly from gpu?, however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux (but it doesn't offer any useful code snippets/examples).
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so? Would I need to write my own driver for the hardware? Are there any examples where certain copies are avoided?
Thanks in advance for the help.

GPUDirect is a technology "umbrella term", which in general is a brand referring to technologies that enable direct data transfer to and/or from a GPU, somehow bypassing unnecessary trips through host memory.
GPUDirect v1 is a technology that works with specific infiniband adapters, and enables the sharing of a data buffer between the GPU driver and the IB driver. This technology has mostly been superceded by GPUDirect (v3) RDMA. This v1 technology does not enable general usage with any NIC. The environment variable reference:
however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux
refers to enabling GPUDirect v1. It is not a general purpose NIC enabler.
GPUDirect v2 is also called GPUDirect Peer-to-Peer, and it is for transfer of data between two CUDA GPUs on the same PCIE fabric only. It does not enable interoperability with any other kind of device.
GPUDirect v3 is also called GPUDirect RDMA.
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so?
Today, the canonical use case for GPUDirect RDMA is with a Mellanox Infiniband (IB) adapter. (It can also be made to work, perhaps with assistance from Mellanox, using a Mellanox Ethernet Adapter and RoCE). If this fits your definition of "NIC", then it's possible by loading a proper software stack, assuming you have appropriate hardware. The GPU and the IB device need to be on the same PCIE fabric, which means they need to be attached to the same PCIE root complex (effectively, connected to the same CPU socket). When used with a Mellanox IB adapter, typical usage would involve a GPUDirect RDMA-aware MPI.
If you have your own unspecified NIC or RAID controller, and you don't already have a GPUDirect RDMA linux device driver for it, then it's not possible to use GPUDirect. (If there is a GPUDirect RDMA driver for it, contact the manufacturer or driver provider for assistance.) If you have access to the driver source code, and are familiar with writing your own linux device drivers, you could try crafting your own GPUDirect driver. The steps involved are beyond the scope of my answer, but the starting point is documented here.
Would I need to write my own driver for the hardware?
Yes, if you don't already have a GPUDirect RDMA driver for it, one would need to be written.
Are there any examples where certain copies are avoided?
The GPUDirect RDMA MPI link gives examples and explains how GPUDirect RDMA can avoid unnecessary device<->host data copies during the transfer of data from GPU to IB adapter. In general, data can be transferred directly (over PCIE) from memory on the GPU device to memory on the IB device (or vice-versa) with no trip through host memory (GPUDirect v1 did not achieve this.)
UPDATE: NVIDIA has recently announced a new GPU Direct technology called GPU Direct Storage.

Related

Linux Kernel device driver needs access to shared object in userspace

I am trying to write a network device driver for Linux. The device that I have has an API available that allows me to access all of the features I need through a shared object that exists in userspace.
I want to write a network driver such that I can make the device show up as a CAN interface. However, in order to interact with the device I need to use a specific shared object that exists in userspace.
The reason that I need a network device driver is to expose a CAN Interface that can be interacted with via the SocketCAN utilities.
Is there a way that I can write a network device driver in userspace? Or what would the best way for me to architect a solution?
Tl;Dr
Need to write a device driver for a device which can only be interacted with from userspace via a supplied shared object which exposes the API. I need the device to show up as a network interface in order to utilize the SocketCAN utilities and other applications that communicate with CAN interfaces in Linux.
What are my options here? What can I do?
Thanks!
So you are saying that there is no driver for your network device in kernel at all, and it can be only accessed via some user-space library? In that case shared library you mentioned should be communicating with your network device by memory mapping your /dev/mem file, in order to be able to read/write to hardware registers. Or perhaps by using some UIO.
So your driver should be also developed in user-space then... Then the actual question you should ask is how to use kernel CAN API from user-space? And is it possible at all in the first place? For answers I guess you should look at Documentation/networking/can.txt. And if the answer is "no" (means you can't expose CAN interface from user-space), then you should develop also some kernel driver which would interact with your user-space part, exposing CAN interface.
In ideal world the whole driver architecture would look like this:
But you need to use some (proprietary, if I understand correctly) shared library API to interact with your device. So I propose you to use next driver architecture, which depicted on the image below:
blue color stands for parts that need to be developed
magenta is for already existing code
In a nutshell, your app and driver both make a shim between SocketCAN API and shared library API.
So you need to develop 2 components:
Driver (on kernel side). It's in charge of:
talking to SocketCAN utilities
talking to your user-space application
Application (in user-space); it's probably should be a daemon, as it's gonna be running constantly. It's in charge of:
talking to shared library
talking to your driver
The last question remains is which kernel API to use to interact between your kernel space driver and user-space application (marked as IPC on picture). It strictly depends on which kind of data you are going to send between two, and how much of data you will want to send, and which way of sending is most appropriate for your task. It may also depend on your shared library API: you probably don't want to spend much of CPU time to convert messages format (as you already have triple context switching with this driver architecture, which is not really nice for performance). So it's probably should be something packet-oriented, like Netlink.
Next reading can be useful to figure out which IPC to use:
Kernel Space - User Space Interfaces
Linux kernel interfaces

Does Linux X-Server directly access GPU memory?

My main question is there piece of code running in X-Server process memory (Excluded drivers - which we all know can be written in different manners) is directly accessing memory in GPU card?
Or it employs drivers and drm, or any other interface for communication with GPU and queuing draw/render/clear/... commands?
I know question seems lame, but I am interested in specifics?
EDIT:
More specifically: to my understanding kernel communicates with hardware with assistance from drivers, and exposes API to the rest (if I am wrong please correct me).
In this context can X-Server circumvent DMA-API (I am only guessing DMA IO is responsible for communication with periferials) located in kernel to communicate and exchange data with GPU card (in a direct way - without anyones assistance == without kernel, drivers, ...)?
And what would be bare minimum requirement for X-Server to communicate with GPU. I am aiming to understand how this communication is done on low level.
It is entirely possible that on Linux a given X server accesses part of the video card memory directly as a framebuffer. It's not the most efficient way of displaying things, but it works.

Linux - Nic's flags configuration

Context
Debian 64 bit. kernel 3.18.x
Litterally struggling to understand how a network driver is initialized.
I mean how to choose which flag to set. I dig in the kernel for days now to train myself. The card setup is the only point I miss.
I take the intel 82574 as an example. I downloaded the card's datasheet, saw many information but not a clue on how to setup the hardware.
Question
Where to start to know what flags to set ? The datasheet didn't helped me (i am not very experienced but willing to learn).
Please give me a starting point, a tip or anything to help me understand what is going on in the already written open sourced driver.
How can a developer knows how to initialize his nic ? (yes reinventing the wheel the time to understand)
You'll need to read the source code of the kernel module that handles your specific NIC.
EDIT: Of course, to develop such a module, you'd usually just use a register map as specified in a data sheet or application node; often, manufacturers develop their linux drivers themselves, so the driver developers might even be the same people that developed the chipset (because it's really handy to have a platform to test against -- it's impossible to test hardware without having something like a driver, so you might as well write a proper driver).
Furthermore, devices often come with code examples -- no one is going to build a device based on an IC that he has not seen in action.
If you've got access to neither proper documentation nor source, you can only reverse engineer - and that's an incredibly large field.
Using your example with the Intel 82574 Network Adapter, Intel provides a zip file of the source code used to build the Linux driver. The driver is like all drivers in that it hooks into the OS API for Networking.
The Linux networking API is document on both the linux.org site and discussed on popular Linux sites like lwn.org. Below is the link to lwn's chapter on Network drivers using the networking API called NAPI.
https://static.lwn.net/images/pdf/LDD3/ch17.pdf
You'll notice in the Intel igb driver source code that the NAPI net_device data structure is one of the first things that is setup. It registers the driver with the OS. This way the OS knows which igb functions to call when loading/unloading the driver, or when needing to send/receive data.
The igb functions read/modify/write the necessary bits in the 82574's memory-mapped registers that control and monitor the device. The device registers are all documented in the 82574 datasheet available on Intel's site. And this is usually the case for almost any networking company like Broadcom/Chelsio/Mellanox/Marvell.
Hope that helps a little more.

How can the linux kernel be forced to enumerate the PCI-e bus?

Linux kernel 2.6
I've got an fpga that is loaded over GPIO connected to a development board running linux.
The fpga will transmit and receive data over the pci-express bus. However, this is enumerated
at boot and as such, no link is discovered (because the fpga is not loaded at boot).
How can I force re-enumeration of the pci-e bus in linux?
Is there a simple command or will I have to make kernel changes?
I need the capability to hotplug pcie devices.
As root, try the following command:
echo "1" > /sys/bus/pci/rescan
See this link for more information: http://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci
I wonder what platform you are on: A work around (aka hack) for this that works on x86 systems is to have the BIOS basically statically configure a PCI device at whatever bus, device, function the FPGA normally lands on, then the OS will enumerate the device and reserve the PCI space for it (even though the device isn't really there). Then in your device driver you will have to do some extra things like setup the BARs and int lines manually after the fpga has been programmed. Of course this requires modifying the BIOS, which if you are working with a BIOS vendor you can contract them to make this change for you, if you are not working with a BIOS vendor then it will be much harder... Also keep in mind that I was working on VxWorks on x86, and we had a AMI make a custom BIOS for our boards...
If you don't have a BIOS, then consider programming it in the bootloader, there you already have the ability to read from disk, and adding GPIO capabilities probably isn't too difficult (assuming you are using jtag and GPIOs?), in fact depending on what bootloader you use it might already be able to do GPIO?
The issues with modifying the kernel to do this is that you have to find the sweet spot where you can read the bitfile, before the PCI enumeration... If for example the disk device drivers are initialized after PCI, then obviously you must do some radical changes to the kernel just to read the bitfile prior to PCI enumeration, which might cause other annoying problems...
One other option which you may have already discovered, and which is really only ok for development time: Power up the system, program the fpga board, then do a reset (without power cycle, for example: sudo reboot now), the FPGA should keep its configuration, and linux should enumerate it...
After turning on your computer, the BIOS enumerates the PCI bus and attempts to fulfill all IO space and memory mapped IO (MMIO) requests. It sets up these BAR's initially, and when the operating system loads these BAR's can be changed by the OS as it sees fit while the PCI bus driver enumerates the bus yet again. It is even possible for the superuser of the system to run the command setpci to change these BAR's after the BIOS has already attempted to configure them and the OS has loaded (may cause drivers to fail and several other bad things if done improperly).
I have had to do this in cases where the card in question was not assigned any resources by the BIOS since the region requested required a 64-bit address and the BIOS only operated with 32-bit address assignments. I was able to go in after-the-fact and change these addresses (originally assigned by the BIOS) to whatever addresses I saw fit, insert the kernel module, and my driver would map and use these newly-assigned addresses for the card without knowing the difference.
The problem that exists with hotplugging PCI-Express cards is that the power to the slot, itself, cannot be turned on/off without specific hotplug controllers that need to exist on the motherboard/backplane. Not having these hotplug controllers to turn the slot's power off may lead to shorts between the tiny pins when the card is physically inserted and/or removed if power is still present. Hotplug events, however, can be initiated by either end (the host or the endpoint device). This does not seem to be the case, however if your FPGA already has a link established with the root complex, a possible solution to your problem would be to generate hotplug interrupts to cause a bus rescan in the OS.
There is a major problem, though -- if your card does not actually obtain a link to the root complex, it won't be able to generate any hotplug events; which seems to be the case. After booting, the FPGA should toggle the PRESENT line on the PCIe bus to tell the OS there is a card ready to be enumerated. Once detected, the OS should attempt to establish a link to the card and assign memory regions to the device. After the OS enumerates the card you'll be able to load drivers against it and see it in lspci. You stated you're using kernel 2.6, which does have support for hotplugging and dynamic resource allocation so this method should work as long as your FPGA supports the ability to toggle the PRESENT PCIe line, too.

Hardware clock signals implementation in Linux Kernel

I am looking at some pointers for understanding how the Linux kernel implements the setting up of various hardware clocks. This basically relates to working with setting up the various clocks that hardware features like the LCD, UART etc will use. For example when Linux boots how does it handle setting up the clocks for UART or USB. Maybe something like a Clock manager or something.
I am basically trying to implement something similar for a different OS on a new hardware that i am working on. Any help would be really appreciated.
[Edit]
Thanks for the replies and the links. So here is what i have implemented up until now. This should give you an idea of where I'm headed.
I looked up the Hardware Reference Manual for the particular system I'm targeting and wrote some code to monitor/modify the signals/pins of the peripherals I am interested in i.e. turning them ON/OFF from the command line.Now a collection of these clocks/signals together control a peripheral.The HRM would say that if you want to turn on the UART or something then turn on such and such signals/pins. And #BjoernD yes I am using something like a mmap() function to talk to the peripherals.
The meat of my question is that I want to understand the design and implementation of a Clock/Peripheral Manager which uses the utility that I have already written. This Clock/Peripheral Manager would give me the control of enabling/disabling the peripherals I want.Basically this Manager would enable me to make changes in the init code that is right now running. Also during run time processes can call this Manager to turn ON/OFF the devices so that power consumption is optimized. It might not have made perfect sense but I'm myself trying to wrap my head around this.
Now I'm sure something like this would have been implemented in Linux or for that matter any OS for performance issues (nobody would want to waste power by turning on all peripherals at boot time). I want to understand the Software Architecture of it. Reference from any OS would do as of now to atleast get a headstart. Also I am not writing my own OS, there is an OS in place but Im looking more at a board level software aka BSP to work on. But thanks for the OS link anyways, they are really good. Appreciate it.
Thanks!
What you want to achieve is highly specific to a) the platform you are using and b) the device you want to use. For instance, on x86 there are 3 ways to communicate with a device:
Interrupts allow the device to signal the CPU. The OS usually provides mechanisms to register interrupt handlers - functions that are called upon occurrence of an interrupt. In Linux see request_irq() and friends in linux/include/interrupt.h
Memory-mapped I/O is physical memory of the device that the platform's BIOS makes available in the same way you also access plain physical memory - simply by writing to a memory address. What exactly is behind such memory (e.g., network interface config registers or an LCD frame buffer) depends on the device and is usually specified in the device's data sheet.
I/O ports are accessed through a special address space and special instructions (INB/OUTB & co.). Other than that they work similar to I/O memory.
There's a multitude of ways to find out what resources a device provies and where the BIOS mapped them. Some platforms use ACPI tables (google yourself for the 1,000k page spec), PCI provides info on devices in a standardized way through the PCI config space, USB has similar ways of discovering devices attached to the bus, and some devices, e.g., UARTS, are simply specified to be available at a pre-configured I/O range that is fixed for your platform.
As a start for understanding Linux, I'd recommend "Understanding the Linux kernel". For specifics on how Linux handles devices and what is there to write drivers, have a look at Linux Device Drivers. Furthermore, you will need to have a look at the peculiarities of your platform and the device you want to drive.
If you want to start an own OS, a UART is certainly something that will be veeery helpful to print debug output, so you might want to go for this first.
Now that I wrote down all this, it seems that your actual question is: How to get started with Operating System design. This question should be highly valuable for you: What are some resources for getting started in operating system development?
The two big power users in most computers are the CPU and the disks. Both of these have capabilities for power saving in Linux. The CPU clock can be slowed down when the system is not busy, and the disk motors can be stopped when no I/O is happening. For a UART, even if you save all of the power that it uses by turning off its clock, it is still tiny compared to the others because a UART doesn't have much logic in it.
Best ways to save power are
1) more efficient power supply
2) replace rotating disk with SSD
3) Slow down the CPU and memory bus

Resources