InfiniBand RDMA

InfiniBand RDMA - azure

I am trying to use the InfiniBand on an A8 machine on Azure. Actually, the ping-pong test works fine however, I can not run a simple RDMA based program. I can find the device by ibv_get_device_list(NULL) but I can not open it by the ibv_open_device(). Is that true that all RDMA based applications must use Intel MPI?
Thank you,

As of today, Azure RDMA instances only allow Intel MPI to utilize RDMA capabilities. This is why any general purpose RDMA app will not work.
The good news is that this is going to change soon, as Azure intend to introduce support for general purpose RDMA in the very near future (2018), so hang tight!
Thanks

For the record, Azure finally opened IBVERBS API for certain instance types.

Related

Can packer.io template specify processor type in azure builder?

Constraints:
My application requires SSE4.2 instruction set.
I am using packer.io to provision my Windows Azure VM (OpenLogic 6.5 OS.)
Windows Azure returns an AMD-processor-backed-VM about 15% of the time. The rest of the time - they are Intel-processor-based. AMD processors do not support SSE4.2, but they do support SSE4a. So, my application is terminated with SIGILL on AMD processors.
Questions:
Can I request specific architecture (Intel CPU) when Packer
provisions a VM? I know that instance types >= A8 come only with Intel processors, but they are more expensive, and I would not want to use them for development.
If Packer cannot do it, what are the other options
(Powershell, ect...) that would give me this functionality?
Thank you.

Answering my own question. Azure does not provide a way to request processor type. The only way to ensure Intel processor is to not use A-series machines (as confirmed by a MSFT representative.) Thus, no tool can do it.

Cuda GPUDirect to NIC/Harddrive?

I am currently writing a CUDA application and am running into a few IO issues "feeding the beast."
I am wondering if there is any way that I can directly read data from a RAID controller or NIC and have that data sent directly to the GPU. What I'm trying to accomplish is shown directly on slide #3 of the following presentation: http://developer.download.nvidia.com/devzone/devcenter/cuda/docs/GPUDirect_Technology_Overview.pdf.
That being said, apparently this has been answered already here: Is it possible to access hard disk directly from gpu?, however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux (but it doesn't offer any useful code snippets/examples).
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so? Would I need to write my own driver for the hardware? Are there any examples where certain copies are avoided?
Thanks in advance for the help.

GPUDirect is a technology "umbrella term", which in general is a brand referring to technologies that enable direct data transfer to and/or from a GPU, somehow bypassing unnecessary trips through host memory.
GPUDirect v1 is a technology that works with specific infiniband adapters, and enables the sharing of a data buffer between the GPU driver and the IB driver. This technology has mostly been superceded by GPUDirect (v3) RDMA. This v1 technology does not enable general usage with any NIC. The environment variable reference:
however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux
refers to enabling GPUDirect v1. It is not a general purpose NIC enabler.
GPUDirect v2 is also called GPUDirect Peer-to-Peer, and it is for transfer of data between two CUDA GPUs on the same PCIE fabric only. It does not enable interoperability with any other kind of device.
GPUDirect v3 is also called GPUDirect RDMA.
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so?
Today, the canonical use case for GPUDirect RDMA is with a Mellanox Infiniband (IB) adapter. (It can also be made to work, perhaps with assistance from Mellanox, using a Mellanox Ethernet Adapter and RoCE). If this fits your definition of "NIC", then it's possible by loading a proper software stack, assuming you have appropriate hardware. The GPU and the IB device need to be on the same PCIE fabric, which means they need to be attached to the same PCIE root complex (effectively, connected to the same CPU socket). When used with a Mellanox IB adapter, typical usage would involve a GPUDirect RDMA-aware MPI.
If you have your own unspecified NIC or RAID controller, and you don't already have a GPUDirect RDMA linux device driver for it, then it's not possible to use GPUDirect. (If there is a GPUDirect RDMA driver for it, contact the manufacturer or driver provider for assistance.) If you have access to the driver source code, and are familiar with writing your own linux device drivers, you could try crafting your own GPUDirect driver. The steps involved are beyond the scope of my answer, but the starting point is documented here.
Would I need to write my own driver for the hardware?
Yes, if you don't already have a GPUDirect RDMA driver for it, one would need to be written.
Are there any examples where certain copies are avoided?
The GPUDirect RDMA MPI link gives examples and explains how GPUDirect RDMA can avoid unnecessary device<->host data copies during the transfer of data from GPU to IB adapter. In general, data can be transferred directly (over PCIE) from memory on the GPU device to memory on the IB device (or vice-versa) with no trip through host memory (GPUDirect v1 did not achieve this.)
UPDATE: NVIDIA has recently announced a new GPU Direct technology called GPU Direct Storage.

Listening a particular port on linux to access data comes from mobile device

i am newbie to Linux platform, i am working on java technology.
what i have to do is : Having a program that running on mobile devices,that sends some data to my Linux machine, now i have to create a program in java that
listen to a particular port.
access data comes on that port(which is sending by mobile device)
save that data to the database.
response back to the mobile device.
i.e. i would make my Linux system as server that can listen from many clients(mobile devices), but not getting how to configure this environment... :(
i used cent OS 5.4 and
installed jdk1.6.0_24
any help would be appreciated.....
thanx in advance!
khushi

One of Java's greatest strengths is that you can pretty much ignore the host operating system as long as you stick to core Java features. In the case you're describing, you should be able to accomplish everything by simply using the standard Java networking APIs and either the JDBC to access an existing, external database or you could choose any number of embedded Java databases such as Derby. For your stated use case, that you'll be running the application on Linux is pretty much irrelevant (which should be good news... you don't need to learn a whole operating system in addition to writing your app ;-).

Here's a nice client/server tutorial, in that it is broken into steps, and adds each new concept in another step.
Here's another client/server tutorial with much more detail.
I would write it to accept one connection at a time. Once that works, I would study the new(ish) java.lang.concurrent classes, in particular the ExecutorService, as a way of managing the worker bee handling each connection. Then change your program to handle multiple connections using those classes. Breaking it up in two steps like that will be a lot easier.

Minimum configuration to run embedded Linux on an ARM processor?

I need to produce an embedded ARM design that has requirements to do many things that embedded Linux would do. However the design is cost sensitive and does not need huge amounts of horse power. Mostly will be talking to serial interfaces. Ideally I would like to use one of the low end ARMs. What is the lowest configuration of an ARM that you have successfully used embedded Linux on.
Edit:
The application needs a file system on some kind of flash device and the ability to run applications for processing the data. Some of the applications might be written by others than myself. I also need to ability to load new applications or update old apps using the serial ports to accept the apps.
When I have looked at other embedded OSes they seem to be more of a real time threading solution than having the ability to run applications. I am open to what ever will get the job done.

I think you need to weigh your cost options here.
ARM + linux is an option but you will be paying a very high operating overhead for such a simple (from your description) set of features. You can't just look at the cost of the ARM chip but must also consider external RAM which will very likely be required as well as flash to get enough space available to run the kernel + apps.
NOTE: you may be able to avoid the external requirements with a very minimal kernel and simple apps combined with a uC with large internal resources.
A second option is a much simpler microcontroller with a light weight OS. This will cut your hardware costs on the CPU and you can likely run something like this without external RAM or flash (dependent on application RAM and program space requirement)
third option: I don't actually see anything in your requirements that demands any OS at all be used. Basic file systems are very simple, for instance there are even FAT drivers out there for 8 bit PIC's. Interfacing to an SD card only requires a SPI port and minimal external circuitry.
The application bit could be simple or complex. I've built systems around PIC18 microcontollers that run a web server and allow program updates via a simple upload screen, it just stores the new program into an EEPROM or flash, reboots into a bootloader and copies the new program into internal program memory. You could likely design a way to do this without the reboot via a cooperative multitasking type of architecture. Any way you go the programmers writing the apps are going to need to have knowledge of the architecture and access to libraries / driver you write. Your best bet to simplify this is to provide as simple an API as possible and to try to automate the build process for them.
The third option will be the "cheapest" in terms of hardware as there will be very little overhead in the processing of your applications allowing you to get away with minimal processing power and memory. It likely will require some more programming/software architecting on your part but won't require nearly the research you will need to undertake to get linux up and running in addition to learning to write the needed device drivers under a linux paradigm.
As always you have to include the software development costs in the build cost of the device. If you plan to build 10,000+ of these your likely better off keeping hardware costs down and putting more man power into designing a software solution that allows that hardware to meet the design goals. If your building 10 of them, your better off spending an extra $15-20 on hardware if it can cut down on your software development costs. For example an ARM with MMU with full linux kernel support and available device drivers.
I kind of feel that your selecting the worst of both worlds at the moment, your paying extra to get a uC you can run linux on but by doing so your also selecting a part that will likely be the most complex to get linux up and running on, especially having not worked with linux on embedded platforms before.

I've had success even on ARM7TDMI, so I don't think you're going to have any trouble. If you have a low-requirements system, you could use any kind of lightweight real-time executive and have a lot better experience than you would getting Linux to work.

I've used a TS-7200 for about five years to run a web server and mail server, using Debian GNU Linux. It is 200 MHz and has 32 MB of RAM, and is quite adequate for these tasks. It has serial port built in. It's based on a ARM920T.
This would be overkill for your job; I mention it so you have another data point.

For several years I've been using a gumstix to do prototyping and testing and I've had good results with it. I don't know if the processor they are using (Intel PXA255 on my board) is considered low-cost, but the entire Verdex line seems pretty cheap to me for an adaptable device.

ucLinux is designed specifically for resource constrained targets, but perhaps more importantly for targets without an MMU.
However you have to have a good reason to use Linux on such a system rather than a small real-time executive. Out-of-the-box networking, readily available drivers and protocol stacks for complex hardware and support for existing POSIX legacy or open source code are a few perhaps. However if you don't need that, Linux is still large, and you may be squandering resources for no real benefit. In most cases you will still need off-chip SDRAM and Flash if you choose Linux of any flavour.
I would not regard serial I/O as 'complex hardware', so unless you are running a complex, but standard protocol, your brief description does not appear to warrant the use of Linux IMO

My DLINK DIR-320 router runs Linux inside.
And I know some handymen, flashing it with Optware and connecting USB-hub, HDDs, USB-flash, and much more.
It's low-cost ready for use "platform". (If you don't need mass production). But maybe more powerful than you need.
Additionally, it can be configured wirelessly via web-interface even through your pda :)

How to implement web services on an embedded device?

We have an embedded device that needs to interact with an enterprise software system.
The enterprise system currently uses many different mechanisms for communication between its components: ODBC, RPC, proprietary protocol over TCP/IP, and is moving to .Net-implmented web services.
The embedded device runs a flavor of *nix, so we're looking at what the best interaction mechanism is.
The requirements for the communication are:
Must run over TCP/IP.
Must also run over RS-232 or USB.
Must be secure (e.g. HTTPS or SSL).
Must be capable of transferring ~32MB of data.
Our current best option is gSOAP.
Does anyone out there in SO-land have any other suggestions?
Edit: Steven's answer gave me the most new pointers. Thanks to all!

You can define RESTful services the use HTTPS (which uses TCP/IP by definition) and is capable of transferring any amount of data.
The advantage of REST over SOAP is that REST is simpler. It can use JSON instead of XML which is simpler.
It has less overhead than the SOAP protocol.

Can't you just use SSL over TCP?
If you have some kind of *nix (may I guess? It's either QNX or embedded linux, right?) it should work pretty much out of the box via Ethernet, USB and RS232. Keep thing simple.
32mb is plenty of memory for this task. I would allocate between 2 and 4 mb of memory for networking & encryption (code + data).

It's not real clear why you want to tie this to a remote-procedure-call protocol like SOAP. Are there other requirements you aren't mentioning?
In general, though, this sort of thing is handled very easily using normal web-based services. You can get very lightweight http processors written in C; see this Wikipedia article for comparisons of a number of them. Then a REST interface will work fine. There are network interfaces that treat USB as a TCP connection, as well.
If you must be able to run over RS232, you might want to look elsewhere; in that case, something like sftp might do better. Or write a simple application-layer protocol that you can run over an encrypted connection.

If you are going to connect your application using RS232, I assume that you will be using PPP to connect the device to the internet. The amount of data that you are proposing to transfer is somewhat worrisome, however. Most RS232 connections are limited to 115200 baud which, ignoring the overhead required for TCP/IP/PPP framing is going to yield a transfer rate of at most 11,000 bytes per second. This implies that it will take a minimum of approximately 2800 seconds or 46 minutes to make whatever transfer that you intend.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string