Why does Intel MPI use DAPL and OpenMPI native ibverbs? - openmpi

I don't understand why Intel MPI use DAPL, if native ibverbs are faster than DAPL, OpenMPI use native ibverbs.
However, in this benchmark IntelMPI achieves better performance.
http://www.hpcadvisorycouncil.com/pdf/AMBER_Analysis_and_Profiling_Intel_E5_2680.pdf

Intel MPI uses several interfaces to interact with hardware, and DAPL is not default for all cases. OpenMPI will select some interface for current hardware too, it will be not always ibverbs, there is shared memory API for local node interactions and TCP for Ethernet-only hosts.
List for Intel MPI (Linux):
https://software.intel.com/en-us/get-started-with-mpi-for-linux
Getting Started with Intel® MPI Library for Linux* OS. Last updated on August 24, 2015
Support for any combination of the following interconnection fabrics:
Shared memory
Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects
Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*)
OpenFabrics Interface* (OFI*)
RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*
Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects
Interface to fabric can be selected with I_MPI_FABRICS environment variable:
https://software.intel.com/en-us/node/535584
Selecting Fabrics. Last updated on February 22, 2017
Intel® MPI Library enables you to select a communication fabric at runtime without having to recompile your application. By default, it automatically selects the most appropriate fabric based on your software and hardware configuration. This means that in most cases you do not have to bother about manually selecting a fabric.
However, in certain situations specifying a particular communication fabric can boost performance of your application. You can specify fabrics for communications within the node and between the nodes (intra-node and inter-node communications, respectively). The following fabrics are available:
Fabric - Network hardware and software used
shm - Shared memory (for intra-node communication only).
dapl - Direct Access Programming Library* (DAPL)-capable network fabrics, such as InfiniBand* and iWarp* (through DAPL).
tcp - TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*).
tmi - Tag Matching Interface (TMI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture and Myrinet* (through TMI).
ofa - OpenFabrics Alliance* (OFA)-capable network fabrics, such as InfiniBand* (through OFED* verbs).
ofi - OpenFabrics Interfaces* (OFI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture, InfiniBand* and Ethernet (through OFI API).
For inter-node communication, it uses the first available fabric from the default fabric list. This list is defined automatically for each hardware and software configuration (see I_MPI_FABRICS_LIST for details).
For most configurations, this list is as follows:
dapl,ofa,tcp,tmi,ofi

Related

Are Intel's PTT and TPM equivalent

Are Intel PTT (Intel Platform Trust Technology) and TPM chips functionally equivalent?
If I had a board with a Intel processor that supported PTT, would I have the same functions as if I had a hardwired TPM chip, e.g support of Trousers, etc.?
How do you discover if a particular Intel processor supports PTT?
The Intel Platform Trust Technology (PTT) architecture, first introduced in 2013 on 4th generation chips, implements TPM functionality within the CPU. PTT fully supports all Microsoft’s requirements for firmware Trusted Platform Module (fTPM) 2.0 specification.
To your operating system and applications, there should be no discernible difference between using PTT or using a dedicated TPM chip.
You will typically have an option in your firmware configuration utility to enable or disable PTT if your processor supports a fTPM. On Windows, you can check if you are using a TPM or a fTPM (PTT) by running TPM.MSC. On Linux, check under /sys/class/tpm, sys/kernel/security/tpm or your boot log.
The easiest way is to check in the BIOS. Usually you have to enable it in the BIOS if you want to use it because the default is disabled on all the systems I've seen.

InfiniBand RDMA

I am trying to use the InfiniBand on an A8 machine on Azure. Actually, the ping-pong test works fine however, I can not run a simple RDMA based program. I can find the device by ibv_get_device_list(NULL) but I can not open it by the ibv_open_device(). Is that true that all RDMA based applications must use Intel MPI?
Thank you,
As of today, Azure RDMA instances only allow Intel MPI to utilize RDMA capabilities. This is why any general purpose RDMA app will not work.
The good news is that this is going to change soon, as Azure intend to introduce support for general purpose RDMA in the very near future (2018), so hang tight!
Thanks
For the record, Azure finally opened IBVERBS API for certain instance types.

What is the Goal of Adaptive AutoSAR

What is the main motive of introducing Adaptive Autosar?
Information provided by Autosar consortium is "AP provides mainly high-performance computing and communication mechanisms and offers flexible software configuration."
High performance computing will be achieved through many/multi core processors,
Ethernet will be used for communication
Application will be programmed in C++ language and POSIX will be used.
My doubts are :
Multi core is already used in Classic platform
Since Autosar is completely Software, how usages on many core FPGA etc will be considered in autosar scope.
Ethernet is also available for Classic Platform.
How C++ fulfill the motive of flexibility, security and high computation?
What is contribution of POSIX in Adaptive autosar?
Classic AUTOSAR (especially AUTOSAR OS) is based on static configuration of OS objects like e.g. tasks (mainly because of and through the largely OSEK-like OS; simply said, AUTOSAR OS is OSEK++).
Main point of adaptive AUTOSAR will be to change that concept, introducing dynamically creatable OS objects. Imagine that an adaptive AUTOSAR system would allow to load executables which were unknown at built time.
(Not discussing here whether that is a safe/secure design.)
See my answer :
Multi core is already used in Classic platform
Yes but it is uC core and the performance.. capability is completely different with uP core i.e some state of the art uP core A53, A57 based.
Why.. uP designed for High Performance Applications.
uC hard to render a HD videos ... but uP does.
Since Autosar is completely Software, how usages on many core FPGA etc will be considered in autosar scope.
Autosar do not only refer to Software but it turns out Hardware requirements as following.
Eg. You could not port a POSIX OS compliant to uC
FPGA can be configure as a SoC for that you can even have a uC and uP running on same board. The rest is free to use.. Autosar Classic in uC and Autosar Adaptive in uP.
Ethernet is also available for Classic Platform.
Autosar Adaptive not even defined what is communication protocol
it just say ara::com following with many Spec and Requirement.. that make vendor or Autosar Provider can implement COM in various way... regards Service Oriented motivations.
How C++ fulfill the motive of flexibility, security and high computation?
It is hard to explain all in one here...
But to fulfill it, we need a completely new platform supports(called Foundations in Adaptive)
Eg. To handle safety we will not start an application via systemd(Linux) or Init (Android) but we need completely new Function to do it : Execution Manager - Adaptive Autosar.
What is contribution of POSIX in Adaptive autosar?
It only related to OS requirement, where at least some "system API" need to be support by OS. The list of system API you can find in POSIX PSE 51.

Are Xen vTPM's integrated to Openstack cloud?

Xen has the ability to attach virtual trusted platform modules (vTPMs) to guest VMs: http://wiki.xenproject.org/wiki/Virtual_Trusted_Platform_Module_(vTPM). I would like to know if there is any Openstack integration for this feature - can managed VM for instance be provisioned vTPMs?
I saw something similar for Hyper-V here:
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/hyper-v-vtpm-devices.html
OpenStack provides the following as part of Cloud tenant threat mitigation:
Use separated clouds for tenants, if necessary.
Use storage encryption per VM or per tenant.
OpenStack Nova has a Trusted Filter for Filter Scheduler to schedule workloads to trusted resources only (trusted computing pools), so workloads not requiring trusted execution can be scheduled on any node, depending on utilization, while workloads with a trusted execution requirement will be scheduled only to trusted nodes.
With the following process:
Before you can run OpenStack with XenServer, you must install the hypervisor on an appropriate server .
Xen is a type 1 hypervisor: When your server starts, Xen is the first software that runs. Consequently, you must install XenServer before you install the operating system where you want to run OpenStack code. You then install nova-compute into a dedicated virtual machine on the host.
While XAPI is the preferred mechanism for supporting XenServer (and its deprecated sibling XCP), most existing Xen Project integration with OpenStack is done through libvirt below.
compute_driver = libvirt.LibvirtDriver
[libvirt]
virt_type = xen
Hardware TPM is also supported:
Our solution essentially mimics how one may download software and compute its SHA-256 hash and compare against its advertised SHA-256 hash to determine its legitimacy. It involves using Intel TXT, which is composed of hardware, software, and firmware. The hardware, attached to the platform, called the Trusted Platform Module (TPM)[3], provides the hardware root of trust. Firmware on the TPM is used to compute secure hashes and save the secure hashes to a set of registers called Platform Configuration Registers (PCRs), with different registers containing different measurements. Other components are Intel virtualization technology, signed code modules, and a trusted boot loader called TBOOT1. Essentially the BIOS, option ROM, and kernel/Ramdisk are all measured in the various PCRs. From a bare metal trust standpoint, we are interested in PCRs 0-7(BIOS, option ROM). The kernel/Ramdisk measurements would depend on the image the tenant seeks to launch on their bare metal instance. PCR value testing is provided by an Open Attestation service, OAT[2]. Additional details in references.
with these security considerations:
At the time of this writing, very few clouds are using secure boot technologies in a production environment. As a result, these technologies are still somewhat immature. We recommend planning carefully in terms of hardware selection. For example, ensure that you have a TPM and Intel TXT support. Then verify how the node hardware vendor populates the PCR values. For example, which values will be available for validation. Typically the PCR values listed under the software context in the table above are the ones that a cloud architect has direct control over. But even these may change as the software in the cloud is upgraded. Configuration management should be linked into the PCR policy engine to ensure that the validation is always up to date.
References
Tighten the security of your OpenStack Clouds - OpenStack Superuser
Xen, XAPI, XenServer - OpenStack Configuration Reference  - kilo
XenServer - OpenStack
XenServer/XenAndXenServer - OpenStack
XenAPI Specific Bugs : OpenStack Compute (nova)
OpenStack - Xen
Xen via Libvirt - OpenStack Configuration Reference  - liberty
Hypervisors - OpenStack Configuration Reference  - kilo
OpenStack Docs: Overview of nova.conf
OpenStack Docs: nova.conf - configuration options
OpenStack Docs: Telemetry configuration options
Configure APIs - OpenStack Configuration Reference  - kilo
OpenStack Docs: Glossary
Bare-metal-trust - OpenStack
Baremetal driver - OpenStack Configuration Reference  - juno
OpenStack Docs: Integrity life-cycle
Current Series Release Notes — Nova Release Notes 16.0.0.0b3.dev171 documentation
Enhanced-Platform-Awareness-OVF-Meta-Data-Import - OpenStack
Example nova.conf configuration files - OpenStack Configuration Reference  - kilo
Chapter 7. Configuring a Basic Overcloud using Pre-Provisioned Nodes - Red Hat Customer Portal
Feature Support Matrix — nova 16.0.0.0b3.dev171 documentation
Trusted Computing for Infrastructure (pdf)
What is Hyper.sh | Hyper.sh User Guide
Xen TPM Manager
Supporting Open Source Software Development in SSOs/SDOs
Xen Cloud Platform Virtual
Machine Installation Guide (pdf)
OpenStack Docs: Security hardening
policy.json - OpenStack Configuration Reference  - kilo
Appendix B. Firewalls and default ports - OpenStack Configuration Reference  - kilo
New, updated and deprecated options in Kilo for Orchestration - OpenStack Configuration Reference  - kilo

Cuda GPUDirect to NIC/Harddrive?

I am currently writing a CUDA application and am running into a few IO issues "feeding the beast."
I am wondering if there is any way that I can directly read data from a RAID controller or NIC and have that data sent directly to the GPU. What I'm trying to accomplish is shown directly on slide #3 of the following presentation: http://developer.download.nvidia.com/devzone/devcenter/cuda/docs/GPUDirect_Technology_Overview.pdf.
That being said, apparently this has been answered already here: Is it possible to access hard disk directly from gpu?, however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux (but it doesn't offer any useful code snippets/examples).
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so? Would I need to write my own driver for the hardware? Are there any examples where certain copies are avoided?
Thanks in advance for the help.
GPUDirect is a technology "umbrella term", which in general is a brand referring to technologies that enable direct data transfer to and/or from a GPU, somehow bypassing unnecessary trips through host memory.
GPUDirect v1 is a technology that works with specific infiniband adapters, and enables the sharing of a data buffer between the GPU driver and the IB driver. This technology has mostly been superceded by GPUDirect (v3) RDMA. This v1 technology does not enable general usage with any NIC. The environment variable reference:
however the presentation that I've attached leads to believe all I need is to set an environment variable in Linux
refers to enabling GPUDirect v1. It is not a general purpose NIC enabler.
GPUDirect v2 is also called GPUDirect Peer-to-Peer, and it is for transfer of data between two CUDA GPUs on the same PCIE fabric only. It does not enable interoperability with any other kind of device.
GPUDirect v3 is also called GPUDirect RDMA.
Therefore, I'm wondering if it is possible to read data directly from a NIC/RAID controller into the GPU and what would be required to do so?
Today, the canonical use case for GPUDirect RDMA is with a Mellanox Infiniband (IB) adapter. (It can also be made to work, perhaps with assistance from Mellanox, using a Mellanox Ethernet Adapter and RoCE). If this fits your definition of "NIC", then it's possible by loading a proper software stack, assuming you have appropriate hardware. The GPU and the IB device need to be on the same PCIE fabric, which means they need to be attached to the same PCIE root complex (effectively, connected to the same CPU socket). When used with a Mellanox IB adapter, typical usage would involve a GPUDirect RDMA-aware MPI.
If you have your own unspecified NIC or RAID controller, and you don't already have a GPUDirect RDMA linux device driver for it, then it's not possible to use GPUDirect. (If there is a GPUDirect RDMA driver for it, contact the manufacturer or driver provider for assistance.) If you have access to the driver source code, and are familiar with writing your own linux device drivers, you could try crafting your own GPUDirect driver. The steps involved are beyond the scope of my answer, but the starting point is documented here.
Would I need to write my own driver for the hardware?
Yes, if you don't already have a GPUDirect RDMA driver for it, one would need to be written.
Are there any examples where certain copies are avoided?
The GPUDirect RDMA MPI link gives examples and explains how GPUDirect RDMA can avoid unnecessary device<->host data copies during the transfer of data from GPU to IB adapter. In general, data can be transferred directly (over PCIE) from memory on the GPU device to memory on the IB device (or vice-versa) with no trip through host memory (GPUDirect v1 did not achieve this.)
UPDATE: NVIDIA has recently announced a new GPU Direct technology called GPU Direct Storage.

Resources