What are the possible schemes for sharing the bandwidth of the link between the Platform Controller Hub and the CPU?
How could concurrent transfers work say for two I/O devices attempting to use DMA capabilities?
Are there arbitration schemes to ensure fair use of the PCH to CPU link?
Related
I am studying Operating Systems, and came across divice controllers.
I gathered that a device controller is hardware whereas a device driver is software.
I also know that a HDD and a SSD both have a small PCB buit into them and I assume those PCB's are the device controllers.
Now what I want to know is if there is another device controller on the PC/motherboard side of the bus/cable connecting the HDD/SSD to the OS?
Is the configuration: OS >> Device Driver >> Bus >> Device Controller >> HDD/SSD
Or is it: OS >> Device Driver >> Device Controler >> Bus >> Device Controller >> HDD/SSD
Or is it some other configuration?
Sites I visited for answers:
Tutorialspoint
JavaPoint
Idc online
Quora
Most hard-disks on desktop are SATA or NVME. eMMC is popular for smartphones but some might use something else. These are hardware interface standards that describe the way to interact electrically with those disks. It tells you what voltage at what frequency and for what amount of time you need to apply (a signal) to a certain pin (a bus line) to make the device behave or react in a certain way.
Most computers are separated in a few external chips. On desktop, it is mostly SATA, NVME, DRAM, USB, Audio Output, network card and graphics card. Even though there is few chips, the CPU would be very expensive if it had to support all those hardware interface standards on the same silicon chip. Instead, the CPU implements PCI/PCI-e as a general interface to interact with all those chips using memory mapped registers. Each of these devices have an external PCI-e controller between the device and the CPU. In the same order as above, you have AHCI, NVME controller, DRAM (not PCI and in the CPU), xHCI (almost everywhere) and Intel HDA (example). Network cards are PCI-e and there isn't really a controller outside the card. Graphics card are also self standing PCI-e devices.
So, the OS detects the registers of those devices that are mapped in the address space. The OS writes at those locations, and it will write the registers of the devices. PCI-e devices can read/write DRAM directly but this is managed by the CPU in its general implementation of the PCI-e standard most likely by doing some bus arbitration. The CPU really doesn't care what's the device that it is writing. It knows that there is a PCI register there and the OS instructs to write it with something so it does. It just happens that this device is an implementation of a standard and that the OS developers read the standard so they write the proper values in those registers and the proper data structures in DRAM to make sure that the device knows what to do.
Drivers implement the standard of the software interface of those controllers. The drivers are the ones instructing the CPU on values to write and writing the proper data structures in DRAM for giving commands to the controllers. The user thread simply places the syscall number in a conventionnal register determined by the OS developers and they call an instruction to jump into the kernel at a specific address that the kernel decides by writing a register at boot. Once there, the kernel looks at the register for the number and determines what driver to call based on the operation.
On Linux and some place else, it is done with files. You call syscalls on files and the OS has a driver attached to the file. They are called virtual files. A lot of transfer mechanisms are similar to the reading/writing files pattern so Linux uses that to make a general driver model where the kernel doesn't even need to understand the driver. The driver just says create me a file there that's not really on the hard disk and if someone opens it and calls an operation on it then call this function that's there in my driver. From there, the driver can do whatever it wants because it is in kernel mode. It just creates the proper data structures in DRAM and writes the registers of the device it drives to make it do something.
In my opinion, SPI and DMA are both controllers.
SPI is a communication tool and DMA can transfer data without CPU.
The system API such as spi_sync() or spi_async(), are controlled by the CPU.
So what is the meaning of SPI with DMA, does it mean DMA can control the SPI API without CPU? Or the SPI control uses CPU but the data transfer to DMA directly?
SPI is not a tool, it is a communication protocol. Typical micro controllers have that protocol implemented in hardware which can accessed by read/write to dedicated registers in the address space of the given controller.
DMA on micro controllers is typically designed to move content of registers to memory and visa versa. DMA can sometimes configured to write a special amount of read/writes or increasing or decreasing source and target address of memory and so on.
If you have a micro controller which have SPI with DMA support, it typically means that you can have some data in the memory which will be transferred to the SPI unit to send multiple data bytes without intervention of the cpu core itself. Or read an amount of data bytes from SPI to memory automatically without wasting cpu core.
How such DMA SPI transfers are configured is written in the data sheets of the controllers. There are a very wide span of types so no specific information can be given here without knowing the micro type.
The linux APIs for dealing with SPI are abstracting the access of DMA and SPI by using the micro controller specific implementations in the drivers.
It is quite unclear if you want to use the API to access your SPI or you want to implement a device driver to make the linux API working on your specific controller.
It is not possible to give you a general introduction to write a kernel driver here or clarify register by register from your data sheets. If you need further information you have to make your question much more specific!
I'm currently doing some research about ARM's TrustZone, e.g. here: ARM information center. As far as I understand, with TrustZone a secure environment based on the AMBA AXI bus can be created.
On ARM website it says: "This concept of secure and non-secure worlds extends beyond the processor to encompass memory, software, bus transactions, interrupts and peripherals within an SoC." I read that peripherals can be connected to TrustZone via the NonSecure-bit of the AMBA AXI bus (The extra signal is used to differentiate between trusted and non-trusted requests).
1) What, except the extra pin of AMBA AXI bus, is the TrustZone specific hardware in a SoC with TrustZone?
2) Is it possible to connect an external non-volatile memory (e.g. Flash) or a partition of it to TrustZone with access to secure world (via external memory interface and -then internal- the AXI bus)? If no, how are secrets (as keys) stored to be used in the secure world (with help of fuses??)? If yes, how is it prevented that a Flash including malicious code is connected?
3) Is it possible to implement code to the secure world as a customer of a chip vendor (e.g. TI or NXP), either before or after the chip left the factory?
Thank you for your answers.
TrustZone is a set of standards released by ARM. It gives OEM (embedded software programmers) and SOC vendors some tools to make a secure solution. These have different needs depending on what needs to be secured. So each SOC will be different. Some SOC manufacturers will try to compete on the same security application, but they will still differentiate.
1) What, except the extra pin of AMBA AXI bus, is the TrustZone specific hardware in a SoC with TrustZone?
Anything that the vendor wants. The GIC (ARMv7-A) interrupt controller, the L1 and L2 controllers, and MMU are all TrustZone aware peripherals in most Cortex-A CPUs. These are designed by ARM and implemented in the SOC. As well, there are various memory partitioning/exclusion devices which can be placed in between a peripheral and the SOC. Examples are the NIC301 and various proprietary BUS interconnect technology.
Other hardware may include physical tampers, voltage and temperature monitoring, clock monitoring and cryptography accelerators.
2) Is it possible to connect an external non-volatile memory (e.g. Flash) or a partition of it to TrustZone with access to secure world (via external memory interface and -then internal- the AXI bus)? If no, how are secrets (as keys) stored to be used in the secure world (with help of fuses??)? If yes, how is it prevented that a Flash including malicious code is connected?
As the above alludes, chips like the NIC301 can physically partition AXI peripherals.See image below Part of any TrustZone solution is some secure boot mechanism. All CPU will boot in the secure world. The secure boot mechanism may vary. For instance, a one time programmable ROM might be appropriate for some applications. Many have programmable fuses with a public/private key mechanism implemented in SOC ROM. The SOC ROM boot software will verify that the image in flash is properly signed by whoever burned the one time fuses.
This OEM image can set-up many TrustZone peripherals, most of which will have a lock bit. Once set, registers in the peripherals can not be changed until the next hard boot.
3) Is it possible to implement code to the secure world as a customer of a chip vendor (e.g. TI or NXP), either before or after the chip left the factory?
Yes, this is the secure boot mechanism. It is not specified in the ARM TrustZone documents on how code will be secured. If you manufacture the chip and have on-chip ROM with a MESH layer protecting it maybe sufficient for secure boot. However, TI and NXP will implement a public/private key mechanism and verify that only software signed by an OEM can be loaded. This OEM software can have bugs (and possibly the ROM loader by the SOC vendor), but at least it is possible to create a secure boot chain.
With public key, even complete access to the chip will only allow an attacker to load previously released software from the OEM. Some solutions may have revocation mechanisms as well to prevent previously released software from being used.
See: trust-zone
Typical ARM bus
ARM partition checker
Handling ARM TrustZone
I am writing a Linux device driver which supports multiple devices. I have a x8 PCIe card with 4 of these devices on it. Each runs through a PCIe switch and gets 2 PCIe lanes. Is there a way to have the driver write to multiple lanes at the same time? If so, how would I do this? I would think it should be possible since it is all on one PCIe slot, but I have no idea how this would be done from the driver.
PCIe doesn't work quite the way you think it does. The switch is not partitioning up the upstream x8 link into multiple x2 links -- it simply forwards traffic from one link to another. So what you will see is a x8 link to the switch, and then 4 x2 links from the switch to the downstream devices. However with a different switch and different downstream devices, it would be equally possible to (for example) have x8 links everywhere, ie a x8 link from the root port to the switch and x8 links from the switch to the downstream devices.
However, in your case you have a matched amount of bandwidth on both sides of the switch, so there should be no issue with devices competing for a limited amount of bandwidth. Your driver can talk to all the devices simultaneously as efficiently as if there were independent links.
It sounds like you're looking for PCIe multicast. This has no connection to the number of lanes, but would simply be a function of delivering a single write to multiple destinations as efficiently as possible. There is a standard for this, mostly intended for backplane uses, see: http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=12f5c260ccf5e054366d4c96ee655fa6827db5b3
It looks like this is supported with a new PCI BAR type, where multiple devices would have the same mapped physical address range, and the switch would also be configured to know about this multicast range. But this all needs OS support, and I haven't found anything on the web to suggest that Linux has the pieces necessary to configure the devices to do all this.
Since your parent link has enough bandwidth to saturate all four child links, you don't have a throughput problem. The only thing you'd save with multicast is bandwidth from the memory subsystem. If you have a modern architecture, the amount you'd save would be in the noise.
In other words, don't worry about it. Treat your devices as independent (this will make for a cleaner driver, anyway) and get on with your project.
I need to transfer video data to and from an FPGA device over PCI in a linux environment. I'm using a third party PCI master core on the FPGA. So far, I've implemented a simple DMA controller on the FPGA to transfer data from the FPGA to the CPU, using consecutive PCI write bursts.
Next, I need to transfer video data from the CPU to the FPGA. What is the best way to go about this?
Should I implement a module on the FPGA which performs a whole bunch of burst reads over PCI. Or is there a way to get the CPU to efficiently write data into the FPGA's memory using PCI write bursts?
My bandwidth requirements are around 30 MB/s in both directions.
Thanks.
You could do posted writes from CPU like what video card drivers do but you'll need to have some driver magic such as setting MTRR (which means you might have some architectural dependency). If you want to be safe DMA read from FPGA is a better way to go. 30MB/s isn't much.
Sounds to me the FPGA should master both reads and writes. Otherwise you would hog the host CPU. That's a classic task for a DMA (and you cannot guarantee a DMA exists on every host).