cma-reserved region appears to be 0 KB even if I added a 'linux,cma node' in the device tree - linux

I wanted to test cma-allocator in linux (5.15.68).
So I added linux,cma node in the /reserved-memory node like this.
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
axpu_reserved_mem: axpursvd#90000000 {
no-map;
reg = <0x0 0xc0000000 0x0 0x8000000>;
};
linux,cma {
compatible = "shared-dma-pool";
reusable;
size = <0 0x30000000>;
alloc-ranges = <0 0x90000000 0 0x30000000>;
linux,cma-default;
};
};
BTW, this test was done on qemu arm64 machine and there is only 1GB ram (from 0x80000000 ~ 0xbfffffff) in the virtual machine, and notice I'm assigning 3/4 of the ram to cma region and 1/8 to a device driver (just for test).
When I boot the machine, I see this message during the boot.
Memory: 1020140K/1048576K available (3200K kernel code, 386K rwdata, 808K rodata, 7808K init, 106K bss, 28436K reserved, 0K cma-reserved)
Why is the CMA-reserved area is 0KB?
This is some config variables I added for CMA test.
CONFIG_MEMORY_ISOLATION=y
CONFIG_CONTIG_ALLOC=y
ONFIG_CMA=y
CONFIG_CMA_DEBUG=y
CONFIG_CMA_DEBUGFS=y
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
# CONFIG_DMA_CMA is not set
I tried adding 'cma=768MB' in the boot args or changed CONFIG_CMA_AREAS to 1 but it is the same.
What am I missing??

I found by adding CONFIG_CMA_DMA=y, the cma-alloc area is reserved.
When I add 'cma=768M' in the boot args, it has the precedence, (I don't know where the kernel placed the cma region).
But when there is no 'cma=768M' in the boot args, the device tree information is used and the CMA area is placed from 0x90000000 as I wanted.

Related

u-boot hard fault error after ram initialization

I have ported U-boot on my Waveshare coreH7 stm32h743 board. I have used stm32h743-disco files and device trees as a template for porting. my onboard SDRAM is IS42S16400J that is 8MBytes. I have calculate the parameters of my sdram and I put them in my board device tree file as shown as below:
/*
* Memory configuration from sdram datasheet IS42S32800G-6BLI
* firsct bank is bank#0
* second bank is bank#1
*/
bank2: bank#1 {
st,sdram-control = /bits/ 8 <NO_COL_8
NO_ROW_12
MWIDTH_16
BANKS_4
CAS_3
SDCLK_2
RD_BURST_EN
RD_PIPE_DL_0>;
st,sdram-timing = /bits/ 8 <TMRD_1
TXSR_1
TRAS_1
TRC_6
TRP_2
TWR_1
TRCD_1>;
st,sdram-refcount = <300>;
};
also, I have been configured the rcc values to feed the dram with 100MHz.
but when uboot starts initialization, it goes to hard fault interrupt.
this is the log:
lib/fdtdec.c:fdtdec_setup_mem_size_base_fdt() fdtdec_setup_mem_size_base_fdt: Initial DRAM size 2000000
include/initcall.h:initcall_run_list() initcall: 08008a89
common/board_f.c:setup_dest_addr() Monitor len: 00039F80
common/board_f.c:setup_dest_addr() Ram size: 02000000
common/board_f.c:setup_dest_addr() Ram top: D2000000
include/initcall.h:initcall_run_list() initcall: 08008665
include/initcall.h:initcall_run_list() initcall: 0800117d
arch/arm/lib/cache.c:arm_reserve_mmu() TLB table from d1ff0000 to d1ff4000
include/initcall.h:initcall_run_list() initcall: 080088c3
include/initcall.h:initcall_run_list() initcall: 080088c7
include/initcall.h:initcall_run_list() initcall: 080086b1
common/board_f.c:reserve_uboot() Reserving 231k for U-Boot at: d1fb6000
include/initcall.h:initcall_run_list() initcall: 080088ed
common/board_f.c:reserve_malloc() Reserving 1032k for malloc() at: d1eb4000
include/initcall.h:initcall_run_list() initcall: 08008821
Hard fault
pc : 0800087e lr : 00000000 xPSR : 21000000
r12 : d1eb3ff0 r3 : 00000000 r2 : 00000010
r1 : 00000000 r0 : d1eb3fb0
Resetting CPU ...
what is the problem? is ram initialized unsuccessfully? why? maybe wrong parameters? how Can I know that ram is initialized successfully?
this is normal info log of u-boot:
U-Boot 2020.07-00610-g610e1487c8-dirty (Aug 04 2020 - 00:34:13 +0430)
Model: Waveshare STM32H743i-Coreh7 board
DRAM: Hard fault
pc : 0800087e lr : 00000000 xPSR : 21000000
r12 : d1eb3ff0 r3 : 00000000 r2 : 00000010
r1 : 00000000 r0 : d1eb3fb0
Resetting CPU ...
your ram size is wrong. as you mentioned, the actual ram size is 8MB that is 0x7A1200 in hexadecimal number. but in U-boot log is "Initial DRAM size 2000000". you should change it in the device tree of your board.
memory {
device_type = "memory";
reg = <0xd0000000 0x7A1200>;
};

#interrupt-cells is 2 but interrupts is a 3-tuple

I was looking at the device tree for the Beagle Bone Black and started with am57xx-beagle-x15.dts. Drilling down into dra7.dtsi I found gpio1:
gpio1: gpio#4ae10000 {
compatible = "ti,omap4-gpio";
reg = <0x4ae10000 0x200>;
interrupts = <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>;
ti,hwmods = "gpio1";
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
#interrupt-cells = <2>;
};
I had read that #interrupt-cells gave the number of u32s or cells that one would expect in an item in the the interrupts list. But when I look at interrupts I see a 3-tuple: <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>. Would love to know, why does this contain 3 cells and not 2?
It's a very late answer but add one so that someone could get help.
I can't find the exact dts file from my current linux 5.15 source. But the interrupt-parent of the node should have required 3 cells for the interrupts property. For gic, normally it requires 3 values - {interrupt type, interrupt number, flag}. It's in the device binding document of gic (Documentation/devicetree/bindings/interrupt-controller/arm,gic.yaml in linux 5.15)
"#interrupt-cells":
const: 3
description: |
The 1st cell is the interrupt type; 0 for SPI interrupts, 1 for PPI
interrupts.
The 2nd cell contains the interrupt number for the interrupt type.
SPI interrupts are in the range [0-987]. PPI interrupts are in the
range [0-15].
The 3rd cell is the flags, encoded as follows:
bits[3:0] trigger type and level flags.
1 = low-to-high edge triggered
2 = high-to-low edge triggered (invalid for SPIs)
4 = active high level-sensitive
8 = active low level-sensitive (invalid for SPIs).
bits[15:8] PPI interrupt cpu mask. Each bit corresponds to each of
the 8 possible cpus attached to the GIC. A bit set to '1' indicated
the interrupt is wired to that CPU. Only valid for PPI interrupts.
Also note that the configurability of PPI interrupts is IMPLEMENTATION
DEFINED and as such not guaranteed to be present (most SoC available
in 2014 seem to ignore the setting of this flag and use the hardware
default value).

How to reserve Linux's memory for a single driver

I am running a secure OS in the ARM Trustzone, along side Linux which is running in the "normal" world. Since some part of the RAM is protected by hardware from access by the normal world (using a TZASC), I want to prevent Linux to try to access it otherwise it will crash. I also need a shared buffer between the two world, allocated by Linux. To this purpose, I want to reserve two memory regions, using this document.
I have two kind of areas:
the first one is a static one and I don't want to ever touch it
the second one is a dynamic one and I only want a single driver to be able to modify its content.
Here is an extract of my device tree:
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
/* global autoconfigured region for contiguous allocations */
linux,cma {
compatible = "shared-dma-pool";
reusable;
reg = <0 0xa0000000 0 0x14000000>;
linux,cma-default;
};
/* First range, static, that I don't want anyone to touch */
reserved_static: mymem#0xc0000000 {
compatible = "mymem,reserved-memory";
reg = <0 0xc0000000 0 0x08000000>;
no-map;
};
/* Second range, limited to a single driver */
reserved_dynamic: shared {
compatible = "mymem,memory-shared";
size = <0 0x08000000>;
alignment = <0 0x200000>;
alloc-ranges = <0 0xc8000000 0 0x38000000>;
};
};
mydev {
compatible = "mydev,mydev";
memory-region = <&reserved_dynamic>;
};
I can access the device tree information from my driver code, and I would like to prevent any access to the static memory range (I think I got that one covered with the no-map attribute), and also that only my driver is able to access the dynamic one.
Is this possible, and how can I achieve it ?

Understanding the Device Tree mechanism

Was reading the Device Tree Usage and reached to the section describing the ranges key attribute for a node.
external-bus {
#address-cells = <2>
#size-cells = <1>;
ranges = <0 0 0x10100000 0x10000 // Chipselect 1, Ethernet
1 0 0x10160000 0x10000 // Chipselect 2, i2c controller
2 0 0x30000000 0x1000000>; // Chipselect 3, NOR Flash
ethernet#0,0 {
compatible = "smc,smc91c111";
reg = <0 0 0x1000>;
interrupts = < 5 2 >;
};
i2c#1,0 {
compatible = "acme,a1234-i2c-bus";
#address-cells = <1>;
#size-cells = <0>;
reg = <1 0 0x1000>;
interrupts = < 6 2 >;
rtc#58 {
compatible = "maxim,ds1338";
reg = <58>;
interrupts = < 7 3 >;
};
};
flash#2,0 {
compatible = "samsung,k8f1315ebm", "cfi-flash";
reg = <2 0 0x4000000>;
};
};
What is the difference between ranges and reg ?
What are the dimensions for the ranges, how the parser figure out what is written in it?
One missing part I didn't understand yet? Can't include .h files instead of hard-coding values in a the .dts file?
The "range" property maps one or more addresses (the second number from the left of the range) in the current node, the "external bus" node, to addresses in the parent node (probably the CPU) address space (the third number in the range). The fourth number is the length of the range. Buses can have their own idea of addresses on their external side to which peripherals are attached, so the drivers that manage the peripherals on the bus will need to know these ranges in order to read to or write from the the devices.
The "reg" property indicates the address at which a device resides in the address range of the node (the "external bus" in this case) in which the device is defined. So in this case, flash#2,0 resides at address 0 in the external bus range, and extends to address 0x04000000. This corresponds to address range 0x30000000 to 0x34000000 in the parent (CPU) address space.
I am assuming that the length specifier of the third range, 2 0 0x30000000 0x1000000>; // Chipselect 3, NOR Flash should in fact be 0x04000000 rather than 0x1000000.
For this perticular example, Jonathan Ben-Avraham's explanation is correct. But its good to understand the detailed structure of ranges property in device tree.
ranges is a list of address translations.
Each entry in the ranges table is a tuple containing the child
address, the parent address, and the size of the region in the child
address space.
like
ranges = < Child1Address ParentAddressForChild1 sizeofchild1
Child2Address ParentAddressForChild2 sizeofchild2
Child3Address ParentAddressForChild3 sizeofchild3
>;
The size of each field is determined as below
For taking the child Address size check #address-cells value of child node.
For taling the the parent address size check #address-cells value of its parent node,
For taking the length of size check #size-cells value of child node.
Exampe1: Mentioned in as Question
#address-cells = <1>;
#size-cells = <1>;
external-bus {
#address-cells = <2>;
#size-cells = <1>;
ranges = <0 0 0x10100000 0x10000 // Chipselect 1, Ethernet
1 0 0x10160000 0x10000 // Chipselect 2, i2c controller
2 0 0x30000000 0x1000000>; // Chipselect 3, NOR Flash
Let's decode first entry.
Child address-cells size is 2 so first two entry mention child address. (This address is specific to local child addressing only)
Further how to decode these 2 entry are device specific. Device driver should have documentation for these)
parent address-cells size is 1 so next entry is of parent address for that child.
child size-cells is 1 so next entry is of child's range (wrt to parent address.)
Exampe2: PCI device entry
#address-cells = <1>;
#size-cells = <1>;
pci#0x10180000 {
compatible = "arm,versatile-pci-hostbridge", "pci";
reg = <0x10180000 0x1000>;
interrupts = <8 0>;
bus-range = <0 0>;
#address-cells = <3>
#size-cells = <2>;
ranges = <0x42000000 0 0x80000000 0x80000000 0 0x20000000
0x02000000 0 0xa0000000 0xa0000000 0 0x10000000
0x01000000 0 0x00000000 0xb0000000 0 0x01000000>;
Here
0x42000000 0 0x80000000 is the address of child1. How to decode these 3 entry is mentioned in PCI driver documentation.
0x80000000 is the parent address. Parent node is cpu so from cpu this address is used to talk to this devide.
0 0x20000000 is the size of this device in parent address space. (0 to 512MB of address)

Linux device driver to allow an FPGA to DMA directly to CPU RAM

I'm writing a linux device driver to allow an FPGA (currently connected to the PC via PCI express) to DMA data directly into CPU RAM. This needs to happen without any interaction and user space needs to have access to the data. Some details:
- Running 64 bit Fedora 14
- System has 8GB of RAM
- The FPGA (Cyclone IV) is on a PCIe card
In an attempt to accomplish this I performed the following:
- Reserved the upper 2GB of RAM in grub with memmap 6GB$2GB (will not boot is I add mem=2GB). I can see that the upper 2GB of RAM is reserved in /proc/meminfo
- Mapped BAR0 to allow reading and writing to FPGA registers (this works perfectly)
- Implemented an mmap function in my driver with remap_pfn_range()
- Use ioremap to get the virtual address of the buffer
- Added ioctl calls (for testing) to write data to the buffer
- Tested the mmap by making an ioctl call to write data into the buffer and verified the data was in the buffer from user space
The problem I'm facing is when the FPGA starts to DMA data to the buffer address I provide. I constantly get PTE errors (from DMAR:) or with the code below I get the following error:
DMAR: [DMA Write] Request device [01:00.0] fault addr 186dc5000
DMAR: [fault reason 01] Present bit in root entry is clear
DRHD: handling fault status reg 3
The address in the first line increments by 0x1000 each time based on the DMA from the FPGA
Here's my init() code:
#define IMG_BUF_OFFSET 0x180000000UL // Location in RAM (6GB)
#define IMG_BUF_SIZE 0x80000000UL // Size of the Buffer (2GB)
#define pci_dma_h(addr) ((addr >> 16) >> 16)
#define pci_dma_l(addr) (addr & 0xffffffffUL)
if((pdev = pci_get_device(FPGA_VEN_ID, FPGA_DEV_ID, NULL)))
{
printk("FPGA Found on the PCIe Bus\n");
// Enable the device
if(pci_enable_device(pdev))
{
printk("Failed to enable PCI device\n");
return(-1);
}
// Enable bus master
pci_set_master(pdev);
pci_read_config_word(pdev, PCI_VENDOR_ID, &id);
printk("Vendor id: %x\n", id);
pci_read_config_word(pdev, PCI_DEVICE_ID, &id);
printk("Device id: %x\n", id);
pci_read_config_word(pdev, PCI_STATUS, &id);
printk("Device Status: %x\n", id);
pci_read_config_dword(pdev, PCI_COMMAND, &temp);
printk("Command Register : : %x\n", temp);
printk("Resources Allocated :\n");
pci_read_config_dword(pdev, PCI_BASE_ADDRESS_0, &temp);
printk("BAR0 : %x\n", temp);
// Get the starting address of BAR0
bar0_ptr = (unsigned int*)pcim_iomap(pdev, 0, FPGA_CONFIG_SIZE);
if(!bar0_ptr)
{
printk("Error mapping Bar0\n");
return -1;
}
printk("Remapped BAR0\n");
// Set DMA Masking
if(!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)))
{
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
printk("Device setup for 64bit DMA\n");
}
else if(!pci_set_dma_mask(pdev, DMA_BIT_MASK(32)))
{
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
printk("Device setup for 32bit DMA\n");
}
else
{
printk(KERN_WARNING"No suitable DMA available.\n");
return -1;
}
// Get a pointer to reserved lower RAM in kernel address space (virtual address)
virt_addr = ioremap(IMG_BUF_OFFSET, IMG_BUF_SIZE);
kernel_image_buffer_ptr = (unsigned char*)virt_addr;
memset(kernel_image_buffer_ptr, 0, IMG_BUF_SIZE);
printk("Remapped image buffer: 0x%p\n", (void*)virt_addr);
}
Here's my mmap code:
unsigned long image_buffer;
unsigned int low;
unsigned int high;
if(remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
{
return(-EAGAIN);
}
image_buffer = (vma->vm_pgoff << PAGE_SHIFT);
if(0 > check_mem_region(IMG_BUF_OFFSET, IMG_BUF_SIZE))
{
printk("Failed to check region...memory in use\n");
return -1;
}
request_mem_region(IMG_BUF_OFFSET, IMG_BUF_SIZE, DRV_NAME);
// Get the bus address from the virtual address above
//dma_page = virt_to_page(addr);
//dma_offset = ((unsigned long)addr & ~PAGE_MASK);
//dma_addr = pci_map_page(pdev, dma_page, dma_offset, IMG_BUF_SIZE, PCI_DMA_FROMDEVICE);
//dma_addr = pci_map_single(pdev, image_buffer, IMG_BUF_SIZE, PCI_DMA_FROMDEVICE);
//dma_addr = IMG_BUF_OFFSET;
//printk("DMA Address: 0x%p\n", (void*)dma_addr);
// Write start or image buffer address to the FPGA
low = pci_dma_l(image_buffer);
low &= 0xfffffffc;
high = pci_dma_h(image_buffer);
if(high != 0)
low |= 0x00000001;
*(bar0_ptr + (17024/4)) = 0;
//printk("DMA Address LOW : 0x%x\n", cpu_to_le32(low));
//printk("DMA Address HIGH: 0x%x\n", cpu_to_le32(high));
*(bar0_ptr + (4096/4)) = cpu_to_le32(low); //2147483649;
*(bar0_ptr + (4100/4)) = cpu_to_le32(high);
*(bar0_ptr + (17052/4)) = cpu_to_le32(low & 0xfffffffe);//2147483648;
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 4096, *(bar0_ptr + (4096/4)));
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 4100, *(bar0_ptr + (4100/4)));
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 17052, *(bar0_ptr + (17052/4)));
return(0);
Thank you for any help you can provide.
Do you control the RTL code that writes the TLP packets yourself, or can you name the DMA engine and PCIe BFM (bus functional model) you are using? What do your packets look like in the simulator? Most decent BFM should trap this rather than let you find it post-deploy with a PCIe hardware capture system.
To target the upper 2GB of RAM you will need to be sending 2DW (64-bit) addresses from the device. Are the bits in your Fmt/Type set to do this? The faulting address looks like a masked 32-bit bus address, so something at this level is likely incorrect. Also bear in mind that because PCIe is big-endian take care when writing the target addresses to the PCIe device endpoint. You might have the lower bytes of the target address dropping into the payload if Fmt is incorrect - again a decent BFM should spot the resulting packet length mismatch.
If you have a recent motherboard/modern CPU, the PCIe endpoint should do PCIe AER (advanced error reporting), so if running a recent Centos/RHEL 6.3 you should get a dmesg report of endpoint faults. This is very useful as the report capture the first handful of DW's of the packet to special capture registers, so you can review the TLP as received.
In your kernel driver, I see you setup the DMA mask, that is not sufficient as you have not programmed the mmu to allow writes to the pages from the device. Look at the implementation of pci_alloc_consistent() to see what else you should be calling to achieve this.
If you are still looking for a reason, then it goes like this:
Your kernel has DMA_REMAPPING flags enabled by default, thus IOMMU is throwing the above error, as IOMMU context/domain entries are not programmed for your device.
You can try using intel_iommu=off in the kernel command line or putting IOMMU in bypass mode for your device.
Regards,
Samir

Resources