I want to make a kernel module that read the DRAM counters to get the number of data read from DRAM (https://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel).
In that page, they say
"The BAR is available (in PCI configuration space) at Bus 0; Device 0; Function 0; Offset 048H", and UNC_IMC_DRAM_DATA_READS, which I want to read, is on "BAR + 0x5050".
Does it mean that I can get the physical address of DRAM Counter by typing
sudo setpci 00:00:0 48.L
and then + 0x5050 to get the address where the UNC_IMC_DRAM_DATA_READS?
Actually,
sudo setpci 00:00:0 48.L
outputs
fed10001
, and I accessed 0xfed15051 with busybox.
sudo busybox devmem 0xfed15051
However, the two leftmost bit, I mean "00" in 0x00123456, are always zero.
What was wrong, and how can I get the physical address correctly with Bus, Device, Function, and Offset.
Thank you :)
The low bit is an enable bit and should be excluded from the address you use. See for example https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v6-vol-2-datasheet.pdf (section 3.12 page # 57) -- where it's documented as the MCHBAREN flag.
This document also provides detailed register descriptions of the same registers mentioned in that tech note -- starting at section 7.43 page # 202.
In general, accesses to PCI registers are pretty much always done on 32-bit (DWORD) boundaries. You'll almost never find a counter that overlaps 32-bit words.
Related
in the book 2017 " UNIX and Linux System Administration " i've read the article below :
Modern systems manage their device files automatically. However, a few rare corner
cases may still require you to create devices manually with the mknod command.
So here’s how to do it:
mknod filename type major minor
Here, filename is the device file to be created, type is c for a character device or b
for a block device, and major and minor are the major and minor device numbers.
If you are creating a device file that refers to a driver that’s already present in your
kernel, check the documentation for the driver to find the appropriate major and
minor device numbers.
where can i find this doc and how to find Major & Minor for a device driver ???
The command cat /proc/devices shows the character and block major device numbers in use by drivers in the currently running Linux kernel, but provides no information about minor device numbers.
There is a list of pre-assigned (reserved) device numbers in the Linux kernel user's and administrator's guide: Linux allocated devices (4.x+ version). (The same list also appears in "Documentation/admin-guide/devices.txt" in the Linux kernel sources.) The list shows how minor device numbers are interpreted for each pre-assigned character and block major device number.
Some major device numbers are reserved for local or experimental use, or for dynamic assignment:
60-63 char LOCAL/EXPERIMENTAL USE
60-63 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
120-127 char LOCAL/EXPERIMENTAL USE
120-127 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
234-254 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major number will
take numbers starting from 254 and downward.
240-254 block LOCAL/EXPERIMENTAL USE
Allocated for local/experimental use. For devices not
assigned official numbers, these ranges should be
used in order to avoid conflicting with future assignments.
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
once the 234-254 range is full.
Character device drivers that call alloc_chrdev_region() to register a range of character device numbers will be assigned an unused major device number from the dynamic range. The same is true for character device drivers that call __register_chrdev() with the first argument (major) set to 0.
Some external ("out-of-tree") Linux kernel modules have a module parameter to allow their default major device number to be specified at module load time. That is useful for drivers that do not create their "/dev" entries dynamically, but want some flexibility for the system administrator to choose a major device number when creating device files manually with mknod.
docs:
https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s02.html
https://tldp.org/LDP/tlk/dd/drivers.html
how to find the appropriate minor & major number for a device number:
ls -l /dev/
cat /proc/devices shows the same as lsblk
I am doing a project using DE1-SoC (FPGA + ARM cortex A9). You can see a part of the design (Qsys, platform designer) here
An on chip memory (RAM, image_memory) is being mastered by two different masters. One of the masters is well known h2f_lw_axi_master (provided by the Quartus Prime software to make the ARM and FPGA data exchange possible) and the other one zpc_1 is a custom master block that I designed.
The basic idea in this project is that after the FPGA is configured, one should be able to write data to the on chip memory and zpc_1 reads the content of the memory and works on it.
The length of each word is 512 bits (64bytes) and there are 1200 words (so address assigned starts from 0x0002_0000 and ends at 0x0003_2bff, enough space for 76800 = (512 * 1200) /8 bytes. The hps uses uint512_t (from boost library of c++) type data to write and zpc_1 has readdata width of 512 bits. The addresses are assigned with respect to h2f_lw_axi_master.
I have two questions related to this system.
1. Should the address for reading data in zpc_1 HDL code start from 0x20000 offset and increment by 0x40 (64) at each cycle to read the data word by word? (or any other method)
2. The zpc_1 is being able to read the first word and continuously working according to the instructions in first word, what might be the reason?
If you need additional information to answer the question and/or question is not clear enough to understand, do not hesitate to ask about more information (comment).
The problem was when one of the masters was interacting with the slave, the slave did not properly allow the other one (in the protocol there is a signal called 'waitrequest', I was not using that signal properly, when I used it that signal properly, the slave was always sending waitrequest which helped me to debug the problem as well).
Tried dual port RAM as shown here and modified the component by properly using the 'waitrequest' signal and everything started working properly.
Now the answers:
Q1: Should the address for reading data in zpc_1 HDL code start from 0x20000 offset and increment by 0x40 (64) at each cycle to read the data word by word? (or any other method)
A1: You can define another address offset with respect to the custom master component as you want, and start reading from that address offset (I used 0x00000000 as in the picture ). The address should increment by 0x40 (64) at each cycle to read the data word by word as #Unn commented.
Q2: The zpc_1 is being able to read the first word and continuously working according to the instructions in first word, what might be the reason?
A2: The reason is the slave (Single port RAM) was not able to respond correctly to both masters at the same time through single port, replacing it with dual port RAM solves the problem.
I need to reserve a large buffer of physically contiguous RAM from the kernel and be able to gaurantee that the buffer will always use a specific, hard-coded physical address. This buffer should remain reserved for the kernel's entire lifetime. I have written a chardev driver as an interface for accessing this buffer in userspace. My platform is an embedded system with ARMv7 architecture running a 2.6 Linux kernel.
Chapter 15 of Linux Device Drivers, Third Edition has the following to say on the topic (page 443):
Reserving the top of RAM is accomplished by passing a mem= argument to the kernel at boot time. For example, if you have 256 MB, the argument mem=255M keeps the kernel from using the top megabyte. Your module could later use the following code to gain access to such memory:
dmabuf = ioremap (0xFF00000 /* 255M */, 0x100000 /* 1M */);
I've done that plus a couple of other things:
I'm using the memmap bootarg in addition to the mem one. The kernel boot parameters documentation suggests always using memmap whenever you use mem to avoid address collisions.
I used request_mem_region before calling ioremap and, of course, I check that it succeeds before moving ahead.
This is what the system looks like after I've done all that:
# cat /proc/cmdline
root=/dev/mtdblock2 console=ttyS0,115200 init=/sbin/preinit earlyprintk debug mem=255M memmap=1M$255M
# cat /proc/iomem
08000000-0fffffff : PCIe Outbound Window, Port 0
08000000-082fffff : PCI Bus 0001:01
08000000-081fffff : 0001:01:00.0
08200000-08207fff : 0001:01:00.0
18000300-18000307 : serial
18000400-18000407 : serial
1800c000-1800cfff : dmu_regs
18012000-18012fff : pcie0
18013000-18013fff : pcie1
18014000-18014fff : pcie2
19000000-19000fff : cru_regs
1e000000-1fffffff : norflash
40000000-47ffffff : PCIe Outbound Window, Port 1
40000000-403fffff : PCI Bus 0002:01
40000000-403fffff : 0002:01:00.0
40400000-409fffff : PCI Bus 0002:01
40400000-407fffff : 0002:01:00.0
40800000-40807fff : 0002:01:00.0
80000000-8fefffff : System RAM
80052000-8045dfff : Kernel text
80478000-80500143 : Kernel data
8ff00000-8fffffff : foo
Everything so far looks good, and my driver is working perfectly. I'm able to read and write directly to the specific physical address I've chosen.
However, during bootup, a big scary warning (™) was triggered:
BUG: Your driver calls ioremap() on system memory. This leads
to architecturally unpredictable behaviour on ARMv6+, and ioremap()
will fail in the next kernel release. Please fix your driver.
------------[ cut here ]------------
WARNING: at arch/arm/mm/ioremap.c:211 __arm_ioremap_pfn_caller+0x8c/0x144()
Modules linked in:
[] (unwind_backtrace+0x0/0xf8) from [] (warn_slowpath_common+0x4c/0x64)
[] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24)
[] (warn_slowpath_null+0x1c/0x24) from [] (__arm_ioremap_pfn_caller+0x8c/0x144)
[] (__arm_ioremap_pfn_caller+0x8c/0x144) from [] (__arm_ioremap_caller+0x50/0x58)
[] (__arm_ioremap_caller+0x50/0x58) from [] (foo_init+0x204/0x2b0)
[] (foo_init+0x204/0x2b0) from [] (do_one_initcall+0x30/0x19c)
[] (do_one_initcall+0x30/0x19c) from [] (kernel_init+0x154/0x218)
[] (kernel_init+0x154/0x218) from [] (kernel_thread_exit+0x0/0x8)
---[ end trace 1a4cab5dbc05c3e7 ]---
Triggered from: arc/arm/mm/ioremap.c
/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
*/
if (pfn_valid(pfn)) {
printk(KERN_WARNING "BUG: Your driver calls ioremap() on system memory. This leads\n"
KERN_WARNING "to architecturally unpredictable behaviour on ARMv6+, and ioremap()\n"
KERN_WARNING "will fail in the next kernel release. Please fix your driver.\n");
WARN_ON(1);
}
What problems, exactly, could this cause? Can they be mitigated? What are my alternatives?
So I've done exactly that, and it's working.
Provide the kernel command line (e.g. /proc/cmdline) and the resulting memory map (i.e. /proc/iomem) to verify this.
What problems, exactly, could this cause?
The problem with using ioremap() on system memory is that you end up assigning conflicting attributes to the memory which causes "unpredictable" behavior.
See the article "ARM's multiply-mapped memory mess", which provides a history to the warning that you are triggering.
The ARM kernel maps RAM as normal memory with writeback caching; it's also marked non-shared on uniprocessor systems. The ioremap() system call, used to map I/O memory for CPU use, is different: that memory is mapped as device memory, uncached, and, maybe, shared. These different mappings give the expected behavior for both types of memory. Where things get tricky is when somebody calls ioremap() to create a new mapping for system RAM.
The problem with these multiple mappings is that they will have differing attributes. As of version 6 of the ARM architecture, the specified behavior in that situation is "unpredictable."
Note that "system memory" is the RAM that is managed by the kernel.
The fact that you trigger the warning indicates that your code is generating multiple mappings for a region of memory.
Can they be mitigated?
You have to ensure that the RAM you want to ioremap() is not "system memory", i.e. managed by the kernel.
See also this answer.
ADDENDUM
This warning that concerns you is the result of pfn_valid(pfn) returning TRUE rather than FALSE.
Based on the Linux cross-reference link that you provided for version 2.6.37,
pfn_valid() is simply returning the result of
memblock_is_memory(pfn << PAGE_SHIFT);
which in turn is simply returning the result of
memblock_search(&memblock.memory, addr) != -1;
I suggest that the kernel code be hacked so that the conflict is revealed.
Before the call to ioremap(), assign TRUE to the global variable memblock_debug.
The following patch should display the salient information about the memory conflict.
(The memblock list is ordered by base-address, so memblock_search() performs a binary search on this list, hence the use of mid as the index.)
static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
{
unsigned int left = 0, right = type->cnt;
do {
unsigned int mid = (right + left) / 2;
if (addr < type->regions[mid].base)
right = mid;
else if (addr >= (type->regions[mid].base +
type->regions[mid].size))
left = mid + 1;
- else
+ else {
+ if (memblock_debug)
+ pr_info("MATCH for 0x%x: m=0x%x b=0x%x s=0x%x\n",
+ addr, mid,
+ type->regions[mid].base,
+ type->regions[mid].size);
return mid;
+ }
} while (left < right);
return -1;
}
If you want to see all the memory blocks, then call memblock_dump_all() with the variable memblock_debug is TRUE.
[Interesting that this is essentially a programming question, yet we haven't seen any of your code.]
ADDENDUM 2
Since you're probably using ATAGs (instead of Device Tree), and you want to dedicate a memory region, fix up the ATAG_MEM to reflect this smaller size of physical memory.
Assuming you have made zero changes to your boot code, the ATAG_MEM is still specifying the full RAM, so perhaps this could be the source of the system memory conflict that causes the warning.
See this answer about ATAGs and this related answer.
I'm working on enhancing the stock ahci driver provided in Linux in order to perform some needed tasks. I'm at the point of attempting to issue commands to the AHCI HBA for the hard drive to process. However, whenever I do so, my system locks up and reboots. Trying to explain the process of issuing a command to an AHCI drive is far to much for this question. If needed, reference this link for the full discussion (the process is rather defined all over because there are several pieces, however, ch 4 has the data structures necessary).
Essentially, one writes the appropriate structures into memory regions defined by either the BIOS or the OS. The first memory region I should write to is the Command List Base Address contained in the register PxCLB (and PxCLBU if 64-bit addressing applies). My system is 64 bits and so I'm trying to getting both 32-bit registers. My code is essentially this:
void __iomem * pbase = ahci_port_base(ap);
u32 __iomem *temp = (u32*)(pbase + PORT_LST_ADDR);
struct ahci_cmd_hdr *cmd_hdr = NULL;
cmd_hdr = (struct ahci_cmd_hdr*)(u64)
((u64)(*(temp + PORT_LST_ADDR_HI)) << 32 | *temp);
pr_info("%s:%d cmd_list is %p\n", __func__, __LINE__, cmd_hdr);
// problems with this next line, makes the system reboot
//pr_info("%s:%d cl[0]:0x%08x\n", __func__, __LINE__, cmd_hdr->opts);
The function ahci_port_base() is found in the ahci driver (at least it is for CentOS 6.x). Basically, it returns the proper address for that port in the AHCI memory region. PORT_LST_ADDR and PORT_LST_ADDR_HI are both macros defined in that driver. The address that I get after getting both the high and low addresses is usually something like 0x0000000037900000. Is this memory address in a space that I cannot simply dereference it?
I'm hitting my head against the wall at this point because this link shows that accessing it in this manner is essentially how it's done.
The address that I get after getting both the high and low addresses
is usually something like 0x0000000037900000. Is this memory address
in a space that I cannot simply dereference it?
Yes, you are correct - that's a bus address, and you can't just dereference it because paging is enabled. (You shouldn't be just dereferencing the iomapped addresses either - you should be using readl() / writel() for those, but the breakage here is more subtle).
It looks like the right way to access the ahci_cmd_hdr in that driver is:
struct ahci_port_priv *pp = ap->private_data;
cmd_hdr = pp->cmd_slot;
I'm porting u-boot to P2040 based board these days.
As u-boot/arch/powerpc/mpc85xx/start.s commented:
The processor starts at 0xffff_fffc and the code is first executed in the last 4K page in flash/rom.
In u-boot/arch/powerpc/mpc85xx/resetvec.S:
.section .resetvec,"ax"
b _start_e500
And in u-boot.lds linker script:
.resetvec RESET_VECTOR_ADDRESS :
{
KEEP(*(.resetvec))
} :text = 0xffff
The reset vector is at 0xffff_fffc, which contains a jump instruction to _start_e500.
The E500MCRM Chapter 6.6 mentioned:
This default TLB entry translates the first instruction fetch out of reset(at effective address 0xffff_fffc).
This instruction should be a branch to the beginning of this page.
So, if I configure the HCW to let powerpc boot from Nor Flash, why should I suppose that the Nor Flash's
last 4K is mapped to 0xffff_f000~0xffff_ffff? Since there're no LAW setup yet and the default BR0/OR0 of Local Bus
does not match that range. I’m confused about how Nor Flash be access at the very beginning of core startup.
Another question is:
Does P2040 always have MMU enabled so as to translate effective address to real address even at u-boot stage?
If so, beside accessing to CCSRBAR, all other memory access should have TLB entry setup first.
Best Regards,
Hook Guo