I know that the ARM PMU is partially implemented, thanks to the gem5 source code and some publications.
I have a binary which uses perf_event to access the PMU on a Linux-based OS, under an ARM processor. Could it use perf_event inside a gem5 full-system simulation with a Linux kernel, under the ARM ISA?
So far, I haven't found the right way to do it. If someone knows, I will be very grateful!
As of September 2020, gem5 needs to be patched in order to use the ARM PMU.
Edit: As of November 2020, gem5 is now patched and it will be included in the next release. Thanks to the developers!
How to patch gem5
This is not a clean patch (very straightforward), and it is more intended to understand how it works. Nonetheless, this is the patch to apply with git apply from the gem5 source repository:
diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
index 2641ec3fb..3d85c1b75 100644
--- i/src/arch/arm/ArmISA.py
+++ w/src/arch/arm/ArmISA.py
## -36,6 +36,7 ##
from m5.params import *
from m5.proxy import *
+from m5.SimObject import SimObject
from m5.objects.ArmPMU import ArmPMU
from m5.objects.ArmSystem import SveVectorLength
from m5.objects.BaseISA import BaseISA
## -49,6 +50,8 ## class ArmISA(BaseISA):
cxx_class = 'ArmISA::ISA'
cxx_header = "arch/arm/isa.hh"
+ generateDeviceTree = SimObject.recurseDeviceTree
+
system = Param.System(Parent.any, "System this ISA object belongs to")
pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
index 047e908b3..58553fbf9 100644
--- i/src/arch/arm/ArmPMU.py
+++ w/src/arch/arm/ArmPMU.py
## -40,6 +40,7 ## from m5.params import *
from m5.params import isNullPointer
from m5.proxy import *
from m5.objects.Gic import ArmInterruptPin
+from m5.util.fdthelper import *
class ProbeEvent(object):
def __init__(self, pmu, _eventId, obj, *listOfNames):
## -76,6 +77,17 ## class ArmPMU(SimObject):
_events = None
+ def generateDeviceTree(self, state):
+ node = FdtNode("pmu")
+ node.appendCompatible("arm,armv8-pmuv3")
+ # gem5 uses GIC controller interrupt notation, where PPI interrupts
+ # start to 16. However, the Linux kernel start from 0, and used a tag
+ # (set to 1) to indicate the PPI interrupt type.
+ node.append(FdtPropertyWords("interrupts", [
+ 1, int(self.interrupt.num) - 16, 0xf04
+ ]))
+ yield node
+
def addEvent(self, newObject):
if not (isinstance(newObject, ProbeEvent)
or isinstance(newObject, SoftwareIncrement)):
diff --git i/src/cpu/BaseCPU.py w/src/cpu/BaseCPU.py
index ab70d1d7f..66a49a038 100644
--- i/src/cpu/BaseCPU.py
+++ w/src/cpu/BaseCPU.py
## -302,6 +302,11 ## class BaseCPU(ClockedObject):
node.appendPhandle(phandle_key)
cpus_node.append(node)
+ # Generate nodes from the BaseCPU children (and don't add them as
+ # subnode). Please note: this is mainly needed for the ISA class.
+ for child_node in self.recurseDeviceTree(state):
+ yield child_node
+
yield cpus_node
def __init__(self, **kwargs):
What the patch resolves
The Linux kernel uses a Device Tree Blob (DTB), which is a regular file, to declare the hardware on which the kernel is running. This is used to make the kernel portable between different architecture without a recompilation for each hardware change. The DTB follows the Device Tree Reference, and is compiled from a Device Tree Source (DTS) file, a regular text file. You can learn more here and here.
The problem was that the PMU is supposed to be declared to the Linux kernel via the DTB. You can learn more here and here. In a simulated system, because the system is specified by the user, gem5 has to generate a DTB itself to pass to the kernel, so the latter can recognize the simulated hardware. However, the problem is that gem5 does not generate the DTB entry for our PMU.
What the patch does
The patch adds an entry to the ISA and the CPU files to enable DTB generation recursion up to find the PMU. The hierarchy is the following: CPU => ISA => PMU. Then, it adds the generation function in the PMU to generate a unique DTB entry to declare the PMU, with the proper notation for the interrupt declaration in the kernel.
After running a simulation with our patch, we could see the DTS from the DTB like this:
cd m5out
# Decompile the DTB to get the DTS.
dtc -I dtb -O dts system.dtb > system.dts
# Find the PMU entry.
head system.dts
dtc is the Device Tree Compiler, installed with sudo apt-get install device-tree-compiler. We end up with this pmu DTB entry, under the root node (/):
/dts-v1/;
/ {
#address-cells = <0x02>;
#size-cells = <0x02>;
interrupt-parent = <0x05>;
compatible = "arm,vexpress";
model = "V2P-CA15";
arm,hbi = <0x00>;
arm,vexpress,site = <0x0f>;
memory#80000000 {
device_type = "memory";
reg = <0x00 0x80000000 0x01 0x00>;
};
pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0x01 0x04 0xf04>;
};
cpus {
#address-cells = <0x01>;
#size-cells = <0x00>;
cpu#0 {
device_type = "cpu";
compatible = "gem5,arm-cpu";
[...]
In the line interrupts = <0x01 0x04 0xf04>;, 0x01 is used to indicate that the number 0x04 is the number of a PPI interrupt (the one declared with number 20 in gem5, the difference of 16 is explained inside the patch code). The 0xf04 corresponds to a flag (0x4) indicating that it is a "active high level-sensitive" interrupt and a bit mask (0xf) indicating that the interrupts should be wired to all PE attached to the GIC. You can learn more here.
If the patch works and your ArmPMU is declared properly, you should see this message at boot time:
[ 0.239967] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available
Context
I was not able to use the Performance Monitoring Unit (PMU) because of a gem5's unimplemented feature. The reference on the mailing list can be found here. After a personal patch, the PMU is accessible through perf_event. Fortunately, a similar patch will be released in the official gem5 release soon, could be seen here. The patch will be described in another answer, due to the number of link limitation inside one message.
How to use the PMU
C source code
This is a minimal working example of a C source code using perf_event, used to count the number of mispredicted branches by the branch predictor unit during a specific task:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/perf_event.h>
int main(int argc, char **argv) {
/* File descriptor used to read mispredicted branches counter. */
static int perf_fd_branch_miss;
/* Initialize our perf_event_attr, representing one counter to be read. */
static struct perf_event_attr attr_branch_miss;
attr_branch_miss.size = sizeof(attr_branch_miss);
attr_branch_miss.exclude_kernel = 1;
attr_branch_miss.exclude_hv = 1;
attr_branch_miss.exclude_callchain_kernel = 1;
/* On a real system, you can do like this: */
attr_branch_miss.type = PERF_TYPE_HARDWARE;
attr_branch_miss.config = PERF_COUNT_HW_BRANCH_MISSES;
/* On a gem5 system, you have to do like this: */
attr_branch_miss.type = PERF_TYPE_RAW;
attr_branch_miss.config = 0x10;
/* Open the file descriptor corresponding to this counter. The counter
should start at this moment. */
if ((perf_fd_branch_miss = syscall(__NR_perf_event_open, &attr_branch_miss, 0, -1, -1, 0)) == -1)
fprintf(stderr, "perf_event_open fail %d %d: %s\n", perf_fd_branch_miss, errno, strerror(errno));
/* Workload here, that means our specific task to profile. */
/* Get and close the performance counters. */
uint64_t counter_branch_miss = 0;
read(perf_fd_branch_miss, &counter_branch_miss, sizeof(counter_branch_miss));
close(perf_fd_branch_miss);
/* Display the result. */
printf("Number of mispredicted branches: %d\n", counter_branch_miss);
}
I will not enter into the details of how using perf_event, good resources are available here, here, here, here. However, just a few notes about the code above:
On real hardware, when using perf_event and common events (events that are available under a lot of architectures), it is recommended to use perf_event macros PERF_TYPE_HARDWARE as type and to use macros like PERF_COUNT_HW_BRANCH_MISSES for the number of mispredicted branches, PERF_COUNT_HW_CACHE_MISSES for the number of cache misses, and so on (see the manual page for a list). This is a best practice to have a portable code.
On a gem5 simulated system, currently (v20.0), a C source code have to use PERF_TYPE_RAW type and architectural event ID to identify an event. Here, 0x10 is the ID of the 0x0010, BR_MIS_PRED, Mispredicted or not predicted branch event, described in the ARMv8-A Reference Manual (here). In the manual, all events available in real hardware are described. However, they are not all implemented into gem5. To see the list of implemented event inside gem5, refer to the src/arch/arm/ArmPMU.py file. In the latter, the line self.addEvent(ProbeEvent(self,0x10, bpred, "Misses")) corresponds to the declaration of the counter described in the manual. This is not a normal behavior, hence gem5 should be patched to allow using PERF_TYPE_HARDWARE one day.
gem5 simulation script
This is not a entire MWE script (it would be too long!), only the needed portion to add inside a full-system script to use the PMU. We use an ArmSystem as a system, with the RealView platform.
For each ISA (we use an ARM ISA here) of each CPU (e.g., a DerivO3CPU) in our cluster (which is a SubSystem class), we add to it a PMU with a unique interrupt number and the already implemented architectural event. An example of this function could be found in configs/example/arm/devices.py.
To choose an interrupt number, pick a free PPI interrupt in the platform interrupt mapping. Here, we choose PPI n°20, according to the RealView interrupt map (src/dev/arm/RealView.py). Since PPIs interrupts are local per Processing Element (PE, corresponds to cores in our context), the interrupt number can be the same for all PE without any conflict. To know more about PPI interrupts, see the GIC guide from ARM here.
Here, we can see that the interrupt n°20 is not used by the system (from RealView.py):
Interrupts:
0- 15: Software generated interrupts (SGIs)
16- 31: On-chip private peripherals (PPIs)
25 : vgic
26 : generic_timer (hyp)
27 : generic_timer (virt)
28 : Reserved (Legacy FIQ)
We pass to addArchEvents our system components (dtb, itb, etc.) to link the PMU with them, thus the PMU will use the internal counters (called probes) of these components as exposed counters to the system.
for cpu in system.cpu_cluster.cpus:
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=20))
# Add the implemented architectural events of gem5. We can
# discover which events is implemented by looking at the file
# "ArmPMU.py".
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
icache=getattr(cpu, "icache", None),
dcache=getattr(cpu, "dcache", None),
l2cache=getattr(system.cpu_cluster, "l2", None))
Two quick additions to Pierre's awesome answers:
for fs.py as of gem5 937241101fae2cd0755c43c33bab2537b47596a2, all that is missing is to apply to fs.py as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/37978/1/configs/example/fs.py
for cpu in test_sys.cpu:
if buildEnv['TARGET_ISA'] in "arm":
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=20))
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.mmu.dtb, itb=cpu.mmu.itb,
icache=getattr(cpu, "icache", None),
dcache=getattr(cpu, "dcache", None),
l2cache=getattr(test_sys, "l2", None))
a C example can also be found in man perf_event_open
Related
If I compile the following program:
$ cat main.cpp && g++ main.cpp
#include <time.h>
int main() {
struct timespec ts;
return clock_gettime(CLOCK_MONOTONIC, &ts);
}
and then run it under strace in "standard" Kubuntu, I get this:
strace -tt --trace=clock_gettime ./a.out
17:58:40.395200 +++ exited with 0 +++
As you can see, there is no clock_gettime (full strace output is here).
On the other hand, if I run the same app in my custom built linux kernel under qemu, I get the following output:
strace -tt --trace=clock_gettime ./a.out
18:00:53.082115 clock_gettime(CLOCK_MONOTONIC, {tv_sec=101481, tv_nsec=107976517}) = 0
18:00:53.082331 +++ exited with 0 +++
Which is more expected - there is clock_gettime.
So, my questions are:
Why does strace ignore/omit clock_gettime if I run it in Kubuntu?
Why strace's behaviour differs depending on environment/kernel?
Answer to the first question
From vdso man
strace(1), seccomp(2), and the vDSO
When tracing systems calls with strace(1), symbols (system calls) that are exported by the vDSO will not appear in the trace output. Those system calls will likewise not be visible to seccomp(2) filters.
Answer to the second question:
In the vDSO, clock_gettimeofday and related functions are reliant on specific clock modes; see __arch_get_hw_counter.
If the clock mode is VCLOCK_TSC, the time is read without a syscall, using RDTSC; if it’s VCLOCK_PVCLOCK or VCLOCK_HVCLOCK, it’s read from a specific page to retrieve the information from the hypervisor. HPET doesn’t declare a clock mode, so it ends up with the default VCLOCK_NONE, and the vDSO issues a system call to retrieve the time.
And indeed:
In the default kernel (from Kubuntu):
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
Custom built kernel:
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
More info about various clock sources. In particular:
The documentation of Red Hat MRG version 2 states that TSC is the preferred clock source due to its much lower overhead, but it uses HPET as a fallback. A benchmark in that environment for 10 million event counts found that TSC took about 0.6 seconds, HPET took slightly over 12 seconds, and ACPI Power Management Timer took around 24 seconds.
It may be due to the fact that clock_gettime() is part of the optimized syscalls. Look at the vdso mechanism described in this answer.
Considering clock_gettime(), on some architecture (e.g. Linux on ARM v7 32 bits), only a subset of the available clock identifiers are supported in the VDSO implementation. For the others, there is a fallback into the actual system call. Here is the source code of the VDSO implementation of clock_gettime() in the Linux kernel for the ARM v7 (file arch/arm/vdso/vgettimeofday.c in the source tree):
notrace int __vdso_clock_gettime(clockid_t clkid, struct timespec *ts)
{
struct vdso_data *vdata;
int ret = -1;
vdata = __get_datapage();
switch (clkid) {
case CLOCK_REALTIME_COARSE:
ret = do_realtime_coarse(ts, vdata);
break;
case CLOCK_MONOTONIC_COARSE:
ret = do_monotonic_coarse(ts, vdata);
break;
case CLOCK_REALTIME:
ret = do_realtime(ts, vdata);
break;
case CLOCK_MONOTONIC:
ret = do_monotonic(ts, vdata);
break;
default:
break;
}
if (ret)
ret = clock_gettime_fallback(clkid, ts);
return ret;
}
The above source code shows a switch/case with the supported clock identifiers and in the default case, there is a fallback to the actual system call.
On such architecture, spying a software like systemd which uses clock_gettime() with CLOCK_MONOTONIC and CLOCK_BOOTTIME, strace only shows the calls with the latter identifier as it is not part of the supported cases in VDSO mode.
Cf. this link for reference
Context
I'm using an i.MX6 (IMXULL) application processor, and want to know in software when the power-off button has been pressed:
Luckily, the IMX6ULL reference manual explains that this should be possible:
Section 10.5: ONOFF Button
The chip supports the use of a button input signal to request main SoC power state changes (i.e. On or Off) from the PMU. The ONOFF logic inside of SNVS_LP allows for connecting directly to a PMIC or other voltage regulator device. The logic takes a button input signal and then outputs a pmic_en_b and set_pwr_off_irq signal. [...] The logic has two different modes of operation (Dumb and Smart mode).
The Dumb PMIC Mode uses pmic_en_b to issue a level signal for on and off. Dumb pmic mode has many different configuration options which include (debounce, off to on time, and max time out).
(Also available in condensed form here on page 18)
Attempt
Therefore, I have built a trivially simple kernel module to try and capture this interrupt:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/syscalls.h>
#include <linux/interrupt.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("John Doe <j.doe#acme.inc>");
// Forward declaration
irqreturn_t irq_handler (int, void *);
// Number of interrupt to capture
#define INTERRUPT_NO 36
static int __init pwr_ctl_init (void)
{
pr_err("init()\n");
return request_irq(INTERRUPT_NO, irq_handler, IRQF_SHARED, "onoff-button",
(void *)irq_handler);
}
static void __exit pwr_ctl_exit (void)
{
pr_err("exit()\n");
free_irq(INTERRUPT_NO, NULL);
}
irqreturn_t irq_handler (int irq, void *dev_irq)
{
pr_err("interrupt!\n");
return IRQ_HANDLED;
}
module_init(pwr_ctl_init);
module_exit(pwr_ctl_exit);
Problem
However, I cannot find any information about what the number of the interrupt is. When searching on the internet, all I get is this one NXP forum post:
ONOFF button doesn't interrupt
Which hints it should be 36. However, I have found that this isn't the case on my platform. When I check /proc/interrupts 36 is already occupied by 20b4000.ethernet. Because the application manual also mentions that it is generated by the SNVS low power system, I checked the device-tree and found the following information:
snvs_poweroff: snvs-poweroff {
compatible = "syscon-poweroff";
regmap = <&snvs>;
offset = <0x38>;
value = <0x60>;
mask = <0x60>;
status = "disabled";
};
snvs_pwrkey: snvs-powerkey {
compatible = "fsl,sec-v4.0-pwrkey";
regmap = <&snvs>;
interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
linux,keycode = <KEY_POWER>;
wakeup-source;
status = "disabled";
};
This information seems useful for knowing that SNVS is the interrupt controller, but not how to capture this set_pwr_off_irq signal.
Conclusion
How do I capture the ON/OFF interrupt supposedly generated by SNVS?
How do I determine the number of an interrupt from the device-tree (if applicable at all)
Am I misunderstanding something about how the ONOFF feature works? Is it possible to capture this from a kernel module at all?
Edit
This edit answers some user questions, and then goes into new information about the problem I have since discovered:
User Questions
Processor: The processor is an NXP i.MX 6UltraLite / 6ULL / 6ULZ ARM Cortex A7.
New Information
SNVS Driver: Using my build system kernel configuration, I have modified and verified that the snvs_pwrkey driver (see here) is enabled. My modification consists of adding a single kprint statement to the interrupt routine to see if the button trips it. This did not work
I have tried updating the driver to a newer version, which claims to support newer i.MX6 processors. This also did not work
I have tried to load the driver as a kernel module for easier debugging. This is not possible, as the kernel configuration requires this be enabled and I cannot remove it from being statically built into the kernel.
The answer is rather anticlimactic. In short, there was a device-tree overlay that was disabling my changes to snvs_pwrkey, even when I had enabled it. Once I located and removed the overlay, the driver (snvs_pwrkey.c) was working as expected.
As for the IRQ number, it turns out that the IRQ for the power button is 45 as interpreted through Linux. The interrupt is not configured for sharing, so my kernel module could not be loaded.
If you want to capture power button toggle events, I suggest modifying the driver to add some output, and then perhaps adding a udev rule to capture button presses. I will update my answer with an example ASAP.
Where can I check to know if the Linux kernel is running in non-secure (EL2) or secure (EL3) mode ?
How can I change this mode ?
I run on the ARMv8 64 bits.
Thanks in advance
Where can I check to know if the Linux kernel is running in non-secure (EL2) or secure (EL3) mode ?
I'm going to give a cheeky answer.
Adapt what is the current execution mode/exception level, etc? and hack your region of interest:
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..840f886d17b3 100644
--- a/init/main.c
+++ b/init/main.c
## -533,6 +533,10 ## asmlinkage __visible void __init start_kernel(void)
char *command_line;
char *after_dashes;
+ register u64 x0 __asm__ ("x0");
+ __asm__ ("mrs x0, CurrentEL;" : : : "%x0");
+ pr_info("EL = %llu\n", (unsigned long long)(x0 >> 2));
+
set_task_stack_end_magic(&init_task);
smp_setup_processor_id();
debug_objects_early_init();
Output at the start of boot.
EL = 1
The default Linux kernel v4.19 boot logs also tell us that by default:
CPU: All CPU(s) started at EL1
How can I change this mode?
Traditionally the kernel ran on EL1 only, and EL2 was left for a hypervisor like Xen, and EL3 for a bootloader like ARM Trusted Firmware.
However, with the introduction of the ARMv8.1 VHE extension, EL2 kernel became efficient and was added into Linux mainline: What are Ring 0 and Ring 3 in the context of operating systems? Not sure which config turns it on though, have a look.
EL3 I don't think there's anything mainline. There are some people trying to adapt kernel code to be the booloader as well, have a look e.g. at: https://github.com/kexecboot/kexecboot
I'm working with the support SMP kernel: Snapgear 2.6.21.
I have created 4 threads in my c application, and I am trying to set thread 1 to run on CPU1, thread2 on CPU 2, etc.
However, the compiler sparc-linux-gcc does not recognize these functions:
CPU_SET (int cpu, cpu_set_t * set);
CPU_ZERO (cpu_set_t * set);
and this type: cpu_set_t
It always gives me these errors:
implicit declaration of function 'CPU_ZERO'
implicit declaration of function 'CPU_SET'
'cpu_set_t' undeclared (first use in this function)
Here is my code to bind active thread to processor 0:
cpu_set_t mask;
CPU_ZERO (& mask);
CPU_SET (0, & mask) // bind processor 0
sched_setaffinity (0, sizeof(mask), & mask);
I have included and defined at the top :
**define _GNU_SOURCE
include <sched.h>**
But I always get the same errors. can you help me please?
You should read sched_setaffinity(2) carefully and test its result (and display errno on failure, e.g. with perror).
Actually, I believe you should use pthread_setaffinity_np(3) instead (and of course test its failure, etc...)
Even more, I believe that you should not bother to explicitly set the affinity. Recent Linux kernels are often quite good at dispatching running threads on different CPUs.
So simply use pthreads and don't bother about affinity, unless you see actual issues when benchmarking.
BTW, passing the -H flag to your GCC (cross-)compiler could be helpful. It shows you the included files. Perhaps also look into the preprocessed form obtained with gcc -C -E ; it looks like some header files are missing or not found (maybe some missing -I include-directory at compilation time, or some missing headers on your development system)
BTW, your kernel version looks ancient. Can't you upgrade your kernel to something newer (3.15.x or some 3.y)?
I'm going to launch a Linux on my development board, and i need a dts file (device tree file) to describe the whole hardware. But I only know very little about the syntax of this file which is not enough to run Linux properly on the board.
What i know now are only how to describe a unit's interrupt number, frequency, address, parent-unit and its compatible driver type (as described below):
ps7_scuwdt_0: ps7-scuwdt#f8f00620 {
compatible = "xlnx,ps7-scuwdt-1.00.a";
device_type = "watchdog";
interrupt-parent = <&ps7_scugic_0>;
interrupts = < 1 14 769 >;
reg = < 0xf8f00620 0xe0 >;
} ;
Other advanced usage or grammar is unfamiliar to me.
Take a look at the dts of the board which most closely resembles your dev-board. Use that as a reference and make changes to the dts according to the differences between the reference board and your dev-board.
Also checkout the following :
- Device-tree Documentation project at eLinux (has a vast collection of links to start reading).
- Series of articles on the basics of device tree.
- Walkthrough of migrating to device-tree.
Minimal reg + interrupt example with QEMU virtual device
Our example will add the following device tree node to the versatilepb device tree which QEMU will use due to -M versatilepb:
lkmc_platform_device#101e9000 {
compatible = "lkmc_platform_device";
reg = <0x101e9000 0x1000>;
interrupts = <18>;
interrupt-controller;
#interrupt-cells = <2>;
clocks = <&pclk>;
clock-names = "apb_pclk";
lkmc-asdf = <0x12345678>;
};
Then, by using a Linux kernel module to interact with the device, we will test the following DTS features:
registers addresses
IRQs
read custom properties from the driver
These are the main components of the example:
Linux versatile .dts patch on Linux fork
reg and interrupt match numbers hard-coded in the QEMU versatile machine (which represents the SoC)
compatible matches the platform_driver.name in the kernel module, and informs the kernel which module will handle this device
we also pass a custom property to the driver: lkmc-asdf = <0x12345678>;, which is read with of_property_read_u32
the device tree is passed to QEMU's firmware with the -dtb argument
QEMU fork:
device that reads a register and generates interrupts
insert device into -M versatilepb
kernel module Writes to memory on probe to test things out, which also generates an IRQ.
Device trees have many more features that we haven't covered, but this example should get you started, and easily allow you to play around with any new features that come up.
Further resources:
indispensable elinux tutorial: http://elinux.org/Device_Tree_Usage
play around with dtc for purely syntaxical questions. E.g., it shows how nodes are simply merged by path: https://unix.stackexchange.com/a/375923/32558
https://unix.stackexchange.com/questions/118683/what-is-a-device-tree-and-a-device-tree-blob
Lets take a example and I will explain each one of them as below
auart0: serial#8006a000 {
compatible = "fsl,imx28-auart", "fsl,imx23-auart";
reg = <0x8006a000 0x2000>;
interrupts = <112>;
dmas = <&dma_apbx 8>, <&dma_apbx 9>;
dma-names = "rx", "tx";
};
Required properties:
- compatible : Should be "fsl,-auart". The supported SoCs include
imx23 and imx28.
- reg : Address and length of the register set for the device
- interrupts : Should contain the auart interrupt numbers
- dmas: DMA specifier, consisting of a phandle to DMA controller node
and AUART DMA channel ID.
- dma-names: "rx" for RX channel, "tx" for TX channel.
Note: Each auart port should have an alias correctly numbered in "aliases"
node.
For more advance properties, please go to this link, it is very useful
Device Tree Explanation
Hope it helps!
Complementary to the other answers:
Keep in mind, that there is also a section for devicetrees in the official kernel source under root/Documentation/devicetree(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree?h=v5.2-rc5).