I know that the ARM PMU is partially implemented, thanks to the gem5 source code and some publications.
I have a binary which uses perf_event to access the PMU on a Linux-based OS, under an ARM processor. Could it use perf_event inside a gem5 full-system simulation with a Linux kernel, under the ARM ISA?
So far, I haven't found the right way to do it. If someone knows, I will be very grateful!
As of September 2020, gem5 needs to be patched in order to use the ARM PMU.
Edit: As of November 2020, gem5 is now patched and it will be included in the next release. Thanks to the developers!
How to patch gem5
This is not a clean patch (very straightforward), and it is more intended to understand how it works. Nonetheless, this is the patch to apply with git apply from the gem5 source repository:
diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
index 2641ec3fb..3d85c1b75 100644
--- i/src/arch/arm/ArmISA.py
+++ w/src/arch/arm/ArmISA.py
## -36,6 +36,7 ##
from m5.params import *
from m5.proxy import *
+from m5.SimObject import SimObject
from m5.objects.ArmPMU import ArmPMU
from m5.objects.ArmSystem import SveVectorLength
from m5.objects.BaseISA import BaseISA
## -49,6 +50,8 ## class ArmISA(BaseISA):
cxx_class = 'ArmISA::ISA'
cxx_header = "arch/arm/isa.hh"
+ generateDeviceTree = SimObject.recurseDeviceTree
+
system = Param.System(Parent.any, "System this ISA object belongs to")
pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
index 047e908b3..58553fbf9 100644
--- i/src/arch/arm/ArmPMU.py
+++ w/src/arch/arm/ArmPMU.py
## -40,6 +40,7 ## from m5.params import *
from m5.params import isNullPointer
from m5.proxy import *
from m5.objects.Gic import ArmInterruptPin
+from m5.util.fdthelper import *
class ProbeEvent(object):
def __init__(self, pmu, _eventId, obj, *listOfNames):
## -76,6 +77,17 ## class ArmPMU(SimObject):
_events = None
+ def generateDeviceTree(self, state):
+ node = FdtNode("pmu")
+ node.appendCompatible("arm,armv8-pmuv3")
+ # gem5 uses GIC controller interrupt notation, where PPI interrupts
+ # start to 16. However, the Linux kernel start from 0, and used a tag
+ # (set to 1) to indicate the PPI interrupt type.
+ node.append(FdtPropertyWords("interrupts", [
+ 1, int(self.interrupt.num) - 16, 0xf04
+ ]))
+ yield node
+
def addEvent(self, newObject):
if not (isinstance(newObject, ProbeEvent)
or isinstance(newObject, SoftwareIncrement)):
diff --git i/src/cpu/BaseCPU.py w/src/cpu/BaseCPU.py
index ab70d1d7f..66a49a038 100644
--- i/src/cpu/BaseCPU.py
+++ w/src/cpu/BaseCPU.py
## -302,6 +302,11 ## class BaseCPU(ClockedObject):
node.appendPhandle(phandle_key)
cpus_node.append(node)
+ # Generate nodes from the BaseCPU children (and don't add them as
+ # subnode). Please note: this is mainly needed for the ISA class.
+ for child_node in self.recurseDeviceTree(state):
+ yield child_node
+
yield cpus_node
def __init__(self, **kwargs):
What the patch resolves
The Linux kernel uses a Device Tree Blob (DTB), which is a regular file, to declare the hardware on which the kernel is running. This is used to make the kernel portable between different architecture without a recompilation for each hardware change. The DTB follows the Device Tree Reference, and is compiled from a Device Tree Source (DTS) file, a regular text file. You can learn more here and here.
The problem was that the PMU is supposed to be declared to the Linux kernel via the DTB. You can learn more here and here. In a simulated system, because the system is specified by the user, gem5 has to generate a DTB itself to pass to the kernel, so the latter can recognize the simulated hardware. However, the problem is that gem5 does not generate the DTB entry for our PMU.
What the patch does
The patch adds an entry to the ISA and the CPU files to enable DTB generation recursion up to find the PMU. The hierarchy is the following: CPU => ISA => PMU. Then, it adds the generation function in the PMU to generate a unique DTB entry to declare the PMU, with the proper notation for the interrupt declaration in the kernel.
After running a simulation with our patch, we could see the DTS from the DTB like this:
cd m5out
# Decompile the DTB to get the DTS.
dtc -I dtb -O dts system.dtb > system.dts
# Find the PMU entry.
head system.dts
dtc is the Device Tree Compiler, installed with sudo apt-get install device-tree-compiler. We end up with this pmu DTB entry, under the root node (/):
/dts-v1/;
/ {
#address-cells = <0x02>;
#size-cells = <0x02>;
interrupt-parent = <0x05>;
compatible = "arm,vexpress";
model = "V2P-CA15";
arm,hbi = <0x00>;
arm,vexpress,site = <0x0f>;
memory#80000000 {
device_type = "memory";
reg = <0x00 0x80000000 0x01 0x00>;
};
pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0x01 0x04 0xf04>;
};
cpus {
#address-cells = <0x01>;
#size-cells = <0x00>;
cpu#0 {
device_type = "cpu";
compatible = "gem5,arm-cpu";
[...]
In the line interrupts = <0x01 0x04 0xf04>;, 0x01 is used to indicate that the number 0x04 is the number of a PPI interrupt (the one declared with number 20 in gem5, the difference of 16 is explained inside the patch code). The 0xf04 corresponds to a flag (0x4) indicating that it is a "active high level-sensitive" interrupt and a bit mask (0xf) indicating that the interrupts should be wired to all PE attached to the GIC. You can learn more here.
If the patch works and your ArmPMU is declared properly, you should see this message at boot time:
[ 0.239967] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available
Context
I was not able to use the Performance Monitoring Unit (PMU) because of a gem5's unimplemented feature. The reference on the mailing list can be found here. After a personal patch, the PMU is accessible through perf_event. Fortunately, a similar patch will be released in the official gem5 release soon, could be seen here. The patch will be described in another answer, due to the number of link limitation inside one message.
How to use the PMU
C source code
This is a minimal working example of a C source code using perf_event, used to count the number of mispredicted branches by the branch predictor unit during a specific task:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/perf_event.h>
int main(int argc, char **argv) {
/* File descriptor used to read mispredicted branches counter. */
static int perf_fd_branch_miss;
/* Initialize our perf_event_attr, representing one counter to be read. */
static struct perf_event_attr attr_branch_miss;
attr_branch_miss.size = sizeof(attr_branch_miss);
attr_branch_miss.exclude_kernel = 1;
attr_branch_miss.exclude_hv = 1;
attr_branch_miss.exclude_callchain_kernel = 1;
/* On a real system, you can do like this: */
attr_branch_miss.type = PERF_TYPE_HARDWARE;
attr_branch_miss.config = PERF_COUNT_HW_BRANCH_MISSES;
/* On a gem5 system, you have to do like this: */
attr_branch_miss.type = PERF_TYPE_RAW;
attr_branch_miss.config = 0x10;
/* Open the file descriptor corresponding to this counter. The counter
should start at this moment. */
if ((perf_fd_branch_miss = syscall(__NR_perf_event_open, &attr_branch_miss, 0, -1, -1, 0)) == -1)
fprintf(stderr, "perf_event_open fail %d %d: %s\n", perf_fd_branch_miss, errno, strerror(errno));
/* Workload here, that means our specific task to profile. */
/* Get and close the performance counters. */
uint64_t counter_branch_miss = 0;
read(perf_fd_branch_miss, &counter_branch_miss, sizeof(counter_branch_miss));
close(perf_fd_branch_miss);
/* Display the result. */
printf("Number of mispredicted branches: %d\n", counter_branch_miss);
}
I will not enter into the details of how using perf_event, good resources are available here, here, here, here. However, just a few notes about the code above:
On real hardware, when using perf_event and common events (events that are available under a lot of architectures), it is recommended to use perf_event macros PERF_TYPE_HARDWARE as type and to use macros like PERF_COUNT_HW_BRANCH_MISSES for the number of mispredicted branches, PERF_COUNT_HW_CACHE_MISSES for the number of cache misses, and so on (see the manual page for a list). This is a best practice to have a portable code.
On a gem5 simulated system, currently (v20.0), a C source code have to use PERF_TYPE_RAW type and architectural event ID to identify an event. Here, 0x10 is the ID of the 0x0010, BR_MIS_PRED, Mispredicted or not predicted branch event, described in the ARMv8-A Reference Manual (here). In the manual, all events available in real hardware are described. However, they are not all implemented into gem5. To see the list of implemented event inside gem5, refer to the src/arch/arm/ArmPMU.py file. In the latter, the line self.addEvent(ProbeEvent(self,0x10, bpred, "Misses")) corresponds to the declaration of the counter described in the manual. This is not a normal behavior, hence gem5 should be patched to allow using PERF_TYPE_HARDWARE one day.
gem5 simulation script
This is not a entire MWE script (it would be too long!), only the needed portion to add inside a full-system script to use the PMU. We use an ArmSystem as a system, with the RealView platform.
For each ISA (we use an ARM ISA here) of each CPU (e.g., a DerivO3CPU) in our cluster (which is a SubSystem class), we add to it a PMU with a unique interrupt number and the already implemented architectural event. An example of this function could be found in configs/example/arm/devices.py.
To choose an interrupt number, pick a free PPI interrupt in the platform interrupt mapping. Here, we choose PPI n°20, according to the RealView interrupt map (src/dev/arm/RealView.py). Since PPIs interrupts are local per Processing Element (PE, corresponds to cores in our context), the interrupt number can be the same for all PE without any conflict. To know more about PPI interrupts, see the GIC guide from ARM here.
Here, we can see that the interrupt n°20 is not used by the system (from RealView.py):
Interrupts:
0- 15: Software generated interrupts (SGIs)
16- 31: On-chip private peripherals (PPIs)
25 : vgic
26 : generic_timer (hyp)
27 : generic_timer (virt)
28 : Reserved (Legacy FIQ)
We pass to addArchEvents our system components (dtb, itb, etc.) to link the PMU with them, thus the PMU will use the internal counters (called probes) of these components as exposed counters to the system.
for cpu in system.cpu_cluster.cpus:
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=20))
# Add the implemented architectural events of gem5. We can
# discover which events is implemented by looking at the file
# "ArmPMU.py".
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
icache=getattr(cpu, "icache", None),
dcache=getattr(cpu, "dcache", None),
l2cache=getattr(system.cpu_cluster, "l2", None))
Two quick additions to Pierre's awesome answers:
for fs.py as of gem5 937241101fae2cd0755c43c33bab2537b47596a2, all that is missing is to apply to fs.py as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/37978/1/configs/example/fs.py
for cpu in test_sys.cpu:
if buildEnv['TARGET_ISA'] in "arm":
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=20))
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.mmu.dtb, itb=cpu.mmu.itb,
icache=getattr(cpu, "icache", None),
dcache=getattr(cpu, "dcache", None),
l2cache=getattr(test_sys, "l2", None))
a C example can also be found in man perf_event_open
I am trying to setup a Half Duplex RS-485 communication using libmodbus on a Raspberry Pi running Raspian Buster, with a FTDI USB to Serial adapter. My FTDI adapter shows as ttyUSB0 when I run ls /dev/.
I tried the following sample code:
#include <modbus.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(void) {
modbus_t *ctx = modbus_new_rtu("/dev/ttyUSB0", 19200, 'N', 8, 1);
if (ctx == NULL) {
fprintf(stderr, "Unable to create the libmodbus context\n");
return 0;
}
if (modbus_rtu_set_serial_mode(ctx, MODBUS_RTU_RS485) == -1) {
fprintf(stderr, "Error setting the serial port as RS-485\n");
fprintf(stderr, "errno: %d\n (EBADF == 9)", errno);
modbus_free(ctx);
return 0;
}
}
Compiled with gcc test1.c -I/usr/include/modbus -lmodbus.
And I get errno as 9, or EBADF, even if I run this code with sudo.
There is a very easy solution to your problem: just don't set MODBUS_RTU_RS485, quite likely you don't need it.
This mode is actually a workaround for devices without automatic (hardware) direction control. As you know, Modbus RTU works over a half-duplex RS485 link (only one device is allowed to talk while all others must be listening only), and hence requires an additional (to RX and TX) signal to control what device is writing to the bus at all times (direction control).
So you would only need to set MODBUS_RTU_RS485 if your device lacks this feature, which is nowadays quite unlikely or if you are building your own transceiver. Most devices based on the FTDI chip, in particular, should have this feature since the chip itself has a TXDEN (transmit enable) pin. See here for more details and a trick to expose the TXDEN signal to a non-default pin.
It is when you don't have this feature (one frequent scenario is when you want to use the embedded UART on your Rpi for Modbus over RS485, implementing your own transceiver) that you need a software (or hardware) workaround. And that's where MODBUS_RTU_RS485 should come handy, repurposing the RTS flow control signal. Unfortunately, most serial drivers (including ftdi_sio, the one you are probably using) don't support this mode (refer again to the link above).
Luckily, there are workarounds to the workaround: see here for a complete discussion. You can also take a look at this answer where I explained how to set up libmodbus with support for toggling the direction on the bus using a GPIO pin on a Rpi (also applicable to most SBCs, I've used this method successfully with a Pocket Chip computer, for instance).
You can find more background on this issue elsewhere: here and here.
I have the following chardev defined:
.h
#define MAJOR_NUM 245
#define MINOR_NUM 0
#define IOCTL_MY_DEV1 _IOW(MAJOR_NUM, 0, unsigned long)
#define IOCTL_MY_DEV2 _IOW(MAJOR_NUM, 1, unsigned long)
#define IOCTL_MY_DEV3 _IOW(MAJOR_NUM, 2, unsigned long)
module .c
static long device_ioctl(
struct file* file,
unsigned int ioctl_num,
unsigned long ioctl_param)
{
...
}
static int device_open(struct inode* inode, struct file* file)
{
...
}
static int device_release(struct inode* inode, struct file* file)
{
...
}
struct file_operations Fops = {
.open=device_open,
.unlocked_ioctl= device_ioctl,
.release=device_release
};
static int __init my_dev_init(void)
{
register_chrdev(MAJOR_NUM, "MY_DEV", &Fops);
...
}
module_init(my_dev_init);
My user code
ioctl(fd, IOCTL_MY_DEV1, 1);
Always fails with same error: ENOTTY
Inappropriate ioctl for device
I've seen similar questions:
i.e
Linux kernel module - IOCTL usage returns ENOTTY
Linux Kernel Module/IOCTL: inappropriate ioctl for device
But their solutions didn't work for me
ENOTTY is issued by the kernel when your device driver has not registered a ioctl function to be called. I'm afraid your function is not well registered, probably because you have registered it in the .unlocked_ioctl field of the struct file_operations structure.
Probably you'll get a different result if you register it in the locked version of the function. The most probable cause is that the inode is locked for the ioctl call (as it should be, to avoid race conditions with simultaneous read or write operations to the same device)
Sorry, I have no access to the linux source tree for the proper name of the field to use, but for sure you'll be able to find it yourself.
NOTE
I observe that you have used macro _IOW, using the major number as the unique identifier. This is probably not what you want. First parameter for _IOW tries to ensure that ioctl calls get unique identifiers. There's no general way to acquire such identifiers, as this is an interface contract you create between application code and kernel code. So using the major number is bad practice, for two reasons:
Several devices (in linux, at least) can share the same major number (minor allocation in linux kernel allows this) making it possible for a clash between devices' ioctls.
In case you change the major number (you configure a kernel where that number is already allocated) you have to recompile all your user level software to cope with the new device ioctl ids (all of them change if you do this)
_IOW is a macro built a long time ago (long ago from the birth of linux kernel) that tried to solve this problem, by allowing you to select a different character for each driver (but not dependant of other kernel parameters, for the reasons pointed above) for a device having ioctl calls not clashing with another device driver's. The probability of such a clash is low, but when it happens you can lead to an incorrect machine state (you have issued a valid, working ioctl call to the wrong device)
Ancient unix (and early linux) kernels used different chars to build these calls, so, for example, tty driver used 'T' as parameter for the _IO* macros, scsi disks used 'S', etc.
I suggest you to select a random number (not appearing elsewhere in the linux kernel listings) and then use it in all your devices (probably there will be less drivers you write than drivers in the kernel) and select a different ioctl id for each ioctl call. Maintaining a local ioctl file with the registered ioctls this way is far better than trying to guess a value that works always.
Also, a look at the definition of the _IO* macros should be very illustrative :)
i'm doing a project in which i need to handle an interrupt in Linux.
the board i'm using is an ARM9Board based on the s3c6410 MCU by Samsung (arm 11 processor) and it has the following I/O interface :
as the image shows i have EINTx pins for external interrupts and GPxx pins as GPIO pins and i don't mind using any of them but i don't have their numbers !
For EINTx pins :
when i call
int request_irq(unsigned int irq, void (*handler)(int, struct pt_regs *),
unsigned long flags, const char *device);
i need the interrupt number to pass it as the first paramter of the function , so how can i get the irq number for example the EINT16 pin ?
For GPxx pins :
the same story as i need the GPIO pin nuumber to pass it to those functions
int gpio_request(unsigned gpio, const char *label);
int gpio_direction_input(unsigned gpio);
int gpio_to_irq(unsigned gpio);
i.e how do i know the GPIO number for the GPP8 pin ?
i searched the board documents and datasheet but it doesn't contain anything about how to get those numbers , any idea or help on where to look ?
The Embedded Linux you are using should have a GPIO driver that has #define statements for the GPIO pins. You can then get the IRQ number of the specific GPIO using something like:
irq_num = gpio_to_irq(S3C64XX_GPP(8));
The Linux GPIO lib support for that particular chip is available in the following file:
linux/arch/arm/mach-s3c6400/include/mach/gpio.h
There you will find all the #define statements for the various GPIO.
See the section on GPIO Conventions in their documentation:
http://www.kernel.org/doc/Documentation/gpio/gpio.txt
I was doing some work on the GPIO pin as well but it's on a different board, AM335x. Just to let you know, there's quite few of way to do it. One of the method we are using is using memory board to access (write or read) the GPIO pin.
This is a really good article to help me to get things working. Register access to the GPIOs of the Beaglebone via memory mapping
Currently I am developing GPIO kernel module for friendlyarm Linux 2.6.32.2 (mini2440). I am from electronics background and new to Linux.
The kernel module loaded at start-up and the related device file is located in /dev as gpiofreq.
At first time writing to device file, GPIO pin toggles continuously at 50kHz. At second time writing it stop toggling. At third time, it starts again, and so on.
I have wrote separate kernel module to generate freq. but CPU freezes after writing device file at first time. The terminal prompt is shown but I can not run any command afterwards.
Here is the code-snippet:
//calling function which generates continuous freq at gpio
static int send_freq(void *arg)
{
set_current_state(TASK_INTERRUPTIBLE);
for(;;)
{
gpio_set_value(192,1);
udelay(10);
gpio_set_value(192,0);
udelay(10);
}
return 0;
}
Here is the device write code,
which start or stop with any data written to device file.
if(toggle==0)
{
printk("Starting Freq.\n");
task=kthread_run(&send_freq,(void *)freq,"START");
toggle=1;
}
else
{
printk("Operation Terminated.\n");
i = kthread_stop(task);
toggle=0;
}
You are doing an infinite loop in a kernel thread, there is no room for anything else
to happen, except IRQ and maybe other kernel thread.
What you could do is either
program a timer on your hardware and do your pin toggling in an interrupt
replace udelay with usleep_range
I suggest doing thing progressively, and starting in the kHz range with usleep_range, and eventually moving to cust om timer + ISR
in either case, you will probably have a lot of jitter, and doing such gpio toggling may be a good idea on a DSP or a PIC, but is a waste of resources on ARM + Linux, unless you are hardware assisted with pwm capable gpio engine.