What linux kernel code creates /sys/devices/system/cpu/cpuX? - linux

I am developing a cpufreq driver (as a loadable kernel module) for the microblaze architecture. I have some FPGA logic that is able to scale the on-system clock and it works quite well. I have followed the information in Documentation/cpu-freq/cpu-drivers.txt and looked at the model in the blackfin cpufreq driver.
I have also made the necessary changes to arch/microblaze/Kconfig in order to have the cpufreq options built into the kernel (not modules).
When I first loaded the driver, cpufreq_register_driver() was returning -ENODEV, which implied that it couldn't find a CPU. I set the driver flag to CPUFREQ_STICKY and was able to insert the module.
However, at this point I realized that /sys/devices/system/cpu/cpu0 isn't present (although /sys/devices/system/cpu/cpufreq is there). So, why is that? What part of the kernel code is responsible for creating that directory?

I discovered where the /sys/devices/system/cpu/cpuX sysfs entry was created by looking at the cpufreq_cpu_callback() in drivers/cpufreq/cpufreq.c. This has a call to get_cpu_sysdev(), which I assumed was the element that I was looking for.
This call is defined in drivers/base/cpu.c, where I also noticed the code that puts together the cpu specific sysdev entry; register_cpu(). For most architectures, this is in arch/${ARCH}/kernel/setup.c, and I used the blackfin code as an example.
DEFINE_PER_CPU(struct cpu, cpu_data);
static int __init topology_init(void)
{
unsigned int cpu;
for_each_possible_cpu(cpu) {
register_cpu(&per_cpu(cpu_data, cpu), cpu);
}
return 0;
}
After adding this code to arch/microblaze/kernel/setup.c, I now have the directory I need and I'm able to make use of the different governors available to talk to my cpufreq driver. Now I just have to make sleep 1 take 1 second at 1/3 the clock rate instead of 3 seconds!

Related

How to Configure and Sample Intel Performance Counters In-Process

In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system):
results[] = ...
for (iteration = 0; iteration < num_iterations; iteration++) {
pctr_start = sample_pctr();
the_benchmark();
pctr_stop = sample_pctr();
results[iteration] = pctr_stop - pctr_start;
}
FWIW, the performance counter I am thinking of using is CPU_CLK_UNHALTED.THREAD_ALL, to read the number of core cycles independent of clock frequency changes (In an earlier question I had been planning to use the TSC register for this, but alas, that is not what this register measures at all).
My initial intention was to use inline assembler to first configure a counter using WRMSR, then to read the counter using RDPMC inside sample_pctr().
I stumbled at the first hurdle, as writing MSRs requires kernel privileges. It seems like you can in fact read the counters from user space (if configured correctly), but the act of configuring the counter (with an MSR) needs to be undertaken by the kernel.
Does anyone know a lightweight way to ask the kernel to configure the a performance counters from user-space so that I can then use RDPMC from within my benchmark harness?
Stuff I've looked into/thought about:
Perf tools for Linux. Seems to be geared up for sampling over the whole lifetime of a process, not within a process as specific points (before and after each iteration).
Use perf syscalls directly (i.e. perf_event_open). Looks like the counter value will only update periodically (using a sample rate) or after the counter exceeds a threshold. I need the counter value precisely at the moment I ask. This is why RDPMC seemed so attractive. I imagine that sampling frequently will itself skew the performance counter readings.
PAPI builds on perf, so probably inherits the above problem.
Write a kernel module -- too much effort, too error prone.
Ideally I would like a solution which works on OpenBSD and Linux, but somehow I think that is a tall order. Perhaps just for Linux for now.
Any help is most appreciated. Thanks.
EDIT: I just found the Linux msr device node, which would probably suffice. I'll leave the question up in case a better answer shows up.
It seems the best way -- for Linux at least -- is to use the msr device node.
You simply open a device node, seek to the address of the MSR required, and read or write 8 bytes.
OpenBSD is harder, since (at the time of writing) there is no user-space proxy to the MSRs. So you would need to write a kernel module or implement a sysctl by hand.

How to generate a steady 37kHz GPIO trigger from inside linux kernel?

I have a micro controller taking care of infrared TX-carrier wave generation currently, but I started wondering if I could dispose of it, and do this work in linux side - thus bringing the cost of my embedded system down.
I'm running on a Freescale i.mx233 (454MHz ARM9), and if I access registry directly through /dev/mem, I can achieve quite steady 5MHz triggering to a GPIO pin.
Since I need 37kHz, I started looking ways of slowing it down, but it seems that at least nanowait() is way too rough for this purpose.
I found one solution of calling rand() in a for loop, and I seem to be able to generate 38,4kHz signal quite well, However there is some unacceptable jitter from time to time according to oscilloscope. (I understand that this is quite a bit waste of resources, but when the TX needs to be done, the system has no other tasks really)
My questions:
Freescales kernel code (3.8 branch) doesn't have CONFIG_PREEMPT_RT patches, so that is one thing maybe I should look into, but before that:
Could I achieve more accurate performance, by writing a kernel module to drive the GPIO from inside the kernel ? I do need to read up on some data from user space (data to be sent), but other than that, I only need to trigger the led on specified frequency at the end of the GPIO, so the driver should be pretty simple.
Can I force the priority of my driver, so that other tasks don't interrupt this gpio triggering ? (data sending takes currently roughly 400ms, and it's done very seldom)
Is there some better way to create an interrupt say every 37kHz, so that I don't stall the system by SW ?
Micro controller is perfect for this kind of tasks, but it would be nice to avoid this cost overhead if possible...
The i.MX23 PWM in "Multi-Chip Attachment Mode" is designed exactly for this requirement.
Use one of the PWM's in "Multi-Chip Attachment Mode", for example, assuming you are using a 24Mhz clock, with
MATT=1 (Enable multi-chip attachment mode)
MATT_SEL=1 (User 24Mhz clock)
CDIV=0x2 (or DIV_4, i.e. divide by 4)
INACTIVE_STATE=0x2 or 0x3
ACTIVE_STATE=0x3 or 0x2
PERIOD=175 (i.e 176-1)
If you use a 32Mhz clock you will need other CDIV and PERIOD parameters to get to 34Khz.
See the "i.MX23 Applications Processor Reference Manual" for example code. If I am not mistaken the driver code is in arch/arm/plat-mxc/pwm.c but it doesn't seem to support the MATT mode. You will probably have to extend the code yourself.
Regarding the implementation -
The above answer relates to the CPU only. In practice, the ability to implement the idea depends on the board design. The board would need a header (pins for external connection) that connects to a GPIO pin that can be connected via the pinmux to one of the PWMs. I would assume that most reference designs would have at least one PWM configurable GPIO exposed through a header. The the question is if there is only one and if you are already using it for some other control purpose.
After determining that there is a header with a free PWM configurable GPIO, you need to configure the pin mux and activate the PWM. There are instructions for this in the processor reference manual noted above. Most systems do this configuration in the boot loader board_init() (assuming U-boot), although it can probably be done in userspace also with some mmap trickery after Linux boots.
Finally you would need to write a driver based on the interface to the PWM module in platform-mxc_pwm.c.
If you are using the i.MX23 EVK 10.05 you might be able to modify the LED PWM driver since it is already configured at the level of the bootloader and kernel and connect your device to the LED output instead of the LED. (You will need a hardware technician to help you with this.) Make sure you config the kernel with the CONFIG_LEDS_MXS.
The above comments regarding implementation are somewhat speculative since I don't know the EVK. Perhaps someone who knows it can improve on this.
Update September 21, 2013
Another way to generate a 37kHz signal with the i.MX23 or with any SoC with a similar ARM CPU core is to use an unused on-chip timer to generate a FIQ interrupt at the required frequency and write a FIQ interrupt handler to toggle a GPIO pin. Maxime Ripard posted a complete example of this method using the i.MX28 SoC on his Free Electrons blog on April 30 this year. To use this method you will need both an unused timer and not be using the FIQ interrupt for another purpose such as one of the SPI, camera, or brownout-detection drivers that use the ARM FIQ. You will also need to write the ISR in ARM assembler.
The best way to get a 37 kHz signal would be to find some serial/audio/PWM output that can generate it in hardware.
It might be possible to raise the priority of your userspace process, but this won't help against interrupts or high-priority kernel tasks.
An RT kernel would allow you to get priority over more kernel tasks, but wouldn't help against all interrupts.
I don't know if you will be able to get the maximum latency below 37 kHz (27 µs); I think it's unlikely.
Doing this in the kernel would help because you could disable interrupt handling.
However, disabling interrupts for as long as 400 ms is frowned upon.

Why do we need a bootloader in an embedded device?

I'm working with ELinux kernel on ARM cortex-A8.
I know how the bootloader works and what job it's doing. But i've got a question - why do we need bootloader, why was the bootloader born?
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
In the context of Linux, the boot loader is responsible for some predefined tasks. As this question is arm tagged, I think that ARM booting might be a useful resource. Specifically, the boot loader was/is responsible for setting up an ATAG list that describing the amount of RAM, a kernel command line, and other parameters. One of the most important parameters is the machine type. With device trees, an entire description of the board is passed. This makes a stock ARM Linux impossible to boot with out some code to setup the parameters as described.
The parameters allows one generic Linux to support multiple devices. For instance, an ARM Debian kernel can support hundreds of different board types. Uboot or other boot loader can dynamically determine this information or it can be hard coded for the board.
You might also like to look at bootloader info page here at stack overflow.
A basic system might be able to setup ATAGS and copy NOR flash to SRAM. However, it is usually a little more complex than this. Linux needs RAM setup, so you may have to initialize an SDRAM controller. If you use NAND flash, you have to handle bad blocks and the copy may be a little more complex than memcpy().
Linux often has some latent driver bugs where a driver will assume that a clock is initialized. For instance if Uboot always initializes an Ethernet clock for a particular machine, the Linux Ethernet driver may have neglected to setup this clock. This can be especially true with clock trees.
Some systems require boot image formats that are not supported by Linux; for example a special header which can initialize hardware immediately; like configuring the devices to read initial code from. Additionally, often there is hardware that should be configured immediately; a boot loader can do this quickly whereas the normal structure of Linux may delay this significantly resulting in I/O conflicts, etc.
From a pragmatic perspective, it is simpler to use a boot loader. However, there is nothing to prevent you from altering Linux's source to boot directly from it; although it maybe like pasting the boot loader code directly to the start of Linux.
See Also: Coreboot, Uboot, and Wikipedia's comparison. Barebox is a lesser known, but well structured and modern boot loader for the ARM. RedBoot is also used in some ARM systems; RedBoot partitions are supported in the kernel tree.
A boot loader is a computer program that loads the main operating system or runtime environment for the computer after completion of the self-tests.
^ From Wikipedia Article
So basically bootloader is doing just what you wanted - copying data from flash into operating memory. It's really that simple.
If you want to know more about boostrapping the OS, I highly recommend you read the linked article. Boot phase consists, apart from tests, also of checking peripherals and some other things. Skipping them makes sense only on very simple embedded devices, and that's why their bootloaders are even simpler:
Some embedded systems do not require a noticeable boot sequence to begin functioning and when turned on may simply run operational programs that are stored in ROM.
The same source
The primary bootloader is usually built in into the silicon and performs the load of the first USER code that will be run in the system.
The bootloader exists because there is no standardized protocol for loading the first code, since it is chip dependent. Sometimes the code can be loaded through a serial port, a flash memory, or even a hard drive. It is bootloader function to locate it.
Once the user code is loaded and running, the bootloader is no longer used and the correctness of the system execution is user responsibility.
In the embedded linux chain, the primary bootloader will setup and run the Uboot. Then Uboot will find the linux kernel and load it.
Why we can't directly load the kernel into RAM from flash memory without bootloader? If we load it what will happen? In fact, processor will not support it, but why are we following the procedure?
Bartek, Artless, and Felipe all give parts of the picture.
Every embedded processor type (E.G. 386EX, Coretex-A53, EM5200) will do something automatically when it is reset or powered on. Sometimes that something is different depending on whether the power is cycled or the device is reset. Some embedded processors allow you to change that something based on voltages applied to different pins when the device is powered or reset.
Regardless, there is a limited amount of something that a processor can do, because of the physical space on-processor required to define that something, whether it is on-chip FLASH, instruction micro-code, or some other mechanism.
This limit means that the something is
fixed purpose, does one thing as quickly as possible.
limited in scope and capability, typically loading a small block of code (often a few kilobytes or less) into a fixed memory location and executing from the start of the loaded code.
unmodifiable.
So what a processor does in response to reset or power-cycle cannot be changed, and cannot do very much, and we don't want it to automatically copy hundreds of megabytes or gigabytes into memory which may not exist or may not be initialized, and which could take a looooong time.
So....
We set up a small program which is smaller than the smallest size permitted across all of the devices we are going to use. That program is stored wherever the something needs it to be.
Sometimes the small program is U-Boot. Sometimes even U-Boot is too big for initial load, so the small program then in turn loads U-Boot.
The point is that whatever gets loaded by the something, is modifiable as needed for a particular system. If it is U-Boot, great, if not, it knows where to load the main operating system or where to load U-Boot (or some other bootloader).
U-Boot (speaking of bootloaders in general) then configures a minimal set of devices, memory, chip settings, etc., to enable the main OS to be loaded and started. The main OS init takes care of any additional configuration or initialization.
So the sequence is:
Processor power-on or reset
Something loads initial boot code (or U-Boot style embedded bootloader)
Initial boot code (may not be needed)
U-Boot (or other general embedded bootloader)
Linux init
The kernel requires the hardware on which you are working to be in a particular state. All the hardware you used needs to be checked for its state and initialized for its further operation. This is one of the main reasons to use a boot loader in an embedded (or any other environment), apart from its use to load a kernel image into the RAM.
When you turn on a system, the RAM is also not in a useful state (fully initialized to use) for us to load kernel into it. Therefore, we cannot load a kernel directly (to answer your question)and thus arises the need for a construct to initialize it.
Apart from what is stated in all the other answers - which is correct - in some cases the system has to go through different execution modes, take as example TrustZone for secure ARM chips. It is possible to still consider it as sort of HW initialization, but what makes it peculiar is the fact that there are additional limitations (ex: memory available) that make it impractical, if not impossible, to do everything in a single binary, thus multiple stages of bootloader are available.
Furthermore, for security reason, each of them is signed and can perform its job only if it meets the security requirements.

Debugging kernel hang because of IOCTL calls

I am trying to make a kernel module which is working on 2.6.32 kernel to work on 3.6 kernel. We use IOCTL calls to update structures in Linux Kernel Module. These calls are working fine in 2.6.32 kernel.
When I try the same in 3.6 kernel I am facing kernel hang whenever ioctl calls are made from user-space application. Its a socket based interface not a file based interface hence we use the ioctl under struct proto_ops.
How can I debug this scenario as there is no core dump generated. To copy data from userspace I am using copy_from_user command.
Any pointers for debugging this scenario would be very helpful
ioctl() is one of the remaining parts of the kernel which runs under the Big Kernel Lock (BKL). In the past, the usage of the BKL has made it possible for long-running ioctl() methods to create long latencies for unrelated processes.
Follows an explanation of the patch that introduced unlocked_ioctl and compat_ioctl into 2.6.11. The removal of the ioctl field happened a lot later, in 2.6.36.
Explanation: When ioctl was executed, it took the Big Kernel Lock (BKL), so nothing else could execute at the same time. This is very bad on a multiprocessor machine, so there was a big effort to get rid of the BKL. First, unlocked_ioctl was introduced. It lets each driver writer choose what lock to use instead. This can be difficult, so there was a period of transition during which old drivers still worked (using ioctl) but new drivers could use the improved interface (unlocked_ioctl). Eventually all drivers were converted and ioctl could be removed.
compat_ioctl is actually unrelated, even though it was added at the same time. Its purpose is to allow 32-bit userland programs to make ioctl calls on a 64-bit kernel. The meaning of the last argument to ioctl depends on the driver, so there is no way to do a driver-independent conversion.
Reference: The new way of ioctl() by Jonathan Corbet

request_irq succeeds but interrupt is never detected

I am running embedded linux 3.2.6 on an ARM processor. I am using a modified version of atmel's serial driver to control the 4 USART ports on my device. When I use the driver compiled with the kernel, all works fine. But I want to run the driver as a kernel module instead. I make all of the necessary changes and disable the internal driver and everything seems fine. The 4 tty devices are registered successfully and I can see that the all of my probe and initialization functions work correctly.
So here's the problem:
When I try to write to any of the devices, my "start transmit" function gets called but then waits for an interrupt from the usart which never occurs. So the write just hangs, and using a logic analyzer I can see that RTS gets asserted but no bytes show up on the tx line. I know that my call to request_irq succeeds and yet i never see any of the irq entries in /proc/interrupts. In the driver, I have also tried using request_irq to register a separate interrupt handler for a gpio line, and this works fine.
I know that this is a problem that is probably hard to diagnose, but I am looking for ANY possible suggestions that could lead me in the right direction to finding a solution. Let me know if you need any clarifications. Thank you
The symptoms reads like a peripheral clock that has not been enabled (or turned off): the device can be initialized w/o errors and an I/O operation can be setup, but the device doesn't do anything; it plays dead. Since no I/O ever starts, you're never going to get an interrupt indicating completion!
The other thing to check are the conditional compilation directives for HW configuration structures in your arch/arm/mach-xxx/zzz_devices.c file.
Make sure that the serial port structures have something like:
#if defined(CONFIG_SERIAL_ATMEL) || defined(CONFIG_SERIAL_ATMEL_MODULE)
and not just
#if defined(CONFIG_SERIAL_ATMEL)
Addendum
I could be wrong but the clock shouldn't have any effect on the CTS pin causing an interrupt, right?
Not right.
These digital circuits are synchronous state machines: without a clock, a change-of-state by an input cannot be processed.
Also, SoCs and modern uControllers use the peripheral clocks as on/off switches for those integrated peripherals. There is often way more functionality, i.e. peripherals, on the silicon chip than can actually be used, mostly due to insufficient quantity of pins to the board. So disabling the clocks to unused devices is employed to reduce power consumption.
You are far too focused on interrupts.
You do not have a solvable interrupt problem; those are secondary failures.
The lack of output when attempting to transmit is far more significant and revealing.
The root cause is probably a flawed configuration of the USART devices, since transmitting bits is an automatic operation for a configured & operational USART.
If the difference between not-working versus working is loadable module versus static linking, then the root cause is going to be something fundamental (and trivial) like my two suggestions.
Also your lack of acknowledgement regarding the #if defined(), e.g. you didn't respond with "Oh yeah, we already knew that", raises a gigantic red flag that says "Fix me first!"
Addendum 2
I'm tempted to delete this answer after discovering that the Atmel serial driver cannot be configured/built as a loadable module using make menuconfig (which is the premise for half of the answer). (Of course the Kconfig file could be hacked to make the config variable tristate instead of boolean to overcome the module restriction.) I've left a comment for the OP. But I also wanted to preserve the comment to Mr. Stratton pointing out how symbols in the .config file are (not) used.
So I did finally fix my problem. Thank you for the responses, none of them directly solved my problem but they did prompt further examination of my code. After some trial and error I finally got it working. I had originally moved the platform_device structures for each usart from /mach-at91/xxx_devices.c to my loadable module. Well for some reason the structures weren't getting the correct data to map to the hardware, I suppose because it wasn't correctly linking the symbols from the kernel (never got an error message though) and so some of the registration functions weren't even getting called. I ended up moving the structures and platform_device_register calls back into the devices file. I also decided to keep the driver for the console built-in using the original atmel_serial.c driver. I had to change the platform_device name for the console in both the devices file and in the built-in atmel_serial.c file in order for it to not conflict with my usart ports driver. I found that changing the platform_device and platform_driver name for the usarts from anything but "atmel_usart" resulted in usart transmission failing. I really don't understand why, but i'm just leaving it as atmel_usart so it works.
Thanks again to everybody who responded to my problem.

Resources