Multithreded applications on different CPUS - linux

If, for example, there is a let's say embedded application which run on unicore CPU. And then that application would be ported on multi core CPU. Would that app run on single or multiple cores?
To be more specific I am interested in ARM CPU (but not only) and toolchain specifics e. g. standard C/C++ libraries.
The intention of this question is this: is it CPU's responsibility to "decide" to execute on multiple cores or compiler toolchain, developer and standard platfor specific libraries? And again, I am interested also in other systems' tendencies out there.
There are plenty of applications and RTOS (for example Linux) that run on different CPUs but the same architecture, so does that mean that they are compiled differently?

Generally speaking single-threaded code will always run on one core. To take advantage of multiple cores you need to have either multiple processes, multiple threads, or both.
There's nothing your compiler can do to help you here. This is an architectural consideration.
If you have multiple threads, for example, most multi-core systems will run them on whatever cores are available if the operating system you're running is properly compiled to support that. Running an OS that's been compiled single-core only will obviously limit your options here.

A single threaded program will run in one thread. It is theoretically possible for the thread to be scheduled to move to a different core, but the scheduler cannot turn a single thread into multiple threads and give you any parallel processing.
EDIT
I misunderstood your question. If there are multiple threads in the application, and that application is binary compatible with the new multicore CPU, the threads will indeed be scheduled to run on different CPUs, if the OS scheduler deems it appropriate.

Well it all depends on the software that if it wants to utilize other cores or not (if present). Lets take an example of Linux on ARM's cortexA53.
Initially a vendor provided boot loader runs on, FSBL (First state bootloader). It then passes control to Arm trusted firmware. ATF then runs uboot. All these run on a single core. Then uboot loads linux kernel and passes control to it. Linux then initializes some stuff and looks into some option, first in the bootargs for smp or nosmp flags. if smp it will get the number of CPUs assigned to it from dtb and then using SMC calls to ATF it will start other cores and then assign work to those cores to provide true feel of multiprocessing environment. This is normally called load balancing and in linux it is mostly done in fair.c file.

Related

Is there code to emulate and increase the number of CPU in a system

This code somehow would make the OS think there are more cores on the system and dispatch threads to those. That code then should be able to take that dispatched thread and run it on a different CPU on another machine.
Of course it would have to know the specs of that machine before hand so it can expose the right CPU specs to the OS or whatever information is needed by the OS.

How to restrict a CPU core to only 2 applications using Linux scheduler?

I would like to study the interaction between two applications on a CPU core, one is persistent application (NVM resident) and another is regular (DRAM resident). For this I want to only schedule these 2 applications on a core and nothing else. I am looking at Linux scheduler to accomplish this. Can someone please help me with a direction to achieve this? Can I achieve this using sched or do I need to modify the scheduler code of kernel so that scheduler does not schedule applications to the core of my interest.
You can use the isolcpus command-line parameter of the kernel:
This option can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms. You can move a process onto or off an "isolated" CPU via the CPU affinity syscalls or cpuset.
isolcpus is a solution but the linux kernel documentation declares it as deprecated:
isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
[Deprecated - use cpusets instead]
The cpuset sub-system of the cgroups is preferable. A tool like partrt is based on this principle to divide the CPU cores in two subsets: nrt where all the processes run and rt into which you can move your applications. Hence, the applications are isolated on some CPU cores.

Linux and RTOS using SoC (ARM, Xilinx)

I am facing a design "issue". I have a board with Xilinx Zynq Soc including dual-core ARM9 and I need to develop an application to support real-time property control application (time deadlines to response time) and also application to do heavy processing (image etc.) and some basic communications between them, but most importantly I will need to be able to control the Linux part (at least e.g. to somehow suspend it, "pause it" in best case to have possibility to shut it down and then run it again). So I was wondering how to combine it.
One of the option, could be RTLinux, which at least to description, what I found offers possibility to run realtime kernel and linux kernel next to it as a thread but it seems that it is now proprieatary by WindRiver..
Then I stepped up over MicroBlaze, where it could be possible to "create" soft processor on Programmable logic, but I am not sure if I can run RTOS on ARM and Linux there?
There are two things that seem to be known as rtlinux. The one you mention, a Wind River revival of the MERT system is a product of that company. Another one, seemingly “RT Linux”, is a real time patch to the mainline kernel which provides deterministic scheduling and fine grained kernel pre-emption.
I think it is the latter one that you want. 10s of google indicates that there is a kconfig target for this SoC, so all the pieces you need should be there.
Do remember there is more to a real time system than just the ability to be real time; the subsystems also have to be well behaved.
Given your description, you have (at least) the following design options:
Dual kernel approach: this means patching the Linux kernel with a (quite invasive) patch that runs a tiny real-time kernel alongside the standard kernel. This approach allows reaching good real-time performance (even in the order of us) at the cost of complexity. It was implemented by the RTLinux project (acquired and then discontinued by Windriver), then by RTAI (mostly focusing on x86) and Xenomai.
If you go along this path, you can see if Xenomai supports your specific SoC; then patch, configure and rebuild the kernel; and finally write the real-time code following Xenomai's API.
Improving the responsiveness of the Linux standard kernel: this is what the PREEMPT_RT project aims at. The real-time performance is lower with respect to the previous approach, but you don't have to write real-time specific code. With this approach, you can patch and build the kernel, then see if the real-time performance is sufficient for your needs.
Synthesizing a Microblaze soft-core on the FPGA, then run Linux on the ARM cores and the real-time code ((either bare-metal or with an RTOS) on the Microblaze.
Unfortunately, your specific SoC does not support ARM's virtualization extensions. Otherwise there would be the additional option of Multi-OS approach: running the Linux OS on one ARM core and the real-time code (either bare-metal or with an RTOS like ERIKA Enterprise) on the other ARM core, through a hypervisor like Jailhouse or Xen.

How to benchmark Linux threaded programs?

I'm trying to compare the performance of threaded programs (on Linux). Since the programs use different thread synchronization methods and different lock granularity, running the programs on a shared server or desktop would not be good, since the other tasks may interfere with the scheduling of my programs. I don't have dedicated hosts, so I thought that using qemu would be a good option.
What I want to know is:
Are there any alternatives for this task?
I suppose that there is no way to reproduce scheduling done by guest Linux system on qemu, if
I - need to? (Suppose my program goes unusually skow or fast -- I'd like to know if I can run it again, but keeping exactly the same scheduling for its threads). Or is there a way?

Threads and CPU Affinity

Lets say there are two processors on a machine. Thread A is running on P1 and Thread B is running on P2.
Thread A calls Sleep(10000);
Is it possible that when Thread A starts executing again, it runs on P2?
If yes, who decides this transition? If no, why not?
Does Processor store some data that which all threads it's running or OS binds each thread to Processor for its full lifetime ?
It is possible. This would be determined by the operating system process scheduler and may also be dependent on the application that is running. No information about previously running threads is kept by the processor, aside from whatever is in the cache.
This is dependent on many things, it behaves differently depending on the particular operating system. See also: Processor Affinity and Scheduling Algorithms. Under Windows you can pin a particular process to a processor core via the task manager.
Yes, it is possible. Though ultimately a thread inherits its CPU (or CPU core) from the process (executable.) In operating systems, which CPU or CPU core a process runs on for its current quanta (time slice) is decided by the Scheduler:
http://en.wikipedia.org/wiki/Scheduling_(computing)
-Oisin
The OS decides which processor to run the thread on, and it may easily change during the lifetime of that thread, especially if there is a context switch (caused by the sleep). It's completely possible if the system is loaded that both threads will be running on the same processor (or core), just at different times. Or if there isn't any load on the system, both threads may continue to run on separate processors.

Resources