Guide for working with Linux thread priorities and scheduling policies? [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm having trouble getting the hang of thread(/process) prioritization on Linux, scheduling policy selection, what to choose when and how, and what the exact effects are.
Is there any documentation (like a guide) somewhere, preferably with concrete examples and timelines, which I could consult?

I'm having trouble getting the hang of thread(/process) prioritization on Linux, scheduling policy selection
The prioritization works by utilizing the underlying OS' thread and process priorities and it is hard to generalize about the specifics of it from the standpoint of documentation which may be why you've not found guides online.
My recommendation is (frankly) to not bother with thread priorities. I've done a large amount of threaded programming and I've never found the need to do anything but the default prioritization. About the only time thread prioritization will make a difference is if all of the threads are completely CPU bound and you want one task or another to get more cycles.
In addition, I'm pretty sure that under Linux at least, this isn't about preemption but more about run frequency. Many thread implementations use a priority scheduling queue so higher frequency threads get preference with logic in there to avoid starving the lower frequency threads. This means that any IO or other blocking operation is going to cause a lower priority thread to run and get its time slice.
This page is a good example of the complexities of the issue. To quote:
As can be seen, thread priorities 1-8 end up with a practically equal share of the CPU, whilst priorities 9 and 10 get a vastly greater share (though with essentially no difference between 9 and 10). The version tested was Java 6 Update 10. For what it's worth, I repeated the experiment on a dual core machine running Vista, and the shape of the resulting graph is the same. My best guess for the special behaviour of priorities 9 and 10 is that THREAD_PRIORITY_HIGHEST in a foreground window has just enough priority for certain other special treatment by the scheduler to kick in (for example, threads of internal priority 14 and above have their full quantum replenished after a wait, whereas lower priorities have them reduced by 1).
If you must use thread priorities then to you may have to write some test programs to understand how your architecture utilizes them.

Related

Difference between SIMD and Multi-threading [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What is the difference between the SIMD and Muti-threading concepts that one comes across in parallel programming paradigm?
SIMD means "Single Instruction, Multiple Data" and is an umbrella term describing a method whereby many elements are loaded into extra-wide CPU registers at the same time and a single low-level instruction (such as ADD, MULTIPLY, AND, XOR) is applied to all the elements in parallel. Specific examples are MMX, SSE2/3 and AVX on Intel processors, or NEON on ARM processors, or AltiVec on PowerPC. It is very low-level and typically only for a few clock cycles. An example might be that, rather than go into a for loop increasing the brightness of the pixels in an image one-by-one, you load 64 off 8-bit pixels into a single 512-bit wide register and multiply them all up at the same time in one or two clock cycles. SIMD is often implemented for you in high-performance libraries (like OpenCV) or is generated for you by your compiler when you compile with vectorisation enabled, typically at optimisation level 3 or higher (-O3 switch). Very experienced programmers may choose to write their own, using "intrinsics".
Multi-threading refers to when you have multiple threads of execution, normally running on different CPU cores, at the same time. It is higher-level than SIMD and typically threads exist a lot longer. One thread might be acquiring images, another thread might be detecting objects, another might be tracking the objects and a final one might be displaying the results. A feature of multi-threading is that the threads all share the same address space, so data in one thread can be seen and manipulated by others. This makes threads light-weight compared to multiple processes, but can make for harder debugging. Threads are called "light-weight" because they typically take much less time to create and start than full-blown processes.
Multi-processing is similar to multi-threading except each process has its own address space, so if you want to share data between the processes, you need to work harder to do it. It has the benefit over multi-threading that one process is unlikely to crash another or interfere with its data, making it somewhat easier to debug.
If I make an analogy with cooking a meal, then SIMD is like lining up all your green beans and slicing them in one go. The single instruction is "slice", the multiple, repeated data are the beans. In fact, lining things up ("memory alignment") is an important aspect of SIMD.
Then multi-threading is like having multiple chefs all taking ingredients from a shared vegetable larder, preparing them and putting them in a big shared cook-pot. You get the job done faster because there are multiple chefs - analogous to CPU cores - working at once.
In this little analogy, multi-processing is more like each chef having his own vegetable larder and cook-pot, so if one chef runs out of vegetables, or cooking gas, the others are not affected - things are more independent. You get the job done faster, because there are more chefs, just you have to do a bit more organisation (or "synchronisation") to get all the chefs to serve their meals at the same time at the end.
There is nothing to prevent an application using SIMD as well as multi-threading and multi-processing at the same time. Going back to the cooking analogy, you can have multiple chefs (multi-threading or multi-processing) who are all slicing their green beans efficiently (SIMD). It is my impression that most applications either use SIMD and multi-threading, or SIMD and multi-processing, but relatively few use both multi-threading and multi-processing. YMMV on this bit!

how to achieve disk write speed as high as fio does? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I bought a Highpoint a HBA card with 4 Samsung 960PRO in it.As the official site said this card can perform 7500MB/s in writing and 13000MB/s in reading.
When I test this card with fio in my ubuntu 16.04 system,I got a writing speed of about 7000MB/s,here is my test arguments:
sudo fio -filename=/home/xid/raid0_dir0/fio.test -direct=1 -rw=write -ioengine=sync -bs=2k -iodepth=32 -size=100G -numjobs=1 -runtime=100 -time_base=1 -group_reporting -name=test-seq-write
I have made a raid0 in the card and made a xfs filesystem.I want to know how to achieve disk writing speed as high as fio performed if I want to use functions such as "open(),read(),write()" or functions such as "fopen(),fread(),fwrite()" in my console applications.
I'll just note that the fio job you specified seems a little flawed:
-direct=1 -rw=write -ioengine=sync -bs=2k -iodepth=32
(for the sake of simplicity let's assume the dashes are actually double)
The above is asking trying to ask a synchronous ioengine to use an iodepth of greater than one. This usually doesn't make sense and the iodepth section of the fio documentation warns about this:
iodepth=int
Number of I/O units to keep in flight against the file. Note that
increasing iodepth beyond 1 will not affect synchronous ioengines
(except for small degrees when verify_async is in use). Even async
engines may impose OS restrictions causing the desired depth not to be
achieved. [emphasis added]
You didn't post the fio output so we can't tell if you ever achieved an iodepth of greater than one. 7.5GByte/s seems high for such a job and I can't help thinking your filesystem quietly went and did buffering behind your back but who knows? I can't say more because the output of you fio run is unavailable I'm afraid.
Also note the data fio was writing might not have been random enough to defeat compression thus helping to achieve an artificially high I/O speed...
Anyway to your main question:
how to achieve disk write speed as high as fio does?
Your example shows you are telling fio to use an ioengine that does regular write calls. With this in mind, theoretically you should be able to achieve a similar speed by:
Preallocating your file and only writing into the allocated portions of it (so you are not doing extending writes)
Fulfilling all the requirements of using O_DIRECT (there are strict memory alignment and size constraints that MUST be fulfilled)
Making sure your write operations are working on buffers writing in chunks of exactly 2048 bytes (or greater so long as they are powers of two)
Submitting your writes as soon as possible :-)
You may find not using O_DIRECT (and thus allowing buffered I/O to do coalescing) is better if for some reason you are unable to submit "large" well aligned buffers every time.

In embedded design, what is the actual overhead of using a linux os vs programming directly against the cpu? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I understand that the answer to this question, like most, is "it depends", but what I am looking for is not so much an answer as much as a rationale for the different things affecting the decision.
My use case is that I have an ARM Cortex A8 (TI AM335x) running an embedded device. My options are to use some embedded linux to take advantage of some prebuilt drivers and other things to make development faster, but my biggest concern for this project is the speed of the device. Memory and disk space are not much of a concern. I think it is a safe assumption that programming directly against the mpu and not using a full OS would certainly make the application faster, but gaining a 1 or 2 percent speedup is not worth the extra development time.
I imagine that the largest slowdowns are going to come from the kernel context switching and memory mapping but I do not have the knowledge to correctly assess or gauge the extent of those slowdowns. Any guidance would be greatly appreciated!
Your concerns are reasonable. Going bare metal can/will improve performance but it may only be a few percent improvement..."it depends".
Going bare metal for something that has fully functional drivers in linux but no fully functional drivers bare metal, will cost you development and possibly maintenance time, is it worth that to get the performance gain?
You have to ask yourself as well am I using the right platform, and/or am I using the right approach for whatever it is you want to do on that processor that you think or know is too slow. Are you sure you know where the bottleneck is? Are you sure your optimization is in the right place?
You have not provided any info that would give us a gut feel, so you have to go on your gut feel as to what path to take. A different embedded platform (pros and cons), bare metal or operating system. Linux or rtos or other. One programming language vs another, one peripheral vs another, and so on and so on. You wont actually know until you try each of these paths, but that can be and likely is cost and time prohibitive...
As far as the generic title question of os vs bare metal, the answer is "it depends". The differences can swing widely, from almost the same to hundreds to thousands of times faster on bare metal. But for any particular application/task/algorithm...it depends.

Where to begin serious concurrent (multi-threaded, parallel ?) programming [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I would want to seriously begin multi-threaded/parallel/concurrent programming in real world. By that I mean like trying to solve real problems in parallel and concurrently and not just learning about low-level details of pthread or MPI, locks, races and the like or academic, text-book examples. Regarding low level mechanism of parallel programming, in fact I would rather not know anything about them and just stick with something more like Actor model :).
I have heard that some programming languages are inherently like what I am looking for and their paradigm is to look at the problem at hand in a parallel (concurrent, multi-threaded, multi-processed) fashion and provide language level tools and constructs to implement the task in parallel (e.g. Erlang has a concept of process as a language construct?).
I fancy a language with a type system like that of Scala ... I know PHP very well and I used to do a lot of coding in C/C++. I have a working knowledge of Scala and Java and I can read Haskell but I'm not particularity proficient at it. I'm quite familiar with Functional paradigm and I'm willing to learn much more. I am also interested in high level theoretical discussions about parallelism/concurrency.
I'd like to mention first off that parallel != concurrent. They're closely related concepts, but parallel computation is distinction from concurrent computation in that parallel flows of control happen simultaneously while concurrent may be interleaved but could possibly be parallel. A fine hair to split, but one that's important to understand.
... provide language level tools and constructs to implement the task in parallel (e.g. Erlang has a concept of process as a language construct?).
An Erlang 'process' is a light-weight, memory isolated green thread. The language provides no shared memory constructs; data is passed between concurrent flows of control via 'messages'. Notice is said 'concurrent'. Erlang is explicitly designed to be a concurrent language and, it just so happens, will schedule some flows of control--which map 1:1 onto processes--in parallel. Erlang does not give you explicit control over scheduling, which is unlike the threading model.
It's hard to know what you're looking for--your question is rather broad--but any of the languages you've mentioned (except maybe PHP?) will allow you to exploit the multiple CPUs that are surely sitting in your computer. Pick several to focus on, expect to spend several years studying and go for it.

Threads and processes? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system.
This is very abstract!
What would be real world/tangible/physical interpretation of threads?
How could I tell if an application (by looking at its code) is a single threaded or multi threaded?
Are there any benefits in making an application multi threaded? In which cases could one do that?
Is there anything like multi process application too?
Is technology a limiting factor to decide if it could be made multi threaded os is it just the design choice? e.g. is it possible to make multi threaded applications in flex?
If anyone can could give me an example or an analogy to explain these things, it would be very helpful.
What would be real world/tangible/physical interpretation of threads?
Think about a thread as an independent unit of execution that can be executed concurrently (at the same time) on a given CPU(s). A good analogy would be multiple cars driving around independently on the same road. Here a "car" is a thread, and a road is that CPU. So the function of all these cars is somewhat the same: "drive people around", but the kicker is that people should not stand in line to wait for a single car => they can drive at the same time in different cars (threads).
Technically, however, depending on number of CPU cores, and overall hardware / OS architecture there will be some context switching, where CPU would make it seem it happens simultaneously, but in reality it switches from one thread to another.
How could I tell if an application (by looking at its code) is a single threaded or multi threaded?
This depends on several things, a language the code is written in, your understanding of the language, what code is trying to accomplish, etc.. Usually you can tell, but I do not believe this will solve anything. If you already have access to the code, it's a lot simpler to just ask the developer, or, in case it is an open source product, read documentation, post on user forums to figure it out.
Are there any benefits in making an application multi threaded? In which cases could one do that?
Yes, think about the car example above. The benefit = at the same time and decoupled execution of things. For example, you need to calculate how many starts are in a known universe. You can either have a single process go over all the stars and count them, or you can "spawn" multiple threads, and give each thread a galaxy to solve: "thread 1 counts all the stars in Milky Way, thread 2 counts all the starts in Andromeda, etc.."
Is there anything like multi process application too?
That is a matter of terminology, but the cleanest answer would be yes. For example, in Erlang, VM is capable of starting many very lightweight processes very fast, where each process does its own thing. On Unix servers if you do "ps -aux / ps -ef", for example, you'd see multiple "processes" executin, where each process may in fact have many threads doing its job.
Is technology a limiting factor to decide if it could be made multi threaded os is it just the design choice? e.g. is it possible to make multi threaded applications in flex?
2 threaded application is already multithreaded. You most likely already have 2 or more cores on your laptop / PC, so technology would pretty always encourage you to utilize those cores, rather than limit you. Having said that, the problem and requirements should drive the decision. Not the technology or tools. But if you do decide write a multithreaded application, make sure you understand all the gotchas and solutions to them. The best language I used so far to solve concurrency is Erlang, since concurrency is just built in to it. However, other languages like Scala, Java, C#, and mostly functional languages, where shared state is not a problem would also be a good choice.

Resources