How well do common languages perform multi-threading?

How well do common languages perform multi-threading? - multithreading

In my CS class we're discussing threads and processes. I'm curious to know what common programming languages (Java, C/C++, C#, Python) can actually implement multi-threading and, if they do, how efficiently they do it.
We were shown a simple multi-threading structure in C but they didn't demonstrate the difference by running it or by a chart of collected results from a previous test. I assume that the gains for some languages using multi-threading may be negligible
EDIT
PDizzle pointed out that the gains in efficiency isn't necessarily dependent upon the language but rather what the applications/software in question require, as well as how well it is implemented for said application/software

When a program creates a separate thread for processing, it all boils down to the program making a call to the operating system to request resources for a thread.
Each operating system has an API programming languages can request multi-threading to use in a program. The implementation is platform dependent. C++ (now) has the std::thread that has operating system dependent calls. Java has classes that implement calls from the virtual machine to the operating system for requesting a thread.
I assume that the gains for some languages using multi-threading may
be negligible
No, the gains from using multi-threading in general may be negligible depending on the application requirements. I would say it's more important how an application uses threading to accomplish a task than worry about the overhead each language has to access multi-threading.

I think most modern languages do multitasking well. Modern being c++11 ,java, c#, d etc.
However most programs don't benefit from multitasking not because of the language in use, but because the algorithm being multi threaded Doesn't benefit from parallel processing. Think sorting algorithm and the like.

Related

C++11 threading vs. OpenMP for simple parallel loops. Which, when?

This is something of a follow-up to this other question of mine.
I would like to know if parallelized loops with a reduction operation, like a parallelized integration, belongs to the domain of applicability of C++11 threading or if OpenMP is best suited for tasks like this.
Now, consider the same setting but with threads executing computations that may throw exceptions. Does it change the scenario? Would now C++11 threading be best suited?
Thank you.

IMO, I would prefer OpenMP for any HPC / scientific and engineering computing codes. It more directly targets data parallelism. C++11 threading represents more task parallelism, which is preferable for other kinds of software (e.g., network server applications).
The situations might change in the future, there are some efforts to integrate more parallelism into C++, such as parallel STL algorithms. However, we now even do not know how this parallelism will look like.
You also rarely build codes from scratch. There are many performance-aware multi-threaded libraries that support OpenMP (sorting, linear algebra, ...), however few that support C++11 threads.

As best as I can determine, OpenMP represents greater performance potential, simply because there are a lot more tricks a compiler can use (particularly if your cpu supports vectorized computations) if it can be directly instructed to parallelize a construct. Host/dispatch threading models (like the threading models in Java and C++11) can't really do that without remarkably intelligent code analysis tools.
However, OpenMP does represent a tax on both code readability and design flexibility. Parallel execution of heterogeneous tasks is possible in OpenMP, but much more verbose to implement, and much more difficult to parse. And because it depends on preprocessor macros (which C++ purists don't like anyways) it's virtually impossible to set dynamic state about the threading model itself.
Personally, having worked on enterprise level code, I think I prefer Host/dispatch threading (aka, C++11 threads). It may represent a performance sacrifice, but as the saying goes: "Processor Cycles are much cheaper than Developer Cycles". And if you really, really are in a performance constrained environment, it either means an algorithm problem, and switching to OpenMP probably wouldn't fix it; or, it means you should probably be looking into compute cards or OpenCL/Cuda programming.

What does built in support for multithreading mean?

Java provides built-in support for multithreaded programming.
That is what my book says. I can do multithreaded programming in C, C++ also. So do they also provide built-in support for multithreading?
What does built in support for multithreading mean? Isn't it the OS that ACTUALLY provides support for multithreading?
Are there any programming languages that cannot support multithreading? If so why? (I am asking this question because, if the OS provides support for multithreading then why cant we do multithreaded programming on all languages that are supported on that OS?)

The issue is one of language-support vs. library support for multithreading.
Java's use of the keyword synchronized for placing locks on objects is a language-level construct. Also the built-in methods on Object (wait, notify, notifyAll) are implemented directly in runtime.
There is a bit of a debate regarding whether languages should implement threading though keywords and language structures and core data types vs. having all thread capabilities in the library.
A research paper espousing the view that language-level threading is beneficial is the relatively famous http://www.hpl.hp.com/personal/Hans_Boehm/misc_slides/pldi05_threads.pdf.
In theory, any language built on a C runtime can access a library such as pthreads, and any language running on a JVM can use those threads. In short all languages that can use a library (and have the notion of function pointers) can indeed do multithreading.

I believe they mean that Java has keywords like volatile and synchronized keyword built-in, to make multithreading easier, and that the library already provides threading classes so you don't need a 3rd party library.

The language needs constructs to create and destroy threads, and in turn the OS needs to provide this behaviour to the language.
Exception being Java Green Threads that aren't real threads at all, similarly with Erlang I think.
A language without threading support, say Basic implemented by QBasic in DOS. Basic is supposed to be basic so threads and processes are advanced features that are non-productive in the languages intent.

C and C++ as a language have no mechanism to:
Start a thread
Declare a mutex, semaphore, etc
etc
This is not part of the language specification. However, such facilities exist on every major operating system. Unlike in Java, these facilities are different on different operating systems: pthread on Linux, OS X and other UNIX-derivatives, CreateThread on Windows, another API on real-time operating systems.
Java has a language definition for Thread, synchronized blocks and methods, 'notify' 'wait' as part of the core Object and the like, which makes the language proper understand multithreading.

It means that there is functionality in the language's runtime that models the concepts of threads and all that goes with that such as providing synchronisation. What happens behind the scenes is up to the languages implementors... they could choose to use native OS threading or they might fake it.
A language that doesn't support it could be VB6 (at least not natively, IIRC)

Does pthreads provide any advantages over GCD?

Having recently learned Grand Central Dispatch, I've found multithreaded code to be pretty intuitive(with GCD). I like the fact that no locks are required(and the fact that it uses lockless data structures internally), and that the API is very simple.
Now, I'm beginning to learn pthreads, and I can't help but be a little overwhelmed with the complexity. Thread joins, mutexes, condition variables- all of these things aren't necessary in GCD, but have a lot of API calls in pthreads.
Does pthreads provide any advantages over GCD? Is it more efficient? Are there normal-use cases where pthreads can do things that GCD can not do(excluding kernel-level software)?
In terms of cross-platform compatibility, I'm not too concerned. After all, libdispatch is open source, Apple has submtited their closure changes as patches to GCC, clang supports closures, and already(e.x. FreeBSD), we're starting to see some non-Apple implementations of GCD. I'm mostly interested in use of the API(specific examples would be great!).

I am coming from the other direction: started using pthreads in my application, which I recently replaced with C++11's std::thread. Now, I am playing with higher-level constructs like the pseudo-boost threadpool, and even more abstract, Intel's Threading Building Blocks. I would consider GCD to be at or even higher than TBB.
A few comments:
imho, pthread is not more complex than GCD: at its basic core, pthread actually contains very few commands (just a handful: using just the ones mentioned in the OP will give you 95%+ of the functionality that you ever need). Like any lower-level library, it's how you put them together and how you use it which gives you its power. Don't forget that the ultimately, libraries like GCD and TBB will call a threading library like pthreads or std::thread.
sometimes, it's not what you use, but how you use it, which determines success vs failure. As proponents of the library, TBB or GCD will tell you about all the benefits of using their libraries, but until you try them out in a real application context, all of it is of theoretical benefit. For example, when I read about how easy it was to use a finely-grained parallel_for, I immediately used it in a task for which I thought could benefit from parallelism. Naturally, I, too, was drawn by the fact that TBB would handle all the details about optimal loading balancing and thread allocation. The result? TBB took five times longer than the single-threaded version! But I do not blame TBB: in retrospect, this is obviously a case of a misuse of the parallel_for: when I read the fine-print, I discovered the overhead involved in using parallel_for and posited that in my case, the costs of context-switching and added function calls outweighed the benefits of using multiple threads. So you must profile your case to see which one will run faster. You may have to reorganize your algorithm to use less threading overhead.
why does this happen? How can pthread or no threads be faster than a GCD or a TBB? When a designer designs GCD or TBB, he must make assumptions about the environment in which tasks will run. In fact, the library must be general enough that it can handle strange, unforseen use-cases by the developer. These general implementations will not come for free. On the plus-side, a library will query the hardware and the current running environment to do a better job of load-balancing. Will it work to your benefit? The only way to know is to try it out.
is there any benefit to learning lower-level libraries like std::thread when higher-level libraries are available? The answer is a resounding YES. The advantage of using higher-level libraries is, abstraction from the implementation details. The disadvantage of using higher-level libraries is also abstraction from the implementation details. When using pthreads, I am supremely aware of shared state and lifetimes of objects, because if I let my guard down, especially in a medium to large size project, I can very easily get race conditions or memory faults. Do these problems go away when I use a higher-level library? Not really. It seems like I don't need to think about them, but in fact, if I get sloppy with those details, the library implementation will also crash. So you will find that if you understand the lower-level constructs, all those libraries actually make sense, because at some point, you will be thinking about implementing them yourself, if you use the lower-level calls. Of course, at that point, it's usually better to use a time-tested and debugged library call.
So, let's break down the possible implementations:
TBB/GCD library calls: greatest benefit is for beginners of threading. They have lower barriers to entry compared to learning lower level libraries. However, they also ignore/hide some of the traps of using multi-threading. Dynamic load balancing will make your application more portable without additional coding on your part.
pthread and std::thread calls: there are actually very few calls to learn, but to use them correctly takes attention to detail and deep awareness of how your application works. If you can understand threads at this level, the APIs of higher-level libraries will certainly make more sense.
single-threaded algorithm: let us not forget the benefits of a simple single-threaded segment. For most applications, a single-thread is easier to understand and much less error-prone than multi-threading. In fact, in many cases, it may be the appropriate design choice. The fact of the matter is, a real application goes through various multi-threading phases and single-threading phases: there may be no need to be multi-threaded all the time.
Which one is fastest? The surprising truth is, it could be any of the three of the above. To get speed benefits of multi-threading, you may need to drastically reorganize your algorithms. Whether or not the benefits outweigh the costs is highly case-dependent.
Oh, and the OP asked about cases where a thread_pool is not appropriate. Easy case: if you have a tight loop that does not require many cycles per loop to compute, using thread_pool may cost more than the benefits without serious reworking. Also be aware of the overhead of function calls like lambda through thread pools vs the use of a single tight loop.
For most applications, multi-threading is a kind of optimization, so do it at the right time and in the right places.

That overwhelming feeling that you are experiencing.. that's exactly why GCD was invented.
At the most basic level there are threads, pthreads is a POSIX API for threads so you can write code in any compliant OS and expect it to work. GCD is built on top of threads (although I'm not sure if they actually used pthreads as the API). I believe GCD only works on OS X and iOS — that in a nutshell is its main disadvantage.
Note that projects that make heavy use of threads and require high performance implement their own version of thread pools. GCD allows you to avoid (re)inventing the wheel for the umpteenth time.

GCD is an Apple technology, and not the most cross platform compatible; pthread are available on just about everything from OSX, Linux, Unix, Windows.. including this toaster
GCD is optimized for thread pool parallelism. Pthreads are (as you said) very complex building blocks for parallelism, you are left to develop your own models. I highly recommend picking up a book on the topic if you're interested in learning more about pthreads and different models of parallelism.

As any declarative/assisted approach like openmp or Intel TBB GCD should be very good at embarrassingly parallel problems and will probably easily beat naïve manually pthread-ed parallel sort. I would suggest you still learn pthreads though. You'll understand concurrency better, you'd be able to apply right tool in each particular situation, and if for nothing else - there's ton of pthread-based code out there - you'd be able to read "legacy" code.

Usual: 1 task per Pthread implementations use mutexes (an OS feature).
GCD:
1 task per block, grouped into queues. 1 thread per virtual CPU can get a queue and run without mutexes through all the tasks. This reduces thread management overhead and mutex overhead, which should increase performance.

GCD abstracts threads and gives you dispatch queues. It creates threads as it deems necessary taking into account the number of processor cores available.
GCD is Open Source and is available through the libdispatch library. FreeBSD includes libdispatch as of 8.1. GCD and C Blocks are mayor contributions from Apple to the C programming community. I would never use any OS that doesn't support GCD.

threads and high level languages

can someone tell me why if i use threads it's better to use an low level languages like c++
and not c# and JAVA? someone asked me that in an interview and i did'nt know the answer

It's news to me. Higher level languages provide easy to use abstractions over thread management, for example.
I expect the interviewer's point would make sense in context. It's dependent on the problem in hand - the level of timing control you need if you're writing a computer game or software for an engine management system may be greater than if you are writing a conference room booking system.
You trade off the low-level control and the associated learning curve and risk you get with lower-level languages for ease of use, safety and productivity of higher-level languages.

I don't think this is necessarily true. In Java (I can't comment on C#) a thread maps directly to a native thread. From here:
The Java HotSpot™ virtual machine
currently associates each Java thread
with a unique native thread. The
relationship between the Java thread
and the native thread is stable and
persists for the lifetime of the Java
thread.
plus you have the additional high level constructs such as the Executor framework.
Going forwards, functional languages (such as F# and Scala) encourage immutability, which contribute to a safer threaded environment.
There may well be scenarios where a low-level language offers more control (as for most requirements), but I suspect those will be fairly specialised situations. You have to balance that against the safety/productivity that the higher-level languages offer.
EDIT : From your comments supplementing the question, this may relate to running a garbage collector and consequent garbage-collection pauses and the impact on providing real-time performance and predictability. Threading in C/C++ may well offer some benefits in this area since a garbage collection cycle is not going to kick off during some critical time-dependent code. For this reason (amongst others) Java can't be considered as a real-time platform.

like most answers : it depends. languages with built in threading facilities like C# and Java
will do some or most of the work needed for thread usage and synchronization for you.
with C++ you have do it yourself but you can employ better optimization techniques for your specific OS and platform

Will you use threads or not - depends solely on application, not on language. And language is a function of design.
C++ provides more control, c# provides more abstraction, Java provides simplification, but in the end they all work the same way.

Which scripting languages support multi-core programming?

I have written a little python application and here you can see how Task Manager looks during a typical run.
(source: weinzierl.name)
While the application is perfectly multithreaded, unsurprisingly it uses only one CPU core.
Regardless of the fact that most modern scripting languages support multithreading, scripts can run on one CPU core only.
Ruby, Python, Lua, PHP all can only run on a single core.
Even Erlang, which is said to be especially good for concurrent programming, is affected.
Is there a scripting language that has built in
support for threads that are not confined to a single core?
WRAP UP
Answers were not quite what I expected, but the TCL answer comes close.
I'd like to add perl, which (much like TCL) has interpreter-based threads.
Jython, IronPython and Groovy fall under the umbrella of combining a proven language with the proven virtual machine of another language. Thanks for your hints in this
direction.
I chose Aiden Bell's answer as Accepted Answer.
He does not suggest a particular language but his remark was most insightful to me.

You seem use a definition of "scripting language" that may raise a few eyebrows, and I don't know what that implies about your other requirements.
Anyway, have you considered TCL? It will do what you want, I believe.
Since you are including fairly general purpose languages in your list, I don't know how heavy an implementation is acceptable to you. I'd be surprised if one of the zillion Scheme implementations doesn't to native threads, but off the top of my head, I can only remember the MzScheme used to but I seem to remember support was dropped. Certainly some of the Common LISP implementations do this well. If Embeddable Common Lisp (ECL) does, it might work for you. I don't use it though so I'm not sure what the state of it's threading support is, and this may of course depend on platform.
Update Also, if I recall correctly, GHC Haskell doesn't do quite what you are asking, but may do effectively what you want since, again, as I recall, it will spin of a native thread per core or so and then run its threads across those....

You can freely multi-thread with the Python language in implementations such as Jython (on the JVM, as #Reginaldo mention Groovy is) and IronPython (on .NET). For the classical CPython implementation of the Python language, as #Dan's comment mentions, multiprocessing (rather than threading) is the way to freely use as many cores as you have available

Thread syntax may be static, but implementation across operating systems and virtual machines may change
Your scripting language may use true threading on one OS and fake-threads on another.
If you have performance requirements, it might be worth looking to ensure that the scripted threads fall through to the most beneficial layer in the OS. Userspace threads will be faster, but for largely blocking thread activity kernel threads will be better.

As Groovy is based on the Java virtual machine, you get support for true threads.

F# on .NET 4 has excellent support for parallel programming and extremely good performance as well as support for .fsx files that are specifically designed for scripting. I do all my scripting using F#.

An answer for this question has already been accepted, but just to add that besides tcl, the only other interpreted scripting language that I know of that supports multithreading and thread-safe programming is Qore.
Qore was designed from the bottom up to support multithreading; every aspect of the language is thread-safe; the language was designed to support SMP scalability and multithreading natively. For example, you can use the background operator to start a new thread or the ThreadPool class to manage a pool of threads. Qore will also throw exceptions with common thread errors so that threading errors (like potential deadlocks or errors with threading APIs like trying to grab a lock that's already held by the current thread) are immediately visible to the programmer.
Qore additionally supports and thread resources; for example, a DatasourcePool allocation is treated as a thread-local resource; if you forget to commit or roll back a transaction before you end your thread, the thread resource handling for the DatasourcePool class will roll back the transaction automatically and throw an exception with user-friendly information about the problem and how it was solved.
Maybe it could be useful for you - an overview of Qore's features is here: Why use Qore?.

CSScript in combination with Parallel Extensions shouldn't be a bad option. You write your code in pure C# and then run it as a script.

It is not related to the threading mechanism. The problem is that (for example in python) you have to get interpreter instance to run the script. To acquire the interpreter you have to lock it as it is going to keep the reference count and etc and need to avoid concurrent access to this objects. Python uses pthread and they are real threads but when you are working with python objects just one thread is running an others waiting. They call this GIL (Global Interpreter Lock) and it is the main problem that makes real parallelism impossible inside a process.
https://wiki.python.org/moin/GlobalInterpreterLock
The other scripting languages may have kind of the same problem.

Guile supports POSIX threads which I believe are hardware threads.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string