Fortran77 routine with multi CPU? - multithreading

I would like to know if it's possible to run a program written in fortran, which is quite "heavy", exploiting all the 24 threads of the CPUs.
I am using intel parallel studio XE 2011 and the routine is written in fortran 77.

Sure, that's possible. There's quite some ways to go about this, but:
What I'm used to is having your "system logic" in a more versatile logic that directly supports things like pthreads or boost threads etc; most likely something like C or C++, and then spawning different threads for different parts of your calculation or parallelizable data parts, etc. Notice that modern compilers can generate binaries from fortran that adhere to normal C calling conventions, so that calling fortran functions doesn't have more overhead than calling other C, C++, ... functions. You should make damn sure not to modify global state in your fortran functions, though!
There's no parallelization framework for fortran77 that I would be aware of; there are a few in existence, but as far as I can tell they depend on language features that weren't there in F77. As #HighPerformanceMark pointed out, OpenMP could very well work with the intel compilers.

Related

C++11 threading vs. OpenMP for simple parallel loops. Which, when?

This is something of a follow-up to this other question of mine.
I would like to know if parallelized loops with a reduction operation, like a parallelized integration, belongs to the domain of applicability of C++11 threading or if OpenMP is best suited for tasks like this.
Now, consider the same setting but with threads executing computations that may throw exceptions. Does it change the scenario? Would now C++11 threading be best suited?
Thank you.
IMO, I would prefer OpenMP for any HPC / scientific and engineering computing codes. It more directly targets data parallelism. C++11 threading represents more task parallelism, which is preferable for other kinds of software (e.g., network server applications).
The situations might change in the future, there are some efforts to integrate more parallelism into C++, such as parallel STL algorithms. However, we now even do not know how this parallelism will look like.
You also rarely build codes from scratch. There are many performance-aware multi-threaded libraries that support OpenMP (sorting, linear algebra, ...), however few that support C++11 threads.
As best as I can determine, OpenMP represents greater performance potential, simply because there are a lot more tricks a compiler can use (particularly if your cpu supports vectorized computations) if it can be directly instructed to parallelize a construct. Host/dispatch threading models (like the threading models in Java and C++11) can't really do that without remarkably intelligent code analysis tools.
However, OpenMP does represent a tax on both code readability and design flexibility. Parallel execution of heterogeneous tasks is possible in OpenMP, but much more verbose to implement, and much more difficult to parse. And because it depends on preprocessor macros (which C++ purists don't like anyways) it's virtually impossible to set dynamic state about the threading model itself.
Personally, having worked on enterprise level code, I think I prefer Host/dispatch threading (aka, C++11 threads). It may represent a performance sacrifice, but as the saying goes: "Processor Cycles are much cheaper than Developer Cycles". And if you really, really are in a performance constrained environment, it either means an algorithm problem, and switching to OpenMP probably wouldn't fix it; or, it means you should probably be looking into compute cards or OpenCL/Cuda programming.

How well do common languages perform multi-threading?

In my CS class we're discussing threads and processes. I'm curious to know what common programming languages (Java, C/C++, C#, Python) can actually implement multi-threading and, if they do, how efficiently they do it.
We were shown a simple multi-threading structure in C but they didn't demonstrate the difference by running it or by a chart of collected results from a previous test. I assume that the gains for some languages using multi-threading may be negligible
EDIT
PDizzle pointed out that the gains in efficiency isn't necessarily dependent upon the language but rather what the applications/software in question require, as well as how well it is implemented for said application/software
When a program creates a separate thread for processing, it all boils down to the program making a call to the operating system to request resources for a thread.
Each operating system has an API programming languages can request multi-threading to use in a program. The implementation is platform dependent. C++ (now) has the std::thread that has operating system dependent calls. Java has classes that implement calls from the virtual machine to the operating system for requesting a thread.
I assume that the gains for some languages using multi-threading may
be negligible
No, the gains from using multi-threading in general may be negligible depending on the application requirements. I would say it's more important how an application uses threading to accomplish a task than worry about the overhead each language has to access multi-threading.
I think most modern languages do multitasking well. Modern being c++11 ,java, c#, d etc.
However most programs don't benefit from multitasking not because of the language in use, but because the algorithm being multi threaded Doesn't benefit from parallel processing. Think sorting algorithm and the like.

Scope of POSIX Threads

I have been learning thread programming in Java, where there are are sophisticated APIs for thread management. I recently came across this. I am curious to know if these are used now. Is the POSIX thread obsolete or is it the standard used now for threading in C++. I am not familiar with Threading in any other language apart from Java.
phtreads are the current standard POSIX threading library. They are missing some important new things, and I hope they will be updated to accomodate them. And the C++1x standard will also have some threading primitives built in.
pthreads is mostly missing atomic value operations. For example there are no thread safe primitive counter operations that are expected to be compiled to 1-5 machine instructions.
These are needed because while the semantics of the volatile keyword seem to suggest that you might be able to use it for some of these things, this is not the case. Modern CPUs manage their L1, L2 and L3 caches in a way that frequently results in writes reads and writes being seen in different order by different CPUs. And current optimizing compilers can significantly re-ordering operations so the order in which they happen no longer bears a lot of resemblance to the order they appear in the source code.
Mutexes, even the modern Linux version that avoids any system calls unless there is contention, are too heavyweight for something like a reference count.
C and C++ could be changed so the language made these guarantees happen all the time. But that would be contrary to their spirit of being 'high-level assembly'.

Does pthreads provide any advantages over GCD?

Having recently learned Grand Central Dispatch, I've found multithreaded code to be pretty intuitive(with GCD). I like the fact that no locks are required(and the fact that it uses lockless data structures internally), and that the API is very simple.
Now, I'm beginning to learn pthreads, and I can't help but be a little overwhelmed with the complexity. Thread joins, mutexes, condition variables- all of these things aren't necessary in GCD, but have a lot of API calls in pthreads.
Does pthreads provide any advantages over GCD? Is it more efficient? Are there normal-use cases where pthreads can do things that GCD can not do(excluding kernel-level software)?
In terms of cross-platform compatibility, I'm not too concerned. After all, libdispatch is open source, Apple has submtited their closure changes as patches to GCC, clang supports closures, and already(e.x. FreeBSD), we're starting to see some non-Apple implementations of GCD. I'm mostly interested in use of the API(specific examples would be great!).
I am coming from the other direction: started using pthreads in my application, which I recently replaced with C++11's std::thread. Now, I am playing with higher-level constructs like the pseudo-boost threadpool, and even more abstract, Intel's Threading Building Blocks. I would consider GCD to be at or even higher than TBB.
A few comments:
imho, pthread is not more complex than GCD: at its basic core, pthread actually contains very few commands (just a handful: using just the ones mentioned in the OP will give you 95%+ of the functionality that you ever need). Like any lower-level library, it's how you put them together and how you use it which gives you its power. Don't forget that the ultimately, libraries like GCD and TBB will call a threading library like pthreads or std::thread.
sometimes, it's not what you use, but how you use it, which determines success vs failure. As proponents of the library, TBB or GCD will tell you about all the benefits of using their libraries, but until you try them out in a real application context, all of it is of theoretical benefit. For example, when I read about how easy it was to use a finely-grained parallel_for, I immediately used it in a task for which I thought could benefit from parallelism. Naturally, I, too, was drawn by the fact that TBB would handle all the details about optimal loading balancing and thread allocation. The result? TBB took five times longer than the single-threaded version! But I do not blame TBB: in retrospect, this is obviously a case of a misuse of the parallel_for: when I read the fine-print, I discovered the overhead involved in using parallel_for and posited that in my case, the costs of context-switching and added function calls outweighed the benefits of using multiple threads. So you must profile your case to see which one will run faster. You may have to reorganize your algorithm to use less threading overhead.
why does this happen? How can pthread or no threads be faster than a GCD or a TBB? When a designer designs GCD or TBB, he must make assumptions about the environment in which tasks will run. In fact, the library must be general enough that it can handle strange, unforseen use-cases by the developer. These general implementations will not come for free. On the plus-side, a library will query the hardware and the current running environment to do a better job of load-balancing. Will it work to your benefit? The only way to know is to try it out.
is there any benefit to learning lower-level libraries like std::thread when higher-level libraries are available? The answer is a resounding YES. The advantage of using higher-level libraries is, abstraction from the implementation details. The disadvantage of using higher-level libraries is also abstraction from the implementation details. When using pthreads, I am supremely aware of shared state and lifetimes of objects, because if I let my guard down, especially in a medium to large size project, I can very easily get race conditions or memory faults. Do these problems go away when I use a higher-level library? Not really. It seems like I don't need to think about them, but in fact, if I get sloppy with those details, the library implementation will also crash. So you will find that if you understand the lower-level constructs, all those libraries actually make sense, because at some point, you will be thinking about implementing them yourself, if you use the lower-level calls. Of course, at that point, it's usually better to use a time-tested and debugged library call.
So, let's break down the possible implementations:
TBB/GCD library calls: greatest benefit is for beginners of threading. They have lower barriers to entry compared to learning lower level libraries. However, they also ignore/hide some of the traps of using multi-threading. Dynamic load balancing will make your application more portable without additional coding on your part.
pthread and std::thread calls: there are actually very few calls to learn, but to use them correctly takes attention to detail and deep awareness of how your application works. If you can understand threads at this level, the APIs of higher-level libraries will certainly make more sense.
single-threaded algorithm: let us not forget the benefits of a simple single-threaded segment. For most applications, a single-thread is easier to understand and much less error-prone than multi-threading. In fact, in many cases, it may be the appropriate design choice. The fact of the matter is, a real application goes through various multi-threading phases and single-threading phases: there may be no need to be multi-threaded all the time.
Which one is fastest? The surprising truth is, it could be any of the three of the above. To get speed benefits of multi-threading, you may need to drastically reorganize your algorithms. Whether or not the benefits outweigh the costs is highly case-dependent.
Oh, and the OP asked about cases where a thread_pool is not appropriate. Easy case: if you have a tight loop that does not require many cycles per loop to compute, using thread_pool may cost more than the benefits without serious reworking. Also be aware of the overhead of function calls like lambda through thread pools vs the use of a single tight loop.
For most applications, multi-threading is a kind of optimization, so do it at the right time and in the right places.
That overwhelming feeling that you are experiencing.. that's exactly why GCD was invented.
At the most basic level there are threads, pthreads is a POSIX API for threads so you can write code in any compliant OS and expect it to work. GCD is built on top of threads (although I'm not sure if they actually used pthreads as the API). I believe GCD only works on OS X and iOS — that in a nutshell is its main disadvantage.
Note that projects that make heavy use of threads and require high performance implement their own version of thread pools. GCD allows you to avoid (re)inventing the wheel for the umpteenth time.
GCD is an Apple technology, and not the most cross platform compatible; pthread are available on just about everything from OSX, Linux, Unix, Windows.. including this toaster
GCD is optimized for thread pool parallelism. Pthreads are (as you said) very complex building blocks for parallelism, you are left to develop your own models. I highly recommend picking up a book on the topic if you're interested in learning more about pthreads and different models of parallelism.
As any declarative/assisted approach like openmp or Intel TBB GCD should be very good at embarrassingly parallel problems and will probably easily beat naïve manually pthread-ed parallel sort. I would suggest you still learn pthreads though. You'll understand concurrency better, you'd be able to apply right tool in each particular situation, and if for nothing else - there's ton of pthread-based code out there - you'd be able to read "legacy" code.
Usual: 1 task per Pthread implementations use mutexes (an OS feature).
GCD:
1 task per block, grouped into queues. 1 thread per virtual CPU can get a queue and run without mutexes through all the tasks. This reduces thread management overhead and mutex overhead, which should increase performance.
GCD abstracts threads and gives you dispatch queues. It creates threads as it deems necessary taking into account the number of processor cores available.
GCD is Open Source and is available through the libdispatch library. FreeBSD includes libdispatch as of 8.1. GCD and C Blocks are mayor contributions from Apple to the C programming community. I would never use any OS that doesn't support GCD.

Which language would you use in your OS?

This is probably more of a subjective question, but which language (not API like .NET or JDK) would you use should you write your own operating system? Which language provides flexibility, simplicity, and possibly a low-level interface to the hardware? I was thinking Java or C...
C, of course.
Haskell.
Once you have flipped the right hardware bits, C is a terrible language to use for the rest of the OS. Things like the scheduler, filesystems, drivers, etc. are complex high-level algorithms, and you don't want to be writing those in assembly language (or C; same thing). It's too hard to get right. (The VM subsystem and memory manager may need to be written in something low-level, as you will need to bootstrap your high-level langauge's runtime somehow.)
Anyway, this isn't just a crazy idea that I am coming up with for SO. Here is an OS written in Haskell: http://programatica.cs.pdx.edu/House/
Lisp is another good choice; the original Lisp machines were infinitely more tweakable (at runtime) than "modern" OSes like UNIX and Windows.
Sometimes history forgets good ideas (often in the name of "maximum performance"), and that makes me sad.
D would be an interesting choice. From its own description:
D is a systems programming language. Its focus is on combining the power and high performance of C and C++ with the programmer productivity of modern languages like Ruby and Python. Special attention is given to the needs of quality assurance, documentation, management, portability and reliability.
The D runtime assumes the existence of a garbage collector, which would not be appropriate for the very lowest levels of the kernel. However, it would be appropriate for many of the higher layers.
Build the basic components like task schedulers and drivers etc with Assembly, then build the higher level components like applications and tools with C
I believe this is how Windows XP was built too (unsure about Windows Vista and Windows 7).
Definitely... C
C, ASM, C#
Singularity
Low-level in something like Haskell or D. Productivity over performance, in my opinion. You can rewrite slow parts in C++ or even assembly later if the need arises.
High-level in Python or Ruby. Ideally I'd also have a really fast JIT-capable VM for that language, but that's not going to happen for either language for a while. Lua would be a good alternative if speed gets in the way.
The kernel has to be written in a low-level language, C is by far the best choice for this, because it is so memory efficient. The higher levels could be built with a combination of Java or more ideally Objective-C, and scripting languages like python and ruby, or lua.
Honestly, I would either use C or some hierarchy of languages that I had either designed or fit together completely seamlessly. What I would be looking for is a seamless experience that starts at the bare metal level and then I could move to higher and higher level languages as I moved up the problem space. I would probably chose something like:
C - for bare metal stuff like drivers, kernel, etc
Java/C# - for application-level things like administration consoles, OS apps
Python/PowerShell - for scripting activities like common administrative tasks (creating a new user, etc)
Personally, I think C/C#/PowerShell is more tightly integrated and the type of experience I'd be looking for. Of course, if I ever got so ambitious as to write an OS, I would have a lot of spare time on my hands and would probably really enjoy tackling the language stack first. So maybe it would be L/L#/LScript ...
BitC seems to have this in mind. Despite it's name it seems to be the midpoint of assembly language and lisp. The goal was to make a language with a strong correspondence with machine language but have an intermediate representation that supports stronger correctness inferences than is possible with most other common languages. The languages was created as part of the Coyotos project, an operating system with lofty goals of security and reliability. Formal verification is made significantly possible with the ir used in BitC.
Ada:
Ada is a structured, statically typed, imperative, and object-oriented high-level computer programming language, extended from Pascal and other languages. It was originally designed by a team led by Jean Ichbiah of CII Honeywell Bull under contract to the United States Department of Defense (DoD) from 1977 to 1983 to supersede the hundreds of programming languages then used by the DoD. Ada is strongly typed and compilers are validated for reliability in mission-critical applications, such as avionics software.
Ada, because it was not only specifically developed for such projects, but it also provides support for several very useful high level features (such as support for strong typing, concurrency and abstraction) that are simply not available in standard C.
So that, even as a project grows, you don't have to work around language limitations (think encapsulation, abstraction, namespaces in C).
Don't get me wrong, C works obviously for a great many of projects, but once a project has gained a certain size (think Linux kernel, gcc, GNOME), you will inevitably appreciate certain features of more high level languages to make the development process less tedious and also less obfuscated.
In C however, these features usually end up being -pretty poorly- emulated by excessive and almost pervert use of the pre-processor (this can for example be seen in the gcc code base), so that you get to see lots of nested macros, that from an implementation point of view, actually emulate features found in other programming languages.
In addition, Ada is the only programming language, that I am aware of, that actually provides standardized support for source code analysis using the ASIS, having such a facility in place is however the prerequisite to actually be able to maintain and transform/re-engineer a code base in the long run (think refactoring).
Having an interface like ASIS available, means that you can actually implement "semantic patching", where you can automatically rename a file, function or variable/data structure and it will actually work.
Java ?? no jave runs on a virtual machine which needs an os to run on top of ,
maybe C and some ASM ;)
I would go with D to see whether it can do it.
I would only pick the following 3 out of practicality.
C (good old fashioned)
C++ (C with stuff tacked on. Windows is partially written in this)
Java (the medium level language that just might have a capable garbage collector with controllable pauses with G1).
If I were going to start a new OS I'd do it with the subset of C++ recommended by the embedded industry. You can use things like classes and use it "as a better C" and be just as fast. Just avoid things that have massive overhead. You can even use some template features, if you stick to a certain subet that basically don't have any overhead. Look on embedded.com for features in C++ that have little to no overhead, but will allow you organize your code better than you ever could in C.
Oberon? I guess I miss Pascal too much some times. C paid the bills for quite a while, but I don't really love it.
Lisp of course!
Title text: Some say the world will end in fire; some say in segfaults.
For an OS, you want speed at the lowest levels. So assembly, C, C++, Objective-C, or Java seem to be the current choices. Although it's just recently that Java got fast, and it's hard for me to imagine an OS with garbage collection.
If I were writing my own, it would be a mix of assembly and C.
A C or C++ microkernel with a JIT for a highly dynamic language like Ruby or maybe a language with native support for the Prototype pattern. Even device drivers in that language.
Not because it's practical but because it's really cool. Cool in the way that NeXTStep was cool for using Obj-C for pretty much everything.
http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html - share of languages in Linux's source code.
C, by a number of reasons. Other candidates, like D, are great. However, C has this advantage: there's a lot of available open-sourced C code that you could reuse in your project (much more than is for other languages appropriate for system programming).
I would be torn between using some existing low level language and write my own based on C# but with much better generics support.
In second case I would make each method generic, but all the constraints will be resolved by compiler - to allow "duck typing" like in Scala but still language should be static. Also static virtual methods would lower the codebase.
I've had that idea for a long time, but it never seems to be doable in real timeframe, so who knows maybe in the future. :-)
Some would say Java.
Note that openfirmware is written partly in Forth, and it's very low level.
Have an open mind.
"The kernel has to be written in a low-level language, C is by far the best choice for this, because it is so memory efficient. "
Um... What about FORTH?
FORTH can be low level and high level, so you could have a whole operating system written in FORTH from the ground up, and still have a nice easy REPL scripting environment on top, also in FORTH.
However, any decent operating system should support lots of langauges on top, from C all the way to Python Ruby and Javascript. Making FORTH the basis for it all has a lot of benefits though.
edit: I'd only ever attempt this for an embedded environment with a single known hardware set. Trying to write an OS that could compete with Linux or Windows is a fools job.
If this isn't a hypothetical question, and you're looking to create your own OS, I'd probably go with C because most of the examples out there are written in C.
Also, (And I haven't build an OS yet so take this with a grain of salt), I'm thinking that the c runtime libraries would be a lot easier to port to your new OS than say .NET.
Pascal + Oberon: they have the power of C and C++ but they're not as daunting to use. Both these languages are grossly under appreciated.

Resources