Scala and AKKA for a simulation - multithreading

Should I learn scala and AKKA for a simulation project. Are these technologies a good fit / worth the investment? The task is to perform https://www.dropbox.com/s/3lby24y26wp60to/assignment.pdf?dl=0 an event-based simulation to simulate an IOT edge data center and implement some scheduling algorithms.
If yes, which libraries would you suggest? https://github.com/scalation/scalation does not seem to be a parallel library.

This is an opinion-based question. Should not be asked here.
Anyways, I'm going to try to give you some pointers. Akka is a generic framework: you can build anything from it, but nothing in particular is an immediate fit (ok, some things fit better than others, but still).
In your case, while Akka is a valid fit (actors = agents), I'd look more into specialized softwares for ABM (Agent based modeling), you can find a massive list here .
In particular, I recommend Netlogo: it's a bit counterintuitive in terms of syntax if you have never used something akin to Lisp or other immutable variables languages ("let" etc.), but once you get the hang of it it's very powerful for the effort required.
And, if you come from a CS background, it should be super easy for you (it's normally used by non-CS people in various fields, and it's designed to be easy).

Related

Distributed Haskell state of the art in 2011?

I've read a lot of articles about distributed Haskell. Much work has been done but seems to be in the area of distributing computations. I saw the remote package which seems to implement Erlang-style messaging passing but it is 0.1 and early stage.
I'd like to implement a system where there are many separate processes that provide distinct services, and are tied together by several main processes. This seems to be a natural fit for Erlang, but not so for Haskell. But I like Haskell's type safety.
Has there been any recent adoption of Erlang-style process management in Haskell?
If you want to learn more about the remote package, a.k.a CloudHaskell, see the paper as well as Jeff Epstein's thesis. It aims to provide precisely the actor abstraction you want, but as you say it is in the early stages. There is active discussion regarding improvements on the parallel-haskell mailing list, so if you have specific needs that remote doesn't provide, we'd be happy for you to jump in and help us decide its future directions.
More mature but lower-level than remote is the haskell-mpi package. If you stick to the Simple interface, messages can be sent containing arbitrary Serialize instances, but the abstraction is still way lower than remote.
There are some experimental systems, such as described in Implementing a High-level Distributed-Memory Parallel Haskell in Haskell (Patrick Maier and Phil Trinder, IFL 2011, can't find a pdf online). It blends a monad-par approach of deterministic dataflow parallelism with a limited ability to make the I-structures serializable over the network. These sorts of abstraction have promise for doing distributed computation, but since the focus is on computing purely-functional values rather than providing Erlang-style processes, they probably wouldn't be a good fit for your application.
Also, for completeness, I should point out the Haskell wiki page on cloud and HPC Haskell, which covers what I describe here, as well as the subsection on distributed Haskell, which seems in need of a refresh.
I frequently get the feeling that IPC and actors are an oversold feature. There are plenty of attractive messaging systems out there that have Haskell bindings e.g. MessagePack, 0MQ or Thrift. IMHO the only thing you have to add is proper addressing of processes and decide who/what is managing this addressing capability.
By the way: a number of coders adopt e.g. 0MQ into their Erlang environments, simply because it offers the possibility to structure messaging via message brokers rather then relying on pure process to process messaging in super scale.
In a "massively multicore world" I personally assume that shared memory approaches will eventually be outperforming messaging. Someone can then always come and argue with asynchrony of course. But already when you write that you want to "tie together" your processes by "several main processes" you in fact speak about synchronization. Also, you can of course challenge whether a single function, process or thread is the right level of parallelization.
In short: I would probably see whether MessagePack or 0MQ could fit my needs in Haskell and care for the rest in my code.

Example problems for concurrent computation

There's a plethora of paradigms and methods for concurrent programming in use today. Software transactional memory, actors, shared state concurrency, tuple spaces and many, many more.
What I find lacking, however, is a library of interesting test problems for concurrency. One well known example is the "Dining Philosophers Problem", which is neither a complex enough nor motivating nor realistic one. Then there are many parallel algorithms (matrix multiplication, rendering, general nested data parallelism) that just require distribution of work, but no real concurrency with communication between threads of execution.
So, can anyone point me to some interesting sets of problems that require real concurrency in an interactive, perhaps even distributed environment, that are simple enough to use as examples for concurrency paradigms? Ideally, I want to find a set of problems to serve as a "lackmus-test" for concurrency paradigms (or to highlight their differences, as every paradigm has its strengths and weaknesses).
Any help is much appreciated :)
I've previously considered this exact issue, having previously proposed some concurrent programing paradigms myself :p
The conclusion I reached then is that such a test set doesn't seem to really exist in a language-independent manner. While it might be helpful for it to exist, there seem to be some fairly good reasons it doesn't (to the best of my knowledge).
Most of the focus within concurrent programming tends to be on data parallelism, such that the same operation is applied in parallel to different pieces of the same data set. The kinds of task-level parallelism (i.e. different tasks being performed in parallel, possibly sharing data) that I think you're talking about is actually not done very much. I think this is because it's kinda hard. But I think it's also kinda hard because most problems don't lend themselves particularly well to this kind of concurrency. Describing a distributed system in terms of concurrency primitives may be helpful, but these systems tend to be decoupled such that there is a protocol (written or implied) moderating their communication. People tend not to think of these kinds of systems as obviously "concurrent" programming situations, even though they are when viewed within the right frame (i.e. considering the "client" and "server" as agents operating in parallel with synchronisation at some points).
The only places I think you could find some sources of inspiration would be within individual implementations. Erlang, Occam (and Occam-pi), Alice, CML, Concurrent Haskell etc all are likely to have small test corpuses, but both the problems and their implementations are going to be biased towards being implementable within a specific language (because they obviously are implementable within that language!). Perhaps you could also look to the communities working on multi-party session types, and various process calculi such as pi-calculus, CCS and CSP to see what kinds of systems they are using as example models. The idea of a standard language-agnostic set of problems for describing concurrent communicating systems is appealing, but somewhat elusive at this point, I think.

Why did you decide "against" using Erlang?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Have you actually "tried" (means programmed in, not just read an article on it) Erlang and decided against it for a project? If so, why? Also, if you have opted to go back to your old language, or to use another functional language like F#, Haskell, Clojure, Scala, or something else then this counts too, and state why.
I returned to Haskell for my personal projects from Erlang for the simple virtue of Haskell's amazing type system. Erlang gives you a ton of tools to handle when things go wrong. Haskell gives you tools to keep you from going wrong in the first place.
When working in a language with a strong type system you are effectively proving free theorems about your code every time you compile.
You also get a bunch of overloading sugar from Haskell's typeclass machinery, but that is largely secondary to me -- even if it does allow me to express a number of abstractions that would be terribly verbose or non-idiomatic and unusable in Erlang (e.g. Haskell's category-extras).
I love Erlang, I love its channels and its effortless scalability. I turn to it when these are the things I need. Haskell isn't a panacea. I give up a better operational understanding of space consumption. I give up the magical one pass garbage collector. I give up OTP patterns and all that effortless scalability.
But its hard for me to give up the security blanket that, as is commonly said, in Haskell, if it typechecks, it is probably correct.
We use Haskell, OCaml and (now) F# so for us it has nothing to do with lack of C-like syntax. Rather we skip Erlang because:
It's dynamically typed (we're fans of Haskell's type system)
Doesn't provide a 'real' string type (I understand why, but it's annoying that this hasn't been corrected at the language level yet)
Tends to have poor (incomplete or unmaintained) database drivers
It isn't batteries included and doesn't appear to have a community working on correcting this. If it does, it isn't highly visible. Haskell at least has Hackage, and I'd guess that's what has us choosing that language over any other. In Windows environments F# is about to have the ultimate advantage here.
There are probably other reasons I can't think of right now, but these are the major points.
The best reason to avoid Erlang is when you cannot commit to the functional way of programming.
I read an anti-Erlang blog rant a few weeks ago, and one of the author's criticisms of Erlang is that he couldn't figure out how to make a function return a different value each time he called it with the same arguments. What he really hadn't figured out is that Erlang is that way on purpose. That's how Erlang manages to run so well on multiple processors without explicit locking. Purely functional programming is side-effect-free programming. You can arm-twist Erlang into working like our ranting blogger wanted, adding side effects, but in doing so you throw away the value Erlang offers.
Pure functional programming is not the only right way to program. Not everything needs to be mathematically rigorous. If you determine your application would be best written in a language that misuses the term "function", better cross Erlang off your list.
I have used Erlang in a few project already. I often use it for restful services. Where I don't use it however is for complex front end web applications where tools like Ruby on Rails are far better. But for the powerbroker behind the scenes I know of no better tool than Erlang.
I also use a few applications written in Erlang. I use CouchDB and RabbitMQ a bit and I have set up a few EJabberd servers. These applications are the most powerful, easiest and flexible tools in their field.
Not wanting to use Erlang because it does not use JVM is in my mind pretty silly. JVM is not some magical tool that is the best in doing everything in the world. In my mind the ability to choose from an arsenal of different tools and not being stuck in a single language or framework is what separates experts from code monkeys.
PS: After reading my comment back in context I noticed it looked like I was calling oxbow_lakes a code monkey. I really wasn't and apologize if he took it like that. I was generalizing about types of programmers and I would never call an individual such a negative name based on one comment by him. He is probably a good programmer even though I encourage him to not make the JVM some sort of a deal breaker.
Whilst I haven't, others on the internet have, e.g.
We investigated the relative merits of
C++ and Erlang in the implementation
of a parallel acoustic ray tracing
algorithm for the U.S. Navy. We found
a much smaller learning curve and
better debugging environment for
parallel Erlang than for
pthreads-based C++ programming. Our
C++ implementation outperformed the
Erlang program by at least 12x.
Attempts to use Erlang on the IBM Cell
BE microprocessor were frustrated by
Erlang's memory footprint. (Source)
And something closer to my heart, which I remember reading back in the aftermath of the ICFP contest:
The coding was very straightforward,
translating pseudocode into C++. I
could have used Java or C#, but I'm at
the point where programming at a high
level in C++ is just as easy, and I
wanted to retain the option of quickly
dropping down into some low-level
bit-twiddling if it came down to it.
Erlang is my other favorite language
for hacking around in, but was worried
about running into some performance
problem that I couldn't extricate
myself from. (Source)
For me, the fact that Erlang is dynamically typed is something that makes me wary. Although I do use dynamically typed languages because some of them are just so very problem-oriented (take Python, I solve a lot of problems with it), I wish they were statically typed instead.
That said, I actually intended to give Erlang a try for some time, and I’ve just started downloading the source. So your “question” achieved something after all. ;-)
I know Erlang since university, but have never used it in my own projects so far. Mainly because I'm mostly developing desktop applications, and Erlang is not a good language for making nice GUIs. But I will soon implement a server application, and I will give Erlang a try, because that's what it's good for. But I'm worring that I need more librarys, so maybe I'll try with Java instead.
A number of reasons:
Because it looks alien from anyone used to the C family of languages
Because I wanted to be able to run on the Java Virtual Machine to take advantage of tools I knew and understood (like JConsole) and the years of effort which have gone into JIT and GC.
Because I didn't want to have to rewrite all the (Java) libraries I've built up over the years.
Because I have no idea about the Erlang "ecosystem" (database access, configuration, build etc).
Basically I am familiar with Java, its platform and ecosystem and I have invested much effort into building stuff which runs on the JVM. It was easier by far to move to scala
I Decided against using Erlang for my project that was going to be run with a lot of shared data on a single multi-processor system and went with Clojure becuase Clojure really gets shared-memory-concurrency. When I have worked on distributed data storage systems Erlang was a great fit because Erlang really shines at distributed message passing systems. I compare the project to the best feature in the language and choose accordingly
Used it for a message gateway for a proprietary, multi-layered, binary protocol. OTP patterns for servers and relationships between services as well as binary pattern matching made the development process very easy. For such a use case I'd probably favor Erlang over other languages again.
The JVM is not a tool, it is a platform. Although I am all in favour of choosing the best tool for the job the platform is mostly already determined. Unless I am developing something standalone, from scratch and without the desire to reuse any existing code/library (three aspects that are unlikely in isolation already) I may be free to choose the platform.
I do use multiple tools and languages but I mainly targetg the JVM platform. That precludes Erlang for most if not all of my projects, as interesting as some of it concepts are.
Silvio
While I liked many design aspects of the Erlang runtime and the OTP platform, I found it to be a pretty annoying program language to develop in. The commas and periods are totally lame, and often require re-writing the last character of many lines of code just to change one line. Also, some operations that are simple in Ruby or Clojure are tedious in Erlang, for example string handling.
For distributed systems relying on a shared database the Mnesia system is really powerful and probably a good option, but I program in a language to learn and to have fun, and Erlang's annoying factor started to outweigh the fun factor once I had gotten past the basic bank account tutorials and started writing plugins for an XMPP server.
I love Erlang from the concurrency standpoint. Erlang really did concurrency right. I didn't end up using erlang primarily because of syntax.
I'm not a functional programmer by trade. I generally use C++, so I'm covet my ability to switch between styles (OOP, imperative, meta, etc). It felt like Erlang was forcing me to worship the sacred cow of immutable-data.
I love it's approach to concurrency, simple, beautiful, scalable, powerful. But the whole time I was programming in Erlang I kept thinking, man I'd much prefer a subset of Java that disallowed data sharing between thread and used Erlangs concurrency model. I though Java would have the best bet of restricting the language the feature set compatible with Erlang's processes and channels.
Just recently I found that the D Programing language offers Erlang style concurrency with familiar c style syntax and multi-paradigm language. I haven't tried anything massively concurrent with D yet, so I can't say if it's a perfect translation.
So professionally I use C++ but do my best to model massively concurrent applications as I would in Erlang. At some point I'd like to give D's concurrency tools a real test drive.
I am not going to even look at Erlang.
Two blog posts nailed it for me:
Erlang machinery walks the whole list to figure out whether they have a message to process, and the only way to get message means walking the whole list (I suspect that filtering messages by pid also involves walking the whole message list)
http://www.lshift.net/blog/2010/02/28/memory-matters-even-in-erlang
There are no miracles, indeed, Erlang does not provide too many services to deal with unavoidable overloads - e.g. it is still left to the application programmer to deal checking for available space in the message queue (supposedly by walking the queue to figure out the current length and I suppose there are no built-in mechanisms to ensure some fairness between senders).
erlang - how to limit message queue or emulate it?
Both (1) and (2) are way below naive on my book, and I am sure there are more software "gems" of similar nature sitting inside Erlang machinery.
So, no Erlang for me.
It seems that once you have to deal with a large system that requires high performance under overload C++ + Boost is still the only game in town.
I am going to look at D next.
I wanted to use Erlang for a project, because of it's amazing scalability with number of CPU'S. (We use other languages and occasionally hit the wall, leaving us with having to tweak the app)
The problem was that we must deliver our application on several platforms: Linux, Solaris and AIX, and unfortunately there is no Erlang install for AIX at the moment.
Being a small operation precludes the effort in porting and maintaining an AIX version of Erlang, and asking our customers to use Linux for part of our application is a no go.
I am still hoping that an AIX Erlang will arrive so we can use it.

Sample Problems for Multithreading Practice

I'm about to tackle what I see as a hard problem, I think. I need to multi-thread a pipeline of producers and consumers.
So I want to start small. What are some practice problems, in varying levels of difficulty, that would be good for multi-threading practice? (And not contrived, impractical examples you see in books not dedicated to concurrency).
What books or references would you recommend that focus on concurrency and give in-depth problems and cases?
(I'd rather not focus on the problem I want to solve. I just want to ask for good references and sample problems. This would be more useful to other users. I'm not stuck on the problem.)
The little book of semaphores is a good free book. The author takes a unique approach of first asking a problem and then presenting hints before answering. The problems increase in difficulty level gradually, and the book isn't written for any language in particular but covers general multithreading concepts.
If you have enough time to invest I would recommend the book "Concurrency: State Models & Java Programs, 2nd Edition" by Jeff Magee and Jeff Kramer, John Wiley&Sons 2006
You can ignore the Java part if you are using some other language
There's a language used to model processes and concurrent processes called FSP. It needs some time and energy to be invested in order to be proficient in the language. There's a tool (LTSA, both are free and supported by an Eclipse plugin or stand alone app) which verifies your models and make you pretty shure that your model is correct from the standpoint of concurrent execution.
Translating this models to your language constructs is then just a question of programming technique and few design patterns.
Most text book problems, like readers-writers, producers-consumers or dinning philosophers are all illustrations of the mutex. I would prefer to model a prototype which is a simplistic approximation the bigger problem and go ahead.
I have some times seen situations where dead-lock avoidance is what is needed and dead-lock prevention measures are being used. It is always a good idea to analyse if Banker's algorithm would suit the case or not.
Completely ignoring your request, I'll suggest that you should look at SEDA (staged event driven architecture) as a way to think about setting up a multi-threaded pipeline of producers and consumers.
I'm not sure what you are looking for. But in real world enterprise situation, we usually use some kind of messaging framework when doing producers consumers stuff. Tipically in Java, that's JMS. And you can use the excellent Spring Framework to help you along.
If you're working with Java at all (and possibly even if you're not), you should definitely read Java Concurrency In Practice.
To be honest, many real-world multithreading programs are not doing much more than reading/writing some value (whether string or int) -- circular buffers (as a network connection might need), readers/writers of log files, etc.
In fact, I'd say that if you implement (or find) a solid (and generic) circular buffer, and then run all thread-to-thread communication through those buffers as the only contact point, that'll cover a very large portion of any multithread syncing you might need to do. (Unless you're working in a buzzword-compliant environment, and need to tack "enterprise", "messaging", or whatever onto the buzzword list... or you're writing a database or operating system.)
(Note that "circular buffer" is a fairly C-centric term, being rooted in the relatively direct manipulation of a block of memory. Python's Queue class implements the same basic principle in a list-centric way, and I'm sure that numerous other languages have conceptually similar constructs under slightly different names...)

switch to parallel coding

we all writing code for single processor.
i wonder when we all are able to write code on multi processors?
what do we need (software tools, logic, algorithms) for this switching?
edit: in my view, as we do many task parallely, same way we need to convert those real life solutions(algorithms) to computer lang. just as OOPs coding did for procedural coding. OOPs is more real life coding style than procedural one. so i hope for that kind of solutions.
I think the most important requirement is a good language that has native constructs that support parallelism or one that can automatically generate parallel code. There are quite a few languages that fit that description, but none of them is popular enough to really be considered for mainstream use. That, in turn is caused by several things:
By their very nature, these languages are very different from today's imperative languages, and are therefor harder to learn (or at least seem that way).
They often lack good tools and libraries, making them unusable for any "real" project.
Of course, if it were more popular more people would be willing to learn it and there would be more support, so it's a kind of cycle that's pretty hard to break out of. I guess all we can do is hope. :)
An example of a language designed with heavy parallelization in mind is Erlang - and it's actually used in commercial projects.
What we need are natural abstractions for highly-concurrent algorithms. Actors (think: Erlang) go a long way in this direction, but they aren't a one-size-fits-all solution. Some more specific abstractions like fork/join or map/reduce can be even easier to apply to common problems.
The trick with all of these concurrency abstractions is they require functional-style programming. Concurrency doesn't mesh well with shared mutable state. As they say, "Locks considered harmful". Since most developers come from a strictly imperative background, switching to a shared-nothing continuation passing approach is often extremely challenging.
Incidentally, with respect to concurrency abstractions, Clojure has some very interesting features in this direction. Not only does it have sort-of actors, but it also defines a transactional memory model (think: databases) along with a global, atomic references mechanism. These two features allow concurrent operations to share "mutable" state without ever having to worry about locking or race conditions.
In the end, it comes down to education. Much of the needed theoretical work into concurrency abstractions has already been done, we just need to accept it. Unfortunately, as Erlang and Haskell prove, sometimes the best ideas remain relegated to an extremely fringe demographic. Hopefully efforts like Scala and Clojure will succeed in bringing the more advanced abstractions into the mainstream by sneaking them onto an existing, well-supported platform (the JVM).
Unfortunately for massive concurrent programming - unless there is a breakthrough in compilers to help, we will be throwing out a lot of what we know about algorithms (I think Don Knuth even said that). Read about Erlang for a glimpse of this possible future.
There are several tools/languages that are popular or are gaining popularity. If you use FORTRAN, C, or C++, you can use OpenMP (not too hard to implement) or the Message Passing Interface (MPI) libraries (powerful and greatest speedup potential, but also complex and difficult). OpenMP uses preprocessor directives to mark areas that can be parallelized, especially loops. MPI uses messages that pass data back and forth between processes, and the greatest difficulty is keeping everything synchronized without hitting bottlenecks and keeping processes waiting. I would say MPI is definitely on the way out, however. It's become clear in the scientific/high-performance computing communities that the speedup is rarely worth the additional development time.
As for up and coming languages, check out Fortress. It's still being designed, but the goal is to create a language even easier for scientific computing than FORTRAN. Programs will be specified in a very high level mathematical syntax. Additionally, parallelism will be implicit; the programmer will have to work to do things in serial. Plus, it's being championed by Sun and is based on java, so it will be portable.
There is no simple answer, and in many ways even the complex answers are currently inadequate or incomplete. You'll get a better answer if you are more specific about the replies you want: pointers to dev libraries and tools, instructional materials, pointers to current research projects and issues in this area, or something else?
The most important requirement is to be able to split your problem into smaller problems that can be solved independently of each other. Once you've worked out how you're going to do that, everything else is easier to think about and further questions of implementation (e.g. "parts of my calculation depend on other parts - how do I wait for them to have finished?") become concrete, specific things you can research or ask here about.
for java you can now look to Parallel Java Library or DPJ(deterministic Parallel Java!)
It will offer you great help in extracting parallelism from codes!!

Resources