I am a beginner in Vert.X and as per the documentation it is mentioned that Vert.X sharedSet and Map supports only immutable objects across verticles. If in case I want to share a java object, (assuming I am using Java based verticles) across verticles or modules what is the recommended approach? Can I use a hazelcast distributed hash table for that ?
I really think you should try a different approach, otherwise you will be involved in one of the strongest points Vert.x is trying to alleviate: concurrency troubles. If I would have that requirement I would use something like Redis to have a really fast, centralized, volatile store I can access and share something. Maybe this doesn't answer your question, but pointing to a different approach...anyway, try to stay away from "shared state". Good luck!
Related
Is there a way to connect/subscribe to Postgres logical replication/streaming replication using node or go? I know its a TCP/IP connection but not exactly where to start. I also know there is a package for this, was wondering for more of a vanilla/understanding solution.
I'm not certain what you want, but maybe you are looking for “logical decoding”.
If you want to directly speak the replication protocol with the server, you'll have to implement it in your code, but that information is pretty useless, as it only contains the physical changes to the data files.
If you want logical decoding, there is the test_decoding module provided by PostgreSQL, and here are some examples how it can be used.
Mind that test_decoding is for testing. For real-world use cases, you will want to use a logical decoding plugin that fits your needs, for example wal2json.
If that's what you want to consume, you'll have to look up the documentation for the logical decoding plugin you want to use to learn the format in which it provides the information.
I wrote a program which needs to process a very large dataset and I'm planning to run it with multiple threads in a high-end machine.
I'm a beginner in Clojure and i'm lost in the myriad of tools at disposal -
agents, futures, core.async (and Quartzite?). I would like to know which one is most suited for this job.
The following describes my situation:
I have a function which transforms some data and store it in database.
The argument to the said function is popped from a Redis set.
Run the function in several separate threads as long as there is a value in the Redis set.
For simplicity, futures can't be beat. They create a new thread, and return a value from it. However, often you need more fine-grained control than they provide.
The core.async library has nice support for parallelism (via pipeline, see below), and it also provides automatic back-pressure. You have to have a way to control the flow of data such that no one's starving for work, or burdened by too much of it. core.async channels must be bounded, and this helps with this problem. Also, it's a pretty logical model of your problem: taking a value from a source, transforming it (maybe using a transducer?) with some given parallelism, and then putting the result to your database.
You can also go the manual route of using Java's excellent j.u.concurrent library. There are low level primitives as well as thread management tools for thread pools. All of this is accessible within clojure.
From a design standpoint, it comes down to whether you are more CPU-bound or I/O-bound. This affects decisions such as whether or not you will perform parallel reads from redis and writes to your database. If you are CPU-bound and thus your bottleneck is the computation, then it wouldn't make much sense to parallelize your reads from redis, or your writes to your database, would it? These are the types of things to consider.
You really have two problems to solve: (1) your familiarity with clojure's/java's concurrency mechanisms, and (2) your approach to this problem (i.e., how would you approach this problem, irrespective of the language you're using?). Once you solve #2, you will have a much better idea of which tools to use that I mentioned above, and how to use them.
Sounds like you may have a
good
embarrassingly parallel problem
to solve. In that case, you could start simply by coding up your
processing into a top-level function that processes the first datum.
Once that's working, wrap it in
a map to handle all of the
data sequentially (serially, one-at-a-time).
You might want to start tackling the bigger problem with just a few
items from your data set. That will make your testing smoother and
faster.
After you have the map working, it's time to just add a p
(parallel) to your code to make it
a pmap. This is a very
rewarding way to heat up your
machine.
Here is
a discussion about the number of threads pmap uses.
The above is the simplest approach. If you need finer control over
the concurrency, the
this concurrency screencast explores
the use cases.
It is hard to be precise w/o knowing the details of your problem. There are several choices as you mention:
Plain Java threads & threadpools. If your problem is similar to a pre-existing Java solution, this may be the most straightforward.
Simple Clojure threading with future et al. Kicking off a thread with future and getting the result in a promise is very easy.
Replace map with pmap (parallel map). This can help in simple cases that are primarily map/reduce oriented.
The Claypoole library: Lots of tools to make multithreading simpler and easier. Please see their GitHub project and the Clojure/West talk.
I'm developer of Robocode engine. We would like to make Robocode
multilingual and Scala seems to be good match. We have Scala plugin prototype here.
The problem:
Because users are creative programmers, they may try to win battle
different ways. As well robots are downloaded from online database
where anyone could upload one. So gap in security may lead to security
hole into users computer. Robots written in Java are running in
restricted sandbox. Almost everything is prohibited [network, GUI,
disk (limited), threads (limited), classloaders and reflection]. The
sandbox is similar to browser applet. We use SecurityManager, custom
ClassLoader per robot, etc ...
There are two ways how to host Scala runtime in Robocode:
1) load it together with robot inside of sandbox. Pretty safe for us,
preferred solution. But it will damage Scala runtime abilities because runtime uses reflection. Maybe generates classes at runtime ? Use threads to do some internal cleanup ? Access to JVM/internals ? (I would not like to limit abilities of language)
2) use Scala runtime as trusted code, outside the box, security on
same level as JDK. Visibility to (malicious)
robot. Are the Scala runtime APIs safe ? Do methods they have security
guards ? Is there any safe mode ? Is there any singleton in Scala runtime,
which could be abused to communicate between robots ? Any concurency/threadpool/messaging which could simulate threads ? (Is there any security audit for Scala runtime?)
3) Something in between, some classes of runtime in and some out. Which classes/packages must be visible to robot/which are just private implementation ? (this seems to be future solution)
The question:
Is it possible to enumerate and isolate the parts of runtime which must run in
trusted scope from the rest ? Specific packages and classes ? Or better idea ?
I'm looking for specific answer, which will lead to secure solution. Random thoughts welcome, but not awarded. There is ongoing discussion at scala email group. No specific answer yet.
I think #1 is your best bet and even that is a moving target. As brought up on the mailing list, structural types use reflection. I don't think structural types are common in the standard library, but I don't think anyone keeps track of where they are.
There's also always the possibility that there are other features using reflection behind the scenes. For example, for a while in the 2.8 branch some array functionality was using reflection. I think that's been changed after benchmarking, but there's always the possibility that there's some problem where someone said "Aha! I will use reflection to solve this."
The Scala standard library is filled with singletons. Most of them are immutable, but I know that the Scheduler object in the actors library could be abused for communication because it is essentially a proxy for an actual scheduler so you can plug your own custom scheduler into it.
At this time I don't think Scala requires using a custom class loader and all of its classes are produced at compile time instead of runtime, but then again that's probably a moving target. Scala generates a lot of class files, and there is always talk of making it generate some of them at runtime when they are needed instead of at compile time.
So, in short, I do not think it's possible (within reasonable constraints on effort) to enumerate and isolate the pieces of Scala that can (and should) be trusted.
As you mentioned other J* language implementations which all may make use of reflections, it would be a ban for all those languages as long as reflection is not part of the game.
I guess that would be JVM's problem not to have a way to partition the scope of reflection API, such that you could sort of "sandbox" the part of code that could be reflected within.
I use and love Berkeley but it seems to bog down once you get near a million or so entries, especially on the inserts. I've tried memcachedb which works but it's not being maintained so I'm worried of using it in production. Does anyone have any other similar solutions, basically I want to be able to do key lookups on a large(possibly distributed) dataset(40+ million).
Note: Anything NOT in Java is a bonus. :-) It seems most things today are going the Java route.
Have you tried Project Voldemort?
I would suggest you had a look at:
Metabrew key-value store blog post
There is a big list of key-value stores with a little bit of discussion in each of them. If you still have doubts you could join the so called Nosql google group and ask for help there.
Redis is insanely fast and actively developed. It is written in C(no java). Compiles out of the box on POSIX OS(no dependencies).
Did you try the hash backend? That should be faster for insert and key search.
Are there any paradigm that give you a different mindset or have a different take to writing multi thread applications? Perhaps something that feels vastly different like procedural programming to function programming.
Concurrency has many different models for different problems. The Wikipedia page for concurrency lists a few models and there's also a page for concurrency patterns which has some good starting point for different kinds of ways to approach concurrency.
The approach you take is very dependent on the problem at hand. Different models solve various different issues that can arise in concurrent applications, and some build on others.
In class I was taught that concurrency uses mutual exclusion and synchronization together to solve concurrency issues. Some solutions only require one, but with both you should be able to solve any concurrency issue.
For a vastly different concept you could look at immutability and concurrency. If all data is immutable then the conventional approaches to concurrency aren't even required. This article explores that topic.
I don't really understand the question, but if you start doing some coding using CUDA give you some different way of thinking about multi-threading applications.
It differs from general multi-threading technics, like Semaphores, Monitors, etc. because you have thousands of threads concurrently. So the problem of parallelism in CUDA resides more in partitioning your data and mixing the chunks of data later.
Just a small example of a complete rethinking of a common serial problem is the SCAN algorithm. It is as simple as:
Given a SET {a,b,c,d,e}
I want the following set:
{a, a+b, a+b+c, a+b+c+d, a+b+c+d+e}
Where the symbol '+' in this case is any Commutattive operator (not only plus, you can do multiplication also).
How to do this in parallel? It's a complete rethink of the problem, it is described in this paper.
Many more implementations of different algorithms in CUDA can be found in the NVIDIA website
Well, a very conservative paradigm shift is from thread-centric concurrency (share everything) towards process-centric concurrency (address-space separation). This way one can avoid unintended data sharing and it's easier to enforce a communication policy between different sub-systems.
This idea is old and was propagated (among others) by the Micro-Kernel OS community to build more reliable operating systems. Interestingly, the Singularity OS prototype by Microsoft Research shows that traditional address spaces are not even required when working with this model.
The relatively new idea I like best is transactional memory: avoid concurrency issues by making sure updates are always atomic.
Have a looksee at OpenMP for an interesting variation.