Is anyone aware of some simple examples of application taking into account the 'eventual consistency' caveat of a distributed database like Cassandra ? I am hoping that there are some design-patterns that helps us deal with this.
If the example is in Python or Java, it'd be easiest for me to understand.
Here is example from datastax.
http://docs.datastax.com/en/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html
Related
Its a bit architectural kind of question. I need to design an application using Spark and Scala as the primary tool. I want to minimise the manual intervention as much as possible.
I will receive a zip with multiple files having different structures as an input at a regular interval of time say, daily. I need to process it using Spark . After transformations need to move the data to a back-end database.
Wanted to understand the best way I can use to design the application.
What would be the best way to process the zip ?
Is the Spark Streaming can be considered as an option looking at the frequency of the file ?
What other options should I take into consideration?
Any guidance will be really appreciable.
Its a broad question, there are batch options and stream options not sure your exact requirements. you may start your research here: https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-FileStreamSource.html
Can some one please provide the points that one may consider while making elasticsearch search engine efficient?
The experience of the developers making the search engine faster and efficient would help new developers like me to make the elasticsearch more reliable.
If the question looks irrelevant please let me know, I will modify it.
Thanks in advance,
I'm trying to develop something that extract keywords from a text. I know AlchemyAPI work best for this. Now i wanna know what algorithms AlchemyAPI used so that i can implement code of it on my own. Does anyone has any idea about it. Please share it. Thanks in advance.
I have no idea what specific algorithms AlchemyAPI uses (I'm guessing it is on the extreme end of proprietary), but the Stanford NLP has a lot of information and code that may be useful:
http://www-nlp.stanford.edu/software/lex-parser.shtml
I am a beginner in Vert.X and as per the documentation it is mentioned that Vert.X sharedSet and Map supports only immutable objects across verticles. If in case I want to share a java object, (assuming I am using Java based verticles) across verticles or modules what is the recommended approach? Can I use a hazelcast distributed hash table for that ?
I really think you should try a different approach, otherwise you will be involved in one of the strongest points Vert.x is trying to alleviate: concurrency troubles. If I would have that requirement I would use something like Redis to have a really fast, centralized, volatile store I can access and share something. Maybe this doesn't answer your question, but pointing to a different approach...anyway, try to stay away from "shared state". Good luck!
I use and love Berkeley but it seems to bog down once you get near a million or so entries, especially on the inserts. I've tried memcachedb which works but it's not being maintained so I'm worried of using it in production. Does anyone have any other similar solutions, basically I want to be able to do key lookups on a large(possibly distributed) dataset(40+ million).
Note: Anything NOT in Java is a bonus. :-) It seems most things today are going the Java route.
Have you tried Project Voldemort?
I would suggest you had a look at:
Metabrew key-value store blog post
There is a big list of key-value stores with a little bit of discussion in each of them. If you still have doubts you could join the so called Nosql google group and ask for help there.
Redis is insanely fast and actively developed. It is written in C(no java). Compiles out of the box on POSIX OS(no dependencies).
Did you try the hash backend? That should be faster for insert and key search.