cache distribute framework for PHP - cassandra

Currently, looking for cache distribution framework for my php implementation.
mainly, for local or remote cache storage.
i have some idea about "Memcache" & "Apache Cassandra".
is there any more well framework ?
thanks
javaamtho

You should consider Couchbase, as it provides a distributed persistent cache that is quite performant and super easy to use. The problem with Memcached is that it's harder to scale to additional machines, and if a machine goes down you lose all those keys and have to rebuild the cache. Cassandra also has excellent caching support but is quite a bit more complex; if you don't need the complexity Couchbase is probably a better choice.

Related

Couchbase fastest NoSQL (no Redis)? Can MongoDB performance be increased by using with some cache product? Is Couchbase so much faster than MongoDB?

Need to setup a server backend web-service and contemplating either some MongoDB solution or other NoSQL and cache concoction. I've read several articles indicating how Couchbase is so much faster than MongoDB which isn't a slouch itself. Here's for reference:
http://www.couchbase.com/press-releases/couchbase-dominates-cassandra-datastax-and-mongodb-newly-released-nosql-performance-benchmark
http://prnewswire.com/news-releases/mongodb-30-with-wired-tiger-new-benchmark-measures-performance-vs-couchbase-server-302-300053144.html
So my question how true is this? Has anyone else tested and can confirm such orders of magnitude performance difference?
If so, is there a way to improve MongoDB performance by integrating some cache for it? I think Couchbase is actually a 'cache' with CouchDB store added, how can MongoDB be used/integrated in some manner to provide similar performance?
Why not just use Couchbase if its better?
Well, I was concerned by reading many places about its "lack of documentation". Then I was alarmed by reading this:
"...Couchbase forum threads which are habitually abandoned by Couchbase reps when a developer points out a pretty huge flaw in their code, intentionally or unintentionally..."
http://scalabilitysolved.com/dont-use-couchbase-unless-you-really-really-want-to/
Just go to the bottom of that article linked above and read the entire comment at the bottom by Erutan. Basically if one goes to Couchbase website it does seem that the company is really pushing their "Enterprise" version mainly which is fine, but it is worry-some when people think that they might be purposefully not providing documentation and perhaps I misunderstood, but from what I gather from that Couchbase user's comments, some think that bugs might be left in the code "intentionally" to steer people to the enterprise version?
On the PLUS side, it does seem that all the code is Apache licensed so anyone is free to fix any bugs.
Anyway, for me, I was leaning towards MongoDB for various reasons, although performance was one of them, until happened on some couchbase benchmarks. Looking forward to some affirmations or challenges to these couchbase performance superiority claims and possible solutions to bolster MongoDB setup.
So is Couchbase way faster than any other non-memory proven/stable NoSql?
CouchBase is fast but not the fastest one. I tested it, and in my scenarios Tarantool was 20% faster in terms of requests per second. Both of them are at order of magnitude faster than MongoDB. Maybe you should consider using one of the in-memory with persistence databases instead of MongoDB as your primary data store. One database is more consistent than a database and a cache layer on top of it.

Has anyone on stackoverflow successfully used CouchDB for a webapp and deployed to a production environment? [duplicate]

I have been using CouchDB on some prototype applications and it has been brilliant, very easy to use and extremely quick. I was wondering if anyone has been using it in production and have any views on it's reliability, performance suitability for operational management etc ?? I am considering using it to support a service layer and would make use of its replication functionality.
Any comments/experiences would be most welcome.
I've used CouchDB for a few small in-house applications - it's been very stable and I've had no serious complaints. Setting that aside, a few small gripes -
1) Databases can be synchronized, but not nodes. That is, if you have four servers and twenty databases, you have to specify each server, and each database to synchronize. A minor gripe, but I prefer less management to more.
2) Since databases are append only, a database with a bunch of activity gets really big really quickly. Compacting fixes this, but isn't exactly fast, especially on big (e.g. 20 gigabytes) database. Scheduling compact for the weekends solved this, but doing that is probably less of an option for high availability applications.
3) Javascript is the de facto view language. What is not well advertised is that since CouchDB is written in Erlang, it also supports Erlang views, which are faster as they are "native". For applications doing a lot of operations in views, Erlang probably makes more sense.
Setting those minor issues aside, I'd wholeheartedly recommend it.
CouchDB ships in Ubuntu and is a fundamental component of the Ubuntu One service.

CouchDB in-memory implementation

Is there a mock backend for CouchDB, i.e. same REST interface and semantics but purely in-memory? We have a testsuite that runs each test on a pristine database every time (to be reproducible), but running against real database could be faster.
Do you mean running against a mock database?
I do not think there is something right out of the box. Two ideas:
CouchDB on a memory filesystem. Set up a ramdisk, or tmpfs mount, and configure the CouchDB database_dir and view_index_dir to point to there.
PouchDB is porting CouchDB to the browser IndexedDB standard. You did not say which language and environment you are using, but if you can run Node.js, this might be worth looking into. PouchDB has good momentum and I think it will be running in Node.js soon (perhaps through jsdom or some other library. Note, this does not get you the full solution; but you have expanded your question to "are there in-memory IndexedDB implementations for Node.js" for which the answer is either "yes" or "soon," given its adoption trajectory.
Found this: https://github.com/RipcordSoftware/AvanceDB - it supports different platforms and seems to be a serious effort.
Rather late to the party, but I've had great success using pouchdb-server, based on the aforementioned PouchDB project (a JavaScript implementation of CouchDB). It can run against a variety of back-ends, including an in-memory back-end. That means you can run
pouchdb-server --in-memory
to get an in-memory CouchDB-compatible server. There's several other command-line options to explore, too.
I think it is able to run the entire CouchDB test suite, so I'd guess it is fairly unlikely you'd run into too many implementation differences.
I have the same problem... for tests i just don't want to setup a couchdb... i just want to have some memory stuff, as easy as possible.
What did i do:
* I create a memory CouchDB Connector => it's just a very simple implementation of "org.ektorp.CouchDbConnector"
* By spring i wire the CouchDbConnection-Implementation which i need => when i use it for my dev-tests i wire my memory CouchDB Connector, if i want to connect to a real CouchDb i use the usual connector => org.ektorp.impl.StdCouchDbConnector
The only problem is, that "org.ektorp.CouchDbConnector" has more than 50 methods, which must be implemented. For my issues it was enough to implemented just a few of these methods. Depends on your testcases.
memorydb is a partial (in-progress) in-memory implementation of CouchDB to be used with Kivik, which can be run as a stand-alone server.
Not all functionality is implemented yet.

log search using hadoop

We have huge log files(~ 100s of Gigs) on multiple web servers that are needed to be searched in real time. These log files are written multiple times/second by different apps. We have recently installed a hadoop cluster on some servers for this purpose. In order to implement search on these logs, I have thought of this design: there is a process running on web servers which creates an inverted-index of logs and cache it in-memory (on web servers itself) and push to HDFS via flume to be stored in Hive when the cache is full (this is much like an LRU cache). This helps in two ways when something is searched for: most recent logs are returned from in-memory cache and is fast and older logs are returned from disk. And since user wants to see latest logs first, this technique works. Can somebody verify if this design will work and scale properly. Are there any better alternatives around?
Thanks
You could store the inverted index in HBase to provide more real-time access to your older logs.
HBase would also likely be a viable alternative to your in-memory cache. You could do this if you wanted to unify the storage platform instead of having it split up. It will obviously be slower than memcached or redis.
A completely different approach could be using Lucene/Solr to index your logs. This has a lot of nice features out of the box for searching.

CouchDB in Production

I have been using CouchDB on some prototype applications and it has been brilliant, very easy to use and extremely quick. I was wondering if anyone has been using it in production and have any views on it's reliability, performance suitability for operational management etc ?? I am considering using it to support a service layer and would make use of its replication functionality.
Any comments/experiences would be most welcome.
I've used CouchDB for a few small in-house applications - it's been very stable and I've had no serious complaints. Setting that aside, a few small gripes -
1) Databases can be synchronized, but not nodes. That is, if you have four servers and twenty databases, you have to specify each server, and each database to synchronize. A minor gripe, but I prefer less management to more.
2) Since databases are append only, a database with a bunch of activity gets really big really quickly. Compacting fixes this, but isn't exactly fast, especially on big (e.g. 20 gigabytes) database. Scheduling compact for the weekends solved this, but doing that is probably less of an option for high availability applications.
3) Javascript is the de facto view language. What is not well advertised is that since CouchDB is written in Erlang, it also supports Erlang views, which are faster as they are "native". For applications doing a lot of operations in views, Erlang probably makes more sense.
Setting those minor issues aside, I'd wholeheartedly recommend it.
CouchDB ships in Ubuntu and is a fundamental component of the Ubuntu One service.

Resources