I would like to do "some calculation" on each In Memory DataBase(IMDB) GridGain (GG) node which continue receiving new data.
While looking at GG examples it seems a node must be started either as data node or as compute node.
Alternative GG architectural ideas would be appreciated.
Thanks
The GridGain Data Grid edition (which I think you are referring to) includes Compute functionality. If you start GridGain node with any configuration, Compute functionality is included by default.
Alternatively if you, for example, would like data grid and streaming functionality together, you may download the platform edition which includes everything.
Related
I am using Hazelcast 3.6.1. It is set up as a server/client. A Map is on the server (single node) and it is about 4Gb of data. My program creates a client and then needs to look up some data (very small in size - like 30MB). So I was getting the data from the map and looping through all of it to search for the data of interest - before I knew it the process size was 4Gb (as I did a get on the map for each piece of data I was analyzing it was loading it into memory (Lazy) until all the data was loaded!). So, I discovered that I could use aggregation which I was under the impression was all done server side and only the part I was interested in was returned to the client, but the client process still grows to 350MB!
Is aggregation solely done on the server?
Thanks
First of all you should upgrade to Hazelcast 3.8.x versions since the new aggregation system is way faster. Apart from that it depends on what you try to aggregate, but if you do real aggregations like sum, min or similar, aggregations is the way to got. The documentation for 3.8.x fast-aggregations is available here: http://docs.hazelcast.org/docs/3.8.3/manual/html-single/index.html#fast-aggregations
After some testing it appears that the collator portion of the aggregator is being done on the client.
Since our former data model is not very correct, the Slow queries panel shows that there are some queries which are performing slowly.
As I am planing to redesign the data model, I want to clear out the old information displayed in this panel, so I can see only information about my new data model. However, I do not know where OpsCenter is reading this data from.
My idea is that if this information is stored in a table or file, I can truncate or delete them. Or am I totally wrong with that assumption and this could be done by a configuration file modification or something similar instead?
OpsCenter Version: 6.0.3
Cassandra Version: 2.1.15.1423
DataStax Enterprise Version: 4.8.10
It follows dse_perf.node_slow_log. Each node will track new events in the log as they occur, and store their top X. When viewing it in UI it gets the top X from each node and merges them. To "reset" you can truncate the log and restart the datastax agents to clear its current top X. There is a feature to reset for you in future but in 6.0.3 its a little difficult.
Is there any option in ignitevisorcmd where I can see what entries(key,value details) are present in particular node? I tried cache -scan -c=mycache -id8=12345678 command but it prints entries from all other nodes also for mycache instead of printing data for 12345678 node only.
Current version of Visor Cmd does not support this, but I think it is easy to implement. I created issue in Ignite JIRA, you may track or even contribute.
Can we use Cassandra as a distributed in-memory cache database by utilizing its file level caching, key cache, and row cache?
I don't want to overload each node and I want to add more nodes to the cluster when the data grows to make this effective (to let most of my data be cached). Especially since 40% of my column families are static, and updates/insertions to other tables are not much.
The primary aim of ours is that we need an elastic realtime data store (faster around as in memory dB)
Cassandra was not born for the goal but after many optimizations it has become also a tool for in-memory caching. There are a few experiments -- the most significant I know is the one reported by Netflix. In Netflix they replaced their EVCache system (whom was persisted by a Cassandra backend) with a new SSD cassandra-based cache architecture -- the results are very impressive in term of performance improvements and cost-reduction.
Before choosing Cassandra as a replacement for any cache system I'd recommend to deeply understand the usage of row-caching and key-caching. More, I've never used Datastax Enterprise but it has an interesting in memory table feature.
HTH,
Carlo
I guess you could but I don't think that's correct use-case for Cassandra. Without knowing more about your requirements, I'd recommend you have a look at products like e.g. Hazelcast which is an in-memory distributed cache and sounds more like a fit for your use-case.
I know its a little late but I've just come accross this post doing some research on Cassandra.
I've seen success with Tibco's AST (recently rebranded to DTM) for in memory caching.
I've also played around with Pivotal's gemfire (this uses Geode under the covers), which has shown some promise.
I'm looking for a tool to load CSV into Cassandra. I was hoping to use RazorSQL for this but I've been told that it will be several months out.
What is a good tool?
Thanks
1) If you have all the data to be loaded in place you can try sstableloader(only for cassandra 0.8.x onwards) utility to bulk load the data.For more details see:cassandra bulk loader
2) Cassandra has introduced BulkOutputFormat bulk loading data into cassandra with hadoop job in latest version that is cassandra-1.1.x onwards.
For more details see:Bulkloading to Cassandra with Hadoop
I'm dubious that tool support would help a great deal with this, since a Cassandra schema needs to reflect the queries that you want to run, rather than just being a generic model of your domain.
The built-in bulk loading mechanism for cassandra is via BinaryMemtables: http://wiki.apache.org/cassandra/BinaryMemtable
However, whether you use this or the more usual Thrift interface, you still probably need to manually design a mapping from your CSV into Cassandra ColumnFamilies, taking into account the queries you need to run. A generic mapping from CSV-> Cassandra may not be appropriate since secondary indexes and denormalisation are commonly needed.
For Cassandra 1.1.3 and higher, there is the CQL COPY command that is available for importing (or exporting) data to (or from) a table. According to the documentation, if you are importing less than 2 million rows, roughly, then this is a good option. Is is much easier to use than the sstableloader and less error prone. The sstableloader requires you to create strictly formatted .db files whereas the CQL COPY command accepts a delimited text file. Documenation here:
http://www.datastax.com/docs/1.1/references/cql/COPY
For larger data sets, you should use the sstableloader.http://www.datastax.com/docs/1.1/references/bulkloader. A working example is described here http://www.datastax.com/dev/blog/bulk-loading.