what's the relationship between gridgain and Apache ignite? Are they same?

what's the relationship between gridgain and Apache ignite? Are they same? - gridgain

I am puzzled with the relationship between gridgain and ignite.I couldn't find out the difference.They both belong to Apache?Are they same?

GridGain adds enterprise features to Ignite. All Ignite APIs remain unchanged and are available in GridGain.
Here is the comparison:
http://www.gridgain.com/products/software

Related

How to choose between apache ranger and sentry

From the wiki provided by those 2 projects, I found it seems they did the similar job. But there must be some difference or it's no need for 2.
So what are the differences, and what is the practical advice to choose from one another.
thx a lot!

Great answers above.
Just quick update with Cloudera+Hortonworks merge last year.
These companies have decided to standardize on Ranger.
CDH5 and CDH6 will still use Sentry until CDH product line retires in ~2-3 years.
Ranger will be used for Cloudera+Hortonworks' combined "Unity" platform / CDP product.
Cloudera were saying to us that Ranger is a more "mature" product.
Since Unity hasn't released yet (as of May 2019), something may come up in the future, but that's the current direction. (Oct 2019 update: Unity is now known as CDP and is available for beta testing; will be available for cloud deployments soon, and in 2020 for on-prem customers)
If you're a former Cloudera customer / or CDH user, you would still have to use Apache Sentry. There is a significant overlap between Sentry and Ranger, but if you start fresh, definitely look at Ranger.

You can use Sentry or Ranger depends upon what hadoop distribution tool that you are using like Cloudera or Hortonworks.
Apache Sentry - Owned by Cloudera. Supports HDFS, Hive, Solr and Impala. (Ranger will not support Impala)
Apache Ranger - Owned by Hortonworks. Apache Ranger offers a centralized security framework to manage fine-grained access control across: HDFS, Hive, HBase, Storm, Knox, Solr, Kafka, and YARN
https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial
http://hortonworks.com/apache/ranger/
Thx Kumar

Apache Ranger overlaps with Apache Sentry since it also deals with authorization and permissions. It adds an authorization layer to Hive, HBase, and Knox. Both Sentry and Ranger support column-level permissions in Hive (startig from 1.5 release).
Ref: https://www.xplenty.com/blog/2014/11/5-hadoop-security-projects/
you can also check RecordService.
RecordService provides an abstraction layer between compute frameworks and data storage. It provides row- and column-level security, and other advantages.
Ref: http://blog.cloudera.com/blog/2015/09/recordservice-for-fine-grained-security-enforcement-across-the-hadoop-ecosystem/
http://recordservice.io/

Both manage permissions based on role-table grants. Ranger provides dynamic data masking (in transit). Both integrated with Informatica's Secure at Source (Identify risky data stores in the Enterprise) to deliver Data Governance solution.

Upgrading from gridgain to Apache Ignite

We're currently running gridgain 6.2.1. Is there an existing upgrade guide in order to transition to apache ignite?

There is no such guide and it highly depends on what parts of GridGain you're using. All functionality that existed in 6.x was migrated to Ignite with a bit different API. So I suggest to update the version and start fixing compilation step by step.

For Cassandra kundera.client.lookup.class options

In order to configure kundera for Cassandra, I notice there are 3 possible options for kundera.client.lookup.class as below
com.impetus.client.cassandra.pelops.PelopsClientFactory
com.impetus.kundera.client.cassandra.dsdriver.DSClientFactory
com.impetus.client.cassandra.thrift.ThriftClientFactory
I am not sure of the Pros and Cons of the above 3 and hence not sure which one to use. Please help me decide

I suggest you to use com.impetus.client.cassandra.thrift.ThriftClientFactory. It is the implementation using just Cassandra's thrift api.
PelopsClient is not in active development.
DSClient is built over datastax driver of cassandra.
There is no real advantage of using either DSClient or ThriftClient.

After further research, I found the following
Don't use PelopsClient as its not in active development as mentioned by #karthik , but more importantly because of the issue reported here
Data Stax Driver is better than thrift client as it over comes few limitations of thrift and they use a different binary protocol specific to cassandra which gives a better performance. Refer Datastax java driver support for Cassandra using Kundera

Apache Cassandra vs Datastax Cassandra [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is Datastax Cassandra the only available Cassandra that can be used in a production environment? Is there any free alternatives available? What about the cassandra available on Apache site?

Datastax Community Edition is also free, it contains a basic version of OpsCenter -- http://planetcassandra.org/cassandra/
Here is the difference between the community edition and DSE
http://www.datastax.com/download/dse-vs-dsc

They can both be used in production. DataStax Enterprise comes with a bunch of extra features on top of Apache Cassandra, and also comes with support.

Datastax is a commercial company, who supports C*. The base source code of Cassandra is taken of the Apache Repositories, then some of their own code is merged. Besides this, as already mentioned by others, Datastax version comes with some additional tools for maintaining a Cassandra Cluster.
One of the benefits of Datastax Enterprise is their neatless SOLR Integration, another great Apache Foundation Project.
Cassandra comes with a Query Language called CQL (Cassandra Query Language) which is "similar" to SQL, you should however think of CQL like a cousin of SQL, not a brother.
One of the great features of the Enterprise edition is that you can query a SOLR index through their CQL integration, also a Cassandra Cluster shares it's resources with SOLR, so you don't need a second Cluster for SOLR.
You could... set up Apache or Datastax Cassandra, you would get almost the same thing, but if you need something similar to SQL Like Statement (natively not available in Cassandra), or you do have a very much denormalized database and you need search capabilities, then Datstax Enterprise (DSE) is your only viable choice.
As someone already has mentioned, DSE is free for startups until they reach an annual revenue of 3m USD, or are funded with 30m. This should give everybody the opportunity to leverage the power of NoSQL and use one of the most reliable databases for big data out there.

For the Cassandra product, you can use the Apache open source offering in production, if your organisation is comfortable with open source.
You can also use the Datastax Community version of Cassandra, which is also open source and free to deploy; that gives you a bit more assurance from DataStax who offer commercial support.
Then there is DataStax Enterprise, which is the version that you pay to use, with a support model included. This still uses open source Cassandra, with additional code from DataStax. They have also put this release through their internal test processes, so that they are happy to support it. That generally means the releases will lag that Apache and Community versions, if that matters to you.
The DataStax 'Dev Center' product is a GUI tool that allows you to enter CQL commands against a Cassandra installation - it is free to use against any release. You may find it useful, though the CQLSH command-line should offer much of what you may need (and Cassandra CLI).
The DataStax 'Ops Center' product is available in a free version, which can run against any Cassandra with the associated 'DataStax Agent' used to collect data from each node. The Enterprise version of Ops Center includes additional functionality; that is available if you purchase the fully support DSE (DataStax Enterprise) stack.
Hope that helps. Much more information available at Planet Cassandra and the DataStax web sites.

Besides Apache Cassandra, there's Scylla which is a drop in replacement for Cassandra written in C++. It claims to be 10 times faster than Apache Cassandra. However, Scylla is still in alpha version, and you should stay away from it in a production environment.
Scylla aims to support all cassandra features together with toolings. It also supports JMX monitoring.

Apache Cassandra also have all features as well as community edition of DataStax . So you can put Apache Cassandra on Production enivorment .

Another good feature of DSE is the ability to do backup and recovery of your Cassandra database which I would think is very important if you are planning to use this in a production setup.

GridGain open source datacenter topology specification

GRIDGAIN DATA-CENTER REPLICATION
A few specific questions regarding the recently open-sourced Gridgain code. The gridgain.org support link says datacenter replication is not enabled for the open-source version. Is this true or false.
More imporatantly, assuming the open-source version has the datacenter feature enabled, how do we go about specifying the topology and activating the replication.
For example, the official documentation suggest to create/set a GridDrSenderCacheConfiguration, GridDrSenderHubConfiguration with details of the topology. I did this but it didnt seem to enable any cross data center replication.
More specifically, I did the following:
assign a dataCenterId byte parameter in the config.xml for gridgain.
...
define those nodes that are part of that datacenter under the
... add ip addresses of nodes
Define above for each node in each datacenterl appropriately. In the gridgain java client code, initiate a gridgain instance and set the GridDrSenderCacheConfiguration,GridDrSenderHubConnection (along wtih the GridDrSenderHubConnectionConfiguration) as specified in the docs for each node in each datacenter and also using a dummy GridDrReceiverHubConfiguration object (all defaults)
However this does not seem to do any replication across the data centers.
Would someone from the GridGain team please give some examples of setting up the data center replication, How to setup the config.xml, and enable in the java code when instantiating a gridgain instance.
Also, I am trying to avoid intra-datacenter replication by setting the gridDrSenderHubConnectionConfiguration.setIgnoredDataCenterIds(localDC); paramter to avoid replicating if the datacenter is

Just confirmed. Since data center replication is not present in open source version, no replication would happen in this case. Please download eval version of GridGain enterprise edition and try it out.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string