How to set client side timestamp in spring data cassandra? - cassandra

In datastax driver we have api like
withTimestampGenerator(new AtomicMonotonicTimestampGenerator())
to enable the feature to setting timestamp per query at client side. How can we achieve same with spring data canssandra.
I am aware that i can use "USING TIMESTAMP value" in cql but is there something which spring data cassandra provide ? I dont find such api in CassandraClusterFactoryBean .

You are correct!
Unfortunately, it appears SD Cassandra is missing a configuration option on the CassandraCqlClusterFactoryBean class supporting the withTimestampGenerator(:TimestampGenerator) configuration setting with the DataStax Java driver Cluster.Builder API.
Equally unfortunate is there is no workaround (other than the USING TIMESTAMP in CQL) at the moment either.
It also appears the CassandraCqlClusterFactoryBean is missing configuration options for:
Cluster.Builder.withAddressTranslator(:AddressTranslator)
Cluster.Builder.withClusterName(:String)
Cluster.Builder.withCodeRegistry(:CodecRegistry)
Cluster.Builder.withMaxSchemaAgreementWaitSeconds(:int)
Cluster.Builder.withSpeculativeExecutionPolicy(:SpeculativeExecutionPolicy)
Though, beware, the withTimestampGenerator(..) is only supported in version 3 of the DataStax Java Driver, which the next version (i.e. 1.5.0) of SD Cassandra will support...
This feature is only available with version V3 or above of the native protocol. With earlier versions, timestamps are always generated server-side, and setting a generator through this method will have no effect.

The timestamp capability is available in SD 1.5.x,
public void setTimestampGenerator(TimestampGenerator timestampGenerator) {
this.timestampGenerator = timestampGenerator;
}
https://github.com/spring-projects/spring-data-cassandra/blob/cc4625f492c256e5fa3cb6640d19b4e048b9542b/spring-data-cassandra/src/main/java/org/springframework/data/cql/config/CassandraCqlClusterFactoryBean.java.

Related

What load balancing policies are available in Cassandra Java driver 4.x?

We are upgrading datastax Cassandra java driver from 3.2 to 4.x to support DSE 6.8.
Load balancing policies our application currently supports are RoundRobinPolicy and DCAwareRoundRobinPolicy.
These policies aren't available in java-driver-core 4.12.
How can we support the above policies.Please help..
Current code in our application using cassandra-driver-core-3.1.0.jar:
public static LoadBalancingPolicy getLoadBalancingPolicy(String loadBalanceStr, boolean isTokenAware) {
LoadBalancingPolicy loadBalance = null;
if (isTokenAware) {
loadBalance = new TokenAwarePolicy(loadBalanceDataConvert(loadBalanceStr));
} else {
loadBalance = loadBalanceDataConvert(loadBalanceStr);
}
return loadBalance;
}
private static LoadBalancingPolicy loadBalanceDataConvert(String loadBalanceStr) {
if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_DC.equals(loadBalanceStr)) {
return new DCAwareRoundRobinPolicy.Builder().build();
} else if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_ROUND.equals(loadBalanceStr)) {
return new RoundRobinPolicy();
}
return null;
}
The load balancing has been heavily simplified in version 4.x of the Cassandra Java driver. You no longer need to nest multiple policies within each other to achieve high availability.
In our opinion, the best policy is the DefaultLoadBalancingPolicy which is enabled by default and achieves all the best attributes as the policies in older versions.
The DefaultLoadBalancingPolicy generates a query plan that is token-aware by default so replicas which own the data appear first and prioritised over other nodes in the local DC. For token-awareness to work, you must provide routing information either by keyspace (with getRoutingKeyspace()), or by routing key (with getRoutingKey()).
If routing information is not provided, the DefaultLoadBalancingPolicy generates a query plan that is a simple round-robin shuffle of available nodes in the local DC.
We understand that developers who are used to configuring DCAwareRoundRobinPolicy in older versions would like to continue using it but we do not recommend it. It is our opinion that failover should take place at the infrastructure layer, not the application layer.
Our opinion is that the DefaultLoadBalancingPolicy is the right choice in all cases. If you prefer to configure DC-failover, make sure you fully understand the implications and know that we think it is the wrong choice.
For details, see the following documents:
Java driver v4 Upgrade Guide
Load Balancing in Java driver v4

How to set DcInferringLoadBalancingPolicy in CqlSessionBuilder programmatically

I using 4.4.0 datastax-java-driver. In my scenario, I have to provide contacts points as I am connecting to remote cluster. If I am doing so I am getting following error - Since you provided explicit contact points, the local DC must be explicitly set.
I also don't have option to provide this explicitly as I am connecting to different cluster on demand which can be in different data centre. I have found option to set DcInferringLoadBalancingPolicy to infer data centre But I am not sure how to set this in CqlSessionBuilder. Please help me with this.
You need be very careful with it - it's mostly for people who are building tools, such as IDEs, etc. For applications itself it's better to pass datacenter name explicitly - either via config file, or via Java system property.
In short it could be done as following:
ProgrammaticDriverConfigLoaderBuilder configBuilder =
DriverConfigLoader.programmaticBuilder();
configBuilder.withClass(DefaultDriverOption.LOAD_BALANCING_POLICY_CLASS,
DcInferringLoadBalancingPolicy.class);
DriverConfigLoader loader = configBuilder.endProfile().build();
CqlSessionBuilder clusterBuilder = CqlSession.builder()
.addContactPoints(hosts);
CqlSession session = clusterBuilder.withConfigLoader(loader).build();

How to access counters from Reporting task in Nifi 1.2.0

IN Nifi 1.4.0, in order to access counters from a scripted reporting task, you can do something like:
context.eventAccess.controllerStatus.processGroupStatus.each { pg ->
pg.processorStatus.each { ps ->
ps.counters.each { counter
System.out.println("${counter.key} -> ${counter.value})
}
}
That's because ProcessorStatus API exposes:
Map<String,Long> getCounters()
However, I'm with NiFi 1.2.0.
Which does not have this method for class ProcessorReportingTask.
I'm desperately searching for a way to access the counters from a ReportingTask (so, through a ReportingContext, because that's what's bound within the script).
Because my ReportingTask is reporting metrics to our graphite server.
Any Idea?
I know I could access the metrics through REST APIs.
But then I would vanish completely the goal of my ScriptedReporingTask, and I would need to setup an additional piece of software to collect these metrics from outside. While the ReponrtingTask will just run from NiFi system.
I believe you will have to upgrade to 1.4.0 in order to get access to the getCounters method, there is no other way I know of through a ReportingContext which is why it was added in 1.4.0 as part of this JIRA:
https://issues.apache.org/jira/browse/NIFI-106

Connecting to Cassandra 3.0.3 from WSO2 and JDBC

I am trying to connect to Cassandra DB from Wso2 ESB & wso2 DSS, in both the approaches getting the same error.
1) Approach 1: Connecting to Cassandra from WSO2 ESB
We are trying to connect Cassandra DB 3.0.3 from Wso2 ESB 4.9
Below are the jars copied to components/lib folder.
(jar listing)
In master-datasource.xml below is the configuration added.
<datasource>
<name>CassandraDB</name>
<description>The datasource used for cassandra</description>
<jndiConfig>
<name>cassandraWSO2DB</name>
</jndiConfig>
<definition type="RDBMS">
<configuration>
<url>jdbc:cassandra://127.0.0.1:9042/sample</url>(tried with port 9160)
<username>cassandra</username>
<password>cassandra</password>
<driverClassName>org.apache.cassandra.cql.jdbc.CassandraDriver</driverClassName>
<maxActive>50</maxActive>
<maxWait>60000</maxWait>
<testOnBorrow>true</testOnBorrow>
<validationQuery>SELECT COUNT(*) from sample.users</validationQuery>
<validationInterval>30000</validationInterval>
<defaultAutoCommit>true</defaultAutoCommit>
</configuration>
</definition>
</datasource>
2) Approach 2: Connecting to Cassandra from WSO2 DSS
We are trying to connect Cassandra DB from Wso2 DSS 3.5.0
Below are the jars copied to components/lib folder.
Created the Data service and added the data source below is the configuration for the same:
<config enableOData="false" id="CassandraSampleId">
<property name="url">jdbc:cassandra://127.0.0.1:9042/sample</property>
<property name="driverClassName">org.apache.cassandra.cql.jdbc.CassandraDriver</property>
<query id="SampleQuery" useConfig="CassandraSampleId">
<expression>select * from users</expression>
In the above configuration “sample” is the keyspace created in Cassandra.
In the both the cases i.e. 1 & [2] facing the same below error. Can you please suggest to resolve the issue.
java.sql.SQLNonTransientConnectionException: org.apache.thrift.transport.TTransportException: Read a negative frame size (-2080374784)!
at org.wso2.carbon.dataservices.core.description.config.RDBMSConfig.<init>(RDBMSConfig.java:45)
at org.wso2.carbon.dataservices.core.description.config.ConfigFactory.getRDBMSConfig(ConfigFactory.java:92)
at org.wso2.carbon.dataservices.core.description.config.ConfigFactory.createConfig(ConfigFactory.java:60)
at org.wso2.carbon.dataservices.core.DataServiceFactory.createDataService(DataServiceFactory.java:150)
at org.wso2.carbon.dataservices.core.DBDeployer.createDBService(DBDeployer.java:785)
at org.wso2.carbon.dataservices.core.DBDeployer.processService(DBDeployer.java:1139)
at org.wso2.carbon.dataservices.core.DBDeployer.deploy(DBDeployer.java:195)
... 8 more
Caused by: java.sql.SQLNonTransientConnectionException: org.apache.thrift.transport.TTransportException: Read a negative frame size (-2080374784)!
at org.apache.cassandra.cql.jdbc.CassandraConnection.<init>(CassandraConnection.java:159)
at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:92)
at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:278)
at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:701)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:635)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
at org.wso2.carbon.dataservices.core.description.config.SQLConfig.createConnection(SQLConfig.java:187)
at org.wso2.carbon.dataservices.core.description.config.SQLConfig.createConnection(SQLConfig.java:173)
at org.wso2.carbon.dataservices.core.description.config.SQLConfig.initSQLDataSource(SQLConfig.java:151)
at org.wso2.carbon.dataservices.core.description.config.RDBMSConfig.<init>(RDBMSConfig.java:43)
... 14 more
Caused by: org.apache.thrift.transport.TTransportException: Read a negative frame size (-2080374784)!
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:133)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_cluster_name(Cassandra.java:1101)
at org.apache.cassandra.thrift.Cassandra$Client.describe_cluster_name(Cassandra.java:1089)
at org.apache.cassandra.cql.jdbc.CassandraConnection.<init>(CassandraConnection.java:130)
<query id="SampleQuery" useConfig="CassandraSampleId">
<expression>select * from users</expression>
<validationQuery>SELECT COUNT(*) from sample.users</validationQuery>
First thing I noticed...unbound queries are not a good idea to do in Cassandra. You should always query with a partition key. This won't fix your problem, but you definitely do not want to do that in production with millions of rows.
We are trying to connect Cassandra DB 3.0.3 from Wso2 ESB 4.9
<definition type="RDBMS">
<configuration>
<url>jdbc:cassandra://127.0.0.1:9042/sample</url>(tried with port 9160)
org.apache.thrift.transport.TTransportException:
Ok, a couple of things I noticed here. I've never used WSO2 (actually, I have no idea what it is) but it looks like it's making a couple of (bad) assumptions here.
I don't know what your options for "type" are, but Cassandra is definitely not a RDBMS.
I see you're using JDBC. There are MANY drivers out there that interact with Cassandra from Java. JDBC was originally designed for relational databases, and augmented to be used with Cassandra. You will have the highest chance for success by using the DataStax Java Driver, especially if you are running Cassandra 3.x.
I see it throwing a thrift.transport.TTransportException. Cassandra 3.0.3 installs with the thrift (9160) protocol disabled by default. Changing the port to 9042 isn't enough...you need to be using the native binary protocol for that to work.
The first thing to try, is to re-enable Thrift inside your Cassandra node. Inside your cassandra.yaml, find the start_rpc property, and set it to true:
start_rpc: true
At the very least, that will get Thrift running on 9160, and you'll be ready to try connecting again. However, I don't know if JDBC will even work with Cassandra 3.x. And if it does, you certainly won't get all of the available features.
The biggest problem I see here, is that you are trying to use a new version of Cassandra with technologies that it really wasn't designed to interact with. But try starting Thrift in Cassandra, and see if that helps. If anything, it should get you to the next problem.

Accessing Spark RDDs from a web browser via thrift server - java

We have processed our data using Spark 1.2.1 with Java and stored in Hive tables. We want to access this data as RDDs from an web browser.
I read documentation and I understood the steps to do the task.
I am unable to find the way to interact with Spark SQL RDDs via thrift server. Examples I found have belw line in the code and I am not find the class for this in Spark 1.2.1 java API docs.
HiveThriftServer2.startWithContext
In github i saw scala examples using
import org.apache.spark.sql.hive.thriftserver , but I dont see this in Java API docs. Not sure if I am missing something.
Did anybody had luck with accessing Spark SQL RDDs from a browser via thrift? Can you post the code snippet. We are using Java.
I've got most of this working. Lets dissect each part of it: (References at bottom of post)
HiveThriftServer2.startWithContext is defined in Scala. I was never able to access it from Java or from Python using Py4j, and am no JVM expert, but I ended up switching to Scala. This may have something to do with the annotation #DeveloperApi . This is how I imported it Scala in Spark 1.6.1:
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
For anyone reading this and not using Hive, a Spark SQL context won't do, and you need a hive context. However, the HiveContext constructor requires a Java spark context, not a scala one.
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.hive.HiveContext
var hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc))
Now start the thrift server
HiveThriftServer2.startWithContext(hiveContext)
// Yay
Next, we need to make our RDDs available as SQL tables. First, we have to convert them into Spark SQL DataFrames:
val someDF = hiveContext.createDataFrame(someRDD)
Then, we need to turn them into Spark SQL tables. You do this by persisting them to Hive, or making the RDD available as a temporary table.
Persist to Hive:
// Deprecated since Spark 1.4, to be removed in Spark 2.0:
someDF.saveAsTable("someTable")
// Up-to-date at time of writing
someDF.write().saveAsTable("someTable")
Or, use a temporary table:
// Use the Data Frame as a Temporary Table
// Introduced in Spark 1.3.0
someDF.registerTempTable("someTable")
Note - temporary tables are isolated to an SQL session.
Spark's hive thrift server is multi-session by default
in version 1.6 (one session per connection). Therefore,
for clients to access temporary tables you've registered,
you'll need to set the option spark.sql.hive.thriftServer.singleSession to true
You can test this by querying the tables in beeline, a command line utility for interacting with the hive thrift server. It ships with Spark.
Finally, you need a way of accessing the hive thrift server from the browser. Thanks to its awesome developers, it has an HTTP mode, so if you want to build a web app, you can use the thrift protocol over AJAX requests from the browser. A simpler strategy might be to create an IPython notebook, and use pyhive to connect to the thrift server.
Data Frame Reference:
https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrame.html
singleSession option pull request:
https://mail-archives.apache.org/mod_mbox/spark-commits/201511.mbox/%3Cc2bd1313f7ca4e618ec89badbd8f9f31#git.apache.org%3E
HTTP mode and beeline howto:
https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Pyhive:
https://github.com/dropbox/PyHive
HiveThriftServer2 startWithContext definition:
https://github.com/apache/spark/blob/6b1a6180e7bd45b0a0ec47de9f7c7956543f4dfa/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L56-73
Thrift is JDBC/ODBC server.
You can connect to it via JDBC/ODBC connections and access content through the HiveDriver.
You can not get RDDs back from it, because HiveContext is not available.
What you refered to is an experimental feature not available for Java.
As a workaround, you could re-parse the results and create your structures for your client.
For example:
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
private static String hiveConnectionString = "jdbc:hive2://YourHiveServer:Port";
private static String tableName = "SOME_TABLE";
Class c = Class.forName(driverName);
Connection con = DriverManager.getConnection(hiveConnectionString, "user", "pwd");
Statement stmt = con.createStatement();
String sql = "select * from "+tableName;
ResultSet res = stmt.executeQuery(sql);
parseResultsToObjects(res);

Resources