I have an application that connects to Cassandra using the Java Driver, fetches some configuration and based on the results generates and executes some PIG scripts.
Now, I am able to successfully connect to Cassandra, when jars required for PIG are not in the classpath. Similarly, I am able to launch PigServer class and execute scripts / statements using the entire DSE stack when I am not connecting to Cassandra using the java driver to retrieve the configuration.
When I use both of them I get following exception:
org.jboss.netty.channel.ChannelPipelineException: Failed to initialize a pipeline.
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:181)
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:570)
... 35 more
Caused by: org.jboss.netty.channel.ChannelPipelineException: Failed to initialize a pipeline.
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:208)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
at com.datastax.driver.core.Connection.<init>(Connection.java:100)
at com.datastax.driver.core.Connection.<init>(Connection.java:51)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:376)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:207)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:87)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:576)
at com.datastax.driver.core.Cluster$Manager.access$100(Cluster.java:520)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:67)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:94)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:501)
I see others have seen similar exception, but when trying to execute Cassandra statements, from MapReduce tasks, which is not my case:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/FhW_8e4FyAI
http://www.datastax.com/dev/blog/the-native-cql-java-driver-goes-ga#comment-297187
Thanks!
DSE stacks connect to Cassandra through thrift API which is different from Cassandra Java Driver.
You can't use Cassandra Java driver for Pig/Hadoop before CASSANDRA-6311 is resolved.
There may be the bad security certificate/security certificate expiration issue if you are using certificate.
Related
I am connecting to DSE (Spark) using this:
new SparkConf()
.setAppName(name)
.setMaster("spark://localhost:7077")
With DSE 5.0.8 works fine (Spark 1.6.3) but now fails with DSE 5.1.0 getting this error:
java.lang.AssertionError: Unknown application type
at org.apache.spark.deploy.master.DseSparkMaster.registerApplication(DseSparkMaster.scala:88) ~[dse-spark-5.1.0.jar:2.0.2.6]
After checking the use-spark jar, I've come up with this:
if(rpcendpointref instanceof DseAppProxy)
And within spark, seems to be RpcEndpointRef (NettyRpcEndpointRef).
How can I fix this problem?
I had a similar issue, and fixed it by following this:
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/spark/sparkRemoteCommands.html
Then you need to run your job using dse spark-submit, without specifying any master.
Resource Manager Changes
The DSE Spark Resource manager is different than the OSS Spark Standalone Resource Manager. The DSE method uses a different uri "dse://" because under the hood it actually is performing a CQL based request. This has a number of benefits over the Spark RPC but as noted does not match some of the submission
mechanisms possible in OSS Spark.
There are several articles on this on the Datastax Blog as well as documentation notes
Network Security with DSE 5.1 Spark Resource Manager
Process Security with DSE 5.1 Spark Resource Manager
Instructions on the URL Change
Programmatic Spark Jobs
While it is still possible to launch an application using "setJars" you must also add the DSE specific jars and config options to talk with the resource manager. In DSE 5.1.3+ there is a class provided
DseConfiguration
Which can be applied to your Spark Conf DseConfiguration.enableDseSupport(conf) (or invoked via implicit) which will set these options for you.
Example
Docs
This is of course for advanced users only and we strongly recommend using dse spark-submit if at all possible.
I found a solution.
First of all, I think is impossible to run a Spark job within an Application within DSE 5.1. Has to be sent with dse spark-submit
Once sent, it works perfectly. In order to do the communications to the job I used Apache Kafka.
If you don't want to use a job, you can always go back to a Apache Spark.
I am trying to get Zeppelin to work. But when I run a notebook twice, the second time it fails due to Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient. (full log at the end of the post)
It seems to be due to the fact that the lock in the metastore doesn't get removed. It is also advised to use for example Postgres instead of Hive as it allows multiple users to run jobs in Zeppelin.
I made a postgres DB and a hive-site.xml pointing to this DB. I added this file into the config folder of Zeppelin but also into the config folder of Spark. Also in the jdbc interpreter of Zeppelin I added similar parameters than the ones in the hive-site.xml.
The problems persists though.
Error log: http://pastebin.com/Jqf9cdtU
hive-site.xml: http://pastebin.com/RZdXHPX4
Try using Thrift server architecture in the Spark setup instead of working on a single instance JVM of Hive where you cannot generate multiple of sessions.
There are mainly three types of connection to Hive:
Single JVM - Metastore stored locally in the warehouse which doesn't allow multiple sessions
Mutiple JVM - where each worker behaves as a metastore
Thrift Server Architecture - Multiple Users can access the SQL engine and parallelism can be achieved
Another instance of Derby may have already booted the database
By default, spark use derby as the metadata store which can only serve one user. It seems you start multiple spark interpreter, that's why you see the above error message. So here's the 2 solutions for you
Disable hive in spark interpreter via setting zeppelin.spark.useHiveContext to false if you don't need hive.
Set up hive metadata store which support multiple users. Refer this https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_hive_metastore_configure.html
Stop Zeppelin. Go to your bin folder in Apache Zeppelin and try deleting metastore_db
sudo rm -r metastore_db/
Start Zeppelin again and try now.
I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia
I am inserting data in cassandra from a csv file using java driver.But after some inserts it throws NoHostAvailableException:java 65 :All host(s) tried for query failed (no host was tried).
I am having cassandra on client machine (singlenode).Cassandra services are still running on client machine.
Thanks in advance.I am a newbie.
Here's some best practices for bulk ingestions using the java driver:
http://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/
I got this problem solved by using executeasync method present in the java driver.
If I create a new project like this .
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
this code works.
But if I take all the jars from this project and migrate the jars to my own project .the code above doesn't work and it says:
13/07/01 16:27:16 ERROR core.Connection: [/127.0.0.1-1] No handler set for stream 1 (this is a bug, either of this driver or of Cassandra, you should report it)
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/127.0.0.1])
What version of Cassandra are you running? Have you enabled the native protocol in your cassandra.yaml?
In Cassandra 1.2.0-1.2.4 the native protocol was disabled by default, but in 1.2.5+ it's on by default.
See https://github.com/apache/cassandra/blob/cassandra-1.2.5/conf/cassandra.yaml#L335
That's the most common reason I've seen for not being able to connect with the driver.