Elasticsearch spark connection for structured-streaming - apache-spark

I'm trying to make a connection to elasticsearch from my spark program.
My elasticsearch host is https and found no connection property for that.
We are using spark structred streaming Java API and the connection details are as follows,
SparkSession spark = SparkSession.builder()
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_USER, "username")
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_PASS, "password")
.config(ConfigurationOptions.ES_NODES, "my_host_url")
.config(ConfigurationOptions.ES_PORT, "9200")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_LOCATION,"C:\\certs\\elastic\\truststore.jks")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_PASS,"my_password") .config(ConfigurationOptions.ES_NET_SSL_KEYSTORE_TYPE,"jks")
.master("local[2]")
.appName("spark_elastic").getOrCreate();
spark.conf().set("spark.sql.shuffle.partitions",2);
spark.conf().set("spark.default.parallelism",2);
And I'm getting the following error
19/07/01 12:26:00 INFO HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 10.xx.xxx.xxx failed to respond
19/07/01 12:26:00 INFO HttpMethodDirector: Retrying request
19/07/01 12:26:00 ERROR NetworkClient: Node [10.xx.xxx.xxx:9200] failed (The server 10.xx.xxx.xxx failed to respond); no other nodes left - aborting...
19/07/01 12:26:00 ERROR StpMain: Error
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:344)
Probably it's because it tries to initiate connection by http protocol but in my case I need https connection and not sure how to configure that

The error happened as spark was not able to locate the truststore file. It seems we need to add "file:\\" for the path to be accepted.

Related

Why does Cassandra Kerberos Connection to second keyspace fail?

We are trying to connect to two keyspaces of Cassandra (3.x) in the same application with the same Kerberos credentials. The application is able to connect to one keyspace but no the other. Access to the keyspaces has been verified.
Error on connection:
2022-08-22 13:15:10,972 [cluster-reconnection-0] DEBUG c.d.d.c.ControlConnection [--]- [Control connection] error on 169.24.167.109:9042 connection, trying next host
javax.security.auth.login.LoginException: No LoginModules configured for CassandraJavaClient
at javax.security.auth.login.LoginContext.init(LoginContext.java:264)
at javax.security.auth.login.LoginContext.<init>(LoginContext.java:417)
The ticket cache is :
CassandraJavaClient {
com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true ticketCache="/var//krb5cc_userlogin";
};
The same ticket cache file is used by the first connection - which succeeds. While the second connection fails. I am not even sure as to how to debug it (tried remote debugging and since the initial control connection is an Async call, unable to get to the actual error).
We are using com.datastax.cassandra:cassandra-driver-core:jar:3.6.0
Any ideas/help to debug / resolve this will be highly appreciated

advanced.session-leak after sometime of starting spark thrift server with datastax cassandra connector

Hi I am getting the following error after some time of inactivity.
Error: Error running query: com.typesafe.config.ConfigException$Missing: withValue(advanced.reconnection-policy.base-delay): No configuration setting found for key 'advanced.session-leak' (state=,code=0)
restarting thrift server seems to solve the issue for sometime.

How to connect KairosDB with Azure Cosmos (Cassandra)

I use KairosDB on top of Cassandra for saving all our time series data. I am now trying to replicate the same KairosDB with Azure CosmosDB (Cassandra API). But its throwing error:
16:59:08.364 [main] INFO [LZ4Compressor.java:52] - Using LZ4Factory:JNI
16:59:08.441 [main] INFO [NettyUtil.java:73] - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16:59:08.842 [main] ERROR [CassandraModule.java:136] - Unable to setup cassandra schema
com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host ilenstsdb2.cassandra.cosmos.azure.com/40.65.106.154:10350: Cql request had unsupported headers Compression
at com.datastax.driver.core.Connection$8.apply(Connection.java:392)
at com.datastax.driver.core.Connection$8.apply(Connection.java:361)
enter image description here
I don't have a huge amount of exposure to the Cassandra API but the error is indicating that your API request contained the header Compression while authenticating and the server didn't like it.
Make sure you're not using withCompression() when starting your datastax driver:
cluster = Cluster.builder()
.addContactPoint("X.X.X.X")
.withCompression(ProtocolOptions.Compression.LZ4)
.build();
...should be...
cluster = Cluster.builder()
.addContactPoint("X.X.X.X")
.build();

We are running a map reduce/spark job to bulk load hbase data in One of the environment

We are running a map reduce/spark job to bulk load hbase data in one of the environments.
While running it, connection to the hbase zookeeper cannot initialize throwing the following error.
16/05/10 06:36:10 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181 sessionTimeout=90000 watcher=hconnection-0x74b47a30, quorum=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181, baseZNode=/hbase
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Opening socket connection to server c321shu.int.westgroup.com/10.204.152.28:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.204.24.16:35740, server: c321shu.int.westgroup.com/10.204.152.28:2181
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Session establishment complete on server c321shu.int.westgroup.com/10.204.152.28:2181, sessionid = 0x5534bebb441bd3f, negotiated timeout = 60000
16/05/10 06:36:11 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table ecpdevv1patents:NormNovusDemo
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Tue May 10 06:36:11 CDT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller#3927df20, java.io.IOException: Call to c873gpv.int.westgroup.com/10.204.67.9:60020 failed on local exception: java.io.EOFException
We have executed the same job in Titan DEV too but facing the same problem. Please let us know if anyone has faced the same problem before.
Details are,
• Earlier job was failing to connect to localhost/127.0.0.1:2181. Hence only the property hbase.zookeeper.quorum has been set in map reduce code with c149jub.int.westgroup.com,c321shu.int.westgroup.com,c167rvm.int.westgroup.com which we got from hbase-site.xml.
• We are using jars of cdh version 5.3.3.

Hector Cassandra Connectivity Error

I am trying to connect to a local cassandra instance through a java client powered by Hector. I attempt to read rows after trying to connect. The code snippet is as follows
Cluster myCluster = HFactory.getOrCreateCluster("test" , "localhost:9160");
KeyspaceDefinition keySpaceDef = myCluster.describeKeyspace("testkeyspace");
.....
However the connectivity fails with this error
Exception in thread "main" java.lang.NoSuchFieldError: DEFAULT_MEMTABLE_OPERATIONS_IN_MILLIONS
at me.prettyprint.cassandra.service.ThriftCfDef.(ThriftCfDef.java:65)
at me.prettyprint.cassandra.service.ThriftCfDef.fromThriftList(ThriftCfDef.java:144)
at me.prettyprint.cassandra.service.ThriftKsDef.(ThriftKsDef.java:34)
at me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:192)
at me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:187)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
at me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:201)
I have cassandra, thrift as dependencies in my pom.xml. Any clues as to what could be wrong?

Resources