Connecting to a Cassandra Database

Connecting to a Cassandra Database - cassandra

I am new to Cassandra. I am now working on Ubuntu and the latest Cassandra. I am using Java 1.7 on a PC. I created the installation on a PC without any cloud or server network (replication was one). I can use CQL ok but when I am trying in Java the code fails.
Here is the code:
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
session = cluster.connect("casslinks");
if (!tableExists(linktable, session))
{
createCassTable(linktable, session);
}
for (String url: urls) {
insertCassUrl(url, crawledUrl, session, linktable);
}
Here is the error:
Exception in thread "main"
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:65)
at com.example.GetSoogrData.insertUrls(GetSoogrData.java:596)
at com.example.GetSoogrData.main(GetSoogrData.java:1112)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
I used the debugger and saw the session line failed. I assume the IP address is wrong. I am unsure how to test for the correct IP address or how to define the session variable to it will correctly connect.
I have been using mongo db and 127.0.0.1 works for that.
Has anyone any ideas?

You are probably running your code with missing dependencies.
Datastax java driver requires few addidional jars, SLF4J libraries being one of them for which exception is thrown.
Look here to see what jars you need:
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/reference/settingUpJavaProgEnv_r.html
http://www.datastax.com/documentation/developer/java-driver/2.0/common/drivers/introduction/driverDependencies_r.html

Related

Apache Spark on k8s: securing RPC communication between driver and executors is not working

I have been trying Spark 2.4 deployment on k8s and want to establish a secured RPC communication channel between driver and executors. Was using the following configuration parameters as part of spark-submit
spark.authenticate true
spark.authenticate.secret good
spark.network.crypto.enabled true
spark.network.crypto.keyFactoryAlgorithm PBKDF2WithHmacSHA1
spark.network.crypto.saslFallback false
The driver and executors were not able to communicate on a secured channel and were throwing the following errors.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown challenge message.
at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:109)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:181)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
Can someone guide me on this?

Disclaimer: I do not have a very deep understanding of spark implementation, so, be careful when using the workaround described below.
AFAIK, spark does not have support for auth/encryption for k8s in 2.4.0 version.
There is a ticket, which is already fixed and likely will be released in a next spark version: https://issues.apache.org/jira/browse/SPARK-26239
The problem is that spark executors try to open connection to a driver, and a configuration will be sent only using this connection. Although, an executor creates the connection with default config AND system properties started with "spark.".
For reference, here is the place where executor opens the connection: https://github.com/apache/spark/blob/5fa4384/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L201
Theoretically, if you would set spark.executor.extraJavaOptions=-Dspark.authenticate=true -Dspark.network.crypto.enabled=true ..., it should help, although driver checks that there are no spark parameters set in extraJavaOptions.
Although, there is a workaround (a little bit hacky): you can set spark.executorEnv.JAVA_TOOL_OPTIONS=-Dspark.authenticate=true -Dspark.network.crypto.enabled=true .... Spark does not check this parameter, but JVM uses this env variable to add this parameter to properties.
Also, instead of using JAVA_TOOL_OPTIONS to pass secret, I would recommend to use spark.executorEnv._SPARK_AUTH_SECRET=<secret>.

Spark on Amazon EMR: "Timeout waiting for connection from pool"

I'm running a Spark job on a small three server Amazon EMR 5 (Spark 2.0) cluster. My job runs for an hour or so, fails with the error below. I can manually restart and it works, processes more data, and eventually fails again.
My Spark code is fairly simple and is not using any Amazon or S3 APIs directly. My Spark code passes S3 text string paths to Spark and Spark uses S3 internally.
My Spark program just does the following in a loop: Load data from S3 -> Process -> Write data to different location on S3.
My first suspicion is that some internal Amazon or Spark code is not properly disposing of connections and the connection pool becomes exhausted.
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1015)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:991)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:212)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy44.retrieveMetadata(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:780)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1428)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:313)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:85)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:487)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.$Proxy45.getConnection(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:423)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
... 41 more

I encountered this issue with a very trivial program on EMR (read data from S3, filter, write to S3).
I could solve it by using the S3A file system implementation and setting fs.s3a.connection.maximum to 100 to have a bigger connection pool.
(default is 15; see Hadoop-AWS module: Integration with Amazon Web Services for more config properties)
This is how I set the configuration:
// in Scala
val hc = sc.hadoopConfiguration
// in Python (not tested)
hc = sc._jsc.hadoopConfiguration()
// setting the config is the same for both languages
hc.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hc.setInt("fs.s3a.connection.maximum", 100)
To make it work, the S3 URIs passed to Spark have to start with s3a://...

This issue may also be resolved while remaining on EMRFS by setting fs.s3.maxConnections to something larger than the default 500 in emrfs-site config
https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
If using Java SDK to spin up an EMR cluster you can set this using the withConfigurations method (which is much easier than doing it manually by modifying files). See also https://stackoverflow.com/a/52595058/1586965
You can check this has been set correctly by using the Configurations tab in EMR, e.g.

Create a new JDBC driver with netbeans to connect to Cassandra?

When I run the main class I get these errors:
**Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at org.apache.cassandra.cql.jdbc.CassandraDriver.<clinit>(CassandraDriver.java:52)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at oracledbtest.CqlJdbcTestBasic.main(CqlJdbcTestBasic.java:19)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
How can I create a new driver connection to Cassandra?

What exactly are you trying to do? The error simply means that you do not have slf4j's jar in your classpath..

JBoss - configuring server.xml Connector

I've configured HTTP Connector in server.xml adding some ssl features. I tryied to set my keyAlias to which is the name of the alias for certain certificate (not the private key of the keystore). Then, when I start JBoss I get something like:
[2012-04-12 17:01:37,236 ERROR [org.apache.coyote.http11.Http11Protocol] Error
initializing endpoint
java.io.IOException: Alias name <somealias> do not indetify a key entry
I'm new to ssl configuration and web security core concepts as well. Thanks for your patience.
Edit: complete stacktrace follows:
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.getKeyManagers(JSSESocketFactory.java:412)
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.init(JSSESocketFactory.java:378)
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.createSocket(JSSESocketFactory.java:135)
at org.apache.tomcat.util.net.JIoEndpoint.init(JIoEndpoint.java:497)
at org.apache.tomcat.util.net.JIoEndpoint.start(JIoEndpoint.java:514)
at org.apache.coyote.http11.Http11Protocol.start(Http11Protocol.java:203)
at org.apache.catalina.connector.Connector.start(Connector.java:1146)
at org.jboss.web.tomcat.service.JBossWeb.startConnectors(JBossWeb.java:601)
at org.jboss.web.tomcat.service.JBossWeb.handleNotification(JBossWeb.java:638)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.mx.notification.NotificationListenerProxy.invoke(NotificationListenerProxy.java:153)
at $Proxy46.handleNotification(Unknown Source)
at org.jboss.mx.util.JBossNotificationBroadcasterSupport.handleNotification(JBossNotificationBroadcasterSupport.java:127)
at org.jboss.mx.util.JBossNotificationBroadcasterSupport.sendNotification(JBossNotificationBroadcasterSupport.java:108)
at org.jboss.system.server.ServerImpl.sendNotification(ServerImpl.java:916)
at org.jboss.system.server.ServerImpl.doStart(ServerImpl.java:497)
at org.jboss.system.server.ServerImpl.start(ServerImpl.java:362)
at org.jboss.Main.boot(Main.java:200)
at org.jboss.Main$1.run(Main.java:508)
at java.lang.Thread.run(Thread.java:662)

It looks like you are not importing your keys properties. I'd recommend you review your steps against these two documents
http://docs.jboss.org/jbossweb/3.0.x/ssl-howto.html
A shorter version is here
http://www.agentbob.info/agentbob/79-AB.html

Grails 2.0.1 Hibernate exception in foreign worker thread

I have a Grails 2.0.1 app and I spawn worker threads. My worker threads use GORM to read/write domain objects from MySql. Given that the code that is accessing the DB is not inside of an HTTP request, I create a Hibernate session for the save() like this:
MyClass.withTransaction { status ->
myClass.save()
}
I do that to resolve the "No Hibernate Session bound to thread" problem.
This worked fine on Grails 1.3.7. I am currently attempting to upgrade to Grails 2.0.1, and am seeing an exception I do not understand.
The following stack trace shows the exception.
Although I recognize the exception, and normally resolve it via the "withTransaction" technique shown above, the thing that baffles me about this one is that it is happening on a worker thread that I did not create.
Exception in thread "pool-12-thread-2" org.hibernate.HibernateException: No Hibernate Session bound to thread, and configuration does not allow creation of non-transactional one here
at org.springframework.orm.hibernate3.SpringSessionContext.currentSession(SpringSessionContext.java:63)
at org.hibernate.impl.SessionFactoryImpl.getCurrentSession(SessionFactoryImpl.java:687)
at org.codehaus.groovy.grails.orm.hibernate.SessionFactoryProxy.getCurrentSession(SessionFactoryProxy.java:145)
at org.codehaus.groovy.grails.orm.hibernate.validation.HibernateDomainClassValidator.validate(HibernateDomainClassValidator.java:51)
at org.codehaus.groovy.grails.validation.GrailsDomainClassValidator.validate(GrailsDomainClassValidator.java:121)
at org.codehaus.groovy.grails.orm.hibernate.metaclass.ValidatePersistentMethod.doInvokeInternal(ValidatePersistentMethod.java:119)
at org.codehaus.groovy.grails.orm.hibernate.metaclass.AbstractDynamicPersistentMethod.invoke(AbstractDynamicPersistentMethod.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.springsource.loaded.ri.ReflectiveInterceptor.jlrMethodInvoke(ReflectiveInterceptor.java:1231)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124)
at org.codehaus.groovy.grails.orm.hibernate.HibernateGormValidationApi.validate(HibernateGormEnhancer.groovy:702)
at com......MyClass.validate(MyClass.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.springsource.loaded.ri.ReflectiveInterceptor.jlrMethodInvoke(ReflectiveInterceptor.java:1231)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at org.codehaus.groovy.grails.orm.hibernate.support.ClosureEventListener$7.call(ClosureEventListener.java:282)
at org.codehaus.groovy.grails.orm.hibernate.support.ClosureEventListener$7.call(ClosureEventListener.java:267)
at org.codehaus.groovy.grails.orm.hibernate.support.ClosureEventListener.doWithManualSession(ClosureEventListener.java:302)
at org.codehaus.groovy.grails.orm.hibernate.support.ClosureEventListener.onPreUpdate(ClosureEventListener.java:267)
at org.codehaus.groovy.grails.orm.hibernate.EventTriggeringInterceptor.onPreUpdate(EventTriggeringInterceptor.java:164)
at org.codehaus.groovy.grails.orm.hibernate.EventTriggeringInterceptor.onPersistenceEvent(EventTriggeringInterceptor.java:89)
at org.grails.datastore.mapping.engine.event.AbstractPersistenceEventListener.onApplicationEvent(AbstractPersistenceEventListener.java:46)
at org.springframework.context.event.SimpleApplicationEventMulticaster$1.run(SimpleApplicationEventMulticaster.java:92)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
I am hoping someone can enlighten me regarding:
where this worker thread pool originates from?
why is is doing a preUpdate event (when I believe the save in my thread already did a successful validate and save
Can I somehow config my app such that these foreign worker threads are not there
If #3 can't be pulled off, how can I do the equivalent of a "withTransaction" over in the worker thread code that I do not "own"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Connecting to a Cassandra Database - cassandra

Related

Apache Spark on k8s: securing RPC communication between driver and executors is not working

Spark on Amazon EMR: "Timeout waiting for connection from pool"

Create a new JDBC driver with netbeans to connect to Cassandra?

JBoss - configuring server.xml Connector

Grails 2.0.1 Hibernate exception in foreign worker thread

Categories

Resources