InvalidQueryException when trying to create column family in Cassandra via unit - cassandra

I have a 3 node cassandra cluster and via my unit test in Java, I first create a keyspace and then create a column family within that keyspace. Sometimes the unit tests passes but randomly I keep getting the following error. I am using the latest datastax 2.1.4 java driver and the cassandra version in 2.1.0.
com.symc.edp.database.nosql.NoSQLPersistenceException: com.datastax.driver.core.exceptions.InvalidQueryException: Cannot add column family 'testmaxcolumnstable' to non existing keyspace 'testmaxcolumnskeyspace'.
at com.symc.edp.database.nosql.cassandra.CassandraCQLTableEditor.createTable(CassandraCQLTableEditor.java:67)
at com.symc.edp.database.nosql.cassandra.TestCassandraWideRowPerformance.testWideRowInserts(TestCassandraWideRowPerformance.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:483)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Cannot add column family 'testmaxcolumnstable' to non existing keyspace 'testmaxcolumnskeyspace'.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
at com.symc.edp.database.nosql.cassandra.CassandraCQLTableEditor.createTable(CassandraCQLTableEditor.java:65)
... 6 more
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Cannot add column family 'testmaxcolumnstable' to non existing keyspace 'testmaxcolumnskeyspace'.
at com.datastax.driver.core.Responses$Error.asException(Responses.java:104)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:421)
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:697)
at com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at com.datastax.shaded.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at com.datastax.shaded.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at com.datastax.shaded.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at com.datastax.shaded.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at com.datastax.shaded.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at com.datastax.shaded.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at com.datastax.shaded.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
And in the system.log file of cassandra I see the following exception:
ERROR [SharedPool-Worker-1] 2015-01-28 15:08:24,286 ErrorMessage.java:218 - Unexpected exception during request
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_05]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_05]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_05]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_05]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) ~[na:1.8.0_05]
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:878) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:114) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_05]
INFO [SharedPool-Worker-1] 2015-01-28 15:13:01,051 MigrationManager.java:229 - Create new Keyspace: KSMetaData{name=testmaxcolumnskeyspace, strategyClass=SimpleStrategy, strategyOptions={replication_factor=1}, cfMetaData={}, durableWrites=true, userTypes=org.apache.cassandra.config.UTMetaData#790ee1bb}
INFO [MigrationStage:1] 2015-01-28 15:13:01,058 ColumnFamilyStore.java:856 - Enqueuing flush of schema_keyspaces: 512 (0%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:7] 2015-01-28 15:13:01,059 Memtable.java:326 - Writing Memtable-schema_keyspaces#1727029917(138 serialized bytes, 3 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:7] 2015-01-28 15:13:01,077 Memtable.java:360 - Completed flushing /usr/share/apache-cassandra-2.1.0/bin/../data/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-103-Data.db (175 bytes) for commitlog position ReplayPosition(segmentId=1422485457803, position=1181)
Also, I verified via devcenter, the keyspace didn't get created.

Without seeing your code, my guess is you need a sleep in between creating the keyspace and trying to create tables in it. You probably need to give the keyspace definition a couple seconds to propagate to all the nodes in your cluster before you try to use it.

It would help as noted to see your configuration class. We are using ClassPathCQLDataSet to issue our statements and create keyspace at same go (link to ClassPathCqlDataSet documentation, note that boolean on position 2 and 3 tell it to create and delete keyspace). db.cql is file where we hold create table statments. Here is our configuration which might help you:
package some.package;
import org.cassandraunit.CQLDataLoader;
import org.cassandraunit.dataset.cql.ClassPathCQLDataSet;
import org.cassandraunit.utils.EmbeddedCassandraServerHelper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.DisposableBean;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.context.support.PropertySourcesPlaceholderConfigurer;
import org.springframework.core.io.ClassPathResource;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;
#Configuration
#Profile({"test"})
public class TestCassandraConfig implements DisposableBean {
private static final Logger LOGGER = LoggerFactory.getLogger(TestCassandraConfig.class);
private final String CQL = "db.cql";
#Value("${cassandra.contact_points:localhost}")
private String contact_points;
#Value("${cassandra.port:9142}")
private int port;
#Value("${cassandra.keyspace:test}")
private String keyspace;
private static Cluster cluster;
private static Session session;
private static SessionProxy sessionProxy;
#Bean
public Session session() throws Exception {
if (session == null) {
initialize();
}
return sessionProxy;
}
#Bean
public TestApplicationContext testApplicationContext() {
return new TestApplicationContext();
}
private void initialize() throws Exception {
LOGGER.info("Starting embedded cassandra server");
EmbeddedCassandraServerHelper.startEmbeddedCassandra("another-cassandra.yaml");
LOGGER.info("Connect to embedded db");
cluster = Cluster.builder().addContactPoints(contact_points).withPort(port).build();
session = cluster.connect();
LOGGER.info("Initialize keyspace");
final CQLDataLoader cqlDataLoader = new CQLDataLoader(session);
cqlDataLoader.load(new ClassPathCQLDataSet(CQL, false, true, keyspace));
}
#Override
public void destroy() throws Exception {
if (cluster != null) {
cluster.close();
cluster = null;
}
}
}

Related

Quarkus unable to load the cassandra custom retry policy class

I am working on a task to migrate Quarkus from 1.x to 2.x and Quarkus integration with embedded Cassandra failed in unit testing with error -
Caused by: java.lang.IllegalArgumentException: Can't find class com.mind.common.connectors.cassandra.CassandraCustomRetryPolicy
(specified by advanced.retry-policy.class)
**Custom retry policy**
public class CassandraCustomRetryPolicy implements RetryPolicy {
public CassandraCustomRetryPolicy(DriverContext context, String profileName) {
}
//override methods
}
****quarkus test be like** -**
#QuarkusTest
#QuarkusTestResource(CassandraTestResource.class)
class Test {}
**CassandraTestResource class start the embedded cassandra**
public class CassandraTestResource implements QuarkusTestResourceLifecycleManager {
private Cassandra cassandra;
#Override
public Map<String, String> start() {
cassandra = new CassandraBuilder().version("3.11.9")
.addEnvironmentVariable("JAVA_HOME", getJavaHome())
.addJvmOptions("-Xms512M -Xmx512m").build();
cassandra.start();
}
I have override the default Cassandra driver policy in application.conf inside resource folder.
datastax-java-driver {
basic.request {
timeout = ****
consistency = ***
serial-consistency = ***
}
advanced.retry-policy {
class = com.mind.common.connectors.cassandra.CassandraCustomRetryPolicy
}
I have observed that my custom retry policy class comes under banned resource in QuarkusClassLoader.java-
String resourceName = sanitizeName(name).replace('.', '/') + ".class";
boolean parentFirst = parentFirst(resourceName, state);
if (state.bannedResources.contains(resourceName)) {
throw new ClassNotFoundException(name);
}
I have captured the following logs -
java.lang.ClassNotFoundException: com.mind.common.connectors.cassandra.CassandraCustomRetryPolicy
at io.quarkus.bootstrap.classloading.QuarkusClassLoader.loadClass(QuarkusClassLoader.java:438)
at io.quarkus.bootstrap.classloading.QuarkusClassLoader.loadClass(QuarkusClassLoader.java:414)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:315)
at com.datastax.oss.driver.internal.core.util.Reflection.loadClass(Reflection.java:57)
at com.datastax.oss.driver.internal.core.util.Reflection.resolveClass(Reflection.java:288)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:235)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfigProfiles(Reflection.java:194)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildRetryPolicies(DefaultDriverContext.java:359)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getRetryPolicies(DefaultDriverContext.java:761)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:339)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1100(DefaultSession.java:300)
at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:146)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
I am using quarkus version 2.7.2.Final with cassandra driver version 4.14.0
It's not a complete answer but I wanted to leave some notes here in case anybody else can get this over the finish line before I get back to it.
The underlying problem here is that in the Quarkus test case described above the Java driver code is loaded by the QuarkusClassLoader which (a) is more restrictive about where it loads code from and (b) doesn't appear to immediately support calling it's parent if necessary. So in this case executing the following in the test will fail with a ClassNotFoundException:
CqlSession.class.getClassLoader().forName(customretrypolicyclassname)
while the following works without issue:
CqlSession.class.getClassLoader().getParent().forName(customretrypolicyclassname)
The class loader used to load CqlSession is the QuarkusClassLoader instance while it's parent is a stock JVM class loader.
The Java driver uses Class.forName() to load the classes specified for this policy. But since the Quarkus class loader is used to load the driver code itself that's the loader that's used for these reflection ops... and as mentioned above that driver has some specific characteristics that make loading external code harder.
It worked after I initialized CQL session like -
CqlSession.builder()
.addContactPoint(new InetSocketAddress(settings.getAddress(), settings.getPort()))
.withLocalDatacenter("***")
. withClassLoader(Thread.currentThread().getContextClassLoader()).build())

Hazelcast: LinkageError "attempted duplicate class definition for name"

I saw this error in our hazelcast cluster.
The code is trying to call executeOnKey() on an IMap. e.g.
IMap<MyKey, MyCachedClass> myMap = hz.getMap("my-map");
myMap.executeOnKey(myKey, new EntryProcessor() {
#Override
public Object process(Map.Entry entry) {
return entry.getValue().setItem(item);
}
#Override
public EntryBackupProcessor getBackupProcessor() {
return null;
}
});
...and am getting this exception:
Exception in thread "MYTHREAD" java.lang.LinkageError: loader (instance of com/hazelcast/internal/usercodedeployment/impl/ClassSource): attempted duplicate class definition for name: "com/mycompany/model/Item$ItemEnum"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at com.hazelcast.internal.usercodedeployment.impl.ClassSource.define(ClassSource.java:50)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.tryToGetClassFromRemote(ClassLocator.java:163)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.handleClassNotFoundException(ClassLocator.java:95)
at com.hazelcast.internal.usercodedeployment.impl.ClassSource.loadClass(ClassSource.java:65)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at com.hazelcast.internal.usercodedeployment.impl.ClassSource.define(ClassSource.java:50)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.tryToGetClassFromRemote(ClassLocator.java:161)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.handleClassNotFoundException(ClassLocator.java:95)
at com.hazelcast.internal.usercodedeployment.UserCodeDeploymentService.handleClassNotFoundException(UserCodeDeploymentService.java:89)
at com.hazelcast.internal.usercodedeployment.UserCodeDeploymentClassLoader.loadClass(UserCodeDeploymentClassLoader.java:57)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:288)
at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:237)
at com.hazelcast.nio.IOUtil$ClassLoaderAwareObjectInputStream.resolveClass(IOUtil.java:646)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at java.util.ArrayList.readObject(ArrayList.java:797)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:82)
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:75)
at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
at com.hazelcast.internal.serialization.impl.AbstractSerializationService.readObject(AbstractSerializationService.java:269)
at com.hazelcast.internal.serialization.impl.ByteArrayObjectDataInput.readObject(ByteArrayObjectDataInput.java:574)
at com.hazelcast.map.impl.operation.EntryOperation.readInternal(EntryOperation.java:263)
at com.hazelcast.spi.Operation.readData(Operation.java:728)
at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.readInternal(DataSerializableSerializer.java:160)
at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:106)
at com.hazelcast.internal.serialization.impl.DataSerializableSerializer.read(DataSerializableSerializer.java:51)
at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:187)
at com.hazelcast.spi.impl.NodeEngineImpl.toObject(NodeEngineImpl.java:323)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:398)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:153)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:123)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:110)
at ------ submitted from ------.(Unknown Source)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:127)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:162)
at com.hazelcast.map.impl.proxy.MapProxySupport.executeOnKeyInternal(MapProxySupport.java:1099)
at com.hazelcast.map.impl.proxy.MapProxyImpl.executeOnKeyInternal(MapProxyImpl.java:109)
at com.hazelcast.map.impl.proxy.MapProxyImpl.executeOnKey(MapProxyImpl.java:757)
at com.mycompany.Processor.java
So to me this seemed like:
The 'executeOnKey(item)' command is being sent to the 'other' node in
the cluster where this data is located.
The other node is attempting to deserialize the 'item'.
The other node cannot find the class for the inner enum 'Item$ItemEnum'
The other node makes a request to UserCodeDeploymentService to fetch class definition.
Once the class definition is fetched, and it attempts to create the new class, a
LinkageError occurs.
I just cannot understand why it's getting a duplicate class exception, when it first claimed it couldn't find the class. And this error is proving difficult to replicate.
We do have 'userCodeDeployment' set to allow loading of remote class definitions. Our config looks like this:
UserCodeDeploymentConfig ucdConfig = new UserCodeDeploymentConfig();
ucdConfig.setEnabled(true);
ucdConfig.setClassCacheMode(UserCodeDeploymentConfig.ClassCacheMode.ETERNAL);
ucdConfig.setProviderMode(UserCodeDeploymentConfig.ProviderMode.LOCAL_AND_CACHED_CLASSES);
config.setUserCodeDeploymentConfig(ucdConfig);
So it sort of looks like it's doing what I'd expect. i.e. getting a remote class definition because it can't find it locally. Is just a puzzle as to why the exception is occurring.
Eventually worked this one out.
A simple example.
Assume two Hazelcast nodes. Both have userCodeDeployment switched on. (The ability to share class definitions over the wire.)
Node 1 has MyClass on it's classpath, Node 2 does not.
MyClass definition, note the inner enum (I think the inner enum can be swapped for an inner class and the same thing will happen).
public class MyClass implements Serializable {
private final InnerEnum enumVal;
public MyClass(InnerEnum enumVal) {
this.enumVal = enumVal;
}
public enum InnerEnum {
ONE, TWO, THREE;
}
}
From Node 1, the following code is run:
hz.getMap("myClassMap").put("myClassKey", new MyClass(MyClass.InnerEnum.ONE));
hz.getMap("myInnerEnumMap").put("myInnerEnumKey", MyClass.InnerEnum.TWO);
Then, on Node 2, the following code is run:
System.out.println(hz.getMap("myInnerEnumMap").get("myInnerEnumKey"));
System.out.println(hz.getMap("myClassMap").get("myClassKey"));
On Node 2, I see the following logged:
[TRACE] ClassLocator Loaded class com.mycompany.MyClass$InnerEnum from Member [myhost]:5701 - cc0b2872-7f5b-484b-a40f-7c7ba1fdc165
TWO
[TRACE] ClassLocator Loaded class com.mycompany.MyClass from Member [myhost]:5701 - cc0b2872-7f5b-484b-a40f-7c7ba1fdc165
Exception in thread "main" java.lang.LinkageError: loader (instance of com/hazelcast/internal/usercodedeployment/impl/ClassSource): attempted duplicate class definition for name: "com/mycompany/MyClass$InnerEnum"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at com.hazelcast.internal.usercodedeployment.impl.ClassSource.define(ClassSource.java:50)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.tryToGetClassFromRemote(ClassLocator.java:163)
at com.hazelcast.internal.usercodedeployment.impl.ClassLocator.handleClassNotFoundException(ClassLocator.java:95)
at com.hazelcast.internal.usercodedeployment.UserCodeDeploymentService.handleClassNotFoundException(UserCodeDeploymentService.java:89)
at com.hazelcast.internal.usercodedeployment.UserCodeDeploymentClassLoader.loadClass(UserCodeDeploymentClassLoader.java:57)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:288)
at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:237)
at com.hazelcast.nio.IOUtil$ClassLoaderAwareObjectInputStream.resolveClass(IOUtil.java:646)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:82)
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:75)
at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:48)
at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:187)
at com.hazelcast.map.impl.proxy.MapProxySupport.toObject(MapProxySupport.java:1245)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:120)
at com.mycompany.main(Node2.java:74)
So:
Node 2 is first trying to fetch and display an instance of MyClass$InnerEnum. It fetches the InnerEnum class definition from Node 1 over hazelcast.
Node 2 then tries to fetch and display an instance of MyClass. It fetches the MyClass class definition from Node 1.
Hazelcast then loops through the inner classes of MyClass (which just contains InnerEnum) and asks the class loader to define them without checking whether they already exist.
See (from Hazelcast 3.11.1)
com.hazelcast.internal.usercodedeployment.impl.ClassLocator:160
Map<String, byte[]> innerClassDefinitions = classData.getInnerClassDefinitions();
classSource.define(name, classData.getMainClassDefinition());
for (Map.Entry<String, byte[]> entry : innerClassDefinitions.entrySet()) {
classSource.define(entry.getKey(), entry.getValue());
}
The easiest solution in our case was to pull InnerEnum out as a top level Enum.
Try creating the EntryProcessor as a static or a top level class and pass the item explicitly in its constructor. That hopefully should resolve the issue.
The JVM can get a little tricky with dynamic class loading, which is what is done by UserCodeDeployment.

Runtime Configuration of Cassandra connection in Phantom DSL

I'm using phantom to connect to Apache Cassandra and want to configure the connector at runtime, i.e. I want to parse some configuration file, extract a list of Cassandra databases and pass that somehow to my Database object.
I followed this guide to have an additional layer DatabaseProvider between Database and my service. Hence, I can provide a static DatabaseProvider like this:
object ProdConnector {
val connector = ContactPoints(Seq("dev-cassndr.containers"), 9042)
.keySpace("test")
}
object ProdDatabase extends MyDatabase(ProdConnector.connector)
trait ProdDatabaseProvider extends MyDatabaseProvider {
override def database: MyDatabase = ProdDatabase
}
and in my main function I do
val service = new MessageService with ProdDatabaseProvider {}
How can I achieve the same result at runtime without the singleton objects?
I made several attempst but always got NullPointerExceptions. My current approach is to have a Cassandra configuration object which is read by Jackson from a file:
case class CassandraConfigurator(
contactPoints: Seq[String],
keySpace: String,
port: Int = 9042,
) {
#JsonCreator
def this() = this(null, null, 9042)
val connection: CassandraConnection = {
val p = ContactPoints(contactPoints, port)
p.keySpace(keySpace)
}
}
my entry point then extends StreamApp from fs2
object Main extends StreamApp[IO] {
override def stream(args: List[String], reqShutdown: IO[Unit])
: Stream[IO, ExitCode] = {
val conf: CassandraConfigurator = ???
val service = new MyService with MyDatabaseProvider {
override def database: MyDatabase = new MyDatabase(conf.connection)
}
service.database.create()
val api = ApiWithService(service).getApi
BlazeBuilder[IO].bindHttp(80, "0.0.0.0").mountService(api, "/").serve
}
}
This results in the following error:
12:44:03.436 [pool-10-thread-1] INFO com.datastax.driver.core.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer
12:44:03.436 [pool-10-thread-1] INFO com.datastax.driver.core.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer
12:44:03.463 [pool-10-thread-1] DEBUG com.datastax.driver.core.SystemProperties - com.datastax.driver.NEW_NODE_DELAY_SECONDS is undefined, using default value 1
12:44:03.463 [pool-10-thread-1] DEBUG com.datastax.driver.core.SystemProperties - com.datastax.driver.NEW_NODE_DELAY_SECONDS is undefined, using default value 1
12:44:03.477 [pool-10-thread-1] DEBUG com.datastax.driver.core.SystemProperties - com.datastax.driver.NOTIF_LOCK_TIMEOUT_SECONDS is undefined, using default value 60
12:44:03.477 [pool-10-thread-1] DEBUG com.datastax.driver.core.SystemProperties - com.datastax.driver.NOTIF_LOCK_TIMEOUT_SECONDS is undefined, using default value 60
java.lang.NullPointerException
at com.outworkers.phantom.connectors.ContactPoints$.$anonfun$apply$3(ContactPoint.scala:101)
at com.outworkers.phantom.connectors.DefaultSessionProvider.<init>(DefaultSessionProvider.scala:37)
at com.outworkers.phantom.connectors.CassandraConnection.provider$lzycompute(CassandraConnection.scala:46)
at com.outworkers.phantom.connectors.CassandraConnection.provider(CassandraConnection.scala:41)
at com.outworkers.phantom.connectors.CassandraConnection.session$lzycompute(CassandraConnection.scala:52)
at com.outworkers.phantom.connectors.CassandraConnection.session(CassandraConnection.scala:52)
at com.outworkers.phantom.database.Database.session$lzycompute(Database.scala:36)
at com.outworkers.phantom.database.Database.session(Database.scala:36)
at com.outworkers.phantom.ops.DbOps.$anonfun$createAsync$2(DbOps.scala:66)
at com.outworkers.phantom.builder.query.execution.ExecutionHelper$.$anonfun$sequencedTraverse$2(ExecutableStatements.scala:71)
at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:304)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread "run-main-0"
[error] java.lang.RuntimeException: Nonzero exit code: 1
[error] at sbt.Run$.executeTrapExit(Run.scala:124)
[error] at sbt.Run.run(Run.scala:77)
[error] at sbt.Defaults$.$anonfun$bgRunTask$5(Defaults.scala:1168)
[error] at sbt.Defaults$.$anonfun$bgRunTask$5$adapted(Defaults.scala:1163)
[error] at sbt.internal.BackgroundThreadPool.$anonfun$run$1(DefaultBackgroundJobService.scala:366)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
[error] at scala.util.Try$.apply(Try.scala:209)
[error] at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:289)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
In your case there's nothing implicit here, and the runtime stuff looks like it should be simple. Your problem is more than likely Jackson failing to read from the file. To test that try the below, if it's successfully trying to connect to local Cassandra, then that's your problem.
object Main extends StreamApp[IO] {
override def stream(args: List[String], reqShutdown: IO[Unit])
: Stream[IO, ExitCode] = {
val service = new MyService with MyDatabaseProvider {
override def database: MyDatabase = new MyDatabase(Connector.default)
}
service.database.create()
val api = ApiWithService(service).getApi
BlazeBuilder[IO].bindHttp(80, "0.0.0.0").mountService(api, "/").serve
}
}
Are you sure val conf: CassandraConfigurator = ??? is being properly initialised? If that is what you have in code, I am not surprised you are getting the NPE.

EMR 5.0 + spark 2.0 + Cassandra connector: null pointer exception when trying to connect to Cassandra

I trying to deploy Spark 2.0(streaming) application over EMR 5 which connects to Cassandra.
The Spark-Cassandra-connector I use is:
"com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.0.0-M3".
The application is working as a stand alone on my computer, and connects successfully to Cassandra (save data). all relevant Cassandra ports seems to be open in the cluster.
But I still get the bottom exception.
Below is the function "getCassandraMappedTable".
class VisitDaoImpl {
override def getCassandraMappedTable():CassandraTableScanRDD[Visit] = {
SparkContextHolder.sparkContext.cassandraTable[Visit](keyspace, tableName)
}
}
And the relevant Visit:
case class Visit(val visitorKey:String, val normalizedDomain:String, val timestamp:Date, val visitId:String, val batchId:Long) extends Serializable
object Visit extends CassandraTable {
import Visit.Columns._
implicit object Mapper extends DefaultColumnMapper[Visit](
Map("visitorKey" -> VISITOR_KEY,
"normalizedDomain" -> NORMALIZED_DOMAIN,
"timestamp" -> TIMESTAMP,
"visitId" -> VISIT_ID))
val TABLE_NAME = "visit"
case object Columns {
val VISITOR_KEY = "visitor_key"
val NORMALIZED_DOMAIN = "normalized_domain"
val TIMESTAMP = "timestamp"
val VISIT_ID = "visit_id"
}
val columnsNames:Seq[ColumnName] = toColumnNames(Columns)
}
And I get the following exception without reasonable reason:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1069.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1069.0 (TID 721, ip-10-0-0-111.eu-west-1.compute.internal): java.lang.NullPointerException
at com.datastax.spark.connector.SparkContextFunctions.cassandraTable$default$3(SparkContextFunctions.scala:52)
at com.naturalint.myproject.daoimpl.VisitDaoImpl.getCassandraMappedTable(VisitDaoImpl.scala:24)
at com.naturalint.myproject.daoimpl.VisitDaoImplVisitDaoImpl.findLatestBetween(VisitDaoImpl.scala:92)
at com.naturalint.myproject.servicesimpl.MyAlgo$$anonfun$processStream$1$$anonfun$apply$2.apply(MyAlgo.scala:122)
at com.naturalint.myproject.servicesimpl.MyAlgo$$anonfun$processStream$1$$anonfun$apply$2.apply(MyAlgo.scala:110)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:26)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any ideas?
Thanks,
Eran.

Spark and Cassandra parallel processing

I have a following task ahead of me.
User provides set of IP addresses a config file while executing spark submit command.
Lets say that array looks like this :
val ips = Array(1,2,3,4,5)
There can be up to 100.000 values in array..
For all elements in array, I should read data for Cassandra, perform some computation and insert data back to Cassandra.
If I do:
ips.foreach(ip =>{
- read data from Casandra for specific "ip" // for each IP there is different amount of data to read (within the functions I determine start and end date for each IP)
- process it
- save it back to Cassandra})
this works fine.
I believe that process runs sequentially; I don't exploit parallelism.
On the other hand if I do:
val IPRdd = sc.parallelize(Array(1,2,3,4,5))
IPRdd.foreach(ip => {
- read data from Cassandra // I need to use spark context to make the query
-process it
save it back to Cassandra})
I get serialization exception, because spark is trying to serialize spark context, which is not serializable.
How to make this work, but still exploit parallelism.
Thanks
Edited
This is the execption I get:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:919)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
at com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1.apply(WibeeeBatchJob.scala:59)
at com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1.apply(WibeeeBatchJob.scala:54)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$.main(WibeeeBatchJob.scala:54)
at com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob.main(WibeeeBatchJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext#311ff287)
- field (class: com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1, name: sc$1, type: class org.apache.spark.SparkContext)
- object (class com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1, )
- field (class: com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1$$anonfun$apply$1, name: $outer, type: class com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1)
- object (class com.enerbyte.spark.jobs.wibeeebatch.WibeeeBatchJob$$anonfun$main$1$$anonfun$apply$1, )
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
Easiest thing to do is to use the Spark Cassandra Connector which can handle connection pooling and serialization.
With that you could do something like
sc.parallelize(inputData, numTasks)
.mapPartitions { it =>
val con = CassandraConnection(yourConf)
con.withSessionDo{ session =>
//Use the session
}
//Do any other processing
}.saveToCassandra("ks","table"
This would be completely manual operation of a Cassandra Connection. The sessions would all be automatically pooled and cached and if you prepare a statement those will be cached on the executor as well.
If you would like to use more built in methods, there also exists joinWithCassandraTable which may work in your situation.
sc.parallelize(inputData, numTasks)
.joinWithCassandraTable("ks","table") //Retrieves all records for which input data is the primary key
.map( //manipulate returned results if needed )
.saveToCassandra("ks","table")

Resources