NoSuchMethodError: org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables - cassandra

I have upgraded one of cluster node from 2.2.19 to 3.11.13, but I'm continuously getting the below error in system logs. I'm using TimeWindowCompactionStrategy-3.7.jar
Please let me know how can I fix this error ?
ERROR [CompactionExecutor:2338] 2022-09-12 14:40:41,310 CassandraDaemon.java:244 - Exception in thread Thread[CompactionExecutor:2338,1,main]
java.lang.NoSuchMethodError: org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(Lorg/apache/cassandra/db/lifecycle/SSTableSet;Ljava/lang/Iterable;)Ljava/util/Collection;
at com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy.getNextBackgroundSSTables(TimeWindowCompactionStrategy.java:110)
at com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy.getNextBackgroundTask(TimeWindowCompactionStrategy.java:79)
at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:154)
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
at java.lang.Thread.run(Thread.java:750)

TimeWindowCompactionStrategy has been merged into Apache Cassandra 3.11.13, so you shouldn't need to include the JAR for it. Remove the JAR file and restart the node(s).
Edit
Ok, after a quick conversation with Jeff, he has two suggestions:
The 3.7 jar won't be compatible with 3.11. So issue the ALTER TABLE syntax on the 3.11 node, which will use the TWCS version bundled with 3.11. It'll not propagate to the 2.2 hosts (because schema changes won't cross major versions).
You'll be in a schema disagreement state, but that should be ok until the upgrade is complete. Give that a try in a lower environment, just to make sure it works.
The other option is to take the version of TWCS from 3.11, rename it with the right classpath to use com.jeffjirsa, and just use that instead.
Edit
Protocol exception with client networking: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version (4); supported versions are (3/v3, 4/v4, 5/v5-beta)
Is the error due to the mixed versions in the cluster ?
Yes! I've seen that happen before. You can actually force a protocol version in the driver's connection settings.
Best of luck!

Related

Spark 3.2 on Kubernetes keeps throwing okhttp3/okio EOFException

I'm using Spark 3.2.1 image that was built from the official distribution via `docker-image-tool.sh', on Kubernetes 1.18 cluster. Everything works fine, except for this error message every 90 seconds:
WARN WatcherWebSocketListener: Exec Failure
java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This error message does not effect the application, but it's really annoying, especially for Jupyter users, and the lack of details makes it very hard to debug.
It appears on any submit variation - spark-submit, pyspark, spark-shell, and regardless to dynamic execution enabled or disabled.
I've found traces of it on the internet, but all occurrences were from older versions of Spark and resolved by using "newer" version of fabric8 (4.x). Spark 3.2.1 already use fabric8 5.4.1.
I wonder if anyone else still sees this error in Spark 3.x, and has a resolution.
Thanks.
Update:
This seems to be related to the Kubernetes cluster itself. After migrating to a new cluster this error was gone.

stop hive's RetryingHMSHandler logging to databricks cluster

I'm using azure databricks 5.5 LTS with spark 2.4.3 and scala 2.11. Almost every request going to the databricks cluster is coming up with the following error log
ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)
at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:487)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
While this isn't affecting the end-result of what we're trying to do, our logs are constantly getting filled with this and isn't very pleasant to go through. I've tried turning it off by setting the following property to the driver and executor
log4j.level.org.apache.hadoop.hive.metastore.RetryingHMSHandler=OFF
only to, later on, realize the class RetryingHMSHandler actually uses slf4j logger, is there an elegant way to overcome this?
Maybe late, but faced the same issue with Databricks cluster 9.1 LTS (Apache Spark 3.1.2, Scala 2.12). Solved by using a init script that added the following two properties
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL, publicFile
log4j.additivity.org.apache.hadoop.hive.metastore.RetryingHMSHandler=false
to driver's log4j.properties.
My goal was to remove all verbose logs from the "log4j-active.log" file that can be downloaded from a job UI. By following https://learn.microsoft.com/en-us/azure/databricks/kb/clusters/overwrite-log4j-logs, I decided to add/overwrite some property values within driver's log4j.properties (first I had a look at its content, of course).
Added that two properties, I was able to remove also RetryingHMSHandler (the only third-party log call that was still surviving)
Hope it helps ;)

Flink-Cassandra connector throws exception (flink-connector-cassandra_2.11-1.10.0)

I am trying to upgrade flink 1.7.2 to flink 1.10 and I am having problem with cassandra connector. Everytime I start a job that is using it the following exception is thrown:
com.datastax.driver.core.exceptions.TransportException: [/xx.xx.xx.xx] Error writing
at com.datastax.driver.core.Connection$10.operationComplete(Connection.java:550)
at com.datastax.driver.core.Connection$10.operationComplete(Connection.java:534)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)
at com.datastax.shaded.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
at com.datastax.shaded.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
at com.datastax.driver.core.Connection$Flusher.run(Connection.java:870)
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.datastax.shaded.netty.handler.codec.EncoderException: java.lang.OutOfMemoryError: Direct buffer memory
at com.datastax.shaded.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:643)
Also the following message was printed when the job was run locally (not in YARN):
13:57:54,490 ERROR com.datastax.shaded.netty.util.ResourceLeakDetector - LEAK: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created.
All jobs that do not use cassandra connector are working properly
Can someone help?
UPDATE: The bug is still reproducible and I think this is the reason: https://issues.apache.org/jira/browse/FLINK-17493.
I had an old configuration (from flink 1.7) where classloader.parent-first-patterns.additional: com.datastax. was configured and my cassadndra-flink connector was in flink/lib folder ( this was done because of other problems related to shaded netty I had with Cassandra-flink connector). Now with the migration to flink 1.10 the following problem was hit. Once removing this configuration - classloader.parent-first-patterns.additional: com.datastax., including flink-connector-cassandra_2.12-1.10.0.jar in my jar and removing it from /usr/lib/flink/lib/ the problem was no longer reproducible.

DSE Cassandra has guava-16.0.1.jar conflict issue with CDH spark

We used DSE4.8.3 Cassandra to run CDH5.5.0 Spark in oozie, just found that DSE Cassandra has guava-16.0.1.jar conflict issue as following.
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, com.google.common.reflect.TypeToken.isPrimitive()Z
java.lang.NoSuchMethodError: com.google.common.reflect.TypeToken.isPrimitive()Z
Cassandra version in DSE 4.8.3 was 2.1.11.969. Spark version in CDH 5.5.0 was 1.5.0. For cassandra driver and connector.
1.If we used cassandra-driver-core-2.2.0-rc3.jar and spark-cassandra-connector_2.10-1.5.0-M2.jar, which both used guava-16.0.1.jar as their dependencies, it threw above exception "Method not found: com.google.common.reflect.TypeToken.isPrimitive()Z" in CDH (CDH5.5.0 spark used guava-14.0.1.jar, not guava-16.0.1.jar ).
2.If we used lower version cassandra-driver-core-2.2.0-rc1.jar and spark-cassandra-connector_2.10-1.5.0-M1.jar, which both used guava-14.0.1.jar as their dependencies, it threw following exception:
Exception in thread "main" java.lang.AbstractMethodError: com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy.close()V
at com.datastax.driver.core.Cluster$Manager.close(Cluster.java:1417)
at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:1167)
at com.datastax.driver.core.Cluster.closeAsync(Cluster.java:461)
at com.datastax.driver.core.Cluster.close(Cluster.java:472)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)
I found an answer for this exception:(saying that using upper version spark-cassandra-connector_2.10-1.5.0-M2.jar will resolve the issue)
Spark + Cassandra connector fails with LocalNodeFirstLoadBalancingPolicy.close()
So now, we are mystified with the Cassandra dependencies issue. How to fix this cassandra guava-16.0.1 dependency issue? Is it possible to build a new spark-cassandra-connector.jar fixing with both issues? Can you help to resolve this issue? Thanks!
There should be no C* Driver dependency as that should be brought in automatically with the Spark Cassandra Connector Dependency as a transitive dependency. I would use the 1.5.0 release. Then you need to make sure when building that you exclude all other guava versions.
This means make sure if you are making a fat jar you aren't including any Spark distributions in your code and any Hadoop libs have Guava excluded.
There are a few other mail threads on this for more details
Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use.
https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/HnTsWJkI5jo
Issue with guava
https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/uB_DN_CcK2k

Cassandra Datastax OpsCenter "NO DATA" on some graphs

I'm running on 1 node test cluster Cassandra Datastax OpsCenter 5.2.0, installed from Amazon Datastax AMI version 2.6.3 with Cassandra Community 2.2.0-1.
OpsCenter doesn't report any errors (All agents connected) yet on some graphs I see NO DATA (while I know that there been a lot of requests):
On some there is nothing:
some are working just fine, like OS: Load, Storage Capacity and OS: Disk Utilization.
What could be the reason for this? How to fix it?
EDIT:
Opscenter logs seems to be fine. In agent.log file, I've found the following errors (dozens of times):
ERROR [jmx-metrics-2] 2015-09-21 06:50:30,910 Error getting CF metrics
java.lang.UnsupportedOperationException: nth not supported on this type: PersistentArrayMap
at clojure.lang.RT.nthFrom(RT.java:857)
at clojure.lang.RT.nth(RT.java:807)
at opsagent.rollup$process_metric_map.invoke(rollup.clj:252)
at opsagent.metrics.jmx$cf_metric_helper.invoke(jmx.clj:96)
at opsagent.metrics.jmx$start_pool$fn__15320.invoke(jmx.clj:159)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR [jmx-metrics-4] 2015-09-21 06:50:38,524 Error getting general metrics
java.lang.UnsupportedOperationException: nth not supported on this type: PersistentHashMap
at clojure.lang.RT.nthFrom(RT.java:857)
at clojure.lang.RT.nth(RT.java:807)
at opsagent.rollup$process_metric_map.invoke(rollup.clj:252)
at opsagent.metrics.jmx$generic_metric_helper.invoke(jmx.clj:73)
at opsagent.metrics.jmx$start_pool$fn__15334$fn__15335.invoke(jmx.clj:171)
at opsagent.metrics.jmx$start_pool$fn__15334.invoke(jmx.clj:170)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
BTW. In Datastax agent configuration file (address.yaml), I have only stomp_interface parameter set to my node IP.
NO DATA in this case was caused by bug in Datastax agent. I've included related error in my question. The same error is also mentioned here: Opsagent UnsupportedOperationException with PersistentHashMap
After upgrading from Datastax Opscenter 5.2.0 to 5.2.1 and upgrading agent to the same version, problem went away.
Thank you #phact for our help!
I had a similar error before and it was due to a wrongly configured address.yaml.
In my case the issue was that I only wrote a simple IP address in the hosts setting and it was fixed by using an array syntax instead. It seemed that without that, the agent was using a wrong data type, which then issued an UnsupportedOperationException. Try to add your local interface IP (or whatever IP your cassandra node is listening on on your machine) to the hosts setting like so in address.yaml:
hosts: ["<local-node-ip>"]
Maybe also try adding the following settings to address.yaml, just to be sure that the correct parameters are used by the agent:
rpc_address: <local-node-ip>
agent_rpc_interface: <local-node-ip>

Resources