Failing Cassandra INSERT and UPDATE statements - Unexpected Exception - cassandra

I've recently been getting the following error on the system.log files on both my production and demo clusters. Each cluster has 2 nodes and replication factor is 2. No changes have been made to my knowledge. I cannot figure out what the reason behind the error is. It is causing INSERT and UPDATE statements to fail.
[SharedPool-Worker-27] ERROR org.apache.cassandra.transport.Message - Unexpected exception during request; channel = [id: 0xeb429d31, /14.0.0.1:34495 => /14.0.0.2:9042]
java.lang.AssertionError: -2146739295
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:46) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:39) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.AbstractCell.create(AbstractCell.java:176) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.UpdateParameters.makeColumn(UpdateParameters.java:65) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.Constants$Setter.execute(Constants.java:314) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:110) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:57) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:708) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:495) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:481) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.10.jar:2.1.10]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
These are async requests. On the client side I'm seeing a fail on the future as well.I'm using cassandra-2.1.10.I haven't done a rolling restart of the nodes yet, but I don't think that will fix the problem.
Also noticed that it seems like the failed inserts/updates happen after a few successful inserts/updates. The request statements themselves (formatting) are fine. Any help would be appreciated.
UPDATE: I've looked into the cassandra source code. It contains the following:
assert timeToLive > 0 : timeToLive;
assert localExpirationTime > 0 : localExpirationTime;
Looks like it's failing on the second assert statement. The table has a TTL value set in its properties for 1728000 seconds. No ttl is being set in the insert/update statements. So I don't understand why some of the statements are failing on this assert.
EDIT: on the client applications I see the following error messages:
Client 1 connects to cluster 1:
16:36:01.102 [New I/O worker #64] WARN - /14.0.0.2:9042 replied with server error (java.lang.AssertionError: -2146571535), trying next host
Client 2 connects to cluster 2:
16:30:01.302 [cluster1-nio-worker-7] WARN - /14.0.0.4:9042 replied with server error (java.lang.AssertionError: -2146571895), defuncting connection.
I believe what is happening is when the above errors happen the client drops the connection and reconnects. During this time other async requests fail.

One of the tables had its 'default_time_to_live' set to about 19 years. The reason behind the problem was the 2038 timestamp problmem. Even though the ttl value itself on each cell is the number of seconds left, seems like cassandra internally tries to convert the expiry time into a timestamp. So current timestamp + ttl (19+) years = a timestamp beyond 19 January, 2038. This was causing the overflow and the negative number seen in the exception shown above. Reducing the default ttl value on the table fixed the problem stopped the assertion error from happening.
Seemed like a few assertion errors, would cause the connections to reset and in the meanwhile other writes would fail.

Related

Google Cloud Spanner not implementing check constraint

When trying to create a table with gcloud CLI (using the spanner emulator)
gcloud spanner databases ddl update db_name --instance=instance_name --ddl=$data
I'm attempting to add a check constraint where $data evaluates to:
CREATE TABLE my_table (
my_key STRING(36),
some_column STRING(100),
CONSTRAINT ck_my_table_some_column_enum CHECK (some_column = 'this' OR some_column = 'that'),
) PRIMARY KEY (my_key)
and keep getting the following error:
ERROR: (gcloud.spanner.databases.ddl.update) HttpError accessing <http://localhost:9020/v1/projects/spanner-emulator-test/instances/instance_name/databases/db_name/ddl?alt=json>: response: <{'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 31 Aug 2021 22:53:52 GMT', 'transfer-encoding': 'chunked', 'status': 501}>, content <{"error":"Check Constraint is not implemented.","code":12,"message":"Check Constraint is not implemented."}>
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.
First inclination might be to verify the emulator is running given the last part of the message. It is. And I've tried immediately executing the same DDL without the check constraint and it works perfectly - table is created.
DBeaver
I've also tried creating this table using the DBeaver db tool and get the following error:
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [12]: UNIMPLEMENTED: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Check Constraint is not implemented.
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:133)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:513)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:444)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:431)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:816)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:3440)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:118)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:116)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:4718)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: com.google.cloud.spanner.jdbc.JdbcSqlExceptionFactory$JdbcSqlExceptionImpl: UNIMPLEMENTED: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Check Constraint is not implemented.
at com.google.cloud.spanner.jdbc.JdbcSqlExceptionFactory.of(JdbcSqlExceptionFactory.java:222)
at com.google.cloud.spanner.jdbc.AbstractJdbcStatement.execute(AbstractJdbcStatement.java:259)
at com.google.cloud.spanner.jdbc.JdbcStatement.executeStatement(JdbcStatement.java:110)
at com.google.cloud.spanner.jdbc.JdbcStatement.execute(JdbcStatement.java:106)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:330)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:130)
... 12 more
Caused by: com.google.cloud.spanner.SpannerException: UNIMPLEMENTED: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Check Constraint is not implemented.
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:284)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:61)
at com.google.cloud.spanner.SpannerExceptionFactory.fromApiException(SpannerExceptionFactory.java:299)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:174)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:110)
at com.google.cloud.spanner.DatabaseAdminClientImpl.lambda$updateDatabaseDdl$7(DatabaseAdminClientImpl.java:337)
at com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction.apply(ApiFutures.java:240)
at com.google.common.util.concurrent.AbstractCatchingFuture$CatchingFuture.doFallback(AbstractCatchingFuture.java:223)
at com.google.common.util.concurrent.AbstractCatchingFuture$CatchingFuture.doFallback(AbstractCatchingFuture.java:211)
at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:124)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:724)
at com.google.common.util.concurrent.FluentFuture$TrustedFuture.addListener(FluentFuture.java:110)
at com.google.common.util.concurrent.ForwardingListenableFuture.addListener(ForwardingListenableFuture.java:45)
at com.google.api.core.ApiFutureToListenableFuture.addListener(ApiFutureToListenableFuture.java:52)
at com.google.common.util.concurrent.AbstractCatchingFuture.create(AbstractCatchingFuture.java:41)
at com.google.common.util.concurrent.Futures.catching(Futures.java:297)
at com.google.api.core.ApiFutures.catching(ApiFutures.java:99)
at com.google.api.gax.longrunning.OperationFutureImpl.<init>(OperationFutureImpl.java:97)
at com.google.cloud.spanner.DatabaseAdminClientImpl.updateDatabaseDdl(DatabaseAdminClientImpl.java:335)
at com.google.cloud.spanner.connection.DdlClient.executeDdl(DdlClient.java:89)
at com.google.cloud.spanner.connection.DdlClient.executeDdl(DdlClient.java:84)
at com.google.cloud.spanner.connection.SingleUseTransaction.lambda$executeDdlAsync$1(SingleUseTransaction.java:272)
at io.grpc.Context$2.call(Context.java:596)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Suppressed: com.google.cloud.spanner.connection.AbstractBaseUnitOfWork$SpannerAsyncExecutionException: Execution failed for statement: CREATE TABLE my_table (
my_key STRING(36),
some_column STRING(100),
CONSTRAINT ck_my_table_some_column_enum CHECK (some_column = 'this' OR some_column = 'that'),
) PRIMARY KEY (my_key)
at com.google.cloud.spanner.connection.AbstractBaseUnitOfWork.executeStatementAsync(AbstractBaseUnitOfWork.java:233)
at com.google.cloud.spanner.connection.AbstractBaseUnitOfWork.executeStatementAsync(AbstractBaseUnitOfWork.java:145)
at com.google.cloud.spanner.connection.SingleUseTransaction.executeDdlAsync(SingleUseTransaction.java:281)
at com.google.cloud.spanner.connection.ConnectionImpl.executeDdlAsync(ConnectionImpl.java:1157)
at com.google.cloud.spanner.connection.ConnectionImpl.execute(ConnectionImpl.java:803)
at com.google.cloud.spanner.jdbc.AbstractJdbcStatement.execute(AbstractJdbcStatement.java:250)
at com.google.cloud.spanner.jdbc.JdbcStatement.executeStatement(JdbcStatement.java:110)
at com.google.cloud.spanner.jdbc.JdbcStatement.execute(JdbcStatement.java:106)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:330)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:130)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:513)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:444)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:431)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:816)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:3440)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:118)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:171)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:116)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:4718)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Check Constraint is not implemented.
at io.grpc.Status.asRuntimeException(Status.java:535)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose(SpannerErrorInterceptor.java:100)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:557)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:738)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:717)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
... 3 more
Check constraints is available in the latest version of emulator. Can you please check and get back to us if there are issues.
https://console.cloud.google.com/gcr/images/cloud-spanner-emulator/global/emulator#sha256:ef0fd2ec74bb17b6c31e5010fb699fd0009b9829721d5159e6a11b6a40f881f1/details
Well... it looks like check constraints don't seem to work with the emulator. I tried this with a test db in the console and it worked. If anyone comes up with a solution to this, however, or if there's something I need to configure, please let me know! Thank you.

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

I am trying to query pinot table data using presto, below are my configuration details.
started Pinot is one of the sit server.i.e. 10.184.160.52
Controller: 10.184.160.52:9000
server: 10.184.160.52:7000
broker: 10.184.160.52:8000
I have Presto on different server Ports are open b/w these 2 servers. i.e.10.184.160.53
Created One pinot.properties file inside presto/etc/catalog/pinot.properties.
connector.name=pinot
pinot.controller-urls=Controller_Host:9000
bin/launcher run ---> Loaded Pinot catalog.
Started Prestro with Pinot Segment.
./presto --server 10.184.160.53:8080 --catalog pinot
show catalogs;(able to see my Catalog)
pinot
show schemas; (able to see sachema also)
presto> show schemas;
Schema
--------------------
default
presto> use default;
USE
presto:default> show tables;----(able to see pinot tables:)
Table
------------------------------
test
test2
test3
(3 rows)
Query 20210519_124218_00061_vcz4u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 98B] [10 rows/s, 340B/s]
but when I am doing select * from test ; its showing broker not found
presto:default> select * from test;
Query 20210519_124230_00062_vcz4u failed: No valid brokers found for test
Complete Presto Logs:
Error Code PINOT_UNABLE_TO_FIND_BROKER (84213767)
Stack Trace
io.prestosql.pinot.PinotException: No valid brokers found for test
at io.prestosql.pinot.client.PinotClient.getBrokerHost(PinotClient.java:285)
at io.prestosql.pinot.client.PinotClient.sendHttpGetToBrokerJson(PinotClient.java:185)
at io.prestosql.pinot.client.PinotClient.getRoutingTableForTable(PinotClient.java:302)
at io.prestosql.pinot.PinotSplitManager.generateSplitsForSegmentBasedScan(PinotSplitManager.java:72)
at io.prestosql.pinot.PinotSplitManager.getSplits(PinotSplitManager.java:167)
at io.prestosql.split.SplitManager.getSplits(SplitManager.java:87)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:203)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:185)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:156)
at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:143)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:131)
at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:101)
at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:470)
at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:386)
at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:237)
at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
at io.prestosql.$gen.Presto_350____20210519_105836_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I am not able to understand what is happening here why this issue is showing for select statement.Looks like pinot broker is not accepting queries.someOne Kindly Suggest, What is the issue here.
Update: This is because the connector does not support mixed case table names. Mixed case column names are supported. There is a pull request to add support for mixed case table names: https://github.com/trinodb/trino/pull/7630

FileNotFoundException: Spark save fails. Cannot clear cache from Dataset[T] avro

I get the following error when saving a dataframe in avro for a second time. If I delete sub_folder/part-00000-XXX-c000.avro after saving, and then try to save the same dataset, I get the following:
FileNotFoundException: File /.../main_folder/sub_folder/part-00000-3e7064c0-4a82-424c-80ca-98ce75766972-c000.avro does not exist. It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
If I delete not only from sub_folder, but also from main_folder, then the problem doesn't happen, but I can't afford that.
The problem actually doesnt happen when trying to save the dataset in any
other format.
Saving an empty dataset does not cause an error.
The example suggests that the tables need to be refreshed, but as the output of sparkSession.catalog.listTables().show() there are no tables to refresh.
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+
The previously saved dataframe looks like this. The application is supposed to update it:
+--------------------+--------------------+
| Col1 | Col2 |
+--------------------+--------------------+
|[123456, , ABC, [...|[[v1CK, RAWNAME1_,..|
|[123456, , ABC, [...|[[BG8M, RAWNAME2_...|
+--------------------+--------------------+
For me this is a clear cache problem. However, all attemps of clearing the cache have failed:
dataset.write
.format("avro")
.option("path", path)
.mode(SaveMode.Overwrite) // Any save mode gives the same error
.save()
// Moving this either before or after saving doesnt help.
sparkSession.catalog.clearCache()
// This will not un-persist any cached data that is built upon this Dataset.
dataset.cache().unpersist()
dataset.unpersist()
And this is how I read the dataset:
private def doReadFromPath[T <: SpecificRecord with Product with Serializable: TypeTag: ClassTag](path: String): Dataset[T] = {
val df = sparkSession.read
.format("avro")
.load(path)
.select("*")
df.as[T]
}
Finally the stack trace is this one. Thanks a lot for your help!:
ERROR [task-result-getter-3] (Logging.scala:70) - Task 0 in stage 9.0 failed 1 times; aborting job
ERROR [main] (Logging.scala:91) - Aborting job 150de02a-ac6a-4d42-824d-5db44a98c19a.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 11, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:254)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:168)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File file:/DATA/XXX/main_folder/sub_folder/part-00000-3e7064c0-4a82-424c-80ca-98ce75766972-c000.avro does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:241)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
... 10 more
*Reading from the same location and writing in to same location will give this issue. it was also discussed in this forum. along with my answer there *
and the below message in the error will mis lead. but actual issue is read/write from/in the same location.
You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL
I am giving another example other than yours (used parquet in your case avro).
I have 2 options for you.
Option 1 (cache and show will work like below...) :
import org.apache.spark.sql.functions._
val df = Seq((1, 10), (2, 20), (3, 30)).toDS.toDF("sex", "date")
df.show(false)
df.repartition(1).write.format("parquet").mode("overwrite").save(".../temp") // save it
val df1 = spark.read.format("parquet").load(".../temp") // read back again
val df2 = df1.withColumn("cleanup" , lit("Rod want to cleanup")) // like you said you want to clean it.
//BELOW 2 ARE IMPORTANT STEPS LIKE `cache` and `show` forcing a light action show(1) with out which FileNotFoundException will come.
df2.cache // cache to avoid FileNotFoundException
df2.show(2, false) // light action to avoid FileNotFoundException
// or println(df2.count) // action
df2.repartition(1).write.format("parquet").mode("overwrite").save(".../temp")
println("Rod saved in same directory where he read it from final records he saved after clean up are ")
df2.show(false)
Option 2 :
1) save the DataFrame with a different avro folder.
2) Delete the old avro folder.
3) Finally rename this newly created avro folder to the old name, will work.
Thanks a lot Ram Ghadiyaram!
The solution had 2 solved my problem but only in my local Ubuntu. When I tested in HDFS, the problem remained.
The solution 1 was the definite fix. This is how my code looks now:
private def doWriteToPath[T <: Product with Serializable: TypeTag: ClassTag](dataset: Dataset[T], path: String): Unit = {
// clear any previously cached avro
sparkSession.catalog.clearCache()
// update the cache for this particular dataset, and trigger an action
dataset.cache().show(1)
dataset.write
.format("avro")
.option("path", path)
.mode(SaveMode.Overwrite)
.save()
}
Some remarks:
I had indeed checked that post, and attempted unsuccessfully the solution. I discarded that to be my problem, for the following reasons:
I had created a /temp under 'main_folder', called 'sub_folder_temp', and saving still failed.
Saving the same non-empty dataset in the same path but in json format actually works without the workaround discussed here.
Saving an empty dataset with the same type [T] in the same path actually works without the workaround discussed here.

spring xd losing messages when processing huge volume

I am using spring xd My stream looks like below and running tests on 3 node container with 1 admin node with rabbit as transport
aws-s3|processor1|http-client|processor2>queue:readyQueue
I have created below tap.
tap1 aws-s3>s3Queue
tap2 processor1>processorQueue1
tap3 http-client>httpQueue
I run below scenarios in my tests:
Scenario1: 5 files of 200k =1 Million records
concurrency of http-client=70 and processor2=30
I see 900k message s3Queue
I see 889k message processorQueue1
I see 886k message httpQueue
I see 883k message processorQueue2
Messages are lost everywhere and its random
Scenario2:
5 files of 200k =1 Million records and all module concurrency=1
I see 998800 message s3Queue
I see 998760 message processorQueue1
I see 997540 message httpQueue
I see 997530 message processorQueue2
Even this number is random and not consistent
Scenario3
I changed stream as below and concurrency=1 and 5 files of 200k =1 Million records
aws-s3 >testQueue
I get all my messages I run 3 times and no issues.I get all my 1 million messages
scenario4
I changed stream as below and concurrency=1 5 files of 200k =1 Million records
aws-s3 |processor1 >testQueue2
I get all my messages I run 3 times and no issues.I get all my 1 million messages
In scenario4 and scenarion 3 data ingestion is faster and it took 5 min to process 5 million faster and ingestion was faster in rabbit transport queue like 5k msg per sec
In scenario 1 data ingestion was slower even s3 module was pulling the data very slow like 300 to 1000 msg per sec
In scenario 2 s3 pulled data faster but http client was slow like 100 msg per sec but aws-s3 pulled data fast like 3-4k msg per sec.
I am thinking like seeing xd threading is causing issues and i am losing messages.Please can you help me how to solve this issue.
update
Scenario 5
I changed reply-timeout to -1 in http client and then
I lost only 37 msgs
Now again I run 2nd iteration I lost 25000 msgs i see the bellowing containers log when that happened
2016-03-04T03:42:04-0500 1.2.1.RELEASE ERROR task-scheduler-7 handler.LoggingHandler - org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint#b6700b1]; nested exception is org.springframework.amqp.AmqpIOException: java.io.IOException
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:84)
at org.springframework.xd.dirt.integration.rabbit.RabbitMessageBus$SendingHandler.handleMessageInternal(RabbitMessageBus.java:891)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:116)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:101)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:97)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:77)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:287)
at org.springframework.integration.channel.interceptor.WireTap.preSend(WireTap.java:129)
at org.springframework.integration.channel.AbstractMessageChannel$ChannelInterceptorList.preSend(AbstractMessageChannel.java:392)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:282)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:245)
at sun.reflect.GeneratedMethodAccessor204.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.integration.monitor.DirectChannelMetrics.monitorSend(DirectChannelMetrics.java:114)
at org.springframework.integration.monitor.DirectChannelMetrics.doInvoke(DirectChannelMetrics.java:98)
at org.springframework.integration.monitor.DirectChannelMetrics.invoke(DirectChannelMetrics.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy1537.send(Unknown Source)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:115)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:45)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:95)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutput(AbstractMessageProducingHandler.java:231)
at org.springframework.integration.handler.AbstractMessageProducingHandler.produceOutput(AbstractMessageProducingHandler.java:154)
at org.springframework.integration.splitter.AbstractMessageSplitter.produceOutput(AbstractMessageSplitter.java:157)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutputs(AbstractMessageProducingHandler.java:102)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:105)
Caused by: org.springframework.amqp.AmqpIOException: java.io.IOException
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:63)
at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:51)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createBareChannel(CachingConnectionFactory.java:758)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.access$300(CachingConnectionFactory.java:747)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.doCreateBareChannel(CachingConnectionFactory.java:419)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:395)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:364)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:357)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$1100(CachingConnectionFactory.java:75)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:763)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createChannel(ConnectionFactoryUtils.java:85)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:134)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:67)
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1035)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1028)
at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:540)
at org.springframework.amqp.rabbit.core.RabbitTemplate.convertAndSend(RabbitTemplate.java:635)
at org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint.send(AmqpOutboundEndpoint.java:331)
at org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint.handleRequestMessage(AmqpOutboundEndpoint.java:323)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:99)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
... 93 more
Caused by: java.io.IOException
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:106)
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:102)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:124)
at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:125)
at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:134)
at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:499)
at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:44)
... 112 more
Caused by: com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:33)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:348)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:221)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118)
... 116 more
Caused by: com.rabbitmq.client.impl.UnknownChannelException: Unknown channel number 23364
at com.rabbitmq.client.impl.ChannelManager.getChannel(ChannelManager.java:80)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:552)
... 1 more
2016-03-04T03:42:05-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'xdbus.tap-s3.tap:stream:stream.batch-aws-s3-source.0' in vhost '/', class-id=50, method-id=20)
2016-03-04T03:53:13-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T03:53:13-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'xdbus.tap-s3.tap:stream:stream.batch-aws-s3-source.0' in vhost '/', class-id=50, method-id=20)
~
2016-03-04T02:57:54-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:8080 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T02:57:55-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:8080 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T03:42:04-0500 1.2.1.RELEASE ERROR AMQP Connection yyy:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
Updated
I found the issue for message loses when this exception happens i see lot of msg lost.This pattern i tested multiple time.Everytime this exception happens i see msg lost.Also bumping up concurrency makes this issue to occur often.
2016-03-05T13:59:41-0500 1.2.1.RELEASE ERROR AMQP Connection host1:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
rabbit configuration
spring:
rabbitmq:
addresses: host1:5672,host2:5672,host3:5672
adminAddresses: http://host1:15672,http://host2:15672,http://host3:15672
nodes: rabbit#host1.test.com,rabbit#host2.test.com,rabbit#host2.test.com
username: test
password: test
virtual_host: /
useSSL: false
sslProperties:
updated with increasing cache size to 200
I added xml provided by you and increased cache size to 200.This is the way happens when processing 1 million and 80 k messages.Only my http client concurrency is 100 all other is 1 .Slowly processing stopped msg are still there before http-client queue and same count.But msg count in my named channel slowly increasing like 10 msg per minute but its very slow
s3-poller|processor|http-client>queue:batchCacheQueue
Msg not getting decreass in queue before http 186174.But slowly msg are coming in to batchCacheQueue
Test case to simulate:
1)I was using spring integration aws-s3 source with a splitter in composite module | processor like xml parsing |http-client with concurrency 100 >named channel.
2)I think file source might also work.Create single file of million records and try to pull this from file.
3)After some 4 to 5 run we see this exception happening
Caused by: com.rabbitmq.client.impl.UnknownChannelException: Unknown channel number 23364
We found an issue when channels are churned a lot; you need to increase the channel cache size in the rabbit caching connection factory.
See this answer for a work-around.
I opened a JIRA issue so that the next version of Spring XD will expose this setting by in servers.yml so you don't have to override the bus configuration file.

Cassandra ArrayIndexOutOfBoundsException

I have created the following schema to represent the association between a user and a set of threads ordered by their last message (which threads the user has read and which ones he has not):
CREATE TABLE table(user_id bigint, message_id bigint, thread_id bigint, read boolean, PRIMARY KEY(user_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
CREATE INDEX ON table(read);
After inserting some values, I try to run this query to get the most recent read or unread threads for a user:
SELECT thread_id, message_id FROM table WHERE user_id = ? AND message_id < ? AND read = ? LIMIT ?
The query works if run via cqlsh. However, when run through the datastax client, on the client side we get a timeout exception and on the server side, the Cassandra log shows this exception:
ERROR [ReadStage:4190] 2013-12-10 13:18:03,579 CassandraDaemon.java (line 187) Exception in thread Thread[ReadStage:4190,5,main]
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1940)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.cassandra.db.filter.SliceQueryFilter.start(SliceQueryFilter.java:261)
at org.apache.cassandra.db.index.composites.CompositesSearcher.makePrefix(CompositesSearcher.java:66)
at org.apache.cassandra.db.index.composites.CompositesSearcher.getIndexedIterator(CompositesSearcher.java:101)
at org.apache.cassandra.db.index.composites.CompositesSearcher.search(CompositesSearcher.java:53)
at org.apache.cassandra.db.index.SecondaryIndexManager.search(SecondaryIndexManager.java:537)
at org.apache.cassandra.db.ColumnFamilyStore.search(ColumnFamilyStore.java:1669)
at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:109)
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1423)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1936)
... 3 more
Does anyone know what the problem is? Thanks!
This bug can now be tracked at https://issues.apache.org/jira/browse/CASSANDRA-6470

Resources