spring xd losing messages when processing huge volume - spring-integration

I am using spring xd My stream looks like below and running tests on 3 node container with 1 admin node with rabbit as transport
aws-s3|processor1|http-client|processor2>queue:readyQueue
I have created below tap.
tap1 aws-s3>s3Queue
tap2 processor1>processorQueue1
tap3 http-client>httpQueue
I run below scenarios in my tests:
Scenario1: 5 files of 200k =1 Million records
concurrency of http-client=70 and processor2=30
I see 900k message s3Queue
I see 889k message processorQueue1
I see 886k message httpQueue
I see 883k message processorQueue2
Messages are lost everywhere and its random
Scenario2:
5 files of 200k =1 Million records and all module concurrency=1
I see 998800 message s3Queue
I see 998760 message processorQueue1
I see 997540 message httpQueue
I see 997530 message processorQueue2
Even this number is random and not consistent
Scenario3
I changed stream as below and concurrency=1 and 5 files of 200k =1 Million records
aws-s3 >testQueue
I get all my messages I run 3 times and no issues.I get all my 1 million messages
scenario4
I changed stream as below and concurrency=1 5 files of 200k =1 Million records
aws-s3 |processor1 >testQueue2
I get all my messages I run 3 times and no issues.I get all my 1 million messages
In scenario4 and scenarion 3 data ingestion is faster and it took 5 min to process 5 million faster and ingestion was faster in rabbit transport queue like 5k msg per sec
In scenario 1 data ingestion was slower even s3 module was pulling the data very slow like 300 to 1000 msg per sec
In scenario 2 s3 pulled data faster but http client was slow like 100 msg per sec but aws-s3 pulled data fast like 3-4k msg per sec.
I am thinking like seeing xd threading is causing issues and i am losing messages.Please can you help me how to solve this issue.
update
Scenario 5
I changed reply-timeout to -1 in http client and then
I lost only 37 msgs
Now again I run 2nd iteration I lost 25000 msgs i see the bellowing containers log when that happened
2016-03-04T03:42:04-0500 1.2.1.RELEASE ERROR task-scheduler-7 handler.LoggingHandler - org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint#b6700b1]; nested exception is org.springframework.amqp.AmqpIOException: java.io.IOException
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:84)
at org.springframework.xd.dirt.integration.rabbit.RabbitMessageBus$SendingHandler.handleMessageInternal(RabbitMessageBus.java:891)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:116)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:101)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:97)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:77)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:287)
at org.springframework.integration.channel.interceptor.WireTap.preSend(WireTap.java:129)
at org.springframework.integration.channel.AbstractMessageChannel$ChannelInterceptorList.preSend(AbstractMessageChannel.java:392)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:282)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:245)
at sun.reflect.GeneratedMethodAccessor204.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.integration.monitor.DirectChannelMetrics.monitorSend(DirectChannelMetrics.java:114)
at org.springframework.integration.monitor.DirectChannelMetrics.doInvoke(DirectChannelMetrics.java:98)
at org.springframework.integration.monitor.DirectChannelMetrics.invoke(DirectChannelMetrics.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy1537.send(Unknown Source)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:115)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:45)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:95)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutput(AbstractMessageProducingHandler.java:231)
at org.springframework.integration.handler.AbstractMessageProducingHandler.produceOutput(AbstractMessageProducingHandler.java:154)
at org.springframework.integration.splitter.AbstractMessageSplitter.produceOutput(AbstractMessageSplitter.java:157)
at org.springframework.integration.handler.AbstractMessageProducingHandler.sendOutputs(AbstractMessageProducingHandler.java:102)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:105)
Caused by: org.springframework.amqp.AmqpIOException: java.io.IOException
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:63)
at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:51)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createBareChannel(CachingConnectionFactory.java:758)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.access$300(CachingConnectionFactory.java:747)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.doCreateBareChannel(CachingConnectionFactory.java:419)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:395)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:364)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:357)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$1100(CachingConnectionFactory.java:75)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:763)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createChannel(ConnectionFactoryUtils.java:85)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:134)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:67)
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1035)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1028)
at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:540)
at org.springframework.amqp.rabbit.core.RabbitTemplate.convertAndSend(RabbitTemplate.java:635)
at org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint.send(AmqpOutboundEndpoint.java:331)
at org.springframework.integration.amqp.outbound.AmqpOutboundEndpoint.handleRequestMessage(AmqpOutboundEndpoint.java:323)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:99)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
... 93 more
Caused by: java.io.IOException
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:106)
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:102)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:124)
at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:125)
at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:134)
at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:499)
at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:44)
... 112 more
Caused by: com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:33)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:348)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:221)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118)
... 116 more
Caused by: com.rabbitmq.client.impl.UnknownChannelException: Unknown channel number 23364
at com.rabbitmq.client.impl.ChannelManager.getChannel(ChannelManager.java:80)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:552)
... 1 more
2016-03-04T03:42:05-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'xdbus.tap-s3.tap:stream:stream.batch-aws-s3-source.0' in vhost '/', class-id=50, method-id=20)
2016-03-04T03:53:13-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T03:53:13-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:5672 connection.CachingConnectionFactory - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'xdbus.tap-s3.tap:stream:stream.batch-aws-s3-source.0' in vhost '/', class-id=50, method-id=20)
~
2016-03-04T02:57:54-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:8080 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T02:57:55-0500 1.2.1.RELEASE ERROR AMQP Connection xxx:8080 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-03-04T03:42:04-0500 1.2.1.RELEASE ERROR AMQP Connection yyy:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
Updated
I found the issue for message loses when this exception happens i see lot of msg lost.This pattern i tested multiple time.Everytime this exception happens i see msg lost.Also bumping up concurrency makes this issue to occur often.
2016-03-05T13:59:41-0500 1.2.1.RELEASE ERROR AMQP Connection host1:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
rabbit configuration
spring:
rabbitmq:
addresses: host1:5672,host2:5672,host3:5672
adminAddresses: http://host1:15672,http://host2:15672,http://host3:15672
nodes: rabbit#host1.test.com,rabbit#host2.test.com,rabbit#host2.test.com
username: test
password: test
virtual_host: /
useSSL: false
sslProperties:
updated with increasing cache size to 200
I added xml provided by you and increased cache size to 200.This is the way happens when processing 1 million and 80 k messages.Only my http client concurrency is 100 all other is 1 .Slowly processing stopped msg are still there before http-client queue and same count.But msg count in my named channel slowly increasing like 10 msg per minute but its very slow
s3-poller|processor|http-client>queue:batchCacheQueue
Msg not getting decreass in queue before http 186174.But slowly msg are coming in to batchCacheQueue
Test case to simulate:
1)I was using spring integration aws-s3 source with a splitter in composite module | processor like xml parsing |http-client with concurrency 100 >named channel.
2)I think file source might also work.Create single file of million records and try to pull this from file.
3)After some 4 to 5 run we see this exception happening

Caused by: com.rabbitmq.client.impl.UnknownChannelException: Unknown channel number 23364
We found an issue when channels are churned a lot; you need to increase the channel cache size in the rabbit caching connection factory.
See this answer for a work-around.
I opened a JIRA issue so that the next version of Spring XD will expose this setting by in servers.yml so you don't have to override the bus configuration file.

Related

CICS Error Unable to create connection mainframe system

One of my applications is integrated with the mainframe system. Through CICS / CTG. I am facing an error while executing a request.also, i have used ASN1 encoding for request
The error I am getting while executing the request
com.ibm.connector2.cics.CICSUserInputException: CTG9627E IOException occurred when writing to the Output Record
org.springframework.dao.NonTransientDataAccessResourceException: Unable to create a connection to the remote application; nested exception is com.ibm.connector2.cics.CICSUserInputException:
CTG9627E IOException occurred when writing to the Output Record
com.ibm.connector2.cics.CICSUserInputException: CTG9627E IOException occurred when writing to the Output Record
at com.ibm.connector2.cics.ECIManagedConnection.call(Unknown Source)
at com.ibm.connector2.cics.ECIConnection.call(Unknown Source)
at com.ibm.connector2.cics.ECIInteraction.execute(Unknown Source)
java.io.IOException: messagelength in header greater than existing data length - common area too short?
at com.ibm.connector2.cics.ECIManagedConnection.call(Unknown Source)
at com.ibm.connector2.cics.ECIConnection.call(Unknown Source)
at com.ibm.connector2.cics.ECIInteraction.execute(Unknown Source)
i am using
cics version : c900-20160704-0205
Does anyone have any insights about this?
Error description is available at https://www.ibm.com/docs/en/cics-tg-multi/9.0?topic=SSZHFX_9.0.0/cclaj/CTG9627E.htm
It seems like the data you are passing is not a isntanceof javax.resource.cci.Streamable. Could you verify that.
Solved the issue with the below resolution
messagelength in header greater than existing data length - common area too short? as per this error message length is short so I have tried to increase length in a common area as per this documentation https://www.ibm.com/docs/en/cics-ts/5.6?topic=applications-transferring-data-between-programs-using-channels
added code in CTG Service executor >> CTG Record
setCommonAreaLength(32500)
After applying this resolution issue is resolved
Hope some one helps this ans

How to fix unexpected hazelcast client shutdown

I'm using hazelcast jet to perform aggregations on stream data. Problem is, that hazelcast cliend shutsdown unexpectedly.
I've implemented simple pipeline with remote map source and then the result is simply sinked.
// init pipeline
Pipeline p = Pipeline.create();
// configure source
BatchSource remoteBatchMap = Sources.remoteMap(<my remote map>, <my config>);
// add source and sink to pipeline
p.drawFrom(remoteBatchMap).drainTo(Sinks.map(SINK_MAP_NAME));
On a client side, output is as expected for the first cca. 30 seconds. Then shutdown happens, and further on, those printed values freezes. Ok, that is logical, as it has been shut down. But, how to prevent shutdown?
2019-07-25 14:22:18,214 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 254/41254
2019-07-25 14:22:19,359 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 262/41254
2019-07-25 14:22:20,496 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 269/41259
2019-07-25 14:22:20,786 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTTING_DOWN
2019-07-25 14:22:20,791 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.3]:5701, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/192.168.26.78:64217->/192.168.41.3:5701}, remoteEndpoint=[192.168.41.3]:5701, lastReadTime=2019-07-25 14:22:19.980, lastWriteTime=2019-07-25 14:22:19.855, closedTime=2019-07-25 14:22:20.789, connected server version=3.12.1}
2019-07-25 14:22:20,794 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.4]:5701, connection: ClientConnection{alive=false, connectionId=2, channel=NioChannel{/192.168.26.78:64218->/192.168.41.4:5701}, remoteEndpoint=[192.168.41.4]:5701, lastReadTime=2019-07-25 14:22:20.525, lastWriteTime=2019-07-25 14:22:20.376, closedTime=2019-07-25 14:22:20.793, connected server version=3.12.1}
2019-07-25 14:22:20,797 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTDOWN
2019-07-25 14:22:20,802 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] [192.168.1.66]:5701 [jet] [3.1] Execution of job '8dc4-d1e2-df66-a444', execution 9622-ba74-b907-150c completed in 42,335 ms
2019-07-25 14:22:21,635 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:22,771 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:23,909 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
On server side it says that connection is closed by the other side - so, my client side:
2019-07-25 14:22:21.909 INFO 21375 --- [hz.betex.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [192.168.41.3]:5701 [app] [3.1] Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT] closed. Reason: Connection closed by the other side
2019-07-25 14:22:21.910 INFO 21375 --- [hz.betex.event-14] c.h.client.impl.ClientEndpointManager : [192.168.41.3]:5701 [app] [3.1] Destroying ClientEndpoint{connection=Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT], principal='ClientPrincipal{uuid='c5286586-cbe2-4c84-8e74-4c2f1f59310a', ownerUuid='ebce22c4-ed31-4ccf-9808-b19005dc55f8'}, ownerConnection=true, authenticated=true, clientVersion=3.12.1, creationTime=1564057300564, latest statistics=null}
I'd be very happy to get some orientation and ideas, where to look for a problem.
If you haven't already wrapped the code in a try/catch, I'd try that. I remember running into something similar but can't recall the root cause; it may have been a ClassCastException or something serialization-related. There wasn't any clue in the output but once I added the try/catch and dumped a stack trace the issue was obvious.
The cluster is independent from the clients. Jet client can be used to submit a job and to monitor it, but if the client shuts down, the cluster isn't affected and the job continues to run.
You don't share your code, but you probably shut down the client yourself. You need to fix your code.

Failing Cassandra INSERT and UPDATE statements - Unexpected Exception

I've recently been getting the following error on the system.log files on both my production and demo clusters. Each cluster has 2 nodes and replication factor is 2. No changes have been made to my knowledge. I cannot figure out what the reason behind the error is. It is causing INSERT and UPDATE statements to fail.
[SharedPool-Worker-27] ERROR org.apache.cassandra.transport.Message - Unexpected exception during request; channel = [id: 0xeb429d31, /14.0.0.1:34495 => /14.0.0.2:9042]
java.lang.AssertionError: -2146739295
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:46) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.BufferExpiringCell.<init>(BufferExpiringCell.java:39) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.db.AbstractCell.create(AbstractCell.java:176) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.UpdateParameters.makeColumn(UpdateParameters.java:65) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.Constants$Setter.execute(Constants.java:314) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:110) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:57) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:708) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:495) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:481) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138) ~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.10.jar:2.1.10]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
These are async requests. On the client side I'm seeing a fail on the future as well.I'm using cassandra-2.1.10.I haven't done a rolling restart of the nodes yet, but I don't think that will fix the problem.
Also noticed that it seems like the failed inserts/updates happen after a few successful inserts/updates. The request statements themselves (formatting) are fine. Any help would be appreciated.
UPDATE: I've looked into the cassandra source code. It contains the following:
assert timeToLive > 0 : timeToLive;
assert localExpirationTime > 0 : localExpirationTime;
Looks like it's failing on the second assert statement. The table has a TTL value set in its properties for 1728000 seconds. No ttl is being set in the insert/update statements. So I don't understand why some of the statements are failing on this assert.
EDIT: on the client applications I see the following error messages:
Client 1 connects to cluster 1:
16:36:01.102 [New I/O worker #64] WARN - /14.0.0.2:9042 replied with server error (java.lang.AssertionError: -2146571535), trying next host
Client 2 connects to cluster 2:
16:30:01.302 [cluster1-nio-worker-7] WARN - /14.0.0.4:9042 replied with server error (java.lang.AssertionError: -2146571895), defuncting connection.
I believe what is happening is when the above errors happen the client drops the connection and reconnects. During this time other async requests fail.
One of the tables had its 'default_time_to_live' set to about 19 years. The reason behind the problem was the 2038 timestamp problmem. Even though the ttl value itself on each cell is the number of seconds left, seems like cassandra internally tries to convert the expiry time into a timestamp. So current timestamp + ttl (19+) years = a timestamp beyond 19 January, 2038. This was causing the overflow and the negative number seen in the exception shown above. Reducing the default ttl value on the table fixed the problem stopped the assertion error from happening.
Seemed like a few assertion errors, would cause the connections to reset and in the meanwhile other writes would fail.

Logstash 6.2 - full persistent queue (wrong mapping?)

My queue is almost full and I see this errors in my log file:
[2018-05-16T00:01:33,334][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"2018.05.15-el-mg_papi-prod", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x608d85c1>], :response=>{"index"=>{"_index"=>"2018.05.15-el-mg_papi-prod", "_type"=>"doc", "_id"=>"mHvSZWMB8oeeM9BTo0V2", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [papi_request_json.query.disableFacets]", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"Current token (VALUE_TRUE) not numeric, can not use numeric value accessors\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#56b8442f; line: 1, column: 555]"}}}}}
[2018-05-16T00:01:37,145][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 69
[2018-05-16T00:01:37,147][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 84
...
[2018-05-16T15:28:09,981][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
If I understand first warning, problem is with mapping. I have a lot of files in my queue Logstash folder. My questions is:
How to empty my queue, can i just delete all files from logstash queue folder(all logs will be lost)? And then resend all the data to logstash to proper index?
How can I determine where exactly is problem in mapping, or which servers sending data of wrong type?
I have pipeline on port 5000 named testing-pipeline just for checking if Logstash is active from nagios. What is that [INFO ][org.logstash.beats.BeatsHandler] logs?
If I understand correctly, [INFO ][logstash.outputs.elasticsearch] is just logs about retrying to process logstash queue?
On all servers is FIlebeat 6.2.2. Thank you for your help.
All pages in queue could be deleted but it is not the proper solution. In my case, the queue was full because there was events with different mapping of index. In Elasticsearch 6, you cannot send documents with different mapping to the same index so the logs stacked in queue because of this logs (even if there is only one wrong event, all others will not be processed). So how to process all data you can process an skip the wrong one? Solution is to configure DLQ (dead letter queue). Every event with response code 400 or 404 is moved to DLQ so others could be process. The data from DLQ can be processed later with pipeline.
Wrong mapping can be determined by error log "error"=>{"type"=>"mapper_parsing_exception", ..... }. To specify exact place with wrong mapping, you have to compare mapping of events and indices.
[INFO ][org.logstash.beats.BeatsHandler] was caused by Nagios server. The check did not consist of valid request, that's why the Handling exception. The check should test if Logstash service is active. Now I check Logstas service on localhost:9600, for more info here.
[INFO ][logstash.outputs.elasticsearch] means that Logstash trying to process the queue but index is locked ([FORBIDDEN/12/index read-only / allow delete (api)]) because the indices was set to read-only state. Elasticsearch, when there is not enough space on server, automatically configure indices to read-only. This can be change by cluster.routing.allocation.disk.watermark.low, for more info here.

kafka throwing error suddenly for one particular topic which has worked previously

when iam pushing data the following topic throwing the error, while the other topics are working fine. please let me know the issue.Thanks
WARN Failed to send producer request with correlation id 2 to broker 1003 with data for partitions [TelematicsEvent,0] (kafka.producer.async.DefaultEventHandler)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:122)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:77)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:76)
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:107)
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:107)
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:107)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:106)
at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:106)
at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:106)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer.send(SyncProducer.scala:105)
at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:257)
at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:108)
at kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:100)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:100)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
at scala.collection.immutable.Stream.foreach(Stream.scala:547)
at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45)

Resources