We have tasks that have slowed down tremendously and we are unable to figure out where's the issue. The thread dumps always show the following
java.lang.Thread.State: RUNNABLE
at org.apache.spark.unsafe.Platform.copyMemory(Platform.java:255)
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.writeUnalignedBytes(UnsafeWriter.java:129)
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_2$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.next(WindowExec.scala:206)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.next(WindowExec.scala:122)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage18.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.fetchNextRow(WindowExec.scala:133)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.fetchNextPartition(WindowExec.scala:163)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.next(WindowExec.scala:188)
at org.apache.spark.sql.execution.window.WindowExec$$anon$1.next(WindowExec.scala:122)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage19.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage19.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
AQE is enabled. The tasks are not processing a lot of data though. How could I debug this?
Related
I have an issue wherein driver containers are being killed in such a way that the child processes (ie the spark driver) are not being cleaned up appropriately. This leaves leaked executors. However, my understanding is that if executors should lose contact with the driver they should self destruct. Indeed they claim to do so, but then... do not.
21/02/25 17:43:59 INFO TransportClientFactory: Successfully created connection to /10.178.144.144:37811 after 2 ms (0 ms spent in bootstraps)
21/02/25 17:43:59 INFO ShuffleBlockFetcherIterator: Started 8 remote fetches in 48 ms
21/02/25 17:43:59 INFO CodeGenerator: Code generated in 21.223892 ms
21/02/25 17:43:59 INFO Executor: Finished task 0.0 in stage 2.0 (TID 97). 2661 bytes result sent to driver
21/02/26 19:56:58 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver 10.205.36.205:32000 disassociated! Shutting down.
21/02/26 19:56:58 INFO MemoryStore: MemoryStore cleared
21/02/26 19:56:58 INFO BlockManager: BlockManager stopped
However, the executor docker container is still running and the executor process is still running inside. No further output.
Edit: I've tracked this down to a deadlock in a shutdown hook
"Thread-2" #26 prio=5 os_prio=0 tid=0x00007f6410231800 nid=0x2a2 waiting on condition [0x00007f63c3bf1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c05a47b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
at org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190)
at org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187)
at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324)
at org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
at org.apache.spark.executor.Executor.stop(Executor.scala:292)
at org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74)
at org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown Source)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown Source)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown Source)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
We are facing issue in our production environment with multiple blocking threads and while checking the thread dumps have found the below stack trace where in 14 blocking threads were present with same description. Any help with the below is much appreciated.
We are using weblogic 10.3.6.0 version and JDK 1.7.0_80.
ExecuteThread: '387' is blocked because ExecuteThread: '159' is already holding the lock and is long-running.
======================
"[ACTIVE] ExecuteThread: '387' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002ac72d4f4000 nid=0x530d waiting for monitor entry [0x00002ac767c38000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedBeanDefinition(AbstractBeanFactory.java:1030)
- waiting to lock <0x000000051ea30728> (a java.util.concurrent.ConcurrentHashMap)
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedBeanDefinition(AbstractBeanFactory.java:1013)
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedLocalBeanDefinition(AbstractBeanFactory.java:999)
at org.springframework.beans.factory.support.AbstractBeanFactory.isTypeMatch(AbstractBeanFactory.java:433)
..
"[ACTIVE] ExecuteThread: '159' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002ac7491f5800 nid=0x4781 runnable [0x00002ac7540fd000]
java.lang.Thread.State: RUNNABLE
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedBeanDefinition(AbstractBeanFactory.java:1030)
- locked <0x000000051ea30728> (a java.util.concurrent.ConcurrentHashMap)
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedBeanDefinition(AbstractBeanFactory.java:1013)
at org.springframework.beans.factory.support.AbstractBeanFactory.getMergedLocalBeanDefinition(AbstractBeanFactory.java:999)
..
at com.opensymphony.xwork2.spring.SpringObjectFactory.buildBean(SpringObjectFactory.java:151)
[ACTIVE] ExecuteThread: '359' for queue: 'weblogic.kernel.Default (self-tuning)'
[ACTIVE] ExecuteThread: '359' for queue: 'weblogic.kernel.Default (self-tuning)'
Stack Trace is:
java.lang.Thread.State: RUNNABLE
at oracle.jdbc.driver.OracleResultSetImpl.getString(OracleResultSetImpl.java:2818)
- locked <0x0000000566efc058> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleResultSet.getString(OracleResultSet.java:1215)
at weblogic.jdbc.wrapper.ResultSet_oracle_jdbc_driver_OracleResultSetImpl.getString(Unknown Source)
at com.ibatis.sqlmap.engine.type.StringTypeHandler.getResult(StringTypeHandler.java:35)
at com.ibatis.sqlmap.engine.mapping.result.ResultMap.getPrimitiveResultMappingValue(ResultMap.java:619)
at com.ibatis.sqlmap.engine.mapping.result.ResultMap.getResults(ResultMap.java:345)
at com.ibatis.sqlmap.engine.mapping.result.AutoResultMap.getResults(AutoResultMap.java:47)
- locked <0x00000005205e0968> (a com.ibatis.sqlmap.engine.mapping.result.AutoResultMap)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:384)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:300)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:189)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.sqlExecuteQuery(MappedStatement.java:221)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryWithCallback(MappedStatement.java:189)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryForList(MappedStatement.java:139)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForList(SqlMapExecutorDelegate.java:567)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForList(SqlMapExecutorDelegate.java:541)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForList(SqlMapSessionImpl.java:118)
at org.springframework.orm.ibatis.SqlMapClientTemplate$3.doInSqlMapClient(SqlMapClientTemplate.java:298)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:209)
at org.springframework.orm.ibatis.SqlMapClientTemplate.executeWithListResult(SqlMapClientTemplate.java:249)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForList(SqlMapClientTemplate.java:296)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForList(SqlMapClientTemplate.java:290)
at com.dms.common.masters.daos.HomeDaoImpl.populateDropDowns(Unknown Source)
at com.dms.common.masters.daos.HomeDaoImpl$$FastClassByCGLIB$$dda6d555.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:696)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:631)
at com.dms.common.masters.daos.HomeDaoImpl$$EnhancerByCGLIB$$7cce49ec.populateDropDowns(<generated>)
at com.dms.common.masters.business.HomeBusinessImpl.populateDropDowns(Unknown Source)
at com.dms.common.masters.service.HomeServiceImpl.populateDropDowns(Unknown Source)
at com.dms.common.masters.action.HomeAction.menu(Unknown Source)
at sun.reflect.GeneratedMethodAccessor54415.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.opensymphony.xwork2.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:450)
at com.opensymphony.xwork2.DefaultActionInvocation.invokeActionOnly(DefaultActionInvocation.java:289)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:252)
at org.apache.struts2.interceptor.MultiselectInterceptor.intercept(MultiselectInterceptor.java:73)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ConversionErrorInterceptor.intercept(ConversionErrorInterceptor.java:138)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ParametersInterceptor.doIntercept(ParametersInterceptor.java:254)
at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:98)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.StaticParametersInterceptor.intercept(StaticParametersInterceptor.java:191)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at org.apache.struts2.interceptor.CheckboxInterceptor.intercept(CheckboxInterceptor.java:91)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at org.apache.struts2.interceptor.FileUploadInterceptor.intercept(FileUploadInterceptor.java:252)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ModelDrivenInterceptor.intercept(ModelDrivenInterceptor.java:100)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ScopedModelDrivenInterceptor.intercept(ScopedModelDrivenInterceptor.java:141)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ChainingInterceptor.intercept(ChainingInterceptor.java:145)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.I18nInterceptor.intercept(I18nInterceptor.java:139)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.PrepareInterceptor.doIntercept(PrepareInterceptor.java:171)
at com.opensymphony.xwork2.interceptor.MethodFilterInterceptor.intercept(MethodFilterInterceptor.java:98)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at org.apache.struts2.interceptor.ServletConfigInterceptor.intercept(ServletConfigInterceptor.java:164)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.AliasInterceptor.intercept(AliasInterceptor.java:193)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.opensymphony.xwork2.interceptor.ExceptionMappingInterceptor.intercept(ExceptionMappingInterceptor.java:189)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at com.utils.LoginInterceptor.intercept(Unknown Source)
at com.opensymphony.xwork2.DefaultActionInvocation.invoke(DefaultActionInvocation.java:246)
at org.apache.struts2.impl.StrutsActionProxy.execute(StrutsActionProxy.java:54)
at org.apache.struts2.dispatcher.Dispatcher.serviceAction(Dispatcher.java:562)
at org.apache.struts2.dispatcher.ng.ExecuteOperations.executeAction(ExecuteOperations.java:77)
at org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter.doFilter(StrutsPrepareAndExecuteFilter.java:99)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:118)
at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at com.utils.SetResponseBufferFilter.doFilter(Unknown Source)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.security.jps.ee.http.JpsAbsFilter$1.run(JpsAbsFilter.java:119)
at java.security.AccessController.doPrivileged(Native Method)
at oracle.security.jps.util.JpsSubject.doAsPrivileged(JpsSubject.java:324)
at oracle.security.jps.ee.util.JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:460)
at oracle.security.jps.ee.http.JpsAbsFilter.runJaasMode(JpsAbsFilter.java:103)
at oracle.security.jps.ee.http.JpsAbsFilter.doFilter(JpsAbsFilter.java:171)
at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:71)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:163)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at weblogic.servlet.internal.RequestEventsFilter.doFilter(RequestEventsFilter.java:27)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3748)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3714)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2283)
at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2182)
at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1499)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:263)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
Locked ownable synchronizers:
- None
Blocked threads are not necessarily a concern, in java when handling concurrent requests and using singleton classes or hashmaps etc its natural to have some blocking as a means to achieve sanity in data...
When to take thread dumps -
When weblogic server is reporting stuck threads, or you are getting general slowness in response (though many other factors contribute to this)
How to take thread dumps -
Always take 3-4 thread dumps spaced 1-2 secs apart. Looking at 1 momentary thread dump usually does not provide the complete picture.. If you notice the same threads are in blocked state in almost all 3-4 dumps, check which thread is holding the lock and what it is doing, is the thread holding the lock doing the same task in all 4 dumps, if so you got the clue on what to troubleshoot..
In the thread dump you gave..
ExecuteThread: '387' is waiting on
waiting to lock <0x000000051ea30728>
and ExecuteThread: '159' is holding locked <0x000000051ea30728> but 159 is in Runnable state, meaning its not stuck and doing its normal operations..
had you taken several thread dumps and saw 159 holding the lock for far too long you need to consider tuning your application and think about a workaround if possible around this lock.
The problem seems to be related to the lock on com.ibatis.sqlmap.engine.mapping.result.AutoResultMap
suggesting that part of the code is synchronized, meaning that ONLY one thread at a time will be executed.
If you collect a thread dump, you'll see several threads are waiting for the lock and only one is executing something like:
"[ACTIVE] ExecuteThread: '339' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=REMOVED nid=REMOVED runnable [REMOVED]
java.lang.Thread.State: RUNNABLE
at weblogic.jdbc.wrapper.ResultSet_oracle_jdbc_driver_OracleResultSetImpl.wasNull(Unknown Source)
at com.ibatis.sqlmap.engine.type.StringTypeHandler.getResult(StringTypeHandler.java:36)
at com.ibatis.sqlmap.engine.mapping.result.ResultMap.getPrimitiveResultMappingValue(ResultMap.java:619)
at com.ibatis.sqlmap.engine.mapping.result.ResultMap.getResults(ResultMap.java:345)
at com.ibatis.sqlmap.engine.mapping.result.AutoResultMap.getResults(AutoResultMap.java:47)
- locked <0x00000004544b7328> (a com.ibatis.sqlmap.engine.mapping.result.AutoResultMap)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:384)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:300)
...
That leads us to another question, does IBatis support multithread? Based on the following link the answer is no:
Mybatis does not work in multithreaded programming
In standard usage of MyBatis core module, the SqlSession and mapper object does not support the thread-safe. This is design. The MyBatis provide the SqlSessionManager for supporting the thread-safe as optional component.
https://github.com/mybatis/mybatis-3/issues/1529
I've encountered strange behaviour in Hazelcast Jet. I'm starting many jobs at once (~30, some are triggered slightly before other). However, when my Hazelcast Jet job count hits 26 (magic number?), all processing got stuck.
In the threadumps I see following info:
"hz._hzInstance_1_jet.cached.thread-1" #37 prio=5 os_prio=0 cpu=1093.29ms elapsed=393.62s tid=0x00007f95dc007000 nid=0x6bfc in Object.wait() [0x00007f95e6af4000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#11.0.2/Native Method)
- waiting on <no object reference available>
at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:229)
- waiting to re-lock in wait() <0x00000007864b7040> (a com.hazelcast.internal.util.SimpleCompletableFuture)
at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:191)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:88)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:385)
at com.hazelcast.map.impl.proxy.MapProxySupport.clearInternal(MapProxySupport.java:1016)
at com.hazelcast.map.impl.proxy.MapProxyImpl.clearInternal(MapProxyImpl.java:109)
at com.hazelcast.map.impl.proxy.MapProxyImpl.clear(MapProxyImpl.java:698)
at com.hazelcast.jet.impl.JobRepository.clearSnapshotData(JobRepository.java:464)
at com.hazelcast.jet.impl.MasterJobContext.tryStartJob(MasterJobContext.java:233)
at com.hazelcast.jet.impl.JobCoordinationService.tryStartJob(JobCoordinationService.java:776)
at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitJob$0(JobCoordinationService.java:200)
at com.hazelcast.jet.impl.JobCoordinationService$$Lambda$634/0x00000008009ce840.run(Unknown Source)
and also:
"hz._hzInstance_1_jet.async.thread-2" #81 prio=5 os_prio=0 cpu=0.00ms elapsed=661.98s tid=0x0000025bb23ef000 nid=0x43bc in Object.wait() [0x0000005d492fe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#11/Native Method)
- waiting on <no object reference available>
at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:229)
- waiting to re-lock in wait() <0x0000000725600100> (a com.hazelcast.internal.util.SimpleCompletableFuture)
at com.hazelcast.spi.impl.AbstractCompletableFuture.get(AbstractCompletableFuture.java:191)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:88)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:385)
at com.hazelcast.map.impl.proxy.MapProxySupport.removeAllInternal(MapProxySupport.java:619)
at com.hazelcast.map.impl.proxy.MapProxyImpl.removeAll(MapProxyImpl.java:285)
at com.hazelcast.jet.impl.JobRepository.deleteJob(JobRepository.java:332)
at com.hazelcast.jet.impl.JobRepository.completeJob(JobRepository.java:316)
at com.hazelcast.jet.impl.JobCoordinationService.completeJob(JobCoordinationService.java:576)
at com.hazelcast.jet.impl.MasterJobContext.lambda$finalizeJob$13(MasterJobContext.java:620)
at com.hazelcast.jet.impl.MasterJobContext$$Lambda$783/0x0000000800b26840.run(Unknown Source)
at com.hazelcast.jet.impl.MasterJobContext.finalizeJob(MasterJobContext.java:632)
at com.hazelcast.jet.impl.MasterJobContext.onCompleteExecutionCompleted(MasterJobContext.java:564)
at com.hazelcast.jet.impl.MasterJobContext.lambda$invokeCompleteExecution$6(MasterJobContext.java:544)
at com.hazelcast.jet.impl.MasterJobContext$$Lambda$779/0x0000000800b27840.accept(Unknown Source)
at com.hazelcast.jet.impl.MasterContext.lambda$invokeOnParticipants$0(MasterContext.java:242)
at com.hazelcast.jet.impl.MasterContext$$Lambda$726/0x0000000800a1c040.accept(Unknown Source)
at com.hazelcast.jet.impl.util.Util$2.onResponse(Util.java:172)
at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#11/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#11/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#11/Thread.java:834)
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
I don't have any other info how to reproduce this issue, however I hope someone will know how to fix this or my question will help someone else :)
My setup:
- Java 11
- Hazelcast 3.12 Snapshot
- Hazelcast Jet 3.0 Snapshot (I can't revert to previous version, it will break my logic; I need n:m joins which will be added in 3.1)
- CPU cores: 4
- RAM: 7 GB
- Jet mode: server, connects to other cluster as a client to insert final data.
Did anyone encounter similar issue? The problem is, it cannot be simply reproduces, thus it's hard to create an issue for Hazelcast Team. Only threaddumps and general behaviour can give a hint what is going on.
This was an issue in 3.0-SNAPSHOT during development and was fixed in the 3.0 release.
After a load test in my application, my application server CPU is hogging at 50% always even though nothing is running in the system after the test. When I look at the thread dump, I can see at least 24 BLOCKED threads in the dump. Most of them are related to spring framework, mule framework and so on....nothing specific to my application.
Please see below for a few stack traces of the thread dump.
"ajp-10.40.44.47-8009-116" daemon prio=6 tid=0x000000000cdce800 nid=0x7ac waiting for monitor entry [0x000000003cf1e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:291)
- waiting to lock <0x00000000c274dab8> (a org.jboss.web.tomcat.service.WebCtxLoader$ENCLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.mule.util.ClassUtils.loadClass(ClassUtils.java:340)
at org.mule.util.ClassUtils.instanciateClass(ClassUtils.java:437)
at org.mule.transport.service.DefaultTransportServiceDescriptor.createEndpointURIBuilder(DefaultTransportServiceDescriptor.java:448)
at org.mule.endpoint.MuleEndpointURI.initialise(MuleEndpointURI.java:222)
at org.mule.transport.servlet.MuleReceiverServlet.getReceiverForURI(MuleReceiverServlet.java:308)
at org.mule.transport.servlet.MuleReceiverServlet.doAllMethods(MuleReceiverServlet.java:241)
at org.mule.transport.servlet.MuleReceiverServlet.doPost(MuleReceiverServlet.java:198)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
at org.mule.transport.servlet.MuleReceiverServlet.service(MuleReceiverServlet.java:180)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
"financials.pctogl.staging.partitioned.jms.taskExecutor-9" prio=6 tid=0x0000000009b8f000 nid=0x1730 waiting for monitor entry [0x000000002cf1f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.integration.util.ErrorHandlingTaskExecutor$1.run(ErrorHandlingTaskExecutor.java:52)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
"[powercomp].iip.flow.drools.service.stage1.01" prio=6 tid=0x00000000063af800 nid=0x1608 waiting for monitor entry [0x000000002979f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.mule.work.WorkerContext.run(WorkerContext.java:315)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
"task-scheduler-10" prio=6 tid=0x000000000d075000 nid=0xcf0 waiting for monitor entry [0x000000002649f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.integration.channel.MessagePublishingErrorHandler.handleError(MessagePublishingErrorHandler.java:83)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:59)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
"UDP ucast,10.40.44.47:61710" prio=6 tid=0x0000000009b82800 nid=0xeb0 waiting for monitor entry [0x000000001771e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:201)
- waiting to lock <0x00000000a197e360> (a org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:388)
at org.apache.log4j.Category.log(Category.java:853)
at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.commons.logging.impl.Log4jProxy.log(Log4jProxy.java:301)
at org.apache.commons.logging.impl.Log4jProxy.error(Log4jProxy.java:283)
at org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:208)
at org.jgroups.protocols.TP.down(TP.java:1205)
at org.jgroups.protocols.TP$ProtocolAdapter.down(TP.java:2316)
at org.jgroups.protocols.Discovery.down(Discovery.java:374)
at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:362)
at org.jgroups.protocols.FD.sendHeartbeatResponse(FD.java:325)
at org.jgroups.protocols.FD.up(FD.java:235)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:264)
at org.jgroups.protocols.PING.up(PING.java:273)
at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2327)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1261)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1838)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1817)
at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1746)
at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:39)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.jgroups.protocols.TP.dispatchToThreadPool(TP.java:1366)
at org.jgroups.protocols.TP.receive(TP.java:1333)
at org.jgroups.protocols.UDP$UcastReceiver.run(UDP.java:960)
at java.lang.Thread.run(Thread.java:662)
I've just refactored a project in Eclipse to use packages instead of the default package to better organise my code. I have a test program which creates a number of Runnable objects, each of which are run sequentially to each other, but in parallel with the main program.
Before refactoring this worked fine, with each thread dutifully performing its task. However, since moving things into packages I'm getting a ClassNotFound exception as soon as one of the Runnable classes attempts to use a class from another package. Stacktrace follows:
java.lang.ClassNotFoundException: Tweet
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at java.io.ObjectInputStream.resolveClass(Unknown Source)
at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.io.ObjectInputStream.readClassDesc(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at indexing.TweetCleanup.run(TweetCleanup.java:84)
at java.lang.Thread.run(Unknown Source)
In this trace 'TweetCleanup' is the Runnable class, and 'Tweet' is the class not found. I've included it in TweetCleanup as 'common.Tweet' (where common is the package). I've used a separate test program to see if it can see the class in the main thread, which runs successfully.
All I can think of is that the Thread needs to be supplied some classpath to 'see' the Tweet class, however this wasn't the case before refactoring as packages. I believe the default behaviour of a child Thread is to use the classpath of its parent, which includes 'common.Tweet'.
I'm using Eclipse Helios as my IDE.
Any tips would be appreciated!
Cheers,
P
Not a Threading problem at all, but you're using Java Serialization somehow and you try to read an Object from a stream which has "older" objects stored within it.
I suspect you're using some kind of persistence by Java Serialization, and your TweetCleanup-Thread is reading in the old "database". Delete this file and your program will work again.
Note that renaming packages or moving classes is a binary incompatible change - you cannot deserialize such objects anymore.