I have a daemon thread that calls kieSession.fireUntilHalt. And facts are inserted from other threads (http).
At some point the 'firing' thread gets into "Waiting" state and never returns. So the other facts inserting threads are blocked on this thread.
Rule Firing Thread Stack:
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at org.drools.core.common.UpgradableReentrantReadWriteLock.lowPriorityWriteLock(UpgradableReentrantReadWriteLock.java:109)
at org.drools.core.common.UpgradableReentrantReadWriteLock.writeLock(UpgradableReentrantReadWriteLock.java:95)
at org.drools.core.impl.KnowledgeBaseImpl.lock(KnowledgeBaseImpl.java:684)
at org.drools.core.reteoo.builder.PatternBuilder.attachObjectTypeNode(PatternBuilder.java:275)
at org.drools.core.reteoo.ClassObjectTypeConf. (ClassObjectTypeConf.java:107)
at org.drools.core.common.ObjectTypeConfigurationRegistry.getObjectTypeConf(ObjectTypeConfigurationRegistry.java:72)
at org.drools.core.reteoo.AccumulateNode.createResultFactHandle(AccumulateNode.java:149)
at org.drools.core.phreak.PhreakAccumulateNode.evaluateResultConstraints(PhreakAccumulateNode.java:683)
at org.drools.core.phreak.PhreakAccumulateNode.doNode(PhreakAccumulateNode.java:89)
at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:563)
at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:534)
at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:334)
at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:161)
at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:116)
at org.drools.core.phreak.RuleExecutor.evaluateNetwork(RuleExecutor.java:77)
- locked [0x00000000fe586af0] (a org.drools.core.phreak.RuleExecutor)
at org.drools.core.common.DefaultAgenda.evaluateEagerList(DefaultAgenda.java:990)
- locked [0x000000008ad60cd8] (a java.util.LinkedList)
at org.drools.core.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:945)
at org.drools.core.common.DefaultAgenda.fireUntilHalt(DefaultAgenda.java:1190)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1283)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireUntilHalt(StatefulKnowledgeSessionImpl.java:1260)
at com.comp.narch.lfc.decision.rules.util.FireUntilHaltSpawn.call(FireUntilHaltSpawn.java:21)
at com.comp.narch.lfc.decision.rules.util.FireUntilHaltSpawn.call(FireUntilHaltSpawn.java:11)
Fact Inserting Thread (blocked) Stack:
at org.drools.core.common.DefaultAgenda.addEagerRuleAgendaItem(DefaultAgenda.java:282)
- waiting to lock [0x000000008ad60cd8] (a java.util.LinkedList)
at org.drools.core.reteoo.PathMemory.queueRuleAgendaItem(PathMemory.java:153)
at org.drools.core.reteoo.PathMemory.doLinkRule(PathMemory.java:120)
- locked [0x00000000fe586928] (a org.drools.core.reteoo.PathMemory)
at org.drools.core.reteoo.PathMemory.linkSegment(PathMemory.java:90)
at org.drools.core.reteoo.SegmentMemory.notifyRuleLinkSegment(SegmentMemory.java:170)
at org.drools.core.reteoo.LeftInputAdapterNode$LiaNodeMemory.setNodeDirty(LeftInputAdapterNode.java:647)
at org.drools.core.reteoo.LeftInputAdapterNode.doInsertSegmentMemory(LeftInputAdapterNode.java:277)
at org.drools.core.reteoo.LeftInputAdapterNode.doInsertObject(LeftInputAdapterNode.java:229)
at org.drools.core.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:197)
at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:502)
at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:387)
at org.drools.core.reteoo.ObjectTypeNode.assertObject(ObjectTypeNode.java:288)
at org.drools.core.reteoo.EntryPointNode.assertObject(EntryPointNode.java:251)
at org.drools.core.common.NamedEntryPoint.insert(NamedEntryPoint.java:367)
at org.drools.core.common.NamedEntryPoint.insert(NamedEntryPoint.java:286)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.insert(StatefulKnowledgeSessionImpl.java:1430)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.insert(StatefulKnowledgeSessionImpl.java:1372)
at com.comp.narch.lfc.service.impl.DefaultServiceInstanceMgmtServiceImpl.newAppMetric(DefaultServiceInstanceMgmtServiceImpl.java:84)
Drools Version - 6.1.0.Final
Related
I am using latest Datalevin version 0.7.8 and wrote the following small program:
(ns datalevintest.core
(:require [datalevin.core :as dc]))
(def store (System/getenv "DBSTORE"))
(def conn (datalevin.core/get-conn store {} {:auto-entity-time? true :validate-data? true}))
(defn -main [& _]
(dotimes [i 5]
(future
(locking ::println (println "Starting thread"))
(try
(dotimes [j 100]
(dc/transact! conn [{:i+j (+ i j)}])
(dc/with-transaction [tx-conn conn]
(dc/transact! tx-conn [{:i*j (* i j)}]))
(dc/q '[:find (pull ?e [*]) :in $ ?id :where [?e :db/id ?id]]
(dc/db conn) 2345))
(catch Throwable t (.printStackTrace t))
(finally (println "Thread" i "done")))))
(println "END"))
Nondeterministically, sometimes I get the following:
clojure.lang.ExceptionInfo: Fail to transact to LMDB: "Transaction is not in ready state" {}
at datalevin.binding.java.LMDB.transact_kv(java.clj:484)
at datalevin.storage.Store.load_datoms(storage.cljc:376)
at datalevin.db$local_transact_tx_data.invokeStatic(db.cljc:1236)
at datalevin.db$local_transact_tx_data.invoke(db.cljc:963)
at datalevin.db$transact_tx_data.invokeStatic(db.cljc:1274)
at datalevin.db$transact_tx_data.invoke(db.cljc:1250)
at datalevin.core$with.invokeStatic(core.cljc:291)
at datalevin.core$with.invoke(core.cljc:285)
at datalevin.core$with.invokeStatic(core.cljc:288)
at datalevin.core$with.invoke(core.cljc:285)
at datalevin.core$_transact_BANG_$fn__13128$fn__13129.invoke(core.cljc:550)
at clojure.lang.Atom.swap(Atom.java:37)
at clojure.core$swap_BANG_.invokeStatic(core.clj:2356)
at clojure.core$swap_BANG_.invoke(core.clj:2349)
at datalevin.core$_transact_BANG_$fn__13128.invoke(core.cljc:549)
at datalevin.core$_transact_BANG_.invokeStatic(core.cljc:548)
at datalevin.core$_transact_BANG_.invoke(core.cljc:545)
at datalevin.core$transact_BANG_.invokeStatic(core.cljc:643)
at datalevin.core$transact_BANG_.invoke(core.cljc:555)
at datalevin.core$transact_BANG_.invokeStatic(core.cljc:640)
at datalevin.core$transact_BANG_.invoke(core.cljc:555)
at datalevintest.core$save_BANG_.invokeStatic(core.clj:10)
at datalevintest.core$save_BANG_.doInvoke(core.clj:9)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at datalevintest.core$_main$fn__13261$fn__13265.invoke(core.clj:32)
at datalevintest.core$_main$fn__13261.invoke(core.clj:27)
at clojure.core$binding_conveyor_fn$fn__5772.invoke(core.clj:2034)
at clojure.lang.AFn.call(AFn.java:18)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
(You may need to run the program multiple times to get the error.)
Less often I get the following:
clojure.lang.ExceptionInfo: Fail to get-first: nil {:dbi "datalevin/eav", :k-range [:all-back], :k-type :eav, :v-type :id}
at datalevin.scan$get_first.invokeStatic(scan.cljc:233)
at datalevin.scan$get_first.invoke(scan.cljc:229)
at datalevin.binding.java.LMDB.get_first(java.clj:502)
at datalevin.binding.java.LMDB.get_first(java.clj:500)
at datalevin.storage.Store.init_max_eid(storage.cljc:300)
at datalevin.db$new_db.invokeStatic(db.cljc:387)
at datalevin.db$new_db.invoke(db.cljc:379)
at datalevintest.core$_main$fn__13261$fn__13265$fn__13276.invoke(core.clj:30)
at datalevintest.core$_main$fn__13261$fn__13265.invoke(core.clj:30)
at datalevintest.core$_main$fn__13261.invoke(core.clj:27)
at clojure.core$binding_conveyor_fn$fn__5772.invoke(core.clj:2034)
at clojure.lang.AFn.call(AFn.java:18)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
If I move the creation of connection into the future wiht create-conn, I get another exception:
java.lang.NullPointerException: Cannot read field "e"
at datalevin.storage.Store.init_max_eid(storage.cljc:302)
at datalevin.db$new_db.invokeStatic(db.cljc:387)
at datalevin.db$new_db.invoke(db.cljc:379)
at datalevin.db$empty_db.invokeStatic(db.cljc:399)
at datalevin.db$empty_db.invoke(db.cljc:392)
at datalevin.core$create_conn.invokeStatic(core.cljc:529)
at datalevin.core$create_conn.invoke(core.cljc:488)
at datalevintest.core$_main$fn__13252.invoke(core.clj:14)
at clojure.core$binding_conveyor_fn$fn__5772.invoke(core.clj:2034)
at clojure.lang.AFn.call(AFn.java:18)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
(This one also breaks the database file so the application will not start up the next time.)
The issue comes up in a multithreaded environment and sounds like a concurrency problem. My first idea was that the same connection should'nt be used across different threads, however, the code of get-conn says the same connection will be reused when it alrady exists for a directory. The documentation does not mention multithreading.
What is the error in my code causing the problem and how can I make it safer?
This bug was fixed by version 0.7.9 released half an hour after the question was posted.
I have a high qps Netty app (2500-3000qps per vm).
The EventLoop threads look like below.
bossGroup = new NioEventLoopGroup(2, new DefaultThreadFactory("inbound-netty-boss"));
workerGroup = new NioEventLoopGroup(32 * 5, new DefaultThreadFactory("inbound-netty-worker"));
The request lifecycle looks like this:
Incoming request--> Do X --> Do Y
What I want to do is :
Incoming request --> Do X(params1) --> Do X(params2) after 100ms delay -->Do Y
So far I have tried:
CompletableFuture.runAsync(RunnableX, CompletableFuture.delayedExecutor(
100,
TimeUnit.MILLISECONDS, new ForkJoinPool(32 * 5));
Executors.newScheduledThreadPool(32*5,
new ThreadFactoryBuilder().schedule(RunnableX, 100, TimeUnit.MILLISECONDS)
And finally
channelHandlerContext.executor().schedule(RunnableX, 100, TimeUnit.MILLISECONDS);
All of these have cause a tremendous bottleneck with the qps almost(1/10th) of original.
I keep seeing the threads in RUNNABLE state from the eventloopgroup and all threads in dedicated executor in TIMED WAITING state.
What am I missing? I need to find a way to not block the inbound thread, so that it is free to serve other requests.
I debug a JavaFx application which is getting a locked black screen ( does not react any more ) and high CPU usage.
It may be a hidden locking window or something related to modal dialogs. There is also an installer written in Swing which shows its own dialogs if new updates are comming, maybe this is conflicting with JavaFx ?
How may I find from the jstack output which is the reason ?
The application is using OpenJDK 11.
C:\Users\dprutean>jstack -m 9244
Error: -m option used
Cannot connect to core dump or remote debug server. Use jhsdb jstack instead
C:\Users\dprutean>jstack 9244
2020-01-14 08:03:11
Full thread dump OpenJDK 64-Bit Client VM (13+33 mixed mode):
Threads class SMR info:
_java_thread_list=0x0000000066bcb920, length=25, elements={
0x000000000394d000, 0x0000000003557000, 0x000000000357a800, 0x00000000035b3000,
0x00000000035b4800, 0x00000000035b6800, 0x00000000035b7800, 0x000000004d844000,
0x000000004d9aa800, 0x0000000055e65800, 0x0000000055e89800, 0x00000000560b8800,
0x0000000056447800, 0x00000000564cb800, 0x00000000564ae800, 0x00000000580a3000,
0x00000000582ab000, 0x00000000582a7000, 0x00000000582a6800, 0x00000000582ac800,
0x00000000582ad800, 0x00000000582aa000, 0x00000000582a8800, 0x00000000582a9800,
0x0000000067459000
}
"main" #1 prio=5 os_prio=0 cpu=468.75ms elapsed=43898.09s tid=0x000000000394d000 nid=0x6c4 waiting on condition [0x000000000312d000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x00000000226c3cb8> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base#13/AbstractQueuedSynchronizer.java:885)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base#13/AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base#13/AbstractQueuedSynchronizer.java:1345)
at java.util.concurrent.CountDownLatch.await(java.base#13/CountDownLatch.java:232)
at com.sun.javafx.application.LauncherImpl.launchApplication(LauncherImpl.java:213)
at com.sun.javafx.application.LauncherImpl.launchApplication(LauncherImpl.java:156)
at javafx.application.Application.launch(Application.java:227)
at com.wisecoders.dbs.DbSchema.main(DbSchema.java:62)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base#13/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base#13/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base#13/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base#13/Method.java:567)
at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:85)
at com.exe4j.runtime.WinLauncher.main(WinLauncher.java:94)
at com.install4j.runtime.launcher.WinLauncher.main(WinLauncher.java:25)
"Reference Handler" #2 daemon prio=10 os_prio=2 cpu=0.00ms elapsed=43898.03s tid=0x0000000003557000 nid=0x4498 waiting on condition [0x000000000389f000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base#13/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base#13/Reference.java:241)
at java.lang.ref.Reference$ReferenceHandler.run(java.base#13/Reference.java:213)
"Finalizer" #3 daemon prio=8 os_prio=1 cpu=0.00ms elapsed=43898.03s tid=0x000000000357a800 nid=0x586c in Object.wait() [0x000000004d2ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x0000000021f5dbd8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:176)
at java.lang.ref.Finalizer$FinalizerThread.run(java.base#13/Finalizer.java:170)
"Signal Dispatcher" #4 daemon prio=9 os_prio=2 cpu=0.00ms elapsed=43898.02s tid=0x00000000035b3000 nid=0x62f8 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Attach Listener" #5 daemon prio=5 os_prio=2 cpu=0.00ms elapsed=43898.02s tid=0x00000000035b4800 nid=0x6968 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread0" #6 daemon prio=9 os_prio=2 cpu=8578.13ms elapsed=43898.02s tid=0x00000000035b6800 nid=0x1ff8 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Sweeper thread" #7 daemon prio=9 os_prio=2 cpu=7546.88ms elapsed=43898.02s tid=0x00000000035b7800 nid=0x11c0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #8 daemon prio=9 os_prio=0 cpu=15.63ms elapsed=43898.00s tid=0x000000004d844000 nid=0x6134 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #9 daemon prio=8 os_prio=1 cpu=0.00ms elapsed=43897.96s tid=0x000000004d9aa800 nid=0x2444 in Object.wait() [0x000000004e02f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x0000000021fea588> (a java.lang.ref.ReferenceQueue$Lock)
at jdk.internal.ref.CleanerImpl.run(java.base#13/CleanerImpl.java:148)
at java.lang.Thread.run(java.base#13/Thread.java:830)
at jdk.internal.misc.InnocuousThread.run(java.base#13/InnocuousThread.java:134)
"Java2D Disposer" #10 daemon prio=10 os_prio=2 cpu=0.00ms elapsed=43896.89s tid=0x0000000055e65800 nid=0x670c in Object.wait() [0x000000005675f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x000000002244e758> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:176)
at sun.java2d.Disposer.run(java.desktop#13/Disposer.java:144)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"AWT-Windows" #12 daemon prio=6 os_prio=0 cpu=0.00ms elapsed=43896.85s tid=0x0000000055e89800 nid=0x50ac runnable [0x0000000056b5e000]
java.lang.Thread.State: RUNNABLE
at sun.awt.windows.WToolkit.eventLoop(java.desktop#13/Native Method)
at sun.awt.windows.WToolkit.run(java.desktop#13/WToolkit.java:312)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"JavaFX-Launcher" #16 prio=5 os_prio=0 cpu=296.88ms elapsed=43896.53s tid=0x00000000560b8800 nid=0x4494 waiting on condition [0x0000000057abf000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x00000000227025c0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base#13/AbstractQueuedSynchronizer.java:885)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base#13/AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base#13/AbstractQueuedSynchronizer.java:1345)
at java.util.concurrent.CountDownLatch.await(java.base#13/CountDownLatch.java:232)
at com.sun.javafx.application.LauncherImpl.launchApplication1(LauncherImpl.java:856)
at com.sun.javafx.application.LauncherImpl.lambda$launchApplication$2(LauncherImpl.java:195)
at com.sun.javafx.application.LauncherImpl$$Lambda$160/0x00000000575c7500.run(Unknown Source)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"QuantumRenderer-0" #17 daemon prio=5 os_prio=0 cpu=664781.25ms elapsed=43896.27s tid=0x0000000056447800 nid=0xaa8 runnable [0x0000000057f1e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x00000000226c44a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#13/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#13/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:125)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"InvokeLaterDispatcher" #19 daemon prio=5 os_prio=0 cpu=903656.25ms elapsed=43896.21s tid=0x00000000564cb800 nid=0x5514 in Object.wait() [0x0000000058fff000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.Object.wait(java.base#13/Object.java:326)
at com.sun.glass.ui.InvokeLaterDispatcher.run(InvokeLaterDispatcher.java:127)
- locked <0x00000000226c47f8> (a java.lang.StringBuilder)
"JavaFX Application Thread" #20 prio=5 os_prio=0 cpu=1590890.63ms elapsed=43896.21s tid=0x00000000564ae800 nid=0xb94 runnable [0x00000000590fe000]
java.lang.Thread.State: RUNNABLE
at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
at com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:174)
at com.sun.glass.ui.win.WinApplication$$Lambda$202/0x0000000057b674a8.run(Unknown Source)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"Thread-2" #21 daemon prio=5 os_prio=0 cpu=12703.13ms elapsed=43896.15s tid=0x00000000580a3000 nid=0x28f0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Prism Font Disposer" #22 daemon prio=10 os_prio=2 cpu=0.00ms elapsed=43895.38s tid=0x00000000582ab000 nid=0x4bf0 in Object.wait() [0x00000000613bf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x000000002298ecb0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:176)
at com.sun.javafx.font.Disposer.run(Disposer.java:93)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"pool-4-thread-1" #25 prio=5 os_prio=0 cpu=937.50ms elapsed=43894.26s tid=0x00000000582a7000 nid=0x41b0 waiting on condition [0x000000005714f000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x00000000233fab90> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#13/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#13/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"pool-2-thread-1" #26 prio=5 os_prio=0 cpu=0.00ms elapsed=43894.26s tid=0x00000000582a6800 nid=0x4bc4 waiting on condition [0x000000005724f000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x0000000022b2e718> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#13/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#13/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"Cleaner-0" #30 daemon prio=8 os_prio=1 cpu=0.00ms elapsed=43893.73s tid=0x00000000582ac800 nid=0x59cc in Object.wait() [0x000000006395e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x00000000232a2c60> (a java.lang.ref.ReferenceQueue$Lock)
at jdk.internal.ref.CleanerImpl.run(java.base#13/CleanerImpl.java:148)
at java.lang.Thread.run(java.base#13/Thread.java:830)
at jdk.internal.misc.InnocuousThread.run(java.base#13/InnocuousThread.java:134)
"mysql-cj-abandoned-connection-cleanup" #31 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=43893.28s tid=0x00000000582ad800 nid=0xf10 in Object.wait() [0x0000000063c5e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x0000000023527618> (a java.lang.ref.ReferenceQueue$Lock)
at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:85)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"mysql-cj-abandoned-connection-cleanup" #32 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=43893.15s tid=0x00000000582aa000 nid=0x1cf8 in Object.wait() [0x0000000065d5e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base#13/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove(java.base#13/ReferenceQueue.java:155)
- locked <0x00000000237ccbd8> (a java.lang.ref.ReferenceQueue$Lock)
at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:85)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"TimerQueue" #33 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=43892.89s tid=0x00000000582a8800 nid=0x597c waiting on condition [0x000000006605e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x00000000237cce78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.DelayQueue.take(java.base#13/DelayQueue.java:217)
at javax.swing.TimerQueue.run(java.desktop#13/TimerQueue.java:171)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"DbSchema: Reverse engineering pre-fetch" #42 prio=5 os_prio=0 cpu=890.63ms elapsed=43885.08s tid=0x00000000582a9800 nid=0x2cf8 waiting on condition [0x00000000579bf000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x0000000022b2e718> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#13/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#13/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"DbSchema: Browse Task" #50 prio=5 os_prio=0 cpu=312.50ms elapsed=43855.36s tid=0x0000000067459000 nid=0x5f98 waiting on condition [0x000000005734f000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#13/Native Method)
- parking to wait for <0x0000000022b2e718> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#13/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#13/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#13/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#13/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#13/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#13/Thread.java:830)
"VM Thread" os_prio=2 cpu=11859.38ms elapsed=43898.04s tid=0x0000000003553000 nid=0x4d64 runnable
"VM Periodic Task Thread" os_prio=2 cpu=31.25ms elapsed=43898.00s tid=0x000000004d844800 nid=0x6950 waiting on condition
JNI global refs: 116, weak refs: 14
It seems like that thread named "JavaFX Application Thread" has a looping execution,cause hign cpu usage.
Clojure 1.5.1 and Core.async library version "0.1.267.0-0d7780-alpha" is used to run CPU intensive calculations. A bunch of functions wrapped by time out function thunk-timeout are sent to a channel as shown by below mock code.
(defn toSendToGo [args timeoutUnits]
(let [result (atom [false])
timeout? (atom false)]
(try
( thunk-timeout
(fn [] (reset! result (myFunction args))) timeoutUnits)
(catch java.util.concurrent.TimeoutException e (do (prn "!Time out after " timeoutUnits " seconds!!") (reset! timeout? true)) ))
(if #timeout? (do sth))
#result))
(let [c ( chan)]
(go (>! c (toSendToGo args timeoutUnits))))
On Linux server with large memory, the code runs fine without issue. On windows server with smaller memory, if a few cases in a row experienced time out, there'd be this strange exception that I don't quite understand. Why is this related to time out?
Exception in thread "my-async-dispatch-4" java.lang.IllegalStateException: Pop w
ithout matching push
at clojure.lang.Var.popThreadBindings(Var.java:364)
at clojure.core$pop_thread_bindings.invoke(core.clj:1737)
at regtest$fn__40$processRegtestFiles__41$fn__96$state_machine__3962__auto____97$fn__99.invoke(regtest.clj:158)
at regtest$fn__40$processRegtestFiles__41$fn__96$state_machine__3962__auto____97.invoke(regtest.clj:158)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:945)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:949)
at regtest$fn__40$processRegtestFiles__41$fn__96.invoke(regtest.clj:158)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Exception in thread "my-async-dispatch-5" java.lang.IllegalStateException: Pop w
ithout matching push
at clojure.lang.Var.popThreadBindings(Var.java:364)
at clojure.core$pop_thread_bindings.invoke(core.clj:1737)
at regtest$fn__40$processRegtestFiles__41$fn__96$state_machine__3962__au
to____97$fn__99.invoke(regtest.clj:158)
at regtest$fn__40$processRegtestFiles__41$fn__96$state_machine__3962__au
to____97.invoke(regtest.clj:158)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macro
s.clj:945)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(i
oc_macros.clj:949)
at regtest$fn__40$processRegtestFiles__41$fn__96.invoke(regtest.clj:158)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[UPDATE]
The issue is gone after adding a catch expression after the catch expression for TimeoutException:
(try...
(catch java.util.concurrent.TimeoutException e (do (prn "!Time out after " timeoutUnits " seconds!!") (reset! timeout? true)) )
(catch Exception e (prn "Unexpected exception " e) ))
With the caught exception being
Unexpected exception #<ExecutionException java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: slingshot/Stone
I'm not sure this is your problem but (toSendToGo) returns nil via the result atom if an exception is caught.
That nil is then put to chan c. Core.async channels shouldn't (and normally can't) have nil as a value. Do you still get the error when resetting result in the catch to anything but 'nil' (try 'false' perhaps?)?
As an aside, clojure, being a lisp, tends to prefer lower-case-with-hyphens for names of most things. toSendToGo should probably be to-send-to-go instead. I don't know of any official style guide but here's a link to the most thorough one I know of: https://github.com/bbatsov/clojure-style-guide
(P.S. This should probably have been a comment but I don't have the reputation to make them yet.)
Our system writes the ManagedThreadID with every message written and for years we've used that to help distinguish a particular unit of work among many in the logs. So far, so good.
Now, we're beginning to use the Task Parallel Library and noticing an interesting effect:
public static void Main(string[] args) {
WriteLine("BEGIN");
Parallel.For(0, 32, (index) => {
WriteLine(" Loop " + index.ToString());
});
WriteLine("END");
}
The output looks something like:
ThreadID=1, Message=BEGIN
ThreadID=1, Message= Loop 0
ThreadID=3, Message= Loop 16
ThreadID=3, Message= Loop 17
...
ThreadID=4, Message= Loop 4
ThreadID=4, Message= Loop 5
ThreadID=1, Message= Loop 8
ThreadID=1, Message= Loop 9
ThreadID=1, Message= Loop 10
ThreadID=3, Message= Loop 21
ThreadID=4, Message= Loop 6
...
ThreadID=3, Message= Loop 24
ThreadID=3, Message= Loop 25
ThreadID=1, Message= Loop 11
ThreadID=1, Message= Loop 12
ThreadID=1, Message= Loop 13
ThreadID=1, Message= Loop 31
ThreadID=3, Message= Loop 26
...
ThreadID=3, Message= Loop 30
ThreadID=1, Message=END
You'll notice that the ThreadID of the main thread (marked "BEGIN") is recycled in the Loop threads on occasion.
My question is: can this happen anywhere else -- such as thread pool or when using other features of the Task Parallel Library? I have spent a rediculous amount of time trying to figure out other ways to provoke the behavior and cannot.
The concern here is that if we cannot rely on the ThreadID anymore (we have many tools to rely on this behavior) then we will just avoid using Parallel.For. But if the problem will manifest in other ways, we'll need to figure out how to avoid them UNTIL we revamp our logging strategy and tooling support.
If there are other ways to provoke the behavior, I'd like to know about it so I can determine if any of our usage meets such conditions so we can correct it accordingly. More imporantly, so I can get a sample program to provoke the behavior and study any side-effects in our tooling.
Parallel.For indeed runs one of the worker tasks on the calling thread. The rationale is that since the calling thread must wait until the parallel loop completes, it might as well participate in the parallel operation.
As far as other features of Task Parallel Library go, methods that are blocking will often use the calling thread. So, Parallel.For, Parallel.ForEach, Parallel.Invoke and blocking PLINQ queries will all reuse the calling thread as one of the workers. On the other hand, operations that just "kick-off" some work and immediately return - like Task.Factory.StartNew, Threadpool.QueueUserWorkItem, and non-blocking PLINQ queries - cannot use the calling thread.
As a workaround, you can run the Parallel.For inside a task and wait on the task:
public static void Main(string[] args) {
WriteLine("BEGIN");
Task.Factory.StartNew(() =>
Parallel.For(0, 32, (index) => {
WriteLine(" Loop " + index.ToString());
})
).Wait();
WriteLine("END");
}
Warning: the workaround above will not work if Task.Factory.StartNew() is called from a ThreadPool thread. In that case, the Wait call may end up executing the task inline on the calling ThreadPool thread.
You would have to not use TPL to avoid that behavior
It sounds like you need to find a different way of identifying a unit of work besides the ThreadID, possibly using the logical call context to pass along information about the current thread if you cant rewrite code to pass along some kind of identifier.