I'm writing a Rust app that uses a lot of threads. I noticed the CPU usage was high so I did top and then hit H to see the threads:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
247759 root 20 0 3491496 104400 64676 R 32.2 1.0 0:02.98 my_app
247785 root 20 0 3491496 104400 64676 S 22.9 1.0 0:01.89 llvmpipe-0
247786 root 20 0 3491496 104400 64676 S 21.9 1.0 0:01.71 llvmpipe-1
247792 root 20 0 3491496 104400 64676 S 20.9 1.0 0:01.83 llvmpipe-7
247789 root 20 0 3491496 104400 64676 S 20.3 1.0 0:01.60 llvmpipe-4
247790 root 20 0 3491496 104400 64676 S 20.3 1.0 0:01.64 llvmpipe-5
247787 root 20 0 3491496 104400 64676 S 19.9 1.0 0:01.70 llvmpipe-2
247788 root 20 0 3491496 104400 64676 S 19.9 1.0 0:01.61 llvmpipe-3
What are these llvmpipe-n threads? Why my_app launches them? Are them even from my_app for sure?
As HHK links to, the llvmpipe threads are from your OpenGL driver, which is Mesa.
You said you are running this in a VM. VMs usually don't virtualize GPU hardware, so the Mesa OpenGL driver is doing sofware rendering. To achieve better performance, Mesa spawns threads to do parallel computations on the CPU.
I'm creating a node program to return the output of linux top command, is working fine the only issue is that the name of command is cutted, instead the full command name like /usr/local/libexec/netdata/plugins.d/apps.plugin 1 returns /usr/local+
My code
const topparser=require("topparser")
const spawn = require('child_process').spawn
let proc=null
let startTime=0
exports.start=function(pid_limit,callback){
startTime=new Date().getTime()
proc = spawn('top', ['-c','-b',"-d","3"])
console.log("started process, pid: "+proc.pid)
let top_data=""
proc.stdout.on('data', function (data) {
console.log('stdout: ' + data);
})
proc.on('close', function (code) {
console.log('child process exited with code ' + code);
});
}//start
exports.stop=function(){
console.log("stoped process...")
if(proc){proc.kill('SIGINT')}// SIGHUP -linux ,SIGINT -windows
}//stop
The results
14861 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/1+
14864 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [kworker/0+
15120 root 39 19 102488 3344 2656 S 0.0 0.1 0:00.09 /usr/bin/m+
16904 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/0+
19031 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/u+
21500 root 20 0 0 0 0 Z 0.0 0.0 0:00.00 [dsc] <def+
22571 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/0+
Any way to fix it?
Best regards
From a top manpage:
In Batch mode, when used without an argument top will format output using the COLUMNS= and LINES=
environment variables, if set. Otherwise, width will be fixed at the maximum 512 columns. With an
argument, output width can be decreased or increased (up to 512) but the number of rows is considā
ered unlimited.
Add '-w', '512' to the arguments.
Since you work with node, you can query netdata running on localhost for this.
Example:
http://london.my-netdata.io/api/v1/data?chart=apps.cpu&after=-1&options=ms
For localhost netdata:
http://localhost:19999/api/v1/data?chart=apps.cpu&after=-1&options=ms
You can also get systemd services:
http://london.my-netdata.io/api/v1/data?chart=services.cpu&after=-1&options=ms
If you are not planning to update the screen per second, you can instruct netdata to return the average of a longer duration:
http://london.my-netdata.io/api/v1/data?chart=apps.cpu&after=-5&points=1&group=average&options=ms
The above returns the average of the last 5 seconds.
Finally, you get the latest values all the metrics netdata monitors, with this:
http://london.my-netdata.io/api/v1/allmetrics?format=json
For completeness, netdata can export all the metrics in BASH format for shell scripts. Check this: https://github.com/firehol/netdata/wiki/receiving-netdata-metrics-from-shell-scripts
I'm getting Heap exhausted message when running the following short Haskell program on a big enough dataset. For example, the program fails (with heap overflow) on 20 Mb input file with around 900k lines. The heap size was set (through -with-rtsopts) to 1 Gb. It runs ok if longestCommonSubstrB is defined as something simpler, e.g. commonPrefix. I need to process files in the order of 100 Mb.
I compiled the program with the following command line (GHC 7.8.3):
ghc -Wall -O2 -prof -fprof-auto "-with-rtsopts=-M512M -p -s -h -i0.1" SampleB.hs
I would appreciate any help in making this thing run in a reasonable amount of space (in the order of the input file size), but I would especially appreciate the thought process of finding where the bottleneck is and where and how to force the strictness.
My guess is that somehow forcing longestCommonSubstrB function to evaluate strictly would solve the problem, but I don't know how to do that.
{-# LANGUAGE BangPatterns #-}
module Main where
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy.Char8 as B
import Data.List (maximumBy, sort)
import Data.Function (on)
import Data.Char (isSpace)
-- | Returns a list of lexicon items, i.e. [[w1,w2,w3]]
readLexicon :: FilePath -> IO [[B.ByteString]]
readLexicon filename = do
text <- B.readFile filename
return $ map (B.split '\t' . stripR) . B.lines $ text
where
stripR = B.reverse . B.dropWhile isSpace . B.reverse
transformOne :: [B.ByteString] -> B.ByteString
transformOne (w1:w2:w3:[]) =
B.intercalate (B.pack "|") [w1, longestCommonSubstrB w2 w1, w3]
transformOne a = error $ "transformOne: unexpected tuple " ++ show a
longestCommonSubstrB :: B.ByteString -> B.ByteString -> B.ByteString
longestCommonSubstrB xs ys = maximumBy (compare `on` B.length) . concat $
[f xs' ys | xs' <- B.tails xs] ++
[f xs ys' | ys' <- tail $ B.tails ys]
where f xs' ys' = scanl g B.empty $ B.zip xs' ys'
g z (x, y) = if x == y
then z `B.snoc` x
else B.empty
main :: IO ()
main = do
(input:output:_) <- getArgs
lexicon <- readLexicon input
let flattened = B.unlines . sort . map transformOne $ lexicon
B.writeFile output flattened
This is the profile ouput for the test dataset (100k lines, heap size set to 1 GB, i.e. generateSample.exe 100000, the resulting file size is 2.38 MB):
Heap profile over time:
Execution statistics:
3,505,737,588 bytes allocated in the heap
785,283,180 bytes copied during GC
62,390,372 bytes maximum residency (44 sample(s))
216,592 bytes maximum slop
96 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6697 colls, 0 par 1.05s 1.03s 0.0002s 0.0013s
Gen 1 44 colls, 0 par 4.14s 3.99s 0.0906s 0.1935s
INIT time 0.00s ( 0.00s elapsed)
MUT time 7.80s ( 9.17s elapsed)
GC time 3.75s ( 3.67s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 1.44s ( 1.35s elapsed)
EXIT time 0.02s ( 0.00s elapsed)
Total time 13.02s ( 12.85s elapsed)
%GC time 28.8% (28.6% elapsed)
Alloc rate 449,633,678 bytes per MUT second
Productivity 60.1% of total user, 60.9% of total elapsed
Time and Allocation Profiling Report:
SampleB.exe +RTS -M1G -p -s -h -i0.1 -RTS sample.txt sample_out.txt
total time = 3.97 secs (3967 ticks # 1000 us, 1 processor)
total alloc = 2,321,595,564 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
longestCommonSubstrB Main 43.3 33.1
longestCommonSubstrB.f Main 21.5 43.6
main.flattened Main 17.5 5.1
main Main 6.6 5.8
longestCommonSubstrB.g Main 5.0 5.8
readLexicon Main 2.5 2.8
transformOne Main 1.8 1.7
readLexicon.stripR Main 1.8 1.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 45 0 0.1 0.0 100.0 100.0
main Main 91 0 6.6 5.8 99.9 100.0
main.flattened Main 93 1 17.5 5.1 89.1 89.4
transformOne Main 95 100000 1.8 1.7 71.6 84.3
longestCommonSubstrB Main 100 100000 43.3 33.1 69.8 82.5
longestCommonSubstrB.f Main 101 1400000 21.5 43.6 26.5 49.5
longestCommonSubstrB.g Main 104 4200000 5.0 5.8 5.0 5.8
readLexicon Main 92 1 2.5 2.8 4.2 4.8
readLexicon.stripR Main 98 0 1.8 1.9 1.8 1.9
CAF GHC.IO.Encoding.CodePage 80 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 74 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 70 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 66 0 0.0 0.0 0.0 0.0
CAF System.Environment 65 0 0.0 0.0 0.0 0.0
CAF Data.ByteString.Lazy.Char8 54 0 0.0 0.0 0.0 0.0
CAF Main 52 0 0.0 0.0 0.0 0.0
transformOne Main 99 0 0.0 0.0 0.0 0.0
readLexicon Main 96 0 0.0 0.0 0.0 0.0
readLexicon.stripR Main 97 1 0.0 0.0 0.0 0.0
main Main 90 1 0.0 0.0 0.0 0.0
UPDATE: The following program can be used to generate sample data. It expects one argument, the number of lines in the generated dataset. The generated data will be saved to the sample.txt file. When I generate 900k lines dataset with it (by running generateSample.exe 900000), the produced dataset makes the above program fail with heap overflow (the heap size was set to 1 GB). The resulting dataset is around 20 MB.
module Main where
import System.Environment (getArgs)
import Data.List (intercalate, permutations)
generate :: Int -> [(String,String,String)]
generate n = take n $ zip3 (f "banana") (f "ruanaba") (f "kikiriki")
where
f = cycle . permutations
main :: IO ()
main = do
(n:_) <- getArgs
let flattened = unlines . map f $ generate (read n :: Int)
writeFile "sample.txt" flattened
where
f (w1,w2,w3) = intercalate "\t" [w1, w2, w3]
It seems to me you've implemented a naive longest common substring, with terrible space complexity (at least O(n^2)). Strictness has nothing to do with it.
You'll want to implement a dynamic programming algo. You may find inspiration in the string-similarity package, or in the lcs function in the guts of the Diff package.
I have a program running several threads, but some threads sometimes overload the CPU. so I need to limit these threads CPU usage to %50 something, is it possible in Delphi?
edit: sorry guys my question was not clear.
I actually want to know how could I track threads ( at least make a thread list with their thread IDs) and see how much CPU uses each thread. But I want to do this so I could see which thread is responsible for CPU overload.
sorry for the inconvenience again.
I think the answer to your question can be found in the following Stack Overflow question: How to get the cpu usage per thread on windows (win32).
However, I would advise you to endeavour to understand why your program is behaving as it does and attack the root of the problem rather than killing any threads that you take a dislike to. Of course, if the program in question is purely for your own private use then your approach may be perfectly expedient and pragmatic. But if you are writing professional software then I can't see a situation where killing busy threads sounds like a reasonable approach.
You cannot "limit CPU usage", not in Delphi nor in Windows itself, as far as I know.
You likely want something else: not to interfere with user actions or with other threads. But if there's nothing going on and user aren't doing anything, why run slower than you could? Just use the 100% of the CPU, nobody needs it!
So, if you need those threads not to interfere with user actions, just set them to lower priority with Windows function SetThreadPriority. They'll only run when user doesn't need processor power.
Another trick to give more chance for other threads to run, call Sleep(0) from time to time in your thread body. Every time you call Sleep(), you ask OS to switch to another thread, simply speaking.
I track a rolling CPU usage per thread for every thread in all my applications using some code in my framework (http://www.csinnovations.com/framework/framework.htm). A log output looks like:
15/01/2011 11:17:59.631,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,Memory Check,Verbose,Globals,"System allocated memory = 8282615808 bytes (change since last check = 4872478720 bytes)"
15/01/2011 11:17:59.632,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,Memory Check,Verbose,Globals,"Process allocated memory = 152580096 bytes (change since last check = -4579328 bytes)"
15/01/2011 11:17:59.633,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"System CPU usage = 15.6 % (average over lifetime = 3.0 %)"
15/01/2011 11:17:59.634,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Process CPU usage = 0.5 % (average over lifetime = 0.7 %)"
15/01/2011 11:17:59.634,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.634,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.634,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.635,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.1 % (average over lifetime = 0.1 %)"
15/01/2011 11:17:59.635,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.635,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.3 % (average over lifetime = 0.5 %)"
15/01/2011 11:17:59.635,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.635,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.636,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.0 % (average over lifetime = 0.0 %)"
15/01/2011 11:17:59.636,Misha,MISHA-DCDEL,Scores Client,V0.2.0.1,Main Thread,CPU Check,Verbose,Globals,"Thread CPU usage = 0.1 % (average over lifetime = 0.1 %)"
The time period is configurable, and I tend to use either 10 seconds, a minute, or 10 minutes. Have a look in the CsiSystemUnt.pas and AppGlobalsUnt.pas files to see how it is done.
Cheers, Misha
PS I also check memory usage as well.