How to trace the list of PIDs running on a specific core? - linux

I'm trying to run a program on a dedicated core in Linux. (I know Jailhouse is a good way to do so, but I have to use off-the-shelf Linux. :-( )
Other processes, such as interrupt handlers, kernel threads, service progresses, may also run on the dedicated core occasionally. I want to disable as many such processes as possible. To do that, I need first pin down the list of processes that may run on the dedicated core.
My question is:
Is there any existing tools that I can use to trace the list of PIDs or processes that run on a specific core over a time interval?
Thank you very much for your time and help in this question!

TL;DR Dirty hacky solution.
DISCLAIMER: At some point stops working "column: line too long" :-/
Copy this to: core-pids.sh
#!/bin/bash
TARGET_CPU=0
touch lastPIDs
touch CPU_PIDs
while true; do
ps ax -o cpuid,pid | tail -n +2 | sort | xargs -n 2 | grep -E "^$TARGET_CPU" | awk '{print $2}' > lastPIDs
for i in {1..100}; do printf "#\n" >> lastPIDs; done
cp CPU_PIDs aux
paste lastPIDs aux > CPU_PIDs
column -t CPU_PIDs > CPU_PIDs.humanfriendly.tsv
sleep 1
done
Then
chmod +x core-pids.sh
./core-pids.sh
Then open CPU_PIDs.humanfriendly.tsv with your favorite editor, and ¡inspect!
The key is in the "ps -o cpuid,pid" bit, for more detailed info, please comment. :D
Explanation
Infinite loop with
ps -o cpuid,pid | tail -n +2 | sort | xargs -n 2 | grep -E "^$TARGET_CPU" | awk '{print $2}' > lastPIDs
ps ax -o cpuid,pid
Show pid's associated to CPU
tail -n +2
remove headers
sort
sort by cpuid
xargs -n 2
remove white spaces at begging
grep -E "^$TARGET_CPU"
filter by CPU id
awk '{print $2}'
get pid column
> lastPIDs
output to file those las pid's for the target CPU id
for i in {1..10}; do printf "#\n" >> lastPIDs; done
hack for pretty .tsv print with the "columns -t" command
cp CPU_PIDs aux
CPU_PIDs holds the whole timeline, we copy it to aux file to allow the next command to use it as input and output
paste lastPIDs aux > CPU_PIDs
Append lastPIDs columns to the whole timeline file CPU_PIDs
column -t CPU_PIDs > CPU_PIDs.humanfriendly.tsv
pretty print whole timeline CPU_PIDs file
Attribution
stackoverflow answer to: ps utility in linux (procps), how to check which CPU is used
by Mikel
stackoverflow answer to: Echo newline in Bash prints literal \n
by sth
stackoverflow answer to: shell variable in a grep regex
by David W.
superuser answer to: Aligning columns in output from a UNIX command
Janne Pikkarainen
nixCraft article: HowTo: Unix For Loop 1 to 100 Numbers

The best way to obtain what you want is to operate as follows:
Use the isolcpus= Linux kernel boot parameter to "free" one core from the Linux scheduler
Disable the irqbalance daemon (in case it is executing)
Set the IRQs affinities to the other cores by manually writing the CPU mask on /proc/irq/<irq_number>/smp_affinity
Finally, run your program setting the affinity to the dedicated core through the taskset command.
In this case, such core will only execute your program. For checking, you can type ps -eLF and look at the PSR column (which specifies the CPU number).

Not a direct answer to the question, but I am usually using perf context-switches software event to identify the perturbation of the system or other processes on my benchmarks

Related

Optimizing search in linux

I have a huge log file close to 3GB in size.
My task is to generate some reporting based on # of times something is being logged.
I need to find the number of time StringA , StringB , StringC is being called separately.
What I am doing right now is:
grep "StringA" server.log | wc -l
grep "StringB" server.log | wc -l
grep "StringC" server.log | wc -l
This is a long process and my script takes close to 10 minutes to complete. What I want to know is that whether this can be optimized or not ? Is is possible to run one grep command and find out the number of time StringA, StringB and StringC has been called individually ?
You can use grep -c instead of wc -l:
grep -c "StringA" server.log
grep can't report count of individual strings. You can use awk:
out=$(awk '/StringA/{a++;} /StringB/{b++;} /StringC/{c++;} END{print a, b, c}' server.log)
Then you can extract each count with a simple bash array:
arr=($out)
echo "StringA="${arr[0]}
echo "StringA="${arr[1]}
echo "StringA="${arr[2]}
This (grep without wc) is certainly going to be faster and possibly awk solution is also faster. But I haven't measured any.
Certainly this approach could be optimized since grep doesn't perform any text indexing. I would use a text indexing engine like one of those from this review or this stackexchange QA . Also you may consider using journald from systemd which stores logs in a structured and indexed format so lookups are more effective.
So many greps so little time... :-)
According to David Lyness, a straight grep search is about 7 times as fast as an awk in large file searches.
If that is the case, the current approach could be optimized by changing grep to fgrep, but only if the patterns being searched for are not regular expressions. fgrep is optimized for fixed patterns.
If the number of instances is relatively small compared to the original log file entries, it may be an improvement to use the egrep version of grep to create a temporary file filled with all three instances:
egrep "StringA|StringB|StringC" server.log > tmp.log
grep "StringA" tmp.log | wc -c
grep "StringB" tmp.log | wc -c
grep "StringC" tmp.log | wc -c
The egrep variant of grep allows for a | (vertical bar/pipe) character to be used between two or more separate search strings so that you can find multiple strings in statement. You can use grep -E to do the same thing.
Full documentation is in the man grep page and information about the Extended Regular Expressions that egrep uses from the man 7 re_format command.

top: counting the number of processes belonging to a user

Is there way of counting the number of processes being run by a user in the unix/linux/os x terminal?
For instance, top -u taha lists my processes. I want to be able to count these.
This will show all of the users with their counts (I believe this would be close enough for you. :)
ps -u "$(echo $(w -h | cut -d ' ' -f1 | sort -u))" o user= | sort | uniq -c | sort -rn
You can use ps to output it and count the number using wc, as:
ps -u user | sed 1d | wc -l
You can also dump top output and grep it, something like:
top -u user -n1 | grep user | wc -l
I'm somewhat new to *nix, so perhaps I did not fully understand the context of your question, but here is a possible solution:
jobs | wc -l
The output of the above command is a count of all the processes reported by the jobs command. You can manipulate the parameters of the jobs command to change which processes get reported.
EDIT: Just FYI, this would only work if interested in commands originating from a particular shell. If you want more control in looking at system-wide processes you probably want to use ps as others have suggested. However, if you use wc to do your counting, make sure you take into account any extraneous white space jobs, ps or top may have generated as that will affect the output of wc.

Get pid of last started instance of a certain process

I have several instances of a certain process running and I want to determine the process id of the one that has been started last.
So far I came to this code:
ps -aef | grep myProcess | grep -v grep | awk -F" " '{print $2}' |
while read line; do
echo $line
done
This gets me all process ids of myProcess. Somehow I need to compare now the running times of this pids and find out the one with the smallest running time. But I don't know how to do that...
An easier way would be to use pgrep with its -n, --newest switch.
Select only the newest (most recently started) of the matching
processes.
Alternatively, if you don't want to use pgrep, you can use ps and sort by start time:
ps -ef kbsdstart
Use pgrep. It has a -n (newest) option for that. So just try
pgrep -n myProcess

linux cpu usage

I am working on unix.
I want to knwo the current cpu usage of a process.
I understood that ps give the average of cpu used till the process is up - it is not the current usage.
Is there a way to print only the cpu from the top command without 10 more parameters and
headers? I know how to do it with awk - this is not the way i want to do it.
top -p 20705 -bc -n 1 | tail -n 2 | awk '{ print $9}' | head -n 1
If there is another simple way to do it, not reading /proc/stat...
If there is a simple way doing it from c++, it is also ok.
Most likely, you will need to read /proc/stat, However, here is an interesting article with C code that may help you out. To understand and use the output from the program you should do man 5 proc. And here is the source code.
The bottom line is that you will need to read from /proc/stat to do what you want.
to see cpu usage of a proccess whose pid is 24556
ps -p 24556 -o \%cpu=
to see mem usage of a process named syslogd
ps -C syslogd -o \%mem=

How do I grep multiple lines (output from another command) at the same time?

I have a Linux driver running in the background that is able to return the current system data/stats. I view the data by running a console utility (let's call it dump-data) in a console. All data is dumped every time I run dump-data. The output of the utility is like below
Output:
- A=reading1
- B=reading2
- C=reading3
- D=reading4
- E=reading5
...
- variableX=readingX
...
The list of readings returned by the utility can be really long. Depending on the scenario, certain readings would be useful while everything else would be useless.
I need a way to grep only the useful readings whose names might have have nothing in common (via a bash script). I.e. Sometimes I'll need to collect A,D,E; and other times I'll need C,D,E.
I'm attempting to graph the readings over time to look for trends, so I can't run something like this:
# forgive my pseudocode
Loop
dump-data | grep A
dump-data | grep D
dump-data | grep E
End Loop
to collect A,D,E as that would actually give me readings from 3 separate calls of dump-data as that would not be accurate.
If you want to save all result of grep in the same file, you can just join all expressions in one:
grep -E 'expr1|expr2|expr3'
But if you want to have results (for expr1, expr2 and expr3) in separate files, things are getting more interesting.
You can do this using tee >(command).
For example, here I process the same pipe with thre different commands:
$ echo abc | tee >(sed s/a/_a_/ > file1) | tee >(sed s/b/_b_/ > file2) | sed s/c/_c_/ > file3
$ grep "" file[123]
file1:_a_bc
file2:a_b_c
file3:ab_c_
But the command seems to be too complex.
I would better save dump-data results to a file and then grep it.
TEMP=$(mktemp /tmp/dump-data-XXXXXXXX)
dump-data > ${TEMP}
grep A ${TEMP}
grep B ${TEMP}
grep C ${TEMP}
You can use dump-data | grep -E "A|D|E". Note the -E option of grep. Alternatively you could use egrep without the -E option.
you can simply use:
dump-data | grep -E 'A|D|E'
awk '/MY PATTERN/{print > "matches-"FILENAME;}' myfile{1,3}
thx Guru at Stack Exchange

Resources