I have a Linux command line to display a certain result. From the result, I would like to grep it or use awk to pull out a specific value. This is the result of the command line:
OK 0 seconds over max, 0 active processes, next process in the future|'overmax'=0s;300;600 'active'=0 'nextoldest'=-1153s
All I want is to display on the value after 'active'=? which would be in this instance 0
try this grep line:
grep -Po "'active'=\K\S*"
You could use egrep like this :
egrep -o "'active'=[^\ ]+"
Example :
[ ~]$ str="OK 0 seconds over max, 0 active processes, next process in the future|'overmax'=0s;300;600 'active'=0 'nextoldest'=-1153s"
[ ~]$ echo $str|egrep -o "'active'=[^\ ]+"
[ ~]$
If you're sure that the right value is numeric, you can further restrict the pattern like this :
egrep -o "'active'=[0-9]+"
You could use sed too :
[ ~]$ echo $str|sed "s/.*\('active'=[^\ ]*\).*/\1/g"
And if you want to get only the right value :
[ ~]$ echo $str|sed "s/.*'active'=\([^\ ]*\).*/\1/g"
You could use awk too but I think that is not the better solution in this case. But just for fun :
[ ~]$ echo $str|awk -F " " '{for (i=1; i <= NF; i++){if($i ~ ".?active.?="){print $i}}}'
N.B : egrep is equivalent to "-E" option of grep.
Another gnu awk
awk '{print gensub(/.*active.=(.*) .*/,"\\1","g")}' file
awk '{ match($0,/'active'=(.*) /,a); print a[1]}' file
I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}' "${my_file}"
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv
I need to search for files that contain 2 or more occurrences of a specific word (in my case NORMAL), so from files like the following:
the NORMAL things are [
- case
- case 2
a NORMAL is like [
- case 3
- case 4
the NORMAL things are [
- case
- case 2
a DIFFERENT is like [
- case 3
- case 4
the NORMAL things are [
- case
- case 2
it will find file1.txt only.
I have tried with a simple grep:
grep -Ri "*NORMAL*NORMAL*" .
but it does not work.
If you do not wish to search recursively:
grep -lzE '(NORMAL).*\1' files*
If you do wish to search recursively:
grep -rlzE '(NORMAL).*\1' .
This command is checking recursively in the current directory, for the file which contains NORMAL followed by NORMAL(\1) in the file. Meaning it will match 2 or more matches. This is only printing filename, remove -l to print the content + filename.
-l : This would only print the file name if matched by grep
-z: a data line ends in 0 byte, not a newline
-E: use extended regular expression
-r: recursive
Use grep -c to print counts of matches:
grep -c 'PATTERN' file(s)
In you case, you also need a second grep -Pv or something similar to filter by the number of matches (here, exclude files with 0 or 1 matches):
grep -c 'NORMAL' files | grep -P -v ':[01]$'
:[01] : : followed by either 0 or 1 (selects 0 or 1 matches).
$ : end of the line.
Suppress normal output; instead print a count of matching lines for each input file. With the -v (--invert-match) option, count
non-matching lines. (-c is specified by POSIX.)
Interpret patterns as Perl-compatible regular expressions (PCREs).
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
(from GNU Grep 3.4)
fgrep -c 'NORMAL' file* | awk -F ":" ' $NF >= 2 { print }'
If you want to print search results as well then
fgrep -c 'NORMAL' file* | awk -F ":" ' $NF >= 2 { print }' | grep NORMAL $(awk -F ":" '{ print $1}')
the NORMAL things are [
a NORMAL is like [
Use grep -o option:
grep -o NORMAL file1.txt | wc -l --> 2
grep -o NORMAL file2.txt | wc -l --> 1
grep -o NORMAL file3.txt | wc -l --> 1
Now this can be used with if claus:
if [ `grep -o NORMAL file1.txt | wc -l` -ge 2 ];then echo "Count is greater than or equal to 2";else echo "Count is less than 2";fi
I want to find out processes running more than 3 hrs, I have written a command for this but it's not returning expected output
ps -u <user> -o pid,stime,pcpu,pmem,etime,cmd --sort=start_time | \
grep <searchString> | grep -v grep| awk '{print $5}' | \
sed 's/:|-/ /g;'| awk '{print $4" "$3" "$2" "$1"}' | \
awk '$1+$2*60+$3*3600+$4*86400 > 10800'
but it's printing the values of etime in output. But expected output is, command should print the values of "pid,stime,pcpu,pmem,etime,cmd"
I am not able to find exact issue with this.
You are executing "awk '{print $5}'" which is taking in the input and printing out only column 5 which in your case is "etime" , everything from this point on is lost.
If your system supports etimes (notice the s on the end), you can easily do this with
ps -eo pid,etimes,etime,comm,user,tty | awk '{if ( $2>10800) print $0}'
on a system not supporting etimes which has a standard output of etime which hh:mm:ss or just mm:ss if no hours have passed
ps -eo pid,etime,comm,user,tty | awk '{seconds_old=10800 ; split($2,a,":",sep) ; if(length(a) < 3) b = (a[1] *60) + (a[2]) ; else b=((a[1]*3600) + (a[2] *60) + (a[3])) ; if(b > seconds_old ) print $0}'
Adjust "seconds_old" to change the age you want to test for:
There are various other methods of doing this using Find for example:
explained here:
However, the solution should match your expected output
Try this:
ps -u <user> -o pid,stime,pcpu,pmem,etime=,cmd --sort=start_time|grep <searchString>|while read z;do tago=$(echo $z|awk '{print $5}'|sed -E 's/(:|-)/ /g'| awk '{print $4+$3*60+$2*3600+$1*86400}');if [ $tago -ge 10800 ];then echo $z;fi;done
It prints only processes >= 10800 secs old.
You can readjust the output further to fit your needs.
Able to find running process for more than 3 hrs with below command.
ps -u <user> -o pid,stime,pcpu,pmem,etime,cmd --sort=start_time |grep -v grep|awk 'substr($0,23,2) > 3'
We are trying to execute below script for finding out the occurrence of a particular word in a log file
Need suggestions to optimize the script.
Test.log size - Approx to 500 to 600 MB
$wc -l Test.log
16609852 Test.log
po_numbers - 11 to 12k po's to search
$more po_numbers
....and so on
Current Execution Time - 2.45 hrs
while IFS= read -r po
check=$(grep -c "PO_NUMBER=$po" Test.log)
echo $po "-->" $check >>list3
if [ "$check" = "0" ]
echo $po >>po_to_server
#else break
done < po_numbers
You are reading your big file too many times when you execute
grep -c "PO_NUMBER=$po" Test.log
You can try to split your big file into smaller ones or write your patterns to a file and make grep use it
echo -e "PO_NUMBER=$po\n" >> patterns.txt
grep -f patterns.txt Test.log
$ grep -Fwf <(sed 's/.*/PO_NUMBER=&/' po_numbers) Test.log
create the lookup file from po_numbers (process substitution) check for literal word matches from the log file. This assumes the searched PO_NUMBER=xxx is a separate word, if not remove -w, also assumes there is no regex but just literal matches, if not remove -F, however both will slow down searches.
Using Grep :
sed -e 's|^|PO_NUMBER=|' po_numbers | grep -o -F -f - Test.log | sed -e 's|^PO_NUMBER=||' | sort | uniq -c > list3
grep -o -F -f po_numbers list3 | grep -v -o -F -f - po_numbers > po_to_server
Using awk :
This awk program might work faster
awk '(NR==FNR){ po[$0]=0; next }
{ for(key in po) {
for(key in po) {
if (po[key]==0) {print key >> "po_to_server" }
else {print key"-->"po[key] >> "list3" }
}' po_numbers Test.log
This does the following :
The first line loads the po keys from the file po_numbers
The second awk parser, will pars the file for occurences of PO_NUMBER=key per line. (gsub is a function which performs a substitutation and returns the substitution count)
In the end we print out the requested output to the requested files.
The assumption here is that is might be possible that multiple patterns could occure multiple times on a single line of Test.log
Comment: the original order of po_numbers will not be satisfied.
"finding out the occurrence"
Not sure if you mean to count the number of occurrences for each searched word or to output the lines in the log that contain at least one of the searched words. This is how you could solve it in the latter case:
(cat po_numbers; echo GO; cat Test.log) | \
perl -nle'$r?/$r/&&print:/GO/?($r=qr/#{[join"|",#s]}/):push#s,$_'
I have the command:
ps -ef | grep kde | tr -s ' ' '#'
I`m getting output like this :
how can I get # symbol only for column separation using linux or smth else like awk ?
Use pgrep to get your PIDs instead of using ps. pgrep will eliminate the grep issue where one of the processes you discover is the grep doing your filtering.
You can also specify the output of the ps command itself using the -o or -O option. You can do this to get the fields you want, and eliminate the header.
You can also use the read command to parse your output. The only field you have with possible blank space is the last one -- the command and arguments.
ps -o uid= -o gid= -o tty= -o args= -p $(pgrep kde) | while read uid gid tty cmd
echo "UID = $uid PID = $pid TTY = $tty"
echo "Command = $cmd"
The while will split on whitespace except for the $cmd which will include all the leftover fields (i.e. the entire command with arguments).
The ps command differs from platform to platform, so read the manpage on ps.
Nasty but it works. Tweak the number 8 to suit the number of columns your variant of ps outputs.
ps -ef | awk -v OFS="" '{ for(i=1; i < 8; i++) printf("%s#",$i); for(i=8; i <= NF; i++) printf("%s ", $i); printf("\n")}'
If you mean process your output with '#' as a column/field separator, in awk you can use -F:
echo "user2131#1626#1584#0#15:50#?#00:00:00#/bin/sh#/usr/bin/startkdeere" | awk -F'#' -v OFS='\t' '{$1=$1;print $0}'
user2131 1626 1584 0 15:50 ? 00:00:00 /bin/sh /usr/bin/startkdeere