grep never exiting when it should be searching through a small tempfile - linux

I have a problem with mktemp and grep.
OUT=$(mktemp /tmp/output.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; }
awk -F: '{print $7}' /etc/passwd >> $OUT
grep -c $1 $OUT
In grep line, code not exits, not prints value of grep
Please, help me to solve that problem.

BlackPearl above probably is correct - $1 likely is empty during your program execution. As a result, the grep command looks like this: grep -c $OUT which tells grep to look for $OUT in the stdin. stdin is the keyboard (I suspect) so grep will wait forever (well, until you press Ctrl-D or Ctrl-C).
To solve your problem, specify a parameter when you execute your script.
You can also avoid the problem entirely by counting all of the unique values in your passwd file like this:
OUT=$(mktemp /tmp/output.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; }
awk -F: '{print $7}' /etc/passwd >> "$OUT"
sort "$OUT" | uniq -c # count each unique value in passwd file column 7

Related

Bash function with input fails awk command

I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv

Need to reduce the execution time

We are trying to execute below script for finding out the occurrence of a particular word in a log file
Need suggestions to optimize the script.
Test.log size - Approx to 500 to 600 MB
$wc -l Test.log
16609852 Test.log
po_numbers - 11 to 12k po's to search
$more po_numbers
xxx1335
AB1085
SSS6205
UY3347
OP9111
....and so on
Current Execution Time - 2.45 hrs
while IFS= read -r po
do
check=$(grep -c "PO_NUMBER=$po" Test.log)
echo $po "-->" $check >>list3
if [ "$check" = "0" ]
then
echo $po >>po_to_server
#else break
fi
done < po_numbers
You are reading your big file too many times when you execute
grep -c "PO_NUMBER=$po" Test.log
You can try to split your big file into smaller ones or write your patterns to a file and make grep use it
echo -e "PO_NUMBER=$po\n" >> patterns.txt
then
grep -f patterns.txt Test.log
$ grep -Fwf <(sed 's/.*/PO_NUMBER=&/' po_numbers) Test.log
create the lookup file from po_numbers (process substitution) check for literal word matches from the log file. This assumes the searched PO_NUMBER=xxx is a separate word, if not remove -w, also assumes there is no regex but just literal matches, if not remove -F, however both will slow down searches.
Using Grep :
sed -e 's|^|PO_NUMBER=|' po_numbers | grep -o -F -f - Test.log | sed -e 's|^PO_NUMBER=||' | sort | uniq -c > list3
grep -o -F -f po_numbers list3 | grep -v -o -F -f - po_numbers > po_to_server
Using awk :
This awk program might work faster
awk '(NR==FNR){ po[$0]=0; next }
{ for(key in po) {
str=$0
po[key]+=gsub("PO_NUMBER="key,"",str)
}
}
END {
for(key in po) {
if (po[key]==0) {print key >> "po_to_server" }
else {print key"-->"po[key] >> "list3" }
}
}' po_numbers Test.log
This does the following :
The first line loads the po keys from the file po_numbers
The second awk parser, will pars the file for occurences of PO_NUMBER=key per line. (gsub is a function which performs a substitutation and returns the substitution count)
In the end we print out the requested output to the requested files.
The assumption here is that is might be possible that multiple patterns could occure multiple times on a single line of Test.log
Comment: the original order of po_numbers will not be satisfied.
"finding out the occurrence"
Not sure if you mean to count the number of occurrences for each searched word or to output the lines in the log that contain at least one of the searched words. This is how you could solve it in the latter case:
(cat po_numbers; echo GO; cat Test.log) | \
perl -nle'$r?/$r/&&print:/GO/?($r=qr/#{[join"|",#s]}/):push#s,$_'

Linux usernames /etc/passwd listing

I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'
Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'
All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.
The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.
While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)
If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'

Reformat with awk and sed from STDIN and execute

This is just an example of what I run into a lot:
I would like to copy all .bash_histories to one directory.
grep "/bin/bash" /etc/passwd | awk -F: '{ print "cp " $6"/.bash_history /backup" $6 ".bash_history" }
Output:
cp /home/peter/.bash_history /backup/home/peter/.bash_history
cp /home/john/.bash_history /backup/home/john/.bash_history
What I would like is an output like this:
cp /home/peter/.bash_history /backup/_home_peter_.bash_history
cp /home/john/.bash_history /backup/_home_john_.bash_history
And that this output will be executed.
(It's not specifically about this issue, but just in general how to reformat with awk and sed and execute the new created command line, without really creating a script for it)
The awk script to obtain a similar output will be
grep "/bin/bash" /etc/passwd |head -2 | awk -F: '{ print "cp " $6 "/.bash_history backup/_home_"$1".bash_history" }'
giving an output like
cp /root/.bash_history backup/_home_root.bash_history
cp /home/xxx/.bash_history backup/_home_xxx.bash_history
Now inorder to excecute the commands, the system() function within the awk would be helpfull
system(command) would excecute any command, and return value being the exit status of the command.
The above script can be modified as
grep "/bin/bash" /etc/passwd |head -2 | awk -F: '{ system("cp " $6 "/.bash_history backup/_home_"$1".bash_history;") }'
Test run:
$ grep "/bin/bash" /etc/passwd |head -2 | awk -F: '{ system("cp " $6 "/.bash_history backup/_home_"$1".bash_history;") }'
$ ls backup/
_home_xxx.bash_history _home_root.bash_history
PS: It is not recommend to create directories in your root folder. So i intentionally replaced /backup in your script to backup.
Also inorder for the script to be successful, the backup folder must be created before hand.
getent passwd | grep \/bin\/bash | cut -d ":" -f 6 | while read a; do eval "cp $a/.bash_history /backup/$(echo $a | sed 's#/#_#g')_.bash_history"; done
This uses getent to fetch the passwd file and cut gets the 6th field like your awk statement did, then it reads each entry line by line and builds the string and executes it with eval.
getent passwd | grep \/bin\/bash | cut -d ":" -f 6 | while read a; do eval "cp $a/.bash_history /backup/$(echo $a | sed 's#/#_#g')_.bash_history"; done
Worked perfectly! Issue solved!

storing awk results to bash ${} data format

i got a question about bash and awk, so, i'm going to filter the processes list from ps aux then i did filtering by grep, then i filter it again using awk to display only the pid and the path of the process, then save it to the corresponding ${pid} and ${path} field. the question is, when i done filtering the results using awk, i'm going to save those results on ${pid} for pid numbers and ${path} for process's path, but i got no idea at all on doing that thing. if anyone here have a solution it would be very appreciated.
Thanks
EDIT: here is the code
ps aux | grep 'firefox' | awk '{print $2 " " $11}'
then i don't know what to do to save the $2 content to ${pid} and $11 to ${path} and save those fields to the txt file again...
This should do it:
read pid path <<< $(ps aux | grep 'firefox' | awk '{print $2 " " $11}')
If all else fail remember that you have the disk that you can use. Pipe the result to a file and then process the file using awk or cut two times to assign the fields to the corresponding variables.
For example:
$result > tmp
pid=`cat tmp | cut -f 1`
path=`cat tmp | cut -f 2`
rm tmp
use set :
out=$(ps aux | grep 'nginx' | awk '{print $2 " " $11}' )
set $out
pid=$1
path=$2
echo $pid
echo $path
awk_output=($(ps aux | awk '/firefox/{print $2, $11}'))
pid="${awk_output[0]}"
path="${awk_output[1]}"
If your paths can contain spaces you'll need a different solution involving setting IFS.

Resources