Piping grep to cut - linux

This line:
echo $(grep Uid /proc/1/status) | cut -d ' ' -f 2
Produces output:
0
This line:
grep Uid /proc/1/status | cut -d ' ' -f 2
Produces output:
Uid: 0 0 0 0
My goal was the first output. My question is, why the second command does not produce the output I expected. Why am I required to echo it?

One way to do this is to change the Output Field Separator or OFS variable in the bash shell
IFSOLD="$IFS" # IFS for Internal field separator
IFS=$'\t'
grep 'Uid' /proc/1/status | cut -f 2
0 # Your result
IFS="$IFSOLD"
or the easy way
grep 'Uid' /proc/1/status | cut -d $'\t' -f 2
Note : By the way tab is the default delim for cut as pointed out [ here ]

Use awk
awk '/Uid/ { print $2; }' /proc/1/status

You should almost never need to write something like echo $(...) - it's almost equivalent to calling ... directly. Try echo "$(...)" (which you should always use) instead, and you'll see it behaves like ....
The reason is because when the $() command substitution is invoked without quotes the resulting string is split by Bash into separate arguments before being passed to echo, and echo outputs each argument separated by a single space, regardless of the whitespace generated by the command substitution (in your case tabs).
As sjsam suggested, if you want to cut tab-delimited output, just specify tabs as the delimiter instead of spaces:
cut -d $'\t' -f 2

grep Uid /proc/1/status |sed -r “s/\s+/ /g” | awk ‘{print $3}’
Output
0

Related

Bash function with input fails awk command

I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.
You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt
You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.
First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.
This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

How to pass several shell variables to awk in shell script?

I have written a shell script to get any column.
The script is:
#!/bin/sh
awk '{print $c}' c=${1:-1}
So that i can call it as
ls -l | column 2
But how do i implement it for multiple columns?
Say, it i want something like :
ls -l | column 2 3
In this case, I wouldn't use awk at all:
columns() { tr -s '[:blank:]' ' ' | cut -d ' ' -f "$#"; }
This uses tr to squeeze all sequences of whitespace to a single space, then cut to extract the fields you're interested in.
ls -l | columns 1,5,9-
Note, you shouldn't parse the output of ls.
awk -v c="3,5" '
BEGIN{ split(c,a,/[^[:digit:]]+/) }
{ for(i=1;i in a;i++) printf "%s%s", (i==1?"":OFS), $(a[i]); print "" }
'
Use any non-digit(s) you like as the column number separator (e.g. comma or space).
ls -l | nawk -v r="3,5" 'BEGIN{split(r,a,",")}{for(i in a)printf $a[i]" ";print "\n"}'
Now you can simply change your r variable in shell script and pass it on.
or you can configure it in your shell script and use r=$your_list_var
What ever fields numbers are present in $your_list_var will be printed by awk command.
The example above print 3rd and 5th fields of ls -l output.

Shell script tokenizer

I'm writing a script that queries my JBoss server for some database related data. The thing that is returned after the query looks like this:
ConnectionCount=7
ConnectionCreatedCount=98
MaxConnectionsInUseCount=10
ConnectionDestroyedCount=91
AvailableConnectionCount=10
InUseConnectionCount=0
MaxSize=10
I would like to tokenize this data so the numbers on the right hand side are stored in a variable in the format 7,98,10,91,10,0,10. I tried to use IFS with the equals sign, but that still keeps the parameter names (only the equals signs are eliminated).
I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell):
$ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g')
$ echo $TAB
7,98,10,91,10,0,10
Or just use cut/tr:
F=($(cut -d'=' -f2 input | tr '\n' ' '))
You can do it with one sed command too:
sed -n 's/^.*=\(.*\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file
7,98,10,91,10,0,10
A simple cut without any pipes :
arr=( $(cut -d'=' -f2 file) )
Outut
printf '%s\n' "${arr[#]}"
7
98
10
91
10
0
10

Bash capturing output of awk into array

I am stuck on a little problem. I have a command which pipes output to awk but I want to capture the output of to an array one by one.
My example:
myarr=$(ps -u kdride | awk '{ print $1 }')
But that capture all my output into one giant string separated by commas:
output: PID 3856 5339 6483 10448 15313 15314 15315 15316 22348 29589 29593 32657 1
I also tried the following:
IFS=","
myarr=$(ps -u kdride | awk '{ print $1"," }')
But the output is: PID, 3856, 5339, 6483, 10448, 15293, 15294, 15295, 15296, 22348, 29589, 29593, 32657,
1
I want to be able to capture each individual pid into its own array element. Setting IFS = '\n' does not do anything and retains my original output. What change do I need to do to make this work?
Add additional parentheses, like this:
myarr=($(ps -u kdride | awk '{ print $1 }'))
# Now access elements of an array (change "1" to whatever you want)
echo ${myarr[1]}
# Or loop through every element in the array
for i in "${myarr[#]}"
do
:
echo $i
done
See also bash — Arrays.
Use Bash's builtin mapfile (or its synonym readarray)
mapfile -t -s 1 myarr < <(ps -u myusername | awk '{print $1}')
At least in GNU/Linux you can format output of ps, so no need for awk and -s 1
mapfile -t myarr < <(ps -u myusername -o pid=)

Resources