How does the shell generate input for awk - linux

Say I have a file1 containing:
1,2,3,4
I can use awk to process that file like this;
awk -v FS="," '{print $1}' file1
Also I can invoke awk with a Here String, meaning I read from stdin:
awk -v FS="," '{print $1}' <<<"9,10,11,12"
Command 1 yields the result 1 and command 2 yields 9 as expected.
Now say I have a second file2:
4,5
If I parse both files with awk sequentally:
awk -v FS="," '{print $1}' file1 file2
I get:
1
4
as expected.
But if I'm mixing reading from stdin and reading from files, the content I'm reading from stdin gets ignored and only the content in the files get processed sequentially:
awk -v FS="," '{print $1}' file1 file2 <<<"9,10,11,12"
awk -v FS="," '{print $1}' file1 <<<"9,10,11,12" file2
awk -v FS="," '{print $1}' <<<"9,10,11,12" file1 file2
All three commands yield:
1
4
which means the content from stdin simply gets thrown away. Now what is the shell doing?
Interestingly if I change command 3 to:
awk -v FS="," '{print $1}' <<<"9,10,11,12",file1,file2
I simply get 9 , which makes sense, as file1/2 are just two more fields from stdin. But why is then
awk -v FS="," '{print $1}' <<<"9,10,11,12" file1 file2
not expanded to
awk -v FS="," '{print $1}' <<<"9,10,11,12 file1 file2"
which would also yield the result 9?
And why does the content from stdin gets ignored? The same question arises for command 1 and 2. What is the shell doing here?
I tried out the commands on: GNU bash, version 4.2.53(1)-release

Standard input and input from files don't mix together well. This behavior is not exclusive to awk, you will find it in a lot of command line applications. It is logical if you think of it like this:
Files need to be processed one by one. The consuming application does not have control over when the input behind STDIN starts and stops. Look at echo a,b,c | awk -F, '{print $1}' file1 file2. In what order do the incoming "files" need to be read? When If you think about when FNR would need to be reset, or what FILENAME should be, it becomes clear that it is hard to make this right.
One trick that you can play, is to let awk (or any other program) read from a file descriptor generated by the shell. awk -F, '{print $1}' file1 <(echo 4,5,6) file2 will do what you expected in the first place.
What happens here, is that a proper file descriptor is created with the <(...) syntax (say: /proc/self/fd/11), and the reading program can treat it just like a file. It is the second argument, so it is the second file. FNR and FILENAME are all clear what they should be.

Related

bash: awk print with in print

I need to grep some pattern and further i need to print some output within that. Currently I am using the below command which is working fine. But I like to eliminate using multiple pipe and want to use single awk command to achieve the same output. Is there a way to do it using awk?
root#Server1 # cat file
Jenny:Mon,Tue,Wed:Morning
David:Thu,Fri,Sat:Evening
root#Server1 # awk '/Jenny/ {print $0}' file | awk -F ":" '{ print $2 }' | awk -F "," '{ print $1 }'
Mon
I want to get this output using single awk command. Any help?
You can try something like:
awk -F: '/Jenny/ {split($2,a,","); print a[1]}' file
Try this
awk -F'[:,]+' '/Jenny/{print $2}' file.txt
It is using muliple -F value inside the [ ]
The + means one or more since it is treated as a regex.
For this particular job, I find grep to be slightly more robust.
Unless your company has a policy not to hire people named Eve.
(Try it out if you don't understand.)
grep -oP '^[^:]*Jenny[^:]*:\K[^,:]+' file
Or to do a whole-word match:
grep -oP '^[^:]*\bJenny\b[^:]*:\K[^,:]+' file
Or when you are confident that "Jenny" is the full name:
grep -oP '^Jenny:\K[^,:]+' file
Output:
Mon
Explanation:
The stuff up until \K speaks for itself: it selects the line(s) with the desired name.
[^,:]+ captures the day of week (in this case Mon).
\K cuts off everything preceding Mon.
-o cuts off anything following Mon.

Extraction version from a file

I have a file file1 which looks as below
version=7.2.3.cdead_rcd345
I am using the following command but it is not working
cat file1 | awk -F'=' '{print $2}
It is not giving the version number
Awk solution to extract version number in format <number>.<number>[.number]:
awk -F'[=_]' '{ sub(/\.[^.]*$/, "", $2); print $2 }' file1
7.2.3
If you need only 7.2.3 as answer, try sed:
sed -r 's/.*=(.*)\..*/\1/g' file1
Outpt:
7.2.3
You can use one of the following commands:
awk without the cat, since it is pointless here
command:
awk -F'=' '{print $2}' file1
output:
7.2.3.cdead_rcd345wq
or even better use grep directly:
command:
grep -Po '(?<=version=).*' file1
output:
7.2.3.cdead_rcd345wq
test:
Last but not least, if you need only the version number (7.2.3) than you can use the following command:
$ grep -Po '(?<=version=)\d\.\d\.\d' file1
7.2.3
Maybe you are missing one ' in the end of your command:
awk -F"=" '{print $2}' file1
Output:
7.2.3.cdead_rcd345wq
You can grep "version" in the file, split the line by "=", and get the second word. This method is very easy to understand.
grep "version" file1 | cut -d "=" -f 2

Add/Sub/Mul/Div a constant to a column in a csv file in linux shell scripting

I am trying to modify the contents of a particular column in a csv file by dividing a constant.
For Ex: If the contents are
1000,abc,0,1
2000,cde,2,3 and so on..
I would like to change it to
1,abc,0,1
2,cde,2,3
I went through all the previous solutions in this blog, and i tried this
awk -F\; '{$1=($1/1000)}1' file.csv > tmp.csv && mv tmp.csv file.csv
The above command opens up file.csv , performs $1/1000 and save it to a temporary file and then overwrites to the original file.
The problem i see is, in the final file.csv, The contents displayed are as follows
1
2
3
4 and so on ..
It doesn't copy all the other columns except column 1.
How can i fix this ?
Because your file is comma-separated, you need to specify a comma as the field separator on both input and output:
$ awk -F, '{$1=($1/1000)}1' OFS=, file.csv
1,abc,0,1
2,cde,2,3
-F, tells awk to use a comma as the field separator on input.
OFS=, tells awk to use a comma as the field separator on output.
Changing the file in-place
With a modern GNU awk:
awk -i inplace -F, '{$1=($1/1000)}1' OFS=, file.csv
With BSD/OSX or other non-GNU awk:
awk -F, '{$1=($1/1000)}1' OFS=, file.csv >tmp && mv tmp file.csv
Alternate style
Some stylists prefer OFS to be set before the code:
awk -F, -v OFS=, '{$1=($1/1000)}1' file.csv

cut or awk command to print first field of first row

I am trying print the first field of the first row of an output. Here is the case. I just need to print only SUSE from this output.
# cat /etc/*release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
Tried with cat /etc/*release | awk {'print $1}' but that print the first string of every row
SUSE
VERSION
PATCHLEVEL
Specify NR if you want to capture output from selected rows:
awk 'NR==1{print $1}' /etc/*release
An alternative (ugly) way of achieving the same would be:
awk '{print $1; exit}'
An efficient way of getting the first string from a specific line, say line 42, in the output would be:
awk 'NR==42{print $1; exit}'
Specify the Line Number using NR built-in variable.
awk 'NR==1{print $1}' /etc/*release
try this:
head -1 /etc/*release | awk '{print $1}'
df -h | head -4 | tail -1 | awk '{ print $2 }'
Change the numbers to tweak it to your liking.
Or use a while loop but thats probably a bad way to do it.
You could use the head instead of cat:
head -n1 /etc/*release | awk '{print $1}'
sed -n 1p /etc/*release |cut -d " " -f1
if tab delimited:
sed -n 1p /etc/*release |cut -f1
Try
sed 'NUMq;d' /etc/*release | awk {'print $1}'
where NUM is line number
ex. sed '1q;d' /etc/*release | awk {'print $1}'
awk, sed, pipe, that's heavy
set `cat /etc/*release`; echo $1
the most code-golfy way i could think of to print first line only in awk :
awk '_{exit}--_' # skip the quotations and make it just
# awk _{exit}--_
#
# if u're feeling adventurous
first pass through exit block, "_" is undefined,
so it fails and skips over for row 1.
then the decrementing of the same counter will make
it "TRUE" in awk's eyes (anything not empty string
or numeric zero is considered "true" in their agile boolean sense). that same counter also triggers default action of print for row 1.
—- incrementing… decrementing… it's same thing,
merely direction and sign inverted.
then finally, at start of row 2, it hits criteria to
enter the action block, which instructs it to instantly
exit, thus performing essentially the same functionality as
awk '{ print; exit }'
… in a slightly less verbose manner. For a single line print, it's not even worth it to set FS to skip the field splitting part.
using that concept to print just 1st row 1st field :
awk '_{exit} NF=++_'
awk '_++{exit} NF=_'
awk 'NR==1&&NF=1' file
grep -om1 '^[^ ]\+' file
# multiple files
awk 'FNR==1&&NF=1' file1 file2
You can kill the process which is running the container.
With this command you can list the processes related with the docker container:
ps -aux | grep $(docker ps -a | grep container-name | awk '{print $1}')
Now you have the process ids to kill with kill or kill -9.

How to run grep inside awk?

Suppose I have a file input.txt with few columns and few rows, the first column is the key, and a directory dir with files which contain some of these keys. I want to find all lines in the files in dir which contain these key words. At first I tried to run the command
cat input.txt | awk '{print $1}' | xargs grep dir
This doesn't work because it thinks the keys are paths on my file system. Next I tried something like
cat input.txt | awk '{system("grep -rn dir $1")}'
But this didn't work either, eventually I have to admit that even this doesn't work
cat input.txt | awk '{system("echo $1")}'
After I tried to use \ to escape the white space and the $ sign, I came here to ask for your advice, any ideas?
Of course I can do something like
for x in `cat input.txt` ; do grep -rn $x dir ; done
This is not good enough, because it takes two commands, but I want only one. This also shows why xargs doesn't work, the parameter is not the last argument
You don't need grep with awk, and you don't need cat to open files:
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' input.txt dir/*
Nor do you need xargs, or shell loops or anything else - just one simple awk command does it all.
If input.txt is not a file, then tweak the above to:
real_input_generating_command |
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' - dir/*
All it's doing is creating an array of keys from the first file (or input stream) and then looking for each key from that array in every file in the dir directory.
Try following
awk '{print $1}' input.txt | xargs -n 1 -I pattern grep -rn pattern dir
First thing you should do is research this.
Next ... you don't need to grep inside awk. That's completely redundant. It's like ... stuffing your turkey with .. a turkey.
Awk can process input and do "grep" like things itself, without the need to launch the grep command. But you don't even need to do this. Adapting your first example:
awk '{print $1}' input.txt | xargs -n 1 -I % grep % dir
This uses xargs' -I option to put xargs' input into a different place on the command line it runs. In FreeBSD or OSX, you would use a -J option instead.
But I prefer your for loop idea, converted into a while loop:
while read key junk; do grep -rn "$key" dir ; done < input.txt
Use process substitution to create a keyword "file" that you can pass to grep via the -f option:
grep -f <(awk '{print $1}' input.txt) dir/*
This will search each file in dir for lines containing keywords printed by the awk command. It's equivalent to
awk '{print $1}' input.txt > tmp.txt
grep -f tmp.txt dir/*
grep requires parameters in order: [what to search] [where to search]. You need to merge keys received from awk and pass them to grep using the \| regexp operator.
For example:
arturcz#szczaw:/tmp/s$ cat words.txt
foo
bar
fubar
foobaz
arturcz#szczaw:/tmp/s$ grep 'foo\|baz' words.txt
foo
foobaz
Finally, you will finish with:
grep `commands|to|prepare|a|keywords|list` directory
In case you still want to use grep inside awk, make sure $1, $2 etc are outside quote.
eg. this works perfectly
cat file_having_query | awk '{system("grep " $1 " file_to_be_greped")}'
// notice the space after grep and before file name

Resources