Using awk command in bash to extract a single field from a record - linux

I want to use awk to extract a single field from a list of records. For example,
Assignment1:/home/dir/:Admin:08-07-12
Assignment2:/home/dir/:Paul:09-22-13
I want to extract the 1st field from the second line, or the third field from the first line. Any ideas?

awk -F: 'NR == 2 { print $1 }'
awk -F: 'NR == 1 { print $3 }'
For the given line number, print the specified field.
It doesn't work
My apologies, I am using the Bourne shell and this does not work.
As noted in the comment, I don't believe that the Bourne shell has anything directly to do with the problem. (If $IFS is set to something odd, or some other peculiar setup applies, maybe that has an effect. But in a normal Bourne shell, there should be no problem.)
Here's what I get on my machine (an Ubuntu 14.04 derivative, but I'm confident I'd get the same result on Mac OS X 10.10.3, and pretty much any other Unix-like system, too). This is using Bash, but I don't think that's a factor — I'll go out on a limb and say that Korn shell, Zsh, Dash, and Heirloom Shell would all work the same on this; heck, it should work in the C shell family of shells too since there's nothing special about the notations used). It is using GNU awk too, but I don't think that's a factor either.
$ cat data
Assignment1:/home/dir/:Admin:08-07-12
Assignment2:/home/dir/:Paul:09-22-13
$ awk -F: 'NR == 2 { print $1 }' data
Assignment2
$ awk -F: 'NR == 1 { print $3 }' data
Admin
$
The output looks like record 2, field 1 and record 1, field 3 to me. Please find a way to demonstrate what you're doing and the result you get — probably, add something to the question. We can clean it up later when we've worked out what's going wrong with your setup. Please identify your platform reasonably clearly, too.

You can always use 'cut' from bash like so
cut -d ":" -f "1,3" < asdf
Assignment1:Admin
Assignment2:Paul

Related

Awk pattern always matches last record?

I'm in the process of switching from zsh to bash, and I need to produce a bash script that can remove duplicate entries in $PATH without reordering the entries (thus no sort -d magic). zsh has some nice array handling shortcuts that made it easy to do this efficiently, but I'm not aware of such shortcuts in bash. I came across this answer which has gotten me 90% of the way there, but there is a small problem that I would like to understand better. It appears that when I run that awk command, the last record processed incorrectly matches the pattern.
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:cc
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb"
aa:bb:cc:bb
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:" # note trailing colon
aa:bb:cc:
I don't understand awk well enough to know why it behaves this way, but I have managed to work around the issue by using an intermediate array like so.
array=($(awk 'BEGIN{RS=":";ORS=" "}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:"))
# Use a subshell to avoid modifying $IFS in current context
echo $(export IFS=":"; echo "${array[*]}")
aa:bb:cc
This seems like a sub-optimal solution however, so my question is: did I do something wrong in the awk command that is causing false positive matches on the final record processed?
The last record in your original string is cc\n which is different from cc. When unsure what's happening in any program in any language, adding some print statements is step 1 to debugging/investigating:
$ awk 'BEGIN{RS=ORS=":"} {print "<"$0">"}' <<<"aa:bb:cc:aa:bb:cc"
<aa>:<bb>:<cc>:<aa>:<bb>:<cc
>:$
If you want the RS to be : or \n then just state that (with GNU awk at least):
$ awk 'BEGIN{RS="[:\n]"; ORS=":"} !a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:$
The $ in all of the above is my prompt.
Another possible workaround instead of your bash array solution
$ echo "aa:bb:cc:aa:bb:cc" | tr ':' '\n' | awk '!a[$0]++' | paste -sd:
aa:bb:cc

awk output to variable [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 6 years ago.
[Dd])
echo"What is the record ID?"
read rID
numA= awk -f "%" '{print $1'}< practice.txt
I cannot figure out how to set numA = to the output of the awk in order to compare rID and numA. numA is equal to the first field of a txt file which is separated by %. Any suggestions?
You can capture the output of any command in a variable via command substitution:
numA=$(awk -F '%' '{print $1}' < practice.txt)
Unless your file contains only one line, however, the awk command you presented (as corrected above) is unlikely to be what you want to use. If the practice.txt file contains, say, answers to multiple questions, one per line, then you probably want to structure the script altogether differently.
You don't need to use awk, just use parameter expansion:
numA=${rID%%\%*}
this is the correct syntax.
numA=$(awk -F'%' '{print $1}' practice.txt)
however, it will be easier to do comparisons in awk by passing the bash variable in.
awk -F'%' -v r="$rID" '$1==r{... do something ...}' practice.txt
since you didn't specify any details it's difficult to suggest more...
to remove rID matching line from the file do this
awk -F'%' -v r="$rID" '$1!=r' practice.txt > output
will print the lines where the condition is met ($1 not equal to rID), equivalent to deleting the ones which are equal. You can mimic in place replacement by
awk ... practice.txt > temp && mv temp practice.txt
where you fill in ... from the line above.
Try using
$ numA=`awk -F'%' '{ if($1 != $0) { print $1; exit; }}' practice.txt`
From the question, "numA is equal to the first field of a txt file which is separated by %"
-F'%', meaning % is the only separator we care about
if($1 != $0), meaning ignore lines that don't have the separator
print $1; exit;, meaning exit after printing the first field that we encounter separated by %. Remove the exit if you don't want to stop after the first field.

renaming files using loop in unix

I have a situation here.
I have lot of files like below in linux
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaac
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaag
I want to remove the $line and make a counter from 0001 to 6000 for my 6000 such files in its place.
Also i want to remove the trailer 3 characters after this is done for each file.
After fix file should be like
SIPTV_FIPTV_ID0000001_T20141003195717_C0000001000_FWD148_IPV_001.DAT
SIPTV_FIPTV_ID0000002_T20141003195717_C0000001000_FWD148_IPV_001.DAT
Please help.
With some assumption, I think this should do it:
1. list of the files is in a file named input.txt, one file per line
2. the code is running in the directory the files are in
3. bash is available
awk '{i++;printf "mv \x27"$0"\x27 ";printf "\x27"substr($0,1,16);printf "%05d", i;print substr($0,22,47)"\x27"}' input.txt | bash
from the command prompt give the following command
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}'
%
and check the output, if it looks OK
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}' | sh
%
A commentary: echo *.DAT??? is meant to give as input to awk a list of all the filenames that you want to modify, you may want something more articulated if the example names you gave aren't representative of the whole spectrum... regarding the awk script itself, I used sprintf to generate a string with the correct number of zeroes for the replacement of $line, the idiom `"\\$..." with two backslashes to quote the dollar sign is required by gawk and does no harm in mawk, and as a last remark I have to say that in similar cases I prefer to make at least a dry run before passing the commands to the shell...

How to do something like grep -B to select only one line?

Everything is in the title. Basicaly let's say I have this pattern
some text lalala
another line
much funny wow grep
I grep funny and I want my output to be "lalala"
Thank you
One possible answer is to use either ed or ex to do this (it is trivial in them):
ed - yourfile <<< 'g/funny/.-2p'
(Or replace ed with ex. You might have red, the restricted editor, too; it can't modify files.) This looks for the pattern /funny/ globally, and whenever it is found, prints the line 2 before the matching line (that's the .-2p part). Or, if you want the most recent line containing 'lalala' before the line matching 'funny':
ed - yourfile <<< 'g/funny/?lalala?p'
The only problem is if you're trying to process standard input rather than a file; then you have to save the standard input to a file and process that file, which spoils the concurrency.
You can't do negative offsets in sed (though GNU sed allows you to do positive offsets, so you could use sed -n '/lalala/,+2p' file to get the 'lalala' to 'funny' lines (which isn't quite what you want) based on finding 'lalala', but you cannot find the 'lalala' lines based on finding 'funny'). Standard sed does not allow offsets at all.
If you need to print just the IP address found on a line 8 lines before the pattern-matching line, you need a slightly more involved ed script, but it is still doable:
ed - yourfile <<< 'g/funny/.-8s/.* //p'
This uses the same basic mechanism to find the right line, then runs a substitute command to remove everything up to the last space on the line and print the modified version. Since there isn't a w command, it doesn't actually modify the file.
Since grep -B only prints each full number of lines before the match, you'll have to pipe the output into something like grep or Awk.
grep -B 2 "funny" file|awk 'NR==1{print $NF; exit}'
You could also just use Awk.
awk -v s="funny" '/[[:space:]]lalala$/{n=NR+2; o=$NF}NR==n && $0~s{print o}' file
For the specific example of an IP address 8 lines before the match as mentioned in your comment:
awk -v s="funny" '
/[[:space:]][0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/ {
n=NR+8
ip=$NF
}
NR==n && $0~s {
print ip
}' file
These Awk solutions first find the output field you might want, then print the output only if the word you want exists in the nth following line.
Here's an attempt at a slightly generalized Awk solution. It maintains a circular queue of the last q lines and prints the line at the head of the queue when it sees a match.
#!/bin/sh
: ${q=8}
e=$1
shift
awk -v q="$q" -v e="$e" '{ m[(NR%q)+1] = $0 }
$0 ~ e { print m[((NR+1)%q)+1] }' "${#--}"
Adapting to a different default (I set it to 8) or proper option handling (currently, you'd run it like q=3 ./qgrep regex file) as well as remembering (and hence printing) the entire line should be easy enough.
(I also didn't bother to make it work correctly if you see a match in the first q-1 lines. It will just print an empty line then.)

Linux scripting: Search a specific column for a keyword

I have a large text file that contains multiple columns of data. I'm trying to write a script that accepts a column number and keyword from the command line and searches for any hits before displaying the entire row of any matches.
I've been trying something along the lines of:
grep $fileName | awk '{if ($'$columnNumber' == '$searchTerm') print $0;}'
But this doesn't work at all. Am I on the right lines? Thanks for any help!
The -v option can be used to pass shell variables to awk command.
The following may be what you're looking for:
awk -v s=$SEARCH -v c=$COLUMN '$c == s { print $0 }' file.txt
EDIT:
I am always trying to write more elegant and tighter code. So here's what Dennis means:
awk -v s="$search" -v c="$column" '$c == s { print $0 }' file.txt
Looks reasonable enough. Try using set -x to look at exactly what's being passed to awk. You can also use different and/or more awk things, including getting rid of the separate grep:
awk -v colnum=$columnNumber -v require="$searchTerm"
"/$fileName/ { if (\$colnum == require) print }"
which works by setting awk variables (colnum and require, in this case) and then using the literal string $colnum to get the desired field, and the variable require to get the required-string.
Note that in all cases (with or without the grep command), any regular expression meta-characters in $fileName will be meta-y, e.g., this.that will match the file named this.that but also the file named thisXthat.

Resources