Extract & store Strings with uneven spaces using AWK

Extract & store Strings with uneven spaces using AWK - linux

I have a file contain data like below. I want to cut first and last Columns and store in variables. I am able to print it using command "awk -F" {2,}" '{print $1,$NF}' filename.txt " but I am unable to store it in variables using awk -v command.
The main problem is that first column contains space between words and awk is treating it 3 columns if I am using awk -v command.
Please suggest me how I can achieve this.
XML 2144 11270 2846 3385074
Java 7356 272651 242949 1350596
C++ 671 46497 42702 179366
C/C++ Header 671 16932 57837 44248
XSD 216 3131 807 27634
Korn Shell 129 3686 4279 12431
IDL 90 1098 0 8697
Perl 17 717 795 5698
Python 37 1102 786 4640
Ant 62 596 154 4015
XSLT 18 117 13 2153
make 14 414 1659 1833
Bourne Again Shell 32 532 469 1830
JavaScript 10 204 35 1160
CSS 5 95 45 735
SKILL 2 77 0 523
HTML 11 70 49 494
SQL 9 39 89 71
C Shell 3 13 25 31
D 1 5 15 10
SUM: 11498 359246 355554 5031239

The -v VAR=value parameter is evaluated before the awk code executes. It's not actually part of the code, so you can't reference fields because they don't exist yet. Instead, set the variable in code:
awk '{ Lang=$1; Last=$NF; print Lang, Last; }'
Also, setting those variables within awk won't affect bash's variables. Environments are hierarchical--each child environment inherits some state from the parent environment, but it never flows back upwards. The only way to get state from a child is for the child to print it in a format that the parent can handle. For example, you can pipe the above command to while read LANG LAST; do ...; done to read the awk output into variables.
It seems from your comment that you're trying to mix awk and shell in a way that doesn't quite make sense. So the correct full code (for getting the variables in a bash loop) would be:
cat loc.txt | awk '{ Lang=$1; Last=$NF; print Lang, Last; }' | while read LANG LAST; do ...; done
Or if it's a fixed number of fields, you can skip awk entirely:
cat loc.txt | while read LANG _ _ _ _ LAST; do ...; done
where the "_" just represents a variable which is created and ignored. It's a bit of a convention that underscores represent placeholders in some programming languages, and in this case it's actually a variable which could be printed with echo $_. You'd give it a real name, and name each field differently, if you cared about the middle values.
Neither of these solutions cares about how much whitespace there is. Awk doesn't care unless you tell it to, and neither does the shell.

Related

Separating data into files based on random ID numbers using shell script

I have a tab delimited file(1993_NYA.tab) with radiosonde data containing its' ID. I want to extract data of each ID into separate tab files. The file looks like this.
1993-01-01T10:45:03 083022143 250 78.93018 11.95426 960.0 -16.8 76 1.7 276
1993-01-01T10:45:16 083022143 300 78.93011 11.95529 953.7 -17.2 77 1.8 288
1993-01-01T10:45:30 083022143 350 78.93000 11.95638 947.3 -17.6 79 2.0 297
Here 083022143 is the ID but it changes randomly(not in ascending order). The code I tried is as follows.
ID=$(cat 1993_NYA.tab | cut -f 2 | sort | uniq)
for i in {$ID}
do
awk -F '\t' '$2 = "$i"' 1993_NYA.tab > 1993_$i.tab
done
This is not storing data of a particular ID into filename containing the same ID. Can anyone please help.

There are three small mistakes.
{$ID} should be just $ID.
'$2 = "$i"' has the assignment operator = rather than the comparison ==.
'$2 = "$i"' does not interpolate the value of i into the argument because of the single quotes; we can write e. g. "\$2 == $i" instead.
With the above changes, your code works. But awk has built-in magic which makes the task much easier. The single command
awk -F '\t' '{print >"1993_"$2".tab"}' 1993_NYA.tab
does what you want.

How do I swap two strings using the sed command with the delimiter being a comma?

this is my first time posting here so I apologize if the formatting is all weird.
Using the sed command, I want to swap x and y, from x,y into y,x and store the swapped version into a separate file.
The file I am supposed to modify contains,
#######bank info########
#
####name#### ####age####
#
Bob,Stevenson 27
David,Tan 43
Robert,Jackson 39
I want to change the name from firstname,lastname to lastname,firstname.
I have already tried using the command,
sed -e "s/^\([^#]*\) *\([,]*\)/\2\1/g" file > xxx
Yet when I examine the "swapped" file it looks like nothing has changed. Why?
The expected output is:
#######bank info########
#
####name#### ####age####
#
Stevenson,Bob 27
Tan,David 43
Jackson,Robert 39
But my output is:
#######bank info########
#
####name#### ####age####
#
Bob,Stevenson 27
David,Tan 43
Robert,Jackson 39
Which is exactly the same as the initial version. Why does this happen?

sed '
# only lines that do not start with a #
/^#/!{
# match from the start
# match up until a comma
# then match the comma
# then match up until a space
# swap the part before the comma and after comma
s/^\([^,]*\),\([^ ]*\)/\2,\1/
}
'
And a bit shorter verison:
sed '/^#/!s/^\([^,]*\),\([^ ]*\)/\2,\1/'

Same approach as the very good answer by #KamilCuk, but using extended regex and word characters, you could do
sed -E '/,/s/(\w+),(\w+)/\2,\1/' file > newfile
Where /,/ only operates on lines with a ',' and the capture groups of (\w+) match groups of one or more word-characters. The (\w+),(\w+) expression ensure there are two groups of word-characters separated by a comma, and then the \2,\1 simply reinsert the back-references in reversed order, e.g.
Example Use/Output
$ sed -E '/,/s/(\w+),(\w+)/\2,\1/' file
#######bank info########
#
####name#### ####age####
#
Stevenson,Bob 27
Tan,David 43
Jackson,Robert 39

If you can use awk, this will do.
awk -F'[ ,]+' '/,/{$0=$2","$1"\t"$3}1' file
#######bank info########
#
####name#### ####age####
#
Stevenson,Bob 27
Tan,David 43
Jackson,Robert 39
It may be some easier to understand the awk in compare to sed :)
-F'[ ,]+' Split line by one or more comma or space
/,/ for lines with comma do:
$0=$2","$1"\t"$3 reconstruct the line in new order with comma and tab
1 always true, do the default action, print the line.

How to make a loop with multiple columns in shell?

I have a file with three columns (ID number, x, y)
ifile.txt
1 32.2 21.4
4 33.2 43.5
5 21.3 45.6
12 22.3 32.5
32 21.5 56.3
43 33.4 23.4
44 23.3 22.3
55 22.5 32.4
I would like to make a loop over column 2 and 3 so that is will read like
for x=32.2 and y=21.4; do execute a fortran program
for x=33.2 and y=43.5; do execute the same program
and so on
Though my following script is working, but I need it in an efficient way.
s1=1 #serial number
s2=$(wc -l < ifile.txt) #total number to be loop
while [ $s1 -le $s2 ]
do
x=$(awk 'NR=='$s1' {print $2}' ifile.txt)
y=$(awk 'NR=='$s1' {print $3}' ifile.txt)
cat << EOF > myprog.f
...
take value of x and y
...
EOF
ifort myprog.f
./a.out
(( s1++ ))
done
Kindly Note: myprog.f is written within a cat program. for example,
cat << EOF > myprog.f
....
....
take value of x and y
....
....
EOF

Simple way to read a file in bash is
while read -r _ x y; do
echo "x is $x, y is $y"
# your Fortran code execution
done < ifile.txt
x is 32.2, y is 21.4
x is 33.2, y is 43.5
x is 21.3, y is 45.6
x is 22.3, y is 32.5
x is 21.5, y is 56.3
x is 33.4, y is 23.4
x is 23.3, y is 22.3
x is 22.5, y is 32.4

It looks like you're trying to create Fortran source code in each loop iteration with the loop variables baked into the source code, compiling it, and then invoking it, which is quite inefficient.
Instead, you should create a Fortan program once, and have it accept arguments.
(I don't know Fortran, and you haven't stated a specific compiler, but perhaps this GNU Fortran documentation will get you started.)
Assuming you have such a program and its path is ./a.out, you can invoke awk combined with xargs as follows, passing the 2nd ($2) and 3rd ($3) fields as arguments:
awk '{ print $2, $3 }' file | xargs -n 2 ./a.out
awk '{ print $2, $3 }' prints the 2nd and 3rd whitespace-separated field from each input line, separated by a space.
xargs -n 2 takes pairs of values from awk's output and invokes ./a.out with each pair as arguments. (This approach relies on the values having no embedded whitespace, which is the case here.)

How to insert shell variable inside awk command

I'm trying to write a script, In this script i'm passing a shell variable into an awk command, But when i run it nothing happens, i tried to run that line only in the shell, i found that no variable expansion happened like i expected. Here's the code :
1 #!/bin/bash
2
3 # Created By Rafael Adel
4
5 # This script is to start dwm with customizations needed
6
7
8 while true;do
9 datestr=`date +"%r %d/%m/%Y"`
10 batterystr=`acpi | grep -oP "([a-zA-Z]*), ([0-9]*)%"`
11 batterystate=`echo $batterystr | grep -oP "[a-zA-Z]*"`
12 batterypercent=`echo $batterystr | grep -oP "[0-9]*"`
13
14 for nic in `ls /sys/class/net`
15 do
16 if [ -e "/sys/class/net/${nic}/operstate" ]
17 then
18 NicUp=`cat /sys/class/net/${nic}/operstate`
19 if [ "$NicUp" == "up" ]
20 then
21 netstr=`ifstat | awk -v interface=${nic} '$1 ~ /interface/ {printf("D: %2.1fKiB, U: %2.1fKiB",$6/1000, $8/1000)}'`
22 break
23 fi
24 fi
25 done
26
27
28 finalstr="$netstr | $batterystr | $datestr"
29
30 xsetroot -name "$finalstr"
31 sleep 1
32 done &
33
34 xbindkeys -f /etc/xbindkeysrc
35
36 numlockx on
37
38 exec dwm
This line :
netstr=`ifstat | awk -v interface=${nic} '$1 ~ /interface/ {printf("D: %2.1fKiB, U: %2.1fKiB",$6/1000, $8/1000)}'`
Is what causes netstr variable not to get assigned at all. That's because interface is not replaced with ${nic} i guess.
So could you tell me what's wrong here? Thanks.

If you want to /grep/ with your variable, you have 2 choices :
interface=eth0
awk "/$interface/{print}"
or
awk -v interface=eth0 '$0 ~ interface{print}'
See http://www.gnu.org/software/gawk/manual/gawk.html#Using-Shell-Variables

it's like I thought, awk substitutes variables properly, but between //, inside regex ( or awk regex, depending on some awk parameter AFAIR), awk variable cannot be used for substitution
I had no issue grepping with variable inside an awk program (for simple regexp cases):
sawk1='repo\s+module2'
sawk2='#project2\s+=\s+module2$'
awk "/${sawk1}/,/${sawk2}/"'{print}' aFile
(Here the /xxx/,/yyy/ displays everything between xxx and yyy)
(Note the double-quoted "/${sawk1}/,/${sawk2}/", followed by the single-quoted '{print}')
This works just fine, and comes from "awk: Using Shell Variables in Programs":
A common method is to use shell quoting to substitute the variable’s value into the program inside the script.
For example, consider the following program:
printf "Enter search pattern: "
read pattern
awk "/$pattern/ "'{ nmatches++ }
END { print nmatches, "found" }' /path/to/data
The awk program consists of two pieces of quoted text that are concatenated together to form the program.
The first part is double-quoted, which allows substitution of the pattern shell variable inside the quotes.
The second part is single-quoted.
It does add the caveat though:
Variable substitution via quoting works, but can potentially be messy.
It requires a good understanding of the shell’s quoting rules (see Quoting), and it’s often difficult to correctly match up the quotes when reading the program.

Awk script to select files and print file sizes

I'm working on a home work assignment. The question is:
Write an awk script to select all regular files (not directories or
links) in /etc ending with .conf, sort the result by size from
smallest to largest, count the number of files, and print out the
number of files followed by the filenames and sizes in two columns.
Include a header row for the filenames and sizes. Paste both your
script and its output in the answer area.
I'm really struggling trying to get this to work through using awk. Here's what I came up with.
ls -lrS /etc/*.conf |wc –l
will return the number 33 which is the number of files .conf files in the directory.
ls -lrS /etc/*.conf |awk '{print "File_Size"": " $5 " ""File_Name and Size"": " $9}'
this will make 2 columns with the name and size of the .conf file in the directory.
It works, but I don't think it is what he's looking for. I'm having an AWKful time.

Let's see here...
select all regular files (not directories or links)
So far you haven't addressed this, but if you are piping in the output of ls -l..., this is easy, select on
/^-/
because directories start with d, symbolic links with l and so on. Only plain old files start with -. Now
print out the number of files followed
Well, counting matches is easy enough...
BEGIN{count=0} # This is not *necessary*, but I tend to put it in for clarity
/^-/ {count++;}
To get the filename and size, look at the output of ls -l and count up columns
BEGIN{count=0}
/^-/ {
count++;
SIZE=$5;
FNAME=$9;
}
The big difficulty here is that awk doesn't provide much by way of sorting primitives, so that's the hard part. That can be beaten if you want to be clever but it is not particularly efficient (see the awful thing I did in a [code-golf] solution). The easy (and unixy) thing to do would be to pipe part of the output to sort, so...we collect a line for each file into a big string
BEGIN{count=0}
/^-/ {
count++
SIZE=$5;
FNAME=$9;
OUTPUT=sprintf("%10d\t%s\n%s",SIZE,FNAME,OUTPUT);
}
END{
printf("%d files\n",count);
printf(" SIZE \tFILENAME"); # No newline here because OUTPUT has it
print OUTPUT|"sort -n --key=1";
}
Gives output like
11 files
SIZE FILENAME
673 makefile
2192 houghdata.cc
2749 houghdata.hh
6236 testhough.cc
8751 fasthough.hh
11886 fasthough.cc
19270 HoughData.png
60036 houghdata.o
104680 testhough
150292 testhough.o
168588 fasthough.o
(BTW--There is a test subdirectory here, and you'll note that it does not appear in the output.)

May be something like this should get you on your way -
ls -lrS /etc/*.conf |
awk '
BEGIN{print "Size:\tFilename:"} # Prints Headers
/^-/{print $5"\t"$9} # Prints two desired columns, /^-/ captures only files
END{print "Total Files = "(NR-1)}' # Uses in-built variable to print count
Test: Text after # are comments for your reference.
[jaypal:~/Temp] ls -lrS /etc/*.conf |
awk '
BEGIN{print "Size:\tFilename:"}
/^-/{print $5"\t"$9}
END{print "Total Files = "(NR-1)}'
Size: Filename:
0 /etc/kern_loader.conf
22 /etc/ntp.conf
54 /etc/ftpd.conf
105 /etc/launchd.conf
168 /etc/memberd.conf
242 /etc/notify.conf
366 /etc/ntp-restrict.conf
526 /etc/gdb.conf
723 /etc/pf.conf
753 /etc/6to4.conf
772 /etc/syslog.conf
983 /etc/rtadvd.conf
1185 /etc/asl.conf
1238 /etc/named.conf
1590 /etc/newsyslog.conf
1759 /etc/autofs.conf
2378 /etc/dnsextd.conf
4589 /etc/man.conf
Total Files = 18

I would first find the files with something like find /etc -type f -name '*.conf' ; so you get the right list of files. Then you do ls -l on them (perhaps using xargs). And then using awk should be simple.
But I don't think that if I did more your homework that would help you. You need to think by yourself and find out.

Disclaimer: I'm not a shell expert.
Thought I'd give this a go, been beaten on speed of reply though :-) :
clear
FILE_COUNT=`find /etc/ -name '*.conf' -type f -maxdepth 1 | wc -l`
echo "Number of files: $FILE_COUNT"
ls -lrS /etc/[^-]*.conf | awk '
BEGIN {print "NAME | SIZE"}\
{print $9," | ",$5}\
END {print "- DONE -"}\
'
My output is ugly :-( :
Number of files: 21
NAME | SIZE
/etc/kern_loader.conf | 0
/etc/resolv.conf | 20
/etc/AFP.conf | 24
/etc/ntp.conf | 42
/etc/ftpd.conf | 54
/etc/notify.conf | 132
/etc/memberd.conf | 168
/etc/Symantec.conf | 246
/etc/ntp-restrict.conf | 366
/etc/gdb.conf | 526
/etc/6to4.conf | 753
/etc/syslog.conf | 772
/etc/asl.conf | 860
/etc/liveupdate.conf | 861
/etc/rtadvd.conf | 983
/etc/named.conf | 1238
/etc/newsyslog.conf | 1590
/etc/autofs.conf | 1759
/etc/dnsextd.conf | 2378
/etc/smb.conf | 2975
/etc/man.conf | 4589
/etc/amavisd.conf | 31925
- DONE -

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract & store Strings with uneven spaces using AWK - linux

Related

Separating data into files based on random ID numbers using shell script

How do I swap two strings using the sed command with the delimiter being a comma?

How to make a loop with multiple columns in shell?

How to insert shell variable inside awk command

Awk script to select files and print file sizes

Categories

Resources