csv file manipulation in unix and append value to each line - linux

I have the below csv file
,,,Test File,
,todays Date:,01/10/2018,Generation date,10/01/2019 11:20:58
Header 1,Header 2,Header 3,Header 4,Header 5
,My account no,100102GFC,,
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E
TEST
I need to extract the todays date that is in 3rd column of the second line
and also the account number which is in 3rd column of the 4th line.
Below is the new file that i have to create, those extracted values
from 3rd and 4th line needs to be appended at the end of the file.
New file will contain the data from the 4th line and n-1 line
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
Kindly could you please help me how to do the same in a shell script?
Here is what i tried, i am new to shell scripting, unable to combine all these
To extract the date from second row
sed -sn 2p test.csv| cut -d ',' -f 3
To extract the account no
sed -sn 3p test.csv| cut -d ',' -f 3
To extract the actual data
tail -n +5 test.csv | head -n -1>temp.csv

Try awk:
awk -F, 'NR==2{d=$3}NR==4{a=$3}NR>4{if (line) print line; line = $0 "," d "," a;}' Inputfile.csv
Eg:
$ cat file1
,,,Test File,
,todays Date:,01/10/2018,Generation date,10/01/2019 11:20:58
Header 1,Header 2,Header 3,Header 4,Header 5
,My account no,100102GFC,,
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E
TEST
$ awk -F, 'NR==2{d=$3}NR==4{a=$3}NR>4{if (line) print line; line = $0 "," d "," a;}' file1
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
Misunderstood your meaning before I edit your question, updated my answer afterwards.
In the awk command:
NR means the line number, -F to assign separator, d store date a account.
just concatenate the line $0 with d and a.
You don't want last line, so I used line to delay print, last line won't print out (though it did saved to line, and can be used if a END block is given).

You can try Perl also
$ cat dawn.txt
,,,Test File,
,todays Date:,01/10/2018,Generation date,10/01/2019 11:20:58
Header 1,Header 2,Header 3,Header 4,Header 5
,My account no,100102GFC,,
A,B,C,D,E
A,B,C,D,E
A,B,C,D,E
TEST
$ perl -F, -lane ' $dt=$F[2] if $.==2 ; $ac=$F[2] if $.==4; if($.>4 and ! eof) { print "$_,$dt,$ac" } ' dawn.txt
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
$

$ cat tst.awk
BEGIN { FS=OFS="," }
NR == 2 { date = $3 }
NR == 4 { acct = $3 }
NR>4 && NF>1 { print $0, date, acct }
$ awk -f tst.awk file
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
or, depending on your requirements and actual input data:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR == 2 { date = $3 }
NR == 4 { acct = $3 }
NR>4 {
if (out != "") {
print out
}
out = $0 OFS date OFS acct
}
$ awk -f tst.awk file
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC
A,B,C,D,E,01/10/2018,100102GFC

Related

print lines with specific condition on the last field of each line into another file

I have multiple files and one of the file with 4 lines like below
2345,abdgdhf,......,12879
6354, kfsjgdh,.....,"fac
74573,khskdd,......,5663
gffhf,gfgfhfh,......,7675
I want to write lines where the first field is not digits or the first character of last field is quotation into another file. the expected output should be a file with two lines as below
6354, kfsjgdh,.....,"fac
gffhf,gfgfhfh,......,7675
The command below will print the lines where the first field is not a number
for f in *.csv; do
awk -F "," '(/^[^0-9]/) {print }' "$f" > ./bad/"$f"
done
Output will be
gffhf,gfgfhfh,......,7675
And the command below will give me the fist character of last field
awk -F "," '{print ($(NF))}' <file> |sed 's/\(.\{1\}\).*/\1/'
Output will be
1
"
5
7
I don't know how to merge this line into my for loop and add a condition to only grab lines with quotation as the first character of last field to have first line of 6354, kfsjgdh,.....,"fac in expected output.
You don't need a for loop:
awk -F',' '
FNR==1 { close(out); out="./bad/" FILENAME }
($1 !~ /^[0-9]+$/) || ($NF ~ /^"/) { print > out }
' *.csv

Run query in Linux for selecting CSV'S

In the Linux:
there are many .csvs' in the folder, I have to select those csv's file having column name {'PREDICT' = 646}.
check this link:
https://prnt.sc/gone85
what kind of query works?
Providing test data which was unprovided ):
$ cat > file1
ACTUAL PREDICT
1 2
3 646
$ cat > file2
ACTUAL PREDICT
1 2
3 666
Then some GNU awk (nextfile) to select those csv's file having column name {'PREDICT' = 646} or where there is column PREDICT with a value 646:
$ awk 'FNR==1{for(i=1;i<=NF;i++)if($i=="PREDICT")p=i}$p==646{print FILENAME;nextfile}' file1 file2
file1
Explained:
awk '
FNR==1 { # get the column number of PREDICT column for each file
for(i=1;i<=NF;i++)
if($i=="PREDICT")
p=i # set it to p
}
$p==646 { # if p==646, we have a match
print FILENAME # print the filename
nextfile # and move on to the next file
}' file1 file2 # all the candicate files
gnu awk solution without loop:
$ cat tst.awk
BEGIN{FS=","}
FNR==1 && s=substr($0,1,index($0,"PREDICT")) { # look for index of PREDICT
i=sub(/,/, "", s) + 1 # and count nr of times you
# can replace "," in preceding
# substring
}
s && $i==646 { print FILENAME; nextfile }
some input:
$ cat file1.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
$ cat file2.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,111,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
and:
$ cp file1.csv file3.csv
gives:
$ awk -f tst.awk *.csv
file1.csv
file3.csv
Or use a one-liner:
$ awk -F, 'FNR==1 && s=substr($0,1,index($0,"PREDICT")) {i=sub(/,/, "", s) + 1}s && $i==646 { print FILENAME; nextfile }' *.csv
file1.csv
file3.csv

awk add string to each line except last blank line

I have file with blank line at the end. I need to add suffix to each line except last blank line.
I use:
awk '$0=$0"suffix"' | sed 's/^suffix$//'
But maybe it can be done without sed?
UPDATE:
I want to skip all lines which contain only '\n' symbol.
EXAMPLE:
I have file test.tsv:
a\tb\t1\n
\t\t\n
c\td\t2\n
\n
I run cat test.tsv | awk '$0=$0"\t2"' | sed 's/^\t2$//':
a\tb\t1\t2\n
\t\t\t2\n
c\td\t2\t2\n
\n
It sounds like this is what you need:
awk 'NR>1{print prev "suffix"} {prev=$0} END{ if (NR) print prev (prev == "" ? "" : "suffix") }' file
The test for NR in the END is to avoid printing a blank line given an empty input file. It's untested, of course, since you didn't provide any sample input/output in your question.
To treat all empty lines the same:
awk '{print $0 (/./ ? "suffix" : "")}' file
#try:
awk 'NF{print $0 "suffix"}' Input_file
this will skip all blank lines
awk 'NF{$0=$0 "suffix"}1' file
to only skip the last line if blank
awk 'NR>1{print p "suffix"} {p=$0} END{print p (NF?"suffix":"") }' file
If perl is okay:
$ cat ip.txt
a b 1
c d 2
$ perl -lpe '$_ .= "\t 2" if !(eof && /^$/)' ip.txt
a b 1 2
2
c d 2 2
$ # no blank line for empty file as well
$ printf '' | perl -lpe '$_ .= "\t 2" if !(eof && /^$/)'
$
-l strips newline from input, adds back when line is printed at end of code due to -p option
eof to check end of file
/^$/ blank line
$_ .= "\t 2" append to input line
Try this -
$ cat f ###Blank line only in the end of file
-11.2
hello
$ awk '{print (/./?$0"suffix":"")}' f
-11.2suffix
hellosuffix
$
OR
$ cat f ####blank line in middle and end of file
-11.2
hello
$ awk -v val=$(wc -l < f) '{print (/./ || NR!=val?$0"suffix":"")}' f
-11.2suffix
suffix
hellosuffix
$

How to replace fields using substr comparison

I have two files where I need to fetch the last 6 char of Field-11 from F1 and lookup on F2, if it match I need to replace Field-9 of F1 with Field-1 and Filed-2 of F2.
file1:
12345||||||756432101000||756432||756432101000||
aaaaa||||||986754812345||986754||986754812345||
ccccc||||||134567222222||134567||134567222222||
file2:
101000|AAAA
812345|20030
The expected output is:
12345||||||756432101000||101000AAAA ||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
I have tried:
awk -F '|' -v OFS='|' 'NR==FNR{a[$1,$2];next} {b=substr($11,length($11)-7)} b in a {$9=a[$1,$2]}1'
I'd write it this way as a full script in a file, rather than a one-liner:
#!/usr/bin/awk -f
BEGIN {
FS = "|";
OFS = FS;
}
NR == FNR { # second file: the replacements to use
map[$1] = $2
next;
}
{ # first file specified: the main file to manipulate
b = substr($11,length($11)-5);
if (map[b]) {
$9 = b map[b]
}
print
}
$ awk -F '|' -v OFS='|' 'NR==FNR{a[$1]=$2;next} {b=substr($11,length($11)-5)} b in a {$9=b a[b]}1' file2 file1
12345||||||756432101000||101000AAAA||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
How it works
awk implicitly loops through every line in both files, starting with file2 because it is specified first on the command line.
-F '|'
This tells awk to use | as the field separator on input
-v OFS='|'
This tells awk to use | as the field separator on output
NR==FNR{a[$1]=$2;next}
While reading the first file, file2, this saves the second field, $2, as the value of associative array a with the first field, $1, as the key.
next tells awk to skip the rest of the commands and start over on the next line.
b=substr($11,length($11)-5)
This extracts the last six characters of field 11 and saves them in variable b.
b in a {$9=b a[b]}
This tests to see if b is one of the keys of associative array a. If it is, this assigns the ninth field, $9, to the combination of b and a[b].
1
This is awk's cryptic shorthand for print-the-line.
You are almost there:
$ awk -F '|' -v OFS='|' 'NR==FNR{a[$1]=$2;next} {b=substr($11,length($11)-5)} b in a {$9=b a[b]}1' file2 file1
12345||||||756432101000||101000AAAA ||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
$

formatting text using awk

Hi I have the following text and I need to use awk or sed to print 3 separate columns
11/13/14 101 HUDSON AUBONPAINJERSEY CITY NJ $4.15
11/22/14 MTAMVM*110TH ST/CATNEW YORK NY $19.05
11/22/14 DUANE READE #14226 0NEW YORK NY $1.26
So I like to produce a file containing all the dates. Another file containing all the description and third file containing all the numbers
I can use an awk to print the first column printy $1 and then use -F [$] option to print last column but I'm not able to just print the middle column as there are spaces etc. Can I ignore the spaces? or is there a better way of doing this?
Thaking you in advance
Try doing this :
$ awk '
{
print $1 > "dates"; $1=""
print $NF > "prices"; $NF=""
print $0 > "desc"
}
' file
or :
awk -F' +' '
{
print $1 > "dates"
print $2 > "desc"
print $3 > "prices"
}
' file
Then :
$ cat dates
$ cat desc
$ cat prices
Wasn't fast enough to be the first to give an awk solution, so here's one with grep and sed...
grep -o '^.*/.*/1.' file #first col
sed 's/^.*\/.*\/1.//;s/\$.*//' file #middle col
grep -o '\$.*$' file #last col

Resources