Merge prefix and suffix from 2 files - string

I would like to merge 2 files:
> cat file1.txt
string1:suffix1
string2:suffix2
> cat file2.txt
prefix1:string1
prefix2:string2
in:
> cat result.txt
prefix1:string1:suffix1
prefix2:string2:suffix2
How is it possible to use awk (?) to do that?
Thanks a lot!

$ awk -F: 'NR==FNR {a[$1]=$2; next}
{print $0 FS a[$2]}' file1 file2
prefix1:string1:suffix1
prefix2:string2:suffix2
or if the files are already aligned
$ paste -d: file2 <(cut -d: -f2 file1)
prefix1:string1:suffix1
prefix2:string2:suffix2

awk 'BEGIN {OFS=":"}{ getline line < "file1.txt" ;split(line, a, ":");print $1,a[2];} ' file2.txt
where,
This [ {OFS=":"} ] is to set the character to use to append 2 lines from 2 files, if you use space you will get an output like below:
prefix1:string1 suffix1
prefix2:string2 suffix2
This [ getline line < "file1.txt" ] is to get lines from first file.
This [ split(line, a, ":") ] is to split the line based on colon and create an array a.
This [ print $1 ] is to print entire line of file2.txt file
This [ a[2] ] is to print the 2nd element of the array a from first file

Related

Compare two files and store differences using conditional

I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.
I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).
file1.txt
"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"
file2.txt
"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
If I use:
awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt
I get:
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.
The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:
Desired results.txt
"SWITCH51";"rack7";"Datacenter2"
I am struggling to find a conditional to handle this scenario.
You could store the original lines in array a, like you do, plus modified lines where "rack1" is replaced by ":
$ awk -F';' -vOFS=';' 'FNR==NR {a[$0]; if($2=="\"rack1\"") {$2="\"";a[$0]}; next}
!($0 in a)' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
Note the specification of the OFS output field separator. It is needed because when we modify the $2 field awk reconstructs $0 using the OFS which by default is a space while we need it to remain a semi-column for correct comparison when parsing file2.txt.
You could check if the value of field 2 is just " and replace it with "rack1"
If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.
awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next}
{
tmp = $0
sub(/^"$/, "\"rack1\"", $2)
if (!($0 in a)) print tmp
}
' file1.txt file2.txt
Output
"SWITCH51";"rack7";"Datacenter2"
Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.
awk -F';' '
FNR==NR{
a[$0]
b[$1,$3]
next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt
awk -F';' '
NR==FNR { a[$0]; next }
{ key = $1 FS ($2 == "\"" ? "\"rack1\"" : $2) FS $3 }
!(key in a)
' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"

Run query in Linux for selecting CSV'S

In the Linux:
there are many .csvs' in the folder, I have to select those csv's file having column name {'PREDICT' = 646}.
check this link:
https://prnt.sc/gone85
what kind of query works?
Providing test data which was unprovided ):
$ cat > file1
ACTUAL PREDICT
1 2
3 646
$ cat > file2
ACTUAL PREDICT
1 2
3 666
Then some GNU awk (nextfile) to select those csv's file having column name {'PREDICT' = 646} or where there is column PREDICT with a value 646:
$ awk 'FNR==1{for(i=1;i<=NF;i++)if($i=="PREDICT")p=i}$p==646{print FILENAME;nextfile}' file1 file2
file1
Explained:
awk '
FNR==1 { # get the column number of PREDICT column for each file
for(i=1;i<=NF;i++)
if($i=="PREDICT")
p=i # set it to p
}
$p==646 { # if p==646, we have a match
print FILENAME # print the filename
nextfile # and move on to the next file
}' file1 file2 # all the candicate files
gnu awk solution without loop:
$ cat tst.awk
BEGIN{FS=","}
FNR==1 && s=substr($0,1,index($0,"PREDICT")) { # look for index of PREDICT
i=sub(/,/, "", s) + 1 # and count nr of times you
# can replace "," in preceding
# substring
}
s && $i==646 { print FILENAME; nextfile }
some input:
$ cat file1.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
$ cat file2.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,111,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
and:
$ cp file1.csv file3.csv
gives:
$ awk -f tst.awk *.csv
file1.csv
file3.csv
Or use a one-liner:
$ awk -F, 'FNR==1 && s=substr($0,1,index($0,"PREDICT")) {i=sub(/,/, "", s) + 1}s && $i==646 { print FILENAME; nextfile }' *.csv
file1.csv
file3.csv

Passing two variables instead of two files to awk

Assume two multi-line text files that are dynamically generated during execution of a bash shell script: file1 and file2
$ echo -e "foo-bar\nbar-baz\nbaz-qux" > file1
$ cat file1
foo-bar
bar-baz
baz-qux
$ echo -e "foo\nbar\nbaz" > file2
$ cat file2
foo
bar
baz
Further assume that I wish to use awk involving an operation on the text strings of both files. For example:
$ awk 'NR==FNR{var1=$1;next} {print $var1"-"$1}' FS='-' file1 FS=' ' file2
Is there any way that I can skip having to save the text strings as files in my script and, instead, pass along the text strings to awk as variables (or as here-strings or the like)?
Something along the lines of:
$ var1=$(echo -e "foo-bar\nbar-baz\nbaz-qux")
$ var2=$(echo -e "foo\nbar\nbaz")
$ awk 'NR==FNR{var1=$1;next} {print $var1"-"$1}' FS='-' "$var1" FS=' ' "$var2"
# awk: fatal: cannot open file `foo-bar
# bar-baz
# baz-qux' for reading (No such file or directory)
$ awk '{print FILENAME, FNR, $0}' <(echo 'foo') <(echo 'bar')
/dev/fd/63 1 foo
/dev/fd/62 1 bar

awk add string to each line except last blank line

I have file with blank line at the end. I need to add suffix to each line except last blank line.
I use:
awk '$0=$0"suffix"' | sed 's/^suffix$//'
But maybe it can be done without sed?
UPDATE:
I want to skip all lines which contain only '\n' symbol.
EXAMPLE:
I have file test.tsv:
a\tb\t1\n
\t\t\n
c\td\t2\n
\n
I run cat test.tsv | awk '$0=$0"\t2"' | sed 's/^\t2$//':
a\tb\t1\t2\n
\t\t\t2\n
c\td\t2\t2\n
\n
It sounds like this is what you need:
awk 'NR>1{print prev "suffix"} {prev=$0} END{ if (NR) print prev (prev == "" ? "" : "suffix") }' file
The test for NR in the END is to avoid printing a blank line given an empty input file. It's untested, of course, since you didn't provide any sample input/output in your question.
To treat all empty lines the same:
awk '{print $0 (/./ ? "suffix" : "")}' file
#try:
awk 'NF{print $0 "suffix"}' Input_file
this will skip all blank lines
awk 'NF{$0=$0 "suffix"}1' file
to only skip the last line if blank
awk 'NR>1{print p "suffix"} {p=$0} END{print p (NF?"suffix":"") }' file
If perl is okay:
$ cat ip.txt
a b 1
c d 2
$ perl -lpe '$_ .= "\t 2" if !(eof && /^$/)' ip.txt
a b 1 2
2
c d 2 2
$ # no blank line for empty file as well
$ printf '' | perl -lpe '$_ .= "\t 2" if !(eof && /^$/)'
$
-l strips newline from input, adds back when line is printed at end of code due to -p option
eof to check end of file
/^$/ blank line
$_ .= "\t 2" append to input line
Try this -
$ cat f ###Blank line only in the end of file
-11.2
hello
$ awk '{print (/./?$0"suffix":"")}' f
-11.2suffix
hellosuffix
$
OR
$ cat f ####blank line in middle and end of file
-11.2
hello
$ awk -v val=$(wc -l < f) '{print (/./ || NR!=val?$0"suffix":"")}' f
-11.2suffix
suffix
hellosuffix
$

Finding common columns position between files

I need to find the difference between two files in Unix,
File 1:
1,column1
2,column2
3,column3
File 2:
1,column1
2,column3
3,column5
I need to find the position of common column in file 2 from file 1
If there is no matching column in file1 some default index value and column name should return.
Output:
1,column1
3,column3
-1,column5
Can anyone help me to get in Unix script ?
Thanks,
William R
awk:
awk -F, 'NR==FNR{a[$2]=1; next;} ($2 in a)' file2 file1
grep+process substitution:
grep -f <(cut -d, -f2 file2) file1
EDIT for updated question:
awk:
awk -F, 'NR==FNR{a[$2]=$1;next} {if ($2 in a) print a[$2]","$2; else print "-1," $2}' file1 file2
# if match found in file1, print the index, else print -1
# (Also note that the input file order is reversed in this command, compared to earlier awk.)
grep:
cp file1 tmpfile #get original file
grep -f <(cut -d, -f2 file1) -v f2 | sed 's/.*,/-1,/' >> tmpfile #append missing entries
grep -f <(cut -d, -f2 file2) tmpfile # grep in this tmpfile

Resources