Merge prefix and suffix from 2 files

Merge prefix and suffix from 2 files - string

I would like to merge 2 files:
> cat file1.txt
string1:suffix1
string2:suffix2
> cat file2.txt
prefix1:string1
prefix2:string2
in:
> cat result.txt
prefix1:string1:suffix1
prefix2:string2:suffix2
How is it possible to use awk (?) to do that?
Thanks a lot!

$ awk -F: 'NR==FNR {a[$1]=$2; next}
{print $0 FS a[$2]}' file1 file2
prefix1:string1:suffix1
prefix2:string2:suffix2
or if the files are already aligned
$ paste -d: file2 <(cut -d: -f2 file1)
prefix1:string1:suffix1
prefix2:string2:suffix2

awk 'BEGIN {OFS=":"}{ getline line < "file1.txt" ;split(line, a, ":");print $1,a[2];} ' file2.txt
where,
This [ {OFS=":"} ] is to set the character to use to append 2 lines from 2 files, if you use space you will get an output like below:
prefix1:string1 suffix1
prefix2:string2 suffix2
This [ getline line < "file1.txt" ] is to get lines from first file.
This [ split(line, a, ":") ] is to split the line based on colon and create an array a.
This [ print $1 ] is to print entire line of file2.txt file
This [ a[2] ] is to print the 2nd element of the array a from first file

Related

Compare two files and store differences using conditional

I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.
I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).
file1.txt
"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"
file2.txt
"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
If I use:
awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt
I get:
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.
The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:
Desired results.txt
"SWITCH51";"rack7";"Datacenter2"
I am struggling to find a conditional to handle this scenario.

You could store the original lines in array a, like you do, plus modified lines where "rack1" is replaced by ":
$ awk -F';' -vOFS=';' 'FNR==NR {a[$0]; if($2=="\"rack1\"") {$2="\"";a[$0]}; next}
!($0 in a)' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
Note the specification of the OFS output field separator. It is needed because when we modify the $2 field awk reconstructs $0 using the OFS which by default is a space while we need it to remain a semi-column for correct comparison when parsing file2.txt.

You could check if the value of field 2 is just " and replace it with "rack1"
If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.
awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next}
{
tmp = $0
sub(/^"$/, "\"rack1\"", $2)
if (!($0 in a)) print tmp
}
' file1.txt file2.txt
Output
"SWITCH51";"rack7";"Datacenter2"

Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.
awk -F';' '
FNR==NR{
a[$0]
b[$1,$3]
next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt

awk -F';' '
NR==FNR { a[$0]; next }
{ key = $1 FS ($2 == "\"" ? "\"rack1\"" : $2) FS $3 }
!(key in a)
' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"

Run query in Linux for selecting CSV'S

In the Linux:
there are many .csvs' in the folder, I have to select those csv's file having column name {'PREDICT' = 646}.
check this link:
https://prnt.sc/gone85
what kind of query works?

Providing test data which was unprovided ):
$ cat > file1
ACTUAL PREDICT
1 2
3 646
$ cat > file2
ACTUAL PREDICT
1 2
3 666
Then some GNU awk (nextfile) to select those csv's file having column name {'PREDICT' = 646} or where there is column PREDICT with a value 646:
$ awk 'FNR==1{for(i=1;i<=NF;i++)if($i=="PREDICT")p=i}$p==646{print FILENAME;nextfile}' file1 file2
file1
Explained:
awk '
FNR==1 { # get the column number of PREDICT column for each file
for(i=1;i<=NF;i++)
if($i=="PREDICT")
p=i # set it to p
}
$p==646 { # if p==646, we have a match
print FILENAME # print the filename
nextfile # and move on to the next file
}' file1 file2 # all the candicate files

gnu awk solution without loop:
$ cat tst.awk
BEGIN{FS=","}
FNR==1 && s=substr($0,1,index($0,"PREDICT")) { # look for index of PREDICT
i=sub(/,/, "", s) + 1 # and count nr of times you
# can replace "," in preceding
# substring
}
s && $i==646 { print FILENAME; nextfile }
some input:
$ cat file1.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
$ cat file2.csv
ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH
925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
925,111,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054
and:
$ cp file1.csv file3.csv
gives:
$ awk -f tst.awk *.csv
file1.csv
file3.csv
Or use a one-liner:
$ awk -F, 'FNR==1 && s=substr($0,1,index($0,"PREDICT")) {i=sub(/,/, "", s) + 1}s && $i==646 { print FILENAME; nextfile }' *.csv
file1.csv
file3.csv

Passing two variables instead of two files to awk

Assume two multi-line text files that are dynamically generated during execution of a bash shell script: file1 and file2
$ echo -e "foo-bar\nbar-baz\nbaz-qux" > file1
$ cat file1
foo-bar
bar-baz
baz-qux
$ echo -e "foo\nbar\nbaz" > file2
$ cat file2
foo
bar
baz
Further assume that I wish to use awk involving an operation on the text strings of both files. For example:
$ awk 'NR==FNR{var1=$1;next} {print $var1"-"$1}' FS='-' file1 FS=' ' file2
Is there any way that I can skip having to save the text strings as files in my script and, instead, pass along the text strings to awk as variables (or as here-strings or the like)?
Something along the lines of:
$ var1=$(echo -e "foo-bar\nbar-baz\nbaz-qux")
$ var2=$(echo -e "foo\nbar\nbaz")
$ awk 'NR==FNR{var1=$1;next} {print $var1"-"$1}' FS='-' "$var1" FS=' ' "$var2"
# awk: fatal: cannot open file `foo-bar
# bar-baz
# baz-qux' for reading (No such file or directory)

$ awk '{print FILENAME, FNR, $0}' <(echo 'foo') <(echo 'bar')
/dev/fd/63 1 foo
/dev/fd/62 1 bar

awk add string to each line except last blank line

I have file with blank line at the end. I need to add suffix to each line except last blank line.
I use:
awk '$0=$0"suffix"' | sed 's/^suffix$//'
But maybe it can be done without sed?
UPDATE:
I want to skip all lines which contain only '\n' symbol.
EXAMPLE:
I have file test.tsv:
a\tb\t1\n
\t\t\n
c\td\t2\n
\n
I run cat test.tsv | awk '$0=$0"\t2"' | sed 's/^\t2$//':
a\tb\t1\t2\n
\t\t\t2\n
c\td\t2\t2\n
\n

It sounds like this is what you need:
awk 'NR>1{print prev "suffix"} {prev=$0} END{ if (NR) print prev (prev == "" ? "" : "suffix") }' file
The test for NR in the END is to avoid printing a blank line given an empty input file. It's untested, of course, since you didn't provide any sample input/output in your question.
To treat all empty lines the same:
awk '{print $0 (/./ ? "suffix" : "")}' file

#try:
awk 'NF{print $0 "suffix"}' Input_file

this will skip all blank lines
awk 'NF{$0=$0 "suffix"}1' file
to only skip the last line if blank
awk 'NR>1{print p "suffix"} {p=$0} END{print p (NF?"suffix":"") }' file

If perl is okay:
$ cat ip.txt
a b 1
c d 2
$ perl -lpe '$_ .= "\t 2" if !(eof && /^$/)' ip.txt
a b 1 2
2
c d 2 2
$ # no blank line for empty file as well
$ printf '' | perl -lpe '$_ .= "\t 2" if !(eof && /^$/)'
$
-l strips newline from input, adds back when line is printed at end of code due to -p option
eof to check end of file
/^$/ blank line
$_ .= "\t 2" append to input line

Try this -
$ cat f ###Blank line only in the end of file
-11.2
hello
$ awk '{print (/./?$0"suffix":"")}' f
-11.2suffix
hellosuffix
$
OR
$ cat f ####blank line in middle and end of file
-11.2
hello
$ awk -v val=$(wc -l < f) '{print (/./ || NR!=val?$0"suffix":"")}' f
-11.2suffix
suffix
hellosuffix
$

Finding common columns position between files

I need to find the difference between two files in Unix,
File 1:
1,column1
2,column2
3,column3
File 2:
1,column1
2,column3
3,column5
I need to find the position of common column in file 2 from file 1
If there is no matching column in file1 some default index value and column name should return.
Output:
1,column1
3,column3
-1,column5
Can anyone help me to get in Unix script ?
Thanks,
William R

awk:
awk -F, 'NR==FNR{a[$2]=1; next;} ($2 in a)' file2 file1
grep+process substitution:
grep -f <(cut -d, -f2 file2) file1
EDIT for updated question:
awk:
awk -F, 'NR==FNR{a[$2]=$1;next} {if ($2 in a) print a[$2]","$2; else print "-1," $2}' file1 file2
# if match found in file1, print the index, else print -1
# (Also note that the input file order is reversed in this command, compared to earlier awk.)
grep:
cp file1 tmpfile #get original file
grep -f <(cut -d, -f2 file1) -v f2 | sed 's/.*,/-1,/' >> tmpfile #append missing entries
grep -f <(cut -d, -f2 file2) tmpfile # grep in this tmpfile

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Merge prefix and suffix from 2 files - string

I would like to merge 2 files: > cat file1.txt string1:suffix1 string2:suffix2 > cat file2.txt prefix1:string1 prefix2:string2 in: > cat result.txt prefix1:string1:suffix1 prefix2:string2:suffix2 How is it possible to use awk (?) to do that? Thanks a lot!

$ awk -F: 'NR==FNR {a[$1]=$2; next} {print $0 FS a[$2]}' file1 file2 prefix1:string1:suffix1 prefix2:string2:suffix2 or if the files are already aligned $ paste -d: file2 <(cut -d: -f2 file1) prefix1:string1:suffix1 prefix2:string2:suffix2

Related

Compare two files and store differences using conditional

Run query in Linux for selecting CSV'S

Passing two variables instead of two files to awk

awk add string to each line except last blank line

Finding common columns position between files

Categories

Resources