Multiplication of lines in bash - linux

I want to make something like multiplication:
File1:
aa
bb
File2:
cc
dd
File3:
eee
fff
ggg
I want a result like:
aa cc eee
aa cc fff
aa cc ggg
bb dd eee
bb dd fff
bb dd ggg
File1 & File2 first element will multiply every element of File3, and same as second element of File1 & File2 multiply with every element of File3.

This would work:
$ join -j 9999 <(paste file1 file2) file3
aa cc eee
aa cc fff
aa cc ggg
bb dd eee
bb dd fff
bb dd ggg
It joins on a non-existing field (field 9999), which creates the Cartesian product of the input files. For the input files, paste file1 file2 combines the first two files into one, and join uses process substitution.
A slight snag is that there is a space introduced on each line; to get rid of that, you can pipe to sed:
join -j 9999 <(paste file1 file2) file3 | sed 's/^ //'
or specify an output format:
join -j 9999 -o 1.1,1.2,2.1 <(paste file1 file2) file3

You could use a nested for loop.
for ab in $(paste -d ' ' File1 File2); do
for c in $(cat File3); do
echo "$ab $c"
done
done
It doesn’t scale, obviously, but it may be enough for your use case.

Related

shell duplicate spaces in file

Is it possible to remove multiple spaces from a text file and save the changes in the same file using awk or grep?
Input example:
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
Simply reset value of $1 to again $1 which will allow OFS to come into picture and will add proper spaces into lines.
awk '{$1=$1} 1' Input_file
EDIT: Since OP mentioned that what if we want to keep only starting spaces then try following.
awk '
match($0,/^ +/){
spaces=substr($0,RSTART,RLENGTH)
}
{
$1=$1
$1=spaces $1
spaces=""
}
1
' Input_file
Using sed
sed -i -E 's#[[:space:]]+# #g' < input file
For removing spaces at the start
sed -i -E 's#[[:space:]]+# #g; s#^ ##g' < input file
Demo:
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$sed -i -E 's#[[:space:]]+# #g' test.txt
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$

How to compare two columns in same file and store the difference in new file with the unchanged column according to it?

Row Actual Expected
1 AAA BBB
2 CCC CCC
3 DDD EEE
4 FFF GGG
5 HHH HHH
I want to compare actual and expected and store the difference in a file. Like
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
I have used awk -F, '{if ($2!=$3) {print $1,$2,$3}}' Sample.csv It will only compare Int values not String value
You can use AWK to do this
awk '{if($2!=$3) print $0}' oldfile > newfile
where
$2 and $3 are second and third columns
!= means second and third columns does not match
$0 means whole line
> newfile redirects to new file
I prefer an awk solution (can handle more fields and easier to understand), but you could use
sed -r '/\t([^ ]*)\t\1$/d' Sample.csv
Assuming the file uses tab or some other delimiter to separate the columns, then tsv-filter from eBay's TSV Utilities supports this type of field comparison directly. For the file above:
$ tsv-filter --header --ff-str-ne 2:3 file.tsv
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
The --ff-str-ne option compares two fields in a row for non-equal strings.
Disclaimer: I'm the author.

How can join consecutive non-empty lines using sed/awk?

How can i join consecutive non-empty lines into a single lines using sed or awk?
An example is given of what I am trying to do.
Input:
aaa ff gg
bbb eee eee
ss gg dd
aaa ff gg
bbb eee eee
ss gg dd
aaa ff gg
bbb eee eee
ss gg dd
Converts to
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
Not sure if you REALLY want a blank line between each data line or not so here's both:
$ awk -v RS= '{$1=$1}1' file
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
$ awk -v RS= -v ORS='\n\n' '{$1=$1}1' file
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
This might work for you (GNU sed):
sed ':a;N;/\n$/!s/\n/ /;ta' file
Unless the last line appended is empty, replace a newline by a space and repeat. Otherwise print and repeat.
If you want empty lines deleted, then:
sed ':a;N;/\n$/!s/\n/ /;ta;P;d' file
If perl is okay:
$ perl -00 -pe 's/\n(?!$)/ /g' ip.txt
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
-00 read input in paragraph mode
See http://perldoc.perl.org/perlrun.html#Command-Switches for more info and for -pe options
use perl -i -00 -pe for inplace editing
s/\n(?!$)/ /g replace all newlines except the one from blank line with space
#Schon:#try:
awk '{ORS=/^$/?RS RS:FS} {$1=$1} 1;END{print RS}' Input_file
EDIT: Adding explanation too now.
awk '{
ORS= ##### Setting Output field separator here.
/^$/ ##### Checking the condition if a line starts from null.
? ##### ? means if above condition is TRUE then run following action.
RS RS ##### set ORS as RS RS means set it to 2 new lines, default value of RS will be new line.
: ##### : is a conditional operator which will execute the action following it when condition is FALSE.
FS} ##### Set ORS to FS, which is field separator and it's default value is space.
{$1=$1} ##### Re-setting the first field again of line to reflect the new value of ORS.
1; ##### making the condition as TRUE and not mentioning the action, so by default print will happen of current line.
END
{print RS} ##### printing the RS value at last which is new line.
' Input_file ##### Mentioning the Input_file here.
A more readable example, less Perl-like:
awk '{ if ($0 == "") { print line "\n"; line = "" } else line = line $0 } END { if (line) print line }' file

compare columns from different files and print those that DO NOT match

I have two files, file1 and file2. I want to compare several columns - $1,$2 ,$3 and $4 of file1 with several columns $1,$2, $3 and $4 of file2 and print those rows of file2 that do not match any row in file1.
E.g.
file1
aaa bbb ccc 1 2 3
aaa ccc eee 4 5 6
fff sss sss 7 8 9
file2
aaa bbb ccc 1 f a
mmm nnn ooo 1 d e
aaa ccc eee 4 a b
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
fff sss sss 7 5 6
I want to have as output:
mmm nnn ooo 1 d e
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
I have seen questions asked here for finding those that do match and printing them, but not viceversa,those that DO NOT match.
Thank you!
Use the following script:
awk '{k=$1 FS $2 FS $3 FS $4} NR==FNR{a[k]; next} !(k in a)' file1 file2
k is the concatenated value of the columns 1, 2, 3 and 4, delimited by FS (see comments), and will be used as a key in a search array a later. NR==FNR is true while reading file1. I'm creating the array a indexed by k while reading file1.
For the remaining lines of input I check with !(k in a) if the index does not exists in a. If that evaluates to true awk will print that line.
here is another approach if the files are sorted and you know the used char set.
$ function f(){ sed 's/ /~/g;s/~/ /4g' $1; }; join -v2 <(f file1) <(f file2) |
sed 's/~/ /g'
mmm nnn ooo 1 d e
aaa ccc eee 4 a b
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
fff sss sss 7 5 6
create a key field by concatenating first four fields (with a ~ char, but any unused char can be used), use join to find the unmatched entries from file2 and partition the synthetic key field back.
However, the best way is to use awk solution with a slight fix
$ awk 'NR==FNR{a[$1,$2,$3,$4]; next} !(($1,$2,$3,$4) in a)' file1 file2
No doubt that the awk solution from #hek2mgl is better than this one, but for information this is also possible using uniq, sort, and rev:
rev file1 file2 | sort -k3 | uniq -u -f2 | rev
rev is reverting both files from right to left.
sort -k3 is sorting lines skipping the 2 first column.
uniq -u -f2 prints only lines that are unique (skipping the 2 first while comparing).
At last the rev is reverting back the lines.
This solution sorts the lines of both files. That might be desired or not.

Delete whole line NOT containing given string

Is there a way to delete the whole line if it contains specific word using sed? i.e.
I have the following:
aaa bbb ccc
qqq fff yyy
ooo rrr ttt
kkk ccc www
I want to delete lines that contain 'ccc' and leave other lines intact. In this example the output would be:
qqq fff yyy
ooo rrr ttt
All this using sed. Any hints?
sed -n '/ccc/!p'
or
sed '/ccc/d'

Resources