How can "squeeze-repeated" words? - linux

How can "squeeze-repeated" words?
similar to "squeeze repeated characters" with tr -s ''
I would like to change for example:
hello.hello.hello.hello
to
hello

This can be a way:
$ cat a
hello hello bye but bye yeah
hello yeah
$ awk 'BEGIN{OFS=FS=" "}
{ for (i=1; i<=NF; i++) {
if (!($i in a)) {printf "%s%s",$i,OFS; a[$i]=$i}
};
delete a;
print ""
}' a
hello bye but yeah
hello yeah
You can change the field separator:
$ cat a
hello|hello|bye|but|bye|yeah
hello|yeah
$ awk 'BEGIN{OFS=FS="|"} {for (i=1; i<=NF; i++) {if (!($i in a)) {printf "%s%s",$i,OFS; a[$i]=$i}}; delete a; print ""}' a
hello|bye|but|yeah|
hello|yeah|

Related

Adding double quotes around non-numeric columns by awk

I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.
You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file
Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"
Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999

Awk script to sum multiple column if value in column1 is duplicate

Need your help to resolve the below query.
I want to sum up the values for column3,column5,column6, column7,column9,column10 if value in column1 is duplicate.
Also need to make duplicate rows as single row in output file and also need to put the value of column1 in column 8 in output file
input file
a|b|c|d|e|f|g|h|i|j
IN27201800024099|a|2.01|ad|5|56|6|rr|1|5
IN27201800023963|b|3|4|rt|67|6|61|ty|6
IN27201800024099|a|4|87|ad|5|6|1|rr|7.45
IN27201800024099|a|5|98|ad|5|6|1|rr|8
IN27201800023963|b|7|7|rt|5|5|1|ty|56
IN27201800024098|f|80|67|ty|6|6|1|4rght|765
output file
a|b|c|d|e|f|g|h|i|j|k
IN27201800024099|a|11.01|190|ad|66|18|3|rr|20.45|IN27201800024099
IN27201800023963|b|10|11|rt|72|11|62|ty|62|IN27201800023963
IN27201800024098|f|80|67|ty|6|6|1|4rght|765|IN27201800024098
Tried below code, but it is not working and also no clue how to complete the code to get correct output
awk 'BEGIN {FS=OFS="|"} FNR==1 {a[$1]+= (f3[key]+=$3;f5[key]+=$5;f6[key]+=$6;f7[key]+=$7;f9[key]+=$9;f10[key]+=$10;)} input.txt > output.txt
$ cat tst.awk
BEGIN {
FS=OFS="|"
}
NR==1 {
print $0, "h"
next
}
{
keys[$1]
for (i=2; i<=NF; i++) {
sum[$1,i] += $i
}
}
END {
for (key in keys) {
printf "%s", key
for (i=2; i<=NF; i++) {
printf "%s%s", OFS, sum[key,i]
}
print OFS key
}
}
$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099
The above outputs the lines in random order, if you want them output in the same order as the key values were read in, it's just a couple more lines of code:
$ cat tst.awk
BEGIN {
FS=OFS="|"
}
NR==1 {
print $0, "h"
next
}
!seen[$1]++ {
keys[++numKeys] = $1
}
{
for (i=2; i<=NF; i++) {
sum[$1,i] += $i
}
}
END {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s", key
for (i=2; i<=NF; i++) {
printf "%s%s", OFS, sum[key,i]
}
print OFS key
}
}
$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098

Merge two files using awk in linux

I have a 1.txt file:
betomak#msn.com||o||0174686211||o||7880291304ca0404f4dac3dc205f1adf||o||Mario||o||Mario||o||Kawati
zizipi#libero.it||o||174732943.0174732943||o||e10adc3949ba59abbe56e057f20f883e||o||Tiziano||o||Tiziano||o||D'Intino
frankmel#hotmail.de||o||0174844404||o||8d496ce08a7ecef4721973cb9f777307||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||536c1287d2dc086030497d1b8ea7a175||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||9893ac33a018e8d37e68c66cae23040e||o||Nabile||o||Nabile||o||Nassime
donaldduck#yahoo.com||o||174912161.0174912161||o||0c770713436695c18a7939ad82bc8351||o||Donald||o||Donald||o||Duck
cernakova#centrum.cz||o||0174991962||o||d161dc716be5daf1649472ddf9e343e6||o||Dagmar||o||Dagmar||o||Cernakova
trgsrl#tiscali.it||o||0175099675||o||d26005df3e5b416d6a39cc5bcfdef42b||o||Esmeralda||o||Esmeralda||o||Trogu
catherinesou#yahoo.fr||o||0175128896||o||2e9ce84389c3e2c003fd42bae3c49d12||o||Cat||o||Cat||o||Sou
ermimurati24#hotmail.com||o||0175228687||o||a7766a502e4f598c9ddb3a821bc02159||o||Anna||o||Anna||o||Beratsja
cece_89#live.fr||o||0175306898||o||297642a68e4e0b79fca312ac072a9d41||o||Celine||o||Celine||o||Jacinto
kendinegel39#hotmail.com||o||0175410459||o||a6565ca2bc8887cde5e0a9819d9a8ee9||o||Adem||o||Adem||o||Bulut
A 2.txt file:
9893ac33a018e8d37e68c66cae23040e:134:#a1
536c1287d2dc086030497d1b8ea7a175:~~#!:/92\
8d496ce08a7ecef4721973cb9f777307:demodemo
FS for 1.txt is "||o||" and for 2.txt is ":"
I want to merge two files in a single file result.txt based on the condition that the 3rd column of 1.txt must match with 1st column of 2.txt file and should be replaced by the 2nd column of 2.txt file.
The expected output will contain all the matching lines:
I am showing you one of them:
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime
I tried the script:
awk -F"||o||" 'NR==FNR{s=$0; sub(/:[^:]*$/, "", s); a[s]=$NF;next} {s = $5; for (i=6; i<=NF; ++i) s = s "," $i; if (s in a) { NF = 5; $5=a[s]; print } }' FS=: <(tr -d '\r' < 2.txt) FS="||o||" OFS="||o||" <(tr -d '\r' < 1.txt) > result.txt
But getting an empty file as the result. Any help would be highly appreciated.
If your actual Input_file(s) are same as shown sample then following awk may help you in same.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
next
}
($1 in a){
print a[$1] s1 $2 FS $3 s1 b[$1]
}
' FS="|" 1.txt FS=":" 2.txt
EDIT: Since OP has changed requirement a bit so providing code as per new ask where it will create 2 files too 1 file which will have ids present in 1.txt and NOT in 2.txt and other will be vice versa of it.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
c[$9]=$0;
next
}
($1 in a){
val=$1;
$1="";
sub(/:/,"");
print a[val] s1 $0 s1 b[val];
d[val]=$0;
next
}
{
print > "NOT_present_in_2.txt"
}
END{
for(i in d){
delete c[i]
};
for(j in c){
print j,c[j] > "NOT_present_in_1.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt
You can use this awk to get your output:
awk -F ':' 'NR==FNR{a[$1]=$2 FS $3; next} FNR==1{FS=OFS="||o||"; gsub(/[|]/, "\\\\&", FS)}
$3 in a{$3=a[$3]; print}' file2 file1 > result.txt
cat result.txt
frankmel#hotmail.de||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||~~#!:/92\||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime

Storing command output lines into array based on new line character

I have a variable as below & i perform certain operations to print the output one by one as mentioned below.
a="My name is A. Her Name is B. His Name is C"
echo "$a" | awk -F '[nN]ame |\\.' '{for (i=2; i<=NF; i+=2) print $i}'
The output is
is A
is B
is C
When I store the results into an array, it considers space as array separator and stores value. but i want to store the each line of the output to each array index values as below
x=($(awk -F '[nN]ame |\\.' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$a"))
out puts ,
${x[0]} = is
${x[1]} = A
..and so on...
What i expect is
${x[0]} = is A
${x[1]} = is B
${x[2]} = is C
Also echo ${#x[#]} = 6 ; It should be = 3
OK try below:
i=0
while read v; do
x[i]="$v"
(( i++ ))
done < <(awk -F '[nN]ame |\\.' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$a")
You can also use the mapfile command (bash version 4 or higher):
tempX=$(awk -F '[nN]ame |\\.' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$a")
mapfile -t x <<< "$tempX"
~$ echo "${x[0]}"
is A

Returning a column with grep

I'm attempting to search through a file and return a particular column based on whether a particular value is present in the column. For example, if I search for "Red" in the file:
One Two Three
Cat Dog Chicken
Blue Black Red
Blah Blah Blah
I want returned:
Three
Chicken
Red
Blah
I would even accept just knowing which column grep or any other search command found a match in, so I could use cut, but I can't even find that much.
This is one way:
Store all the data in the matrix a[line][column]. Save the column number in p. Finally print all the items a[line][p].
$ awk -v text=Blue '{for (i=1; i<=NF; i++) {a[NR,i]=$i; if ($i~text) {p=i}}} END{ for (i=1; i<=NR; i++) print a[i,p]}' a
One
Cat
Blue
Blah
$ awk -v text=Red '{for (i=1; i<=NF; i++) {a[NR,i]=$i; if ($i~text) {p=i}}} END{ for (i=1; i<=NR; i++) print a[i,p]}' a
Three
Chicken
Red
Blah
Update
To have exact matches, replace ~ with == (thanks konsolebox):
awk -v text=Blue '{for (i=1; i<=NF; i++) {a[NR,i]=$i; if ($i==text) {p=i}}} END{ for (i=1; i<=NR; i++) print a[i,p]}' a
^^
One possibility, depending on how you respond to the questions I posted in my comment:
awk -v tgt="Red" '
NR==FNR {for (i=1;i<=NF;i++) if ($i==tgt) cols[i]; next}
{sep=""; for (i=1;i<=NF;i++) if (i in cols) {printf "%s%s", sep, $i; sep=OFS}; print ""}
' file file

Resources