merge specific line using awk and sed - linux

I want to merge specific line
Input :
AAA
BBB
CCC
DDD
EEE
AAA
BBB
DDD
CCC
EEE
Output Should be
AAA
BBB
CCC DDD
EEE
AAA
BBB
DDD
CCC EEE
I want to search CCC and merge next line with it.
I have tried with awk command but didn't get success

Use awk patterns, if the line matches /CCC/ then print the line with a space at the end and go on to the next line. Otherwise (1), print the line.
awk '/CCC/ { printf("%s ", $0); next } 1' file

Using sed:
sed '/CCC/ { N; s/\n/ / }' file
Using awk:
awk '{ ORS=(/CCC/ ? FS : RS) }1' file

Related

shell duplicate spaces in file

Is it possible to remove multiple spaces from a text file and save the changes in the same file using awk or grep?
Input example:
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
Simply reset value of $1 to again $1 which will allow OFS to come into picture and will add proper spaces into lines.
awk '{$1=$1} 1' Input_file
EDIT: Since OP mentioned that what if we want to keep only starting spaces then try following.
awk '
match($0,/^ +/){
spaces=substr($0,RSTART,RLENGTH)
}
{
$1=$1
$1=spaces $1
spaces=""
}
1
' Input_file
Using sed
sed -i -E 's#[[:space:]]+# #g' < input file
For removing spaces at the start
sed -i -E 's#[[:space:]]+# #g; s#^ ##g' < input file
Demo:
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$sed -i -E 's#[[:space:]]+# #g' test.txt
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$

How to compare two columns in same file and store the difference in new file with the unchanged column according to it?

Row Actual Expected
1 AAA BBB
2 CCC CCC
3 DDD EEE
4 FFF GGG
5 HHH HHH
I want to compare actual and expected and store the difference in a file. Like
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
I have used awk -F, '{if ($2!=$3) {print $1,$2,$3}}' Sample.csv It will only compare Int values not String value
You can use AWK to do this
awk '{if($2!=$3) print $0}' oldfile > newfile
where
$2 and $3 are second and third columns
!= means second and third columns does not match
$0 means whole line
> newfile redirects to new file
I prefer an awk solution (can handle more fields and easier to understand), but you could use
sed -r '/\t([^ ]*)\t\1$/d' Sample.csv
Assuming the file uses tab or some other delimiter to separate the columns, then tsv-filter from eBay's TSV Utilities supports this type of field comparison directly. For the file above:
$ tsv-filter --header --ff-str-ne 2:3 file.tsv
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
The --ff-str-ne option compares two fields in a row for non-equal strings.
Disclaimer: I'm the author.

get paragraph with awk, and start-of-line regexp

I use awk to get paragraphs from a textfile, like so:
awk -v RS='' -v ORS='\n\n' '/pattern/' ./textfile
Say I have the following textfile:
aaa bbb ccc
aaa bbb ccc
aaa bbb ccc
aaa ccc
bbb aaa ccc
bbb aaa ccc
ccc bbb aaa
ccc bbb aaa
ccc bbb aaa
Now I only want the paragraph with one of the (original) lines starting with "bbb" (hence the second paragraph). However - using regexp ^ will not work anymore, (I presume) because of the RS='' line; awk now only matches to the begin of the paragraph.
Is there another way?
^ means start-of-string. You want start-of-line which is (^|\n), e.g.:
$ awk -v RS='' -v ORS='\n\n' '/(^|\n)bbb/' file
aaa ccc
bbb aaa ccc
bbb aaa ccc

In Linux command line console, how to get the sub-string from a file?

The content of the file is fixed.
Example:
2016-03-28T00:02 AAA 2016-03-28T00:03 ADASDASD
2016-03-28T00:03 BBB 2016-03-28T00:04 FAFAFDAS
2016-03-28T00:05 CCC 2016-03-28T00:06 SDAFAFAS
....
Which command can I use to get all sub-strings, AAA, BBB, CCC, etc.
you can use cut and awk and perl for this.
cat >> file.data << EOF
2016-03-28T00:02 AAA 2016-03-28T00:03 ADASDASD
2016-03-28T00:03 BBB 2016-03-28T00:04 FAFAFDAS
2016-03-28T00:05 CCC 2016-03-28T00:06 SDAFAFAS
EOF
AWK
awk '{ print $2 }' file.data
AAA
BBB
CCC
CUT
cut -d " " -f2 file.data
AAA
BBB
CCC
PERL
perl -alne 'print $F[1] ' file.data
AAA
BBB
CCC
You can use cut:
cut -d' ' -f 2 file
You can use AWK for this:
jayforsythe$ cat > file
2016-03-28T00:02 AAA 2016-03-28T00:03 ADASDASD
2016-03-28T00:03 BBB 2016-03-28T00:04 FAFAFDAS
2016-03-28T00:05 CCC 2016-03-28T00:06 SDAFAFAS
jayforsythe$ awk '{ print $2 }' file
AAA
BBB
CCC
To save the result to another file, simply add the redirection operator:
jayforsythe$ awk '{ print $2 }' file > file2

awk how to print the rest

my file contains lines like this
any1 aaa bbb ccc
The delimiter is space. the number of words in the line is unknown
I want to put the first word into a var1. It's simple with
awk '{print $1}'
Now I want to put the rest of the line into a var2 with awk.
How I can print the rest of the line with awk ?
Better to use read here:
s="any1 aaa bbb ccc"
read var1 var2 <<< "$s"
echo "$var1"
any1
echo "$var2"
aaa bbb ccc
For awk only solution use:
echo "$s" | awk '{print $1; print substr($0, index($0, " ")+1)}'
any1
aaa bbb ccc
$ var=$(awk '{sub(/^[^[:space:]]+[[:space:]]+/,"")}1' file)
$ echo "$var"
aaa bbb ccc
or in general to skip some number of fields use a RE interval:
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1' file
aaa bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' file
bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){3}/,"")}1' file
ccc
Note that doing this gets much more complicated if you have a FS that's more than a single char, and the above is just for the default FS since it additionally skips any leading blanks if present (remove the first [[:space:]]* if you have a non-default but still single-char FS).
awk solution:
awk '{$1 = ""; print $0;}'`

Resources