Read a file for specific string and read lines after the match - string

I have a file which looks like:
AA
2
3
4
CCC
111
222
333
XXX
12
23
34
I am looking for awk command to search for a string 'CCC' from above and print all the lines that occur after 'CCC' but stop reading as soon as i reach 'XXX'.
A very simple command does the read for me but does not stop at XXX.
awk '$0 == "CCC" {i=1;next};i && i++' c.out

Could you please try following.
Solution 1st: With sed.
sed -n '/CCC/,/XXX/p' Input_file
Solution 2nd: With awk.
awk '/CCC/{flag=1} flag; /XXX/{flag=""}' Input_file
Solution 3rd: In case you want to print from string CCC to XXX but not these strings then do following.
awk '/CCC/{flag=1;next} /XXX/{flag=""} flag' Input_file

"Do something between this and that" can easily be solved with a range pattern:
awk '/CCC/,/XXX/' # prints everything between CCC and XXX (inclusive)
But it's not exactly what you've asked. You wanted to print everything after CCC and quit (stop reading) on XXX. This translates to
awk '/XXX/{exit};f;/CCC/{f=1}'

Related

linux command to delete the last column of csv

How can I write a linux command to delete the last column of tab-delimited csv?
Example input
aaa bbb ccc ddd
111 222 333 444
Expected output
aaa bbb ccc
111 222 333
It is easy to remove the fist field instead of the last. So we reverse the content, remove the first field, and then revers it again.
Here is an example for a "CSV"
rev file1 | cut -d "," -f 2- | rev
Replace the "file1" and the "," with your file name and the delimiter accordingly.
You can use cut for this. You specify a delimiter with option -d and then give the field numbers (option -f) you want to have in the output. Each line of the input gets treated individually:
cut -d$'\t' -f 1-6 < my.csv > new.csv
This is according to your words. Your example looks more like you want to strip a column in the middle:
cut -d$'\t' -f 1-3,5-7 < my.csv > new.csv
The $'\t' is a bash notation for the string containing the single tab character.
You can use below command which will delete the last column of tab-delimited csv irrespective of field numbers,
sed -r 's/(.*)\s+[^\s]+$/\1/'
for example:
echo "aaa bbb ccc ddd 111 222 333 444" | sed -r 's/(.*)\s+[^\s]+$/\1/'

linux/unix convert delimited file to fixed width

I have a requirement to convert a delimited file to fixed-width file, details as follows.
Input file sample:
AAA|BBB|C|1234|56
AA1|BB2|DD|12345|890
Output file sample:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
Details of field positions
Field 1 Start at position 1 and length should be 5
Field 2 start at position 6 and length should be 6
Field 3 Start at position 12 and length should be 4
Field 4 Start at position 16 and length should be 6
Field 5 Start at position 22 and length should be 3
Another awk solution:
echo -e "AAA|BBB|C|1234|56\nAA1|BB2|DD|12345|890" |
awk -F '|' '{printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5}'
Note the - before the %-3s in the printf statement, which will left-align the fields, as required in the question. Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
With the following awk command you can achive your goal:
awk 'BEGIN { RS=" "; FS="|" } { printf "%5s%6s%4s%6s%3s\n",$1,$2,$3,$4,$5 }' your_input_file
Your record separator (RS) is a space and your field separator (FS) is a pipe (|) character. In order to parse your data correctly we set them in the BEGIN statement (before any data is read). Then using printf and the desired format characters we output the data in the desired format.
Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345890
Update:
I just saw your edits on the input file format (previously they seemed different). If your input data records are separated with a new line then simply remove the RS=" "; part from the above one-liner and apply the - modifiers for the format characters to left align your fields:
awk 'BEGIN { FS="|" } { printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5 }' your_input_file

sed command to print lines between two patterns

I am trying to print lines between two patterns through sed command. But I want to include the line containing Pattern1 in the result and exclude the Pattern2.
For ex:
/PAT1/
line 1
line 2
line 3
/PAT2/
The desired output is :
/PAT1/
line 1
line 2
line 3
I have tried this :
sed -n '/PAT1/,/PAT2/{/PAT2/{d};p}' Input_File
But it is excluding both the patterns.
You can do it with awk: awk '/patt1/{flag=1}/patt2/{flag=0}flag' input_file
If input_file is:
111
222
333
444
555
awk '/222/{flag=1}/444/{flag=0}flag' input_file
gives:
222
333

awk how to print the rest

my file contains lines like this
any1 aaa bbb ccc
The delimiter is space. the number of words in the line is unknown
I want to put the first word into a var1. It's simple with
awk '{print $1}'
Now I want to put the rest of the line into a var2 with awk.
How I can print the rest of the line with awk ?
Better to use read here:
s="any1 aaa bbb ccc"
read var1 var2 <<< "$s"
echo "$var1"
any1
echo "$var2"
aaa bbb ccc
For awk only solution use:
echo "$s" | awk '{print $1; print substr($0, index($0, " ")+1)}'
any1
aaa bbb ccc
$ var=$(awk '{sub(/^[^[:space:]]+[[:space:]]+/,"")}1' file)
$ echo "$var"
aaa bbb ccc
or in general to skip some number of fields use a RE interval:
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1' file
aaa bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' file
bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){3}/,"")}1' file
ccc
Note that doing this gets much more complicated if you have a FS that's more than a single char, and the above is just for the default FS since it additionally skips any leading blanks if present (remove the first [[:space:]]* if you have a non-default but still single-char FS).
awk solution:
awk '{$1 = ""; print $0;}'`

deleting lines from a text file with bash

I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
I would like to print the second column(number) from this file if marks>60. The output will be 3,4. Next, read the ff.txt file in BB folder and delete the lines containing numbers 3,4. How can I do this with bash?
files in BB folder looks like this. second column is the number.
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
awk 'FNR == NR && $3 > 60 {array[$2] = 1; next} {if ($2 in array) next; print}' AA/ff.txt BB/filename
This works, but is not efficient (is that matter?)
gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done
Of course, the second awk can be replaced with sed -i
For multi files:
ls -1 AA/*.txt | while read file
do
bn=`basename $file`
gawk 'BEGIN {getline} $3>60{print $2}' AA/$bn | while read number
do
gawk -v number=$number '$2 != number' BB/$bn > /tmp/$bn
mv /tmp/$bn BB/$bn
done
done
I didn't test it, so if there is a problem, please comment.

Resources