How to format lines in a file by using shell language? - linux

The purpose of the program is to make comments in the file begin in the same column.
if a line begins with ; then it doesn't change
if a line begins with code then ; the program should insert space before ; so it will start in the same column with the farthest ;
for example:
Before:
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
After:
; Also change "-f elf " for "-f elf64" in build command. # These two line don't change
; # because they start with ;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
I am a beginner in Linux and shell, so far I have got
echo "Enter the filename"
read name
cat $name | while read line;
do ....
Our teacher told us that we should use two while loop;
Record the longest length before; in the first loop and do the changes in the second while loop.
for now I don't know how to use awk or sed to find the longest length before;
Any ideas?

Here is the solution, assuming that comments in your file begin with the first semi-colon (;) that is not inside a string:
$ cat tst.awk
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
.
$ awk -f tst.awk file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
and below is how you get to it from a naive starting point (I added a semi-colon inside your Hello World! string for testing - make sure to verify all suggested solutions using that).
Note that the above DOES contain 2 loops on the input as your teacher suggests, but you do not need to manually write them as awk provides the loops for you each time it reads the file. If your input file contains tabs or similar then you need to remove them in advance, e.g. by using pr -e -t.
Here is how you get to the above:
If you cannot have semi-colons in other contexts than as the start of comments then all you need is:
$ cat tst.awk
{ cur = index($0,";") }
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
which you'd execute as awk -f tst.awk file file (yes, specify your input file twice).
If your code can contain semi-colons in contexts that are not the start of a comment, e.g. in the middle of a string, then you need to tell us how we can identify semi-colons in comment-start vs other contexts but if it can ONLY appear between singe quotes in strings, e.g. the ; inside 'Hello; World!' below:
$ cat file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
then this is all you need to replace every string with a series of blank chars before finding the first semi-colon (which is then presumably the start of a comment):
$ cat tst.awk
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
...the rest as before...
and finally if you don't want to specify the file name twice on the command line, just duplicate it's name in the ARGV[] array by adding this line at the top:
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }

There are a few printf tricks that make this a manageable project. Take a look at the following. The script formats the assembly file with the assembly code beginning at column 0 to code_width - 1 with the comments following at column code_width lined up after the code. The script is fairly well commented so you should be able to follow along.
The usage is:
bash nameofscript.sh input_file [code_width (default 46char)]
or if you make nameofscript.sh executable, then simply:
./nameofscript.sh input_file [code_width (default 46char)]
NOTE: this script requires Bash, if not run on bash, you may experience inconsistent results. If you have multiple embedded ; in each line, the first will be considered the beginning of a comment. Let me know if you have questions.
#!/bin/bash
## basic function to trim (or stip) the leading & trailing whitespace from a variable
# passed to the fuction. Usage: VAR=$(trimws $VAR)
function trimws {
[ -z "$1" ] && return 1
local strln="${#1}"
[ "$strln" -lt 2 ] && return 1
local trimstr=$1
trimstr="${trimstr#"${trimstr%%[![:space:]]*}"}" # remove leading whitespace characters
trimstr="${trimstr%"${trimstr##*[![:space:]]}"}" # remove trailing whitespace characters
printf "%s" "$trimstr"
return 0
}
afn="$1" # input assembly filename
cwidth=${2:--46} # code field width (- is left justified)
[ "${cwidth:0:1}" = '-' ] || cwidth=-${cwidth} # make sure first char is '-'
[ -r "$afn" ] || { # validate input file is readable
printf "error: file not found: '%s'. Usage: %s <filename> [code_width (46 ch)]\n" "$afn" "${0//\//}"
exit 1
}
## loop through file splitting on ';'
while IFS=$';\n' read -r code comment || [ -n "$comment" ]; do
[ -n "$code" ] || { # if no '$code' comment only line
if [ -n "$comment" ]; then
printf ";%s\n" "$comment" # output the line unchanged
else
printf "\n" # it was a blank line to begin with
fi
continue # read next line
}
code=$(trimws "$code") # trim leading and trailing whitespace
comment=$(trimws "$comment") # same
printf "%*s ; %s\n" "$cwidth" "$code" "$comment" # output new format
done <"$afn"
exit 0
input:
$ cat dat/asmfile.txt
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
output:
$ bash fmtasmcmt.sh
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)

So yeah, use a while loop to find the longest length, given your input in the local file input:
length=0
length2=0
while IFS= read -r -- i; do
(( ${#i} > length2 )) && length2=${#i}
i=${i/\;*/}
(( ${#i} > length )) && length=${#i}
done < ./input
(( length++ )); (( length2++ ))
In your next while loop, detect whether the line starts with ; using [[ ${i:0:1} = ';' ]] and output it, or format the output with awk using the length you determined: awk -F\; -v len=$length '{ printf "%-"len"s %-40s\n", $1, $2}'. Check here (http://www.unix.com/shell-programming-scripting/117543-formatting-output-columns.html) for more info on column formatting.
Edit: In case you didn't figure it out, the second loop looks like:
while IFS= read -r -- i; do
# echo the original if the line starts with ';'
[[ ${i:0:1} = ';' ]] && echo "$i" && continue
# column formatting with awk
(echo "$i" | grep -q ';') && echo "$i" | awk -v len=$length -v len2=$length2 -F\; '{printf "%-"len"s %-"len2"s\n",$1,";"$2}' || echo "$i"
done < ./input
That will give you what you want for the output.

I think I'm going to use this example for my personal formatting!
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $max=0;
$_= <>; # slurp file
while( /\n(.+?)$com/g ){
$max=length($1) if length($1) > $max }
s/\n(.+?)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
usage: align_coms input (after chmod+install)
Options: -com=... to redefine comments (default = ; )
and you can try align_coms -com=# align_coms to align this scripts perl comments :)
Edit 1:
Please see the (wise) comment of #EdMorton about problems when the input has strings (or similar) containing comment starters.
Edit 2: The following version can deal with 'alo; word' "alo; word". It is still
not safe -- real languages have always some extra detail (ex '...\'...', multiline comments) but it is a little bit more robust...
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $nc=qr{ # no comment regex
( '[^'\n]*' # '....'
| "[^"\n]*" # "...."
| . # common chars
)+?
}x;
my $max=0;
$_= <>; # slurp file
while( /\n($nc)$com/g ){
$max=length($1) if length($1) > $max }
s/\n($nc)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file

Related

Replace specific string using sed

I have a code.txt file that contains morse code for example
.- .-.
I have a function called decode inside a bash file called morse as this:
decode (){
sed -i 's/ \.-/A/g' $1
sed -i 's/ \.-./R/g' $1
cat $1
}
When I type in terminal $bash morse decode code.txt
I receive:
AA.
The output I want is :
AR
How can it see separate that the string .- is A and the .-. is R?
If your intention is to encode and decode Morse messages with any tool then something like this will do :
#!/usr/local/bin/python3
import re
alphabet = { 'A':'.-', 'B':'-...', 'C':'-.-.', 'D':'-..', 'E':'.', 'F':'..-.', 'G':'--.', 'H':'....', 'I':'..', 'J':'.---', 'K':'-.-', 'L':'.-..', 'M':'--', 'N':'-.', 'O':'---', 'P':'.--.', 'Q':'--.-', 'R':'.-.', 'S':'...', 'T':'-', 'U':'..-', 'V':'...-', 'W':'.--', 'X':'-..-', 'Y':'-.--', 'Z':'--..', '1':'.----', '2':'..---', '3':'...--', '4':'....-', '5':'.....', '6':'-....', '7':'--...', '8':'---..', '9':'----.', '0':'-----', ', ':'--..--', '.':'.-.-.-', '?':'..--..', '/':'-..-.', '-':'-....-', '(':'-.--.', ')':'-.--.-',' ':' '}
def encode(message):
return "".join([ ( alphabet[letter.upper()] + ' ' ) if letter != ' ' else ' ' for letter in message])
def decode(message):
return "".join([ list(alphabet.keys())[list(alphabet.values()).index(item if item != '|' else ' ')] for item in re.sub(r' {2,}', ' | ',message).split(' ')])
print(encode('THIS IS FINE'))
print(decode('- .... .. ... .. ... ..-. .. -. .'))
Hope it helps too.
Wow interesting idea! Based on #MatiasBarrios alphabet i made this.
#!/bin/bash
string=$1
declare -A morse=(
[A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.'
[F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---'
[K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---'
[P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-'
[U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--'
[Z]='--..'
[1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....'
[6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----'
[(]='-.--.' [)]='-.--.-' [/]='-..-.' [-]='-....-' [+]='.-.-.'
[.]='.-.-.-' [,]='--..--' [?]='..--..' [!]='-.-.--' [ ]=' '
)
morse () {
while [[ "$string" ]]; do
symbol="${string::1}"
printf -- "${morse["${symbol^}"]} "
string="${string:1}"
done
}
demorse () {
declare -A demorse
for item in "${!morse[#]}"; { demorse["${morse["$item"]}"]="$item"; }
while [[ $# ]]; do
printf -- "${demorse["$1"],}"
shift
done
}
case $string in
demorse) shift; demorse "$#";;
* ) morse ;;
esac
Usage
$ ./morse 'hello world!'
.... . .-.. .-.. --- .-- --- .-. .-.. -.. -.-.--
Demorse also worsk but, spaces have to be printed like this ' '
$ ./morse demorse .... . .-.. .-.. --- ' ' .-- --- .-. .-.. -.. -.-.--
hello world!
You need to run s/ \.-\./R/g replacement first. Note the second . must be escaped to only match a dot.
Hence, use
sed 's/ \.-\./R/g;s/ \.-/A/g' file
See the online demo
Or, another way:
sed -e 's/ \.-\./R/g' -e 's/ \.-/A/g' file
Replace the file with "$1" in your code.
UPDATE
Here is the translation of encoding / decoding Python function posted by Matias below:
#!/bin/bash
### Encoding:
declare -A MORSE=( [A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.' [F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---' [K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---' [P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-' [U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--' [Z]='--..' [1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....' [6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----' [',']='--..--' ['.']='.-.-.-' [';']='-.-.-.' [':']='---...' ['?']='..--..' ['!']='-.-.--' ['/']='-..-.' ['-']='-....-' ['+']='.-.-.' ['(']='-.--.' [')']='-.--.-' ['_']='..--.-' ['"']='.-..-.' ["'"]='.----.' ['$']='...-..-' ['#']='.--.-.' ['&']='.-...' [' ']=' ' )
function encode {
res=''
s="$1"
for (( i=0; i<${#s}; i++ )); do
letter="${s:$i:1}"
if [[ "$letter" == ' ' ]]; then
res="${res} "
else
res="${res}${MORSE[${letter^^}]} ";
fi
done
printf "%s" "$res"
}
echo "$(encode "THIS IS FINE")"
### Now, decoding
declare -A MORSEDEC=( ['-.--.-']=')' ['..--..']='?' ['--..--']=', ' ['-....-']='-' ['.-.-.-']='.' ['...--']='3' ['-.--.']='(' ['---..']='8' ['-..-.']='/' ['....-']='4' ['-....']='6' ['----.']='9' ['.----']='1' ['..---']='2' ['.....']='5' ['--...']='7' ['-----']='0' ['-...']='B' ['-..-']='X' ['-.-.']='C' ['--..']='Z' ['--.-']='Q' ['.-..']='L' ['-.--']='Y' ['..-.']='F' ['.--.']='P' ['.---']='J' ['...-']='V' ['....']='H' ['-..']='D' ['---']='O' ['..-']='U' ['...']='S' ['.--']='W' ['-.-']='K' ['.-.']='R' ['--.']='G' ['-.']='N' ['..']='I' ['--']='M' ['.-']='A' [' ']=' ' ['.']='E' ['-']='T' )
function decode {
res=''
tmp="$(sed 's/ \{2,\}/ | /g' <<< "$1")";
for word in $tmp; do
if [[ "$word" == '|' ]]; then
res="${res}${MORSEDEC[' ']}";
else
res="${res}${MORSEDEC[$word]}";
fi
done
printf "%s" "$res"
}
echo "$(decode "- .... .. ... .. ... ..-. .. -. .")"
See Bash demo online.
The easy answer in RE engines that support look-ahead and look-behind would be to treat the spaces as look-ahead and look-behind triggers, but sed does not support this.
Another option that avoids needing to order the letters is to inject extra symbols to help you mark each letter. Say we inject = round each space, then we can replace delimited sequences in any order, and finally get rid of the delimiters:
echo .- .-.|sed -e 's/^\(.*\)$/=\1=/;s/ /= =/g' -e 's/=\.-\.=/=R=/g;s/=\.-=/=A=/g' -e 's/= =//g;s/^=//;s/=$//'
If you have rules that need to preserve multiple spaces, then that can be accommodated.

Bash - Filtering lines according to a condition

I have a file that contains some lines as:
#SRR4293695.199563512 199563512
CAAAANCATTCGTAGACGACCTGCTCTGTNGNTACCNTCAANAGATCNGAAGAGCACACGTCTGAACTCCAGTCAC
+SRR4293695.199563512 199563512
A.AA<#FF)FFFFFFF<<<<FF7FFFFFF#.#<FF<#FFFF#FF<A<#FFFFFFFAFFFFFFAAAFFFFF<FFFF.
#SRR4293695.199563513 199563513
CTAAANCATTCGTAGACGACCTGCTT
+SRR4293695.199563513 199563513
<AAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563514 199563514
CCAACNTCATAGAGGGACAAGTGGCGATCNGNC
+SRR4293695.199563514 199563514
AAAAA#<F.F<<FA.F7AA.)<FAFA..7#.#A
#SRR4293695.199563515 199563515
TCGCGNCCTCAGATCAGACGTGGCGA
+SRR4293695.199563515 199563515
AAAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563516 199563516
TGACCNGGGTCCGGTGCGGAGAGCCCTTC
+SRR4293695.199563516 199563516
AAAAA#FAFFFF<F.FFAA.F)FFFFFAF
#SRR4293695.199563517 199563517
AAATGNTCATCGACACTTCGAACGCACT
+SRR4293695.199563517 199563517
AA)AA#F<FFFFFFAFFFFF<)FFFAFF
#SRR4293695.199563518 199563518
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563518 199563518
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
#SRR4293695.199563519 199563519
AAAACNATTCGTAGACGNCCTGCTTNTGTNGNCACCNTNANNANNTCNGNAGAGCNCACNTCTGAACTCNAGTCAC
+SRR4293695.199563519 199563519
AAAAA#FFFFFFFFFFF#FFFFFFF#FF<#F#F.FF#7#F##F##A)#A#FF<F)#AAF#<FFFFAFF<#<FFFFF
#SRR4293695.199563520 199563520
GAAGCNGCACAGCTGGCNTTGGAGCNGANNCNGTAGNCNCNNTNNATNGNTCGGNNGAGNACACGTCTGNACTCCA
+SRR4293695.199563520 199563520
AAAAA#FFFFFFFFFFF#FFFFFFF#FF##A#FFFF#F#F##<##FF#F#FFFF##FFF#FFFFFFFFF#FFFFFF
#SRR4293695.199563521 199563521
TGGTCNGTGGGGAGTCGNCGCCTGCNTANNANTGTANGNANNANNAANANATCGNNAGANCACACGTCTNAACTCC
+SRR4293695.199563521 199563521
AAAAA#FFFFFFFFFFF#FFFFFFF#FF##F#FFFF#F#F##A##FF#A#FFFF##<FF#FFFFFFFFF#F<FFFF
#SRR4293695.199563522 199563522
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563522 199563522
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
Then, I would like to filter these lines according to a condition :
taking in consideration the length of even lines: if that length is > 34 then that line and the preceding line must be removed.
I already did an algorithm: using a while to read all lines in the file, checking the condition and retaining only lines with length < 34. The problem is that it is taking some time.
inputFile=$1
outputFile=$2
while read first_line; read second_line
do
lread=${#second_line}
if [[ "$lread" -le 34 ]] ; then
echo $first_line >> $outputFile
echo $second_line >> $outputFile
fi
done < $inputFile
# This is for the last two lines
lread=${#second_line}
if [[ "$lread" -le 34 ]] ; then
echo $first_line >> $outputFile
echo $second_line >> $outputFile
fi
I was wondering if there is not another way, quicker.
The expected output:
#SRR4293695.199563513 199563513
CTAAANCATTCGTAGACGACCTGCTT
+SRR4293695.199563513 199563513
<AAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563514 199563514
CCAACNTCATAGAGGGACAAGTGGCGATCNGNC
+SRR4293695.199563514 199563514
AAAAA#<F.F<<FA.F7AA.)<FAFA..7#.#A
#SRR4293695.199563515 199563515
TCGCGNCCTCAGATCAGACGTGGCGA
+SRR4293695.199563515 199563515
AAAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563516 199563516
TGACCNGGGTCCGGTGCGGAGAGCCCTTC
+SRR4293695.199563516 199563516
AAAAA#FAFFFF<F.FFAA.F)FFFFFAF
#SRR4293695.199563517 199563517
AAATGNTCATCGACACTTCGAACGCACT
+SRR4293695.199563517 199563517
AA)AA#F<FFFFFFAFFFFF<)FFFAFF
#SRR4293695.199563518 199563518
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563518 199563518
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
#SRR4293695.199563522 199563522
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563522 199563522
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
Thanks in advance!
Here's an awk solution:
awk '!last { last = $0; next } length($0)<=34 { print last; print } { last = "" }' YOURFILE
The output is your expected output.
sed method:
sed -n 'h;n;/.\{34,\}/!{x;G;p}' inputfile > outputfile
h;n The odd numbered lines go into the hold buffer, then get the next line.
The resulting even numbered lines are checked for length. If they're not over 34 chars, the hold buffer is exchanged with the pattern space, then appended to it, (x;G;), so that both lines are in the pattern space, and printed.

How to print something to the right-most of the console in Linux shell script

Say I want to search for "ERROR" within a bunch of log files.
I want to print one line for every file that contains "ERROR".
In each line, I want to print the log file path on the left-most edge while the number of "ERROR" on the right-most edge.
I tried using:
printf "%-50s %d" $filePath $errorNumber
...but it's not perfect, since the black console can vary greatly, and the file path sometimes can be quite long.
Just for the pleasure of the eyes, but I am simply incapable of doing so.
Can anyone help me to solve this problem?
Using bash and printf:
printf "%-$(( COLUMNS - ${#errorNumber} ))s%s" \
"$filePath" "$errorNumber"
How it works:
$COLUMNS is the shell's terminal width.
printf does left alignment by putting a - after the %. So printf "%-25s%s\n" foo bar prints "foo", then 22 spaces, then "bar".
bash uses the # as a parameter length variable prefix, so if x=foo, then ${#x} is 3.
Fancy version, suppose the two variables are longer than will fit in one column; if so print them on as many lines as are needed:
printf "%-$(( COLUMNS * ( 1 + ( ${#filePath} + ${#errorNumber} ) / COLUMNS ) \
- ${#errorNumber} ))s%s" "$filePath" "$errorNumber"
Generalized to a function. Syntax is printfLR foo bar, or printfLR < file:
printfLR() { if [ "$1" ] ; then echo "$#" ; else cat ; fi |
while read l r ; do
printf "%-$(( ( 1 + ( ${#l} + ${#r} ) / COLUMNS ) \
* COLUMNS - ${#r} ))s%s" "$l" "$r"
done ; }
Test with:
# command line args
printfLR foo bar
# stdin
fortune | tr -s ' \t' '\n\n' | paste - - | printfLR

remove a line with special character with given pattern

I'm trying to get the lines with special characters which is not prefixed with \. Below are the special characters:
^$%.*+?!(){}[]|\
I need to check all the above special characters which is not prefixed with \ in 2nd column. I'm trying with awk to complete this, but no luck. I want the output as below.
input.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
output.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
7th row and 5 row are in output.txt because there is 2 special charcters(one is with backslash another without backslash)
"final" final edit: I wanted to allow "\x" whatever x is, but the OP seems to not want that, so I fixed it too.
After trying to find a "clever" regexp (which choked on "\\" or any impair number of "\", but apparently worked for the rest...)
I re-wrote it in awk to do it in a "state automata" way:
The idea:
If in "normal mode", we encounter a special char other than "\" ? : we print the line!
If in "normal mode", we encounter a "\" ? : we enter "escaped mode", and in that mode, ignore the next char
(but if we don't have a next char, we need to print that line too!)
the script:
awk -F"," '
{
IN_ESCAPED_MODE=0 ;
for (i=1 ; i<=length($2) ; i++)
{ char=substr($2,i,1)
if ( IN_ESCAPED_MODE == 0)
{ if ( index(".^$%*+?!(){}[]|",char) > 0 )
{ print $0 ; break ;
}
if ( index("\\" , char ) > 0 )
{ IN_ESCAPED_MODE=1 ; continue ;
}
}
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
}
if (IN_ESCAPED_MODE == 1)
{
print $0 ; break ;
}
}
' input.txt > output.txt
With this change, you will have the same output as the OP, which prints a line when it contains "\e" for example... Which I find weird: to me "\e" is fine, we can "escape" anything?
With that input:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
10,\
11,\\
12,\\\
13,.
14,\.
15,..
16,^
17,\^
18,$
19,\$
20,%
21,\%
22,*
23,\*
24,+
25,\+
26,?
27,\?
28,!
29,\!
30,(
31,\(
32,)
33,\)
34,{
35,\{
36,}
37,\}
38,[
39,\[
40,]
41,\]
42,|
43,\|
it outputs:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
10,\
12,\\\
13,.
15,..
16,^
18,$
20,%
22,*
24,+
26,?
28,!
30,(
32,)
34,{
36,}
38,[
40,]
42,|
(so it appears to really work this time !)
If you prefer to allow any "\x" and NOT only if "x" is a SPECIAL char:
change the "middle lines":
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
into:
if ( IN_ESCAPED_MODE == 1)
{ IN_ESCAPED_MODE=0 ; continue ;
}
for historical reason : the regexp (which worked in "most" cases but choked in some, for example if there was "\\") :
egrep '[^\][].^$%*+?!(){}[|]|[^\][\][^].^$%*+?!(){}[|\]' input.txt > output.txt
But that one will not display the line 12, for example...
A good read: http://www.regular-expressions.info/charclass.html .... and http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html (scary ...)
You can try the following:
awk '
{
line=$0
sub(/\\[\^$%.*+?!(){}\[\]|\\]/,"")
if(/[\^$%.*+?!(){}\[\]|\\]/)
print line
}' input.txt
sed '/[]\\^$%.*+?!(){}[|]/ {
h
s/\\[]\\^$%.*+?!(){}[|]/_/g
/[]\\^$%.*+?!(){}[|]/ {
x
p
}
}' YourFile
Depending of shell and sed could be interpreted (especialy the \) differently. Works on my AIX/KSH

grep lines before and after in aix/ksh shell

I want to extract lines before and after a matched pattern.
eg: if the file contents are as follows
absbasdakjkglksagjgj
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
klgdsgklsdgkldskgdsg
I need find hello and display line before and after 'hello'
the output should be
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
This is possible with GNU but i need a method that works in AIX / KSH SHELL WHERE NO GNU IS INSTALLED.
sed -n '/hello/{x;G;N;p;};h' filename
I've found it is generally less frustrating to build the GNU coreutils once, and benefit from many more features http://www.gnu.org/software/coreutils/
Since you'll have Perl on the machine, you could use the following code, but you'd probably do better to install the GNU utilities. This has options -b n1 for lines before and -f n1 for lines following the match. It works with PCRE matches (so if you want case-insensitive matching, add an i after the regex instead using a -i option. I haven't implemented -v or -l; I didn't need those.
#!/usr/bin/env perl
#
# #(#)$Id: sgrep.pl,v 1.7 2013/01/28 02:07:18 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use warnings;
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}
I had a situation where I was stuck with a slow telnet session on a tablet, believe it or not, and I couldn't write a Perl script very easily with that keyboard. I came up with this hacky maneuver that worked in a pinch for me with AIX's limited grep. This won't work well if your grep returns hundreds of lines, but if you just need one line and one or two above/below it, this could do it. First I ran this:
cat -n filename |grep criteria
By including the -n flag, I see the line number of the data I'm seeking, like this:
2543 my crucial data
Since cat gives the line number 2 spaces before and 1 space after, I could grep for the line number right before it like this:
cat -n filename |grep " 2542 "
I ran this a couple of times to give me lines 2542 and 2544 that bookended line 2543. Like I said, it's definitely fallable, like if you have reams of data that might have " 2542 " all over the place, but just to grab a couple of quick lines, it worked well.

Resources