I have d1="11" and d2="07". I want to convert d1 and d2 to integers and perform d1-d2. How do I do this in UNIX?
d1 - d2 currently returns "11-07" as result for me.
The standard solution:
expr $d1 - $d2
You can also do:
echo $(( d1 - d2 ))
but beware that this will treat 07 as an octal number! (so 07 is the same as 7, but 010 is different than 10).
Any of these will work from the shell command line. bc is probably your most straight forward solution though.
Using bc:
$ echo "$d1 - $d2" | bc
Using awk:
$ echo $d1 $d2 | awk '{print $1 - $2}'
Using perl:
$ perl -E "say $d1 - $d2"
Using Python:
$ python -c "print $d1 - $d2"
all return
4
An answer that is not limited to the OP's case
The title of the question leads people here, so I decided to answer that question for everyone else since the OP's described case was so limited.
TL;DR
I finally settled on writing a function.
If you want 0 in case of non-int:
int(){ printf '%d' ${1:-} 2>/dev/null || :; }
If you want [empty_string] in case of non-int:
int(){ expr 0 + ${1:-} 2>/dev/null||:; }
If you want find the first int or [empty_string]:
int(){ expr ${1:-} : '[^0-9]*\([0-9]*\)' 2>/dev/null||:; }
If you want find the first int or 0:
# This is a combination of numbers 1 and 2
int(){ expr ${1:-} : '[^0-9]*\([0-9]*\)' 2>/dev/null||:; }
If you want to get a non-zero status code on non-int, remove the ||: (aka or true) but leave the ;
Tests
# Wrapped in parens to call a subprocess and not `set` options in the main bash process
# In other words, you can literally copy-paste this code block into your shell to test
( set -eu;
tests=( 4 "5" "6foo" "bar7" "foo8.9bar" "baz" " " "" )
test(){ echo; type int; for test in "${tests[#]}"; do echo "got '$(int $test)' from '$test'"; done; echo "got '$(int)' with no argument"; }
int(){ printf '%d' ${1:-} 2>/dev/null||:; };
test
int(){ expr 0 + ${1:-} 2>/dev/null||:; }
test
int(){ expr ${1:-} : '[^0-9]*\([0-9]*\)' 2>/dev/null||:; }
test
int(){ printf '%d' $(expr ${1:-} : '[^0-9]*\([0-9]*\)' 2>/dev/null)||:; }
test
# unexpected inconsistent results from `bc`
int(){ bc<<<"${1:-}" 2>/dev/null||:; }
test
)
Test output
int is a function
int ()
{
printf '%d' ${1:-} 2> /dev/null || :
}
got '4' from '4'
got '5' from '5'
got '0' from '6foo'
got '0' from 'bar7'
got '0' from 'foo8.9bar'
got '0' from 'baz'
got '0' from ' '
got '0' from ''
got '0' with no argument
int is a function
int ()
{
expr 0 + ${1:-} 2> /dev/null || :
}
got '4' from '4'
got '5' from '5'
got '' from '6foo'
got '' from 'bar7'
got '' from 'foo8.9bar'
got '' from 'baz'
got '' from ' '
got '' from ''
got '' with no argument
int is a function
int ()
{
expr ${1:-} : '[^0-9]*\([0-9]*\)' 2> /dev/null || :
}
got '4' from '4'
got '5' from '5'
got '6' from '6foo'
got '7' from 'bar7'
got '8' from 'foo8.9bar'
got '' from 'baz'
got '' from ' '
got '' from ''
got '' with no argument
int is a function
int ()
{
printf '%d' $(expr ${1:-} : '[^0-9]*\([0-9]*\)' 2>/dev/null) || :
}
got '4' from '4'
got '5' from '5'
got '6' from '6foo'
got '7' from 'bar7'
got '8' from 'foo8.9bar'
got '0' from 'baz'
got '0' from ' '
got '0' from ''
got '0' with no argument
int is a function
int ()
{
bc <<< "${1:-}" 2> /dev/null || :
}
got '4' from '4'
got '5' from '5'
got '' from '6foo'
got '0' from 'bar7'
got '' from 'foo8.9bar'
got '0' from 'baz'
got '' from ' '
got '' from ''
got '' with no argument
Note
I got sent down this rabbit hole because the accepted answer is not compatible with set -o nounset (aka set -u)
# This works
$ ( number="3"; string="foo"; echo $((number)) $((string)); )
3 0
# This doesn't
$ ( set -u; number="3"; string="foo"; echo $((number)) $((string)); )
-bash: foo: unbound variable
let d=d1-d2;echo $d;
This should help.
Use this:
#include <stdlib.h>
#include <string.h>
int main()
{
const char *d1 = "11";
int d1int = atoi(d1);
printf("d1 = %d\n", d1);
return 0;
}
etc.
Related
I have a code.txt file that contains morse code for example
.- .-.
I have a function called decode inside a bash file called morse as this:
decode (){
sed -i 's/ \.-/A/g' $1
sed -i 's/ \.-./R/g' $1
cat $1
}
When I type in terminal $bash morse decode code.txt
I receive:
AA.
The output I want is :
AR
How can it see separate that the string .- is A and the .-. is R?
If your intention is to encode and decode Morse messages with any tool then something like this will do :
#!/usr/local/bin/python3
import re
alphabet = { 'A':'.-', 'B':'-...', 'C':'-.-.', 'D':'-..', 'E':'.', 'F':'..-.', 'G':'--.', 'H':'....', 'I':'..', 'J':'.---', 'K':'-.-', 'L':'.-..', 'M':'--', 'N':'-.', 'O':'---', 'P':'.--.', 'Q':'--.-', 'R':'.-.', 'S':'...', 'T':'-', 'U':'..-', 'V':'...-', 'W':'.--', 'X':'-..-', 'Y':'-.--', 'Z':'--..', '1':'.----', '2':'..---', '3':'...--', '4':'....-', '5':'.....', '6':'-....', '7':'--...', '8':'---..', '9':'----.', '0':'-----', ', ':'--..--', '.':'.-.-.-', '?':'..--..', '/':'-..-.', '-':'-....-', '(':'-.--.', ')':'-.--.-',' ':' '}
def encode(message):
return "".join([ ( alphabet[letter.upper()] + ' ' ) if letter != ' ' else ' ' for letter in message])
def decode(message):
return "".join([ list(alphabet.keys())[list(alphabet.values()).index(item if item != '|' else ' ')] for item in re.sub(r' {2,}', ' | ',message).split(' ')])
print(encode('THIS IS FINE'))
print(decode('- .... .. ... .. ... ..-. .. -. .'))
Hope it helps too.
Wow interesting idea! Based on #MatiasBarrios alphabet i made this.
#!/bin/bash
string=$1
declare -A morse=(
[A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.'
[F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---'
[K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---'
[P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-'
[U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--'
[Z]='--..'
[1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....'
[6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----'
[(]='-.--.' [)]='-.--.-' [/]='-..-.' [-]='-....-' [+]='.-.-.'
[.]='.-.-.-' [,]='--..--' [?]='..--..' [!]='-.-.--' [ ]=' '
)
morse () {
while [[ "$string" ]]; do
symbol="${string::1}"
printf -- "${morse["${symbol^}"]} "
string="${string:1}"
done
}
demorse () {
declare -A demorse
for item in "${!morse[#]}"; { demorse["${morse["$item"]}"]="$item"; }
while [[ $# ]]; do
printf -- "${demorse["$1"],}"
shift
done
}
case $string in
demorse) shift; demorse "$#";;
* ) morse ;;
esac
Usage
$ ./morse 'hello world!'
.... . .-.. .-.. --- .-- --- .-. .-.. -.. -.-.--
Demorse also worsk but, spaces have to be printed like this ' '
$ ./morse demorse .... . .-.. .-.. --- ' ' .-- --- .-. .-.. -.. -.-.--
hello world!
You need to run s/ \.-\./R/g replacement first. Note the second . must be escaped to only match a dot.
Hence, use
sed 's/ \.-\./R/g;s/ \.-/A/g' file
See the online demo
Or, another way:
sed -e 's/ \.-\./R/g' -e 's/ \.-/A/g' file
Replace the file with "$1" in your code.
UPDATE
Here is the translation of encoding / decoding Python function posted by Matias below:
#!/bin/bash
### Encoding:
declare -A MORSE=( [A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.' [F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---' [K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---' [P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-' [U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--' [Z]='--..' [1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....' [6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----' [',']='--..--' ['.']='.-.-.-' [';']='-.-.-.' [':']='---...' ['?']='..--..' ['!']='-.-.--' ['/']='-..-.' ['-']='-....-' ['+']='.-.-.' ['(']='-.--.' [')']='-.--.-' ['_']='..--.-' ['"']='.-..-.' ["'"]='.----.' ['$']='...-..-' ['#']='.--.-.' ['&']='.-...' [' ']=' ' )
function encode {
res=''
s="$1"
for (( i=0; i<${#s}; i++ )); do
letter="${s:$i:1}"
if [[ "$letter" == ' ' ]]; then
res="${res} "
else
res="${res}${MORSE[${letter^^}]} ";
fi
done
printf "%s" "$res"
}
echo "$(encode "THIS IS FINE")"
### Now, decoding
declare -A MORSEDEC=( ['-.--.-']=')' ['..--..']='?' ['--..--']=', ' ['-....-']='-' ['.-.-.-']='.' ['...--']='3' ['-.--.']='(' ['---..']='8' ['-..-.']='/' ['....-']='4' ['-....']='6' ['----.']='9' ['.----']='1' ['..---']='2' ['.....']='5' ['--...']='7' ['-----']='0' ['-...']='B' ['-..-']='X' ['-.-.']='C' ['--..']='Z' ['--.-']='Q' ['.-..']='L' ['-.--']='Y' ['..-.']='F' ['.--.']='P' ['.---']='J' ['...-']='V' ['....']='H' ['-..']='D' ['---']='O' ['..-']='U' ['...']='S' ['.--']='W' ['-.-']='K' ['.-.']='R' ['--.']='G' ['-.']='N' ['..']='I' ['--']='M' ['.-']='A' [' ']=' ' ['.']='E' ['-']='T' )
function decode {
res=''
tmp="$(sed 's/ \{2,\}/ | /g' <<< "$1")";
for word in $tmp; do
if [[ "$word" == '|' ]]; then
res="${res}${MORSEDEC[' ']}";
else
res="${res}${MORSEDEC[$word]}";
fi
done
printf "%s" "$res"
}
echo "$(decode "- .... .. ... .. ... ..-. .. -. .")"
See Bash demo online.
The easy answer in RE engines that support look-ahead and look-behind would be to treat the spaces as look-ahead and look-behind triggers, but sed does not support this.
Another option that avoids needing to order the letters is to inject extra symbols to help you mark each letter. Say we inject = round each space, then we can replace delimited sequences in any order, and finally get rid of the delimiters:
echo .- .-.|sed -e 's/^\(.*\)$/=\1=/;s/ /= =/g' -e 's/=\.-\.=/=R=/g;s/=\.-=/=A=/g' -e 's/= =//g;s/^=//;s/=$//'
If you have rules that need to preserve multiple spaces, then that can be accommodated.
I'm trying to trim only the left half of a string that is given to ltrim() as an argument. This is my current code.
ltrim()
{
string=${1}
divider=$((${#string} / 2))
trimrule=${2}
string_left=${string:0:$divider}
string_right=${string:$divider}
echo ${string:$divider} ## My own quick debug lines
echo ${string:0:$divider} ## My own quick debug lines
if [ $# -ne 2 ]
then
printf "%d argument(s) entered. 2 required.\n" "$#"
else
while :
do
case $string_left in
${2}*) string_left=${string_left#?} ;;
*${2}) string_left=${string_left%?} ;;
*) break ;;
esac
done
printf "Left side string is %s\n" "${string_left}"
fi
}
However, when I enter ltrim abcdefghijklmnopq abc the shell returns the following:
ijklmnopq
abcdefgh
Left side string is bcdefgh
So I only lost 'a' out of the word while I'm looking to get 'defgh' as a result. What am I doing wrong?
function substr_remove() {
echo "${1//$2/}"
}
substr_remove carfoobar123foo456 foo
Output:
carbar123456
Are you searching for something like this?
function ltrim() {
echo ${1##$2}
}
ltrim abcdefghijklmnopq abc # Prints: defghijklmnopq
The purpose of the program is to make comments in the file begin in the same column.
if a line begins with ; then it doesn't change
if a line begins with code then ; the program should insert space before ; so it will start in the same column with the farthest ;
for example:
Before:
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
After:
; Also change "-f elf " for "-f elf64" in build command. # These two line don't change
; # because they start with ;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
I am a beginner in Linux and shell, so far I have got
echo "Enter the filename"
read name
cat $name | while read line;
do ....
Our teacher told us that we should use two while loop;
Record the longest length before; in the first loop and do the changes in the second while loop.
for now I don't know how to use awk or sed to find the longest length before;
Any ideas?
Here is the solution, assuming that comments in your file begin with the first semi-colon (;) that is not inside a string:
$ cat tst.awk
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
.
$ awk -f tst.awk file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
and below is how you get to it from a naive starting point (I added a semi-colon inside your Hello World! string for testing - make sure to verify all suggested solutions using that).
Note that the above DOES contain 2 loops on the input as your teacher suggests, but you do not need to manually write them as awk provides the loops for you each time it reads the file. If your input file contains tabs or similar then you need to remove them in advance, e.g. by using pr -e -t.
Here is how you get to the above:
If you cannot have semi-colons in other contexts than as the start of comments then all you need is:
$ cat tst.awk
{ cur = index($0,";") }
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
which you'd execute as awk -f tst.awk file file (yes, specify your input file twice).
If your code can contain semi-colons in contexts that are not the start of a comment, e.g. in the middle of a string, then you need to tell us how we can identify semi-colons in comment-start vs other contexts but if it can ONLY appear between singe quotes in strings, e.g. the ; inside 'Hello; World!' below:
$ cat file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
then this is all you need to replace every string with a series of blank chars before finding the first semi-colon (which is then presumably the start of a comment):
$ cat tst.awk
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
...the rest as before...
and finally if you don't want to specify the file name twice on the command line, just duplicate it's name in the ARGV[] array by adding this line at the top:
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
There are a few printf tricks that make this a manageable project. Take a look at the following. The script formats the assembly file with the assembly code beginning at column 0 to code_width - 1 with the comments following at column code_width lined up after the code. The script is fairly well commented so you should be able to follow along.
The usage is:
bash nameofscript.sh input_file [code_width (default 46char)]
or if you make nameofscript.sh executable, then simply:
./nameofscript.sh input_file [code_width (default 46char)]
NOTE: this script requires Bash, if not run on bash, you may experience inconsistent results. If you have multiple embedded ; in each line, the first will be considered the beginning of a comment. Let me know if you have questions.
#!/bin/bash
## basic function to trim (or stip) the leading & trailing whitespace from a variable
# passed to the fuction. Usage: VAR=$(trimws $VAR)
function trimws {
[ -z "$1" ] && return 1
local strln="${#1}"
[ "$strln" -lt 2 ] && return 1
local trimstr=$1
trimstr="${trimstr#"${trimstr%%[![:space:]]*}"}" # remove leading whitespace characters
trimstr="${trimstr%"${trimstr##*[![:space:]]}"}" # remove trailing whitespace characters
printf "%s" "$trimstr"
return 0
}
afn="$1" # input assembly filename
cwidth=${2:--46} # code field width (- is left justified)
[ "${cwidth:0:1}" = '-' ] || cwidth=-${cwidth} # make sure first char is '-'
[ -r "$afn" ] || { # validate input file is readable
printf "error: file not found: '%s'. Usage: %s <filename> [code_width (46 ch)]\n" "$afn" "${0//\//}"
exit 1
}
## loop through file splitting on ';'
while IFS=$';\n' read -r code comment || [ -n "$comment" ]; do
[ -n "$code" ] || { # if no '$code' comment only line
if [ -n "$comment" ]; then
printf ";%s\n" "$comment" # output the line unchanged
else
printf "\n" # it was a blank line to begin with
fi
continue # read next line
}
code=$(trimws "$code") # trim leading and trailing whitespace
comment=$(trimws "$comment") # same
printf "%*s ; %s\n" "$cwidth" "$code" "$comment" # output new format
done <"$afn"
exit 0
input:
$ cat dat/asmfile.txt
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
output:
$ bash fmtasmcmt.sh
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
So yeah, use a while loop to find the longest length, given your input in the local file input:
length=0
length2=0
while IFS= read -r -- i; do
(( ${#i} > length2 )) && length2=${#i}
i=${i/\;*/}
(( ${#i} > length )) && length=${#i}
done < ./input
(( length++ )); (( length2++ ))
In your next while loop, detect whether the line starts with ; using [[ ${i:0:1} = ';' ]] and output it, or format the output with awk using the length you determined: awk -F\; -v len=$length '{ printf "%-"len"s %-40s\n", $1, $2}'. Check here (http://www.unix.com/shell-programming-scripting/117543-formatting-output-columns.html) for more info on column formatting.
Edit: In case you didn't figure it out, the second loop looks like:
while IFS= read -r -- i; do
# echo the original if the line starts with ';'
[[ ${i:0:1} = ';' ]] && echo "$i" && continue
# column formatting with awk
(echo "$i" | grep -q ';') && echo "$i" | awk -v len=$length -v len2=$length2 -F\; '{printf "%-"len"s %-"len2"s\n",$1,";"$2}' || echo "$i"
done < ./input
That will give you what you want for the output.
I think I'm going to use this example for my personal formatting!
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $max=0;
$_= <>; # slurp file
while( /\n(.+?)$com/g ){
$max=length($1) if length($1) > $max }
s/\n(.+?)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
usage: align_coms input (after chmod+install)
Options: -com=... to redefine comments (default = ; )
and you can try align_coms -com=# align_coms to align this scripts perl comments :)
Edit 1:
Please see the (wise) comment of #EdMorton about problems when the input has strings (or similar) containing comment starters.
Edit 2: The following version can deal with 'alo; word' "alo; word". It is still
not safe -- real languages have always some extra detail (ex '...\'...', multiline comments) but it is a little bit more robust...
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $nc=qr{ # no comment regex
( '[^'\n]*' # '....'
| "[^"\n]*" # "...."
| . # common chars
)+?
}x;
my $max=0;
$_= <>; # slurp file
while( /\n($nc)$com/g ){
$max=length($1) if length($1) > $max }
s/\n($nc)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
If I have 2 variables $x and $y somewhere in the code flow and I don't really know if they contain numbers or string, how do I compare them?
I mean for strings we use eq etc while for numbers == or <= etc
Also what about greater/less etc?
If you don't know what they are, how can you ask if they're the same?
Specifically, do you consider these two to be the same?
"1"
"1.0"
Numerically, they both represent one, but stringily they contain different characters, so are different.
greater/less for strings can be done with cmp.
if ( ( $a cmp $b ) == 0 ) { print "a == b\n" }
elsif ( ( $a cmp $b ) < 0 ) { print "a < b\n" }
elsif ( ( $a cmp $b ) > 0 ) { print "a > b\n" }
To reiterate a comment above "123" cmp "56" will give less than.
So you may want to do something like this:
if ( compareEm($a, $b) == 0 ) { print "a == b\n" }
elsif ( compareEm($a, $b) < 0 ) { print "a < b\n" }
elsif ( compareEm($a, $b) > 0 ) { print "a > b\n" }
sub compareEm {
my ( $a, $b ) = #_;
my $isnum = qr/(?=.)(?!^\.$)^[\-\+]?\d*\.?\d*$/o;
return ( $a =~ $isnum && $b =~ $isnum ) ? $a <=> $b : $a cmp $b;
}
Use eq, it will always work...
If you don't know whether your data is strings or numbers then it's usually perfectly safe to treat them as strings. If you want to treat your data as numbers, then you should probably validate the input to ensure that it is in the correct format.
I'm trying to get the lines with special characters which is not prefixed with \. Below are the special characters:
^$%.*+?!(){}[]|\
I need to check all the above special characters which is not prefixed with \ in 2nd column. I'm trying with awk to complete this, but no luck. I want the output as below.
input.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
output.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
7th row and 5 row are in output.txt because there is 2 special charcters(one is with backslash another without backslash)
"final" final edit: I wanted to allow "\x" whatever x is, but the OP seems to not want that, so I fixed it too.
After trying to find a "clever" regexp (which choked on "\\" or any impair number of "\", but apparently worked for the rest...)
I re-wrote it in awk to do it in a "state automata" way:
The idea:
If in "normal mode", we encounter a special char other than "\" ? : we print the line!
If in "normal mode", we encounter a "\" ? : we enter "escaped mode", and in that mode, ignore the next char
(but if we don't have a next char, we need to print that line too!)
the script:
awk -F"," '
{
IN_ESCAPED_MODE=0 ;
for (i=1 ; i<=length($2) ; i++)
{ char=substr($2,i,1)
if ( IN_ESCAPED_MODE == 0)
{ if ( index(".^$%*+?!(){}[]|",char) > 0 )
{ print $0 ; break ;
}
if ( index("\\" , char ) > 0 )
{ IN_ESCAPED_MODE=1 ; continue ;
}
}
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
}
if (IN_ESCAPED_MODE == 1)
{
print $0 ; break ;
}
}
' input.txt > output.txt
With this change, you will have the same output as the OP, which prints a line when it contains "\e" for example... Which I find weird: to me "\e" is fine, we can "escape" anything?
With that input:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
10,\
11,\\
12,\\\
13,.
14,\.
15,..
16,^
17,\^
18,$
19,\$
20,%
21,\%
22,*
23,\*
24,+
25,\+
26,?
27,\?
28,!
29,\!
30,(
31,\(
32,)
33,\)
34,{
35,\{
36,}
37,\}
38,[
39,\[
40,]
41,\]
42,|
43,\|
it outputs:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
10,\
12,\\\
13,.
15,..
16,^
18,$
20,%
22,*
24,+
26,?
28,!
30,(
32,)
34,{
36,}
38,[
40,]
42,|
(so it appears to really work this time !)
If you prefer to allow any "\x" and NOT only if "x" is a SPECIAL char:
change the "middle lines":
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
into:
if ( IN_ESCAPED_MODE == 1)
{ IN_ESCAPED_MODE=0 ; continue ;
}
for historical reason : the regexp (which worked in "most" cases but choked in some, for example if there was "\\") :
egrep '[^\][].^$%*+?!(){}[|]|[^\][\][^].^$%*+?!(){}[|\]' input.txt > output.txt
But that one will not display the line 12, for example...
A good read: http://www.regular-expressions.info/charclass.html .... and http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html (scary ...)
You can try the following:
awk '
{
line=$0
sub(/\\[\^$%.*+?!(){}\[\]|\\]/,"")
if(/[\^$%.*+?!(){}\[\]|\\]/)
print line
}' input.txt
sed '/[]\\^$%.*+?!(){}[|]/ {
h
s/\\[]\\^$%.*+?!(){}[|]/_/g
/[]\\^$%.*+?!(){}[|]/ {
x
p
}
}' YourFile
Depending of shell and sed could be interpreted (especialy the \) differently. Works on my AIX/KSH