I have a code.txt file that contains morse code for example
.- .-.
I have a function called decode inside a bash file called morse as this:
decode (){
sed -i 's/ \.-/A/g' $1
sed -i 's/ \.-./R/g' $1
cat $1
}
When I type in terminal $bash morse decode code.txt
I receive:
AA.
The output I want is :
AR
How can it see separate that the string .- is A and the .-. is R?
If your intention is to encode and decode Morse messages with any tool then something like this will do :
#!/usr/local/bin/python3
import re
alphabet = { 'A':'.-', 'B':'-...', 'C':'-.-.', 'D':'-..', 'E':'.', 'F':'..-.', 'G':'--.', 'H':'....', 'I':'..', 'J':'.---', 'K':'-.-', 'L':'.-..', 'M':'--', 'N':'-.', 'O':'---', 'P':'.--.', 'Q':'--.-', 'R':'.-.', 'S':'...', 'T':'-', 'U':'..-', 'V':'...-', 'W':'.--', 'X':'-..-', 'Y':'-.--', 'Z':'--..', '1':'.----', '2':'..---', '3':'...--', '4':'....-', '5':'.....', '6':'-....', '7':'--...', '8':'---..', '9':'----.', '0':'-----', ', ':'--..--', '.':'.-.-.-', '?':'..--..', '/':'-..-.', '-':'-....-', '(':'-.--.', ')':'-.--.-',' ':' '}
def encode(message):
return "".join([ ( alphabet[letter.upper()] + ' ' ) if letter != ' ' else ' ' for letter in message])
def decode(message):
return "".join([ list(alphabet.keys())[list(alphabet.values()).index(item if item != '|' else ' ')] for item in re.sub(r' {2,}', ' | ',message).split(' ')])
print(encode('THIS IS FINE'))
print(decode('- .... .. ... .. ... ..-. .. -. .'))
Hope it helps too.
Wow interesting idea! Based on #MatiasBarrios alphabet i made this.
#!/bin/bash
string=$1
declare -A morse=(
[A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.'
[F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---'
[K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---'
[P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-'
[U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--'
[Z]='--..'
[1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....'
[6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----'
[(]='-.--.' [)]='-.--.-' [/]='-..-.' [-]='-....-' [+]='.-.-.'
[.]='.-.-.-' [,]='--..--' [?]='..--..' [!]='-.-.--' [ ]=' '
)
morse () {
while [[ "$string" ]]; do
symbol="${string::1}"
printf -- "${morse["${symbol^}"]} "
string="${string:1}"
done
}
demorse () {
declare -A demorse
for item in "${!morse[#]}"; { demorse["${morse["$item"]}"]="$item"; }
while [[ $# ]]; do
printf -- "${demorse["$1"],}"
shift
done
}
case $string in
demorse) shift; demorse "$#";;
* ) morse ;;
esac
Usage
$ ./morse 'hello world!'
.... . .-.. .-.. --- .-- --- .-. .-.. -.. -.-.--
Demorse also worsk but, spaces have to be printed like this ' '
$ ./morse demorse .... . .-.. .-.. --- ' ' .-- --- .-. .-.. -.. -.-.--
hello world!
You need to run s/ \.-\./R/g replacement first. Note the second . must be escaped to only match a dot.
Hence, use
sed 's/ \.-\./R/g;s/ \.-/A/g' file
See the online demo
Or, another way:
sed -e 's/ \.-\./R/g' -e 's/ \.-/A/g' file
Replace the file with "$1" in your code.
UPDATE
Here is the translation of encoding / decoding Python function posted by Matias below:
#!/bin/bash
### Encoding:
declare -A MORSE=( [A]='.-' [B]='-...' [C]='-.-.' [D]='-..' [E]='.' [F]='..-.' [G]='--.' [H]='....' [I]='..' [J]='.---' [K]='-.-' [L]='.-..' [M]='--' [N]='-.' [O]='---' [P]='.--.' [Q]='--.-' [R]='.-.' [S]='...' [T]='-' [U]='..-' [V]='...-' [W]='.--' [X]='-..-' [Y]='-.--' [Z]='--..' [1]='.----' [2]='..---' [3]='...--' [4]='....-' [5]='.....' [6]='-....' [7]='--...' [8]='---..' [9]='----.' [0]='-----' [',']='--..--' ['.']='.-.-.-' [';']='-.-.-.' [':']='---...' ['?']='..--..' ['!']='-.-.--' ['/']='-..-.' ['-']='-....-' ['+']='.-.-.' ['(']='-.--.' [')']='-.--.-' ['_']='..--.-' ['"']='.-..-.' ["'"]='.----.' ['$']='...-..-' ['#']='.--.-.' ['&']='.-...' [' ']=' ' )
function encode {
res=''
s="$1"
for (( i=0; i<${#s}; i++ )); do
letter="${s:$i:1}"
if [[ "$letter" == ' ' ]]; then
res="${res} "
else
res="${res}${MORSE[${letter^^}]} ";
fi
done
printf "%s" "$res"
}
echo "$(encode "THIS IS FINE")"
### Now, decoding
declare -A MORSEDEC=( ['-.--.-']=')' ['..--..']='?' ['--..--']=', ' ['-....-']='-' ['.-.-.-']='.' ['...--']='3' ['-.--.']='(' ['---..']='8' ['-..-.']='/' ['....-']='4' ['-....']='6' ['----.']='9' ['.----']='1' ['..---']='2' ['.....']='5' ['--...']='7' ['-----']='0' ['-...']='B' ['-..-']='X' ['-.-.']='C' ['--..']='Z' ['--.-']='Q' ['.-..']='L' ['-.--']='Y' ['..-.']='F' ['.--.']='P' ['.---']='J' ['...-']='V' ['....']='H' ['-..']='D' ['---']='O' ['..-']='U' ['...']='S' ['.--']='W' ['-.-']='K' ['.-.']='R' ['--.']='G' ['-.']='N' ['..']='I' ['--']='M' ['.-']='A' [' ']=' ' ['.']='E' ['-']='T' )
function decode {
res=''
tmp="$(sed 's/ \{2,\}/ | /g' <<< "$1")";
for word in $tmp; do
if [[ "$word" == '|' ]]; then
res="${res}${MORSEDEC[' ']}";
else
res="${res}${MORSEDEC[$word]}";
fi
done
printf "%s" "$res"
}
echo "$(decode "- .... .. ... .. ... ..-. .. -. .")"
See Bash demo online.
The easy answer in RE engines that support look-ahead and look-behind would be to treat the spaces as look-ahead and look-behind triggers, but sed does not support this.
Another option that avoids needing to order the letters is to inject extra symbols to help you mark each letter. Say we inject = round each space, then we can replace delimited sequences in any order, and finally get rid of the delimiters:
echo .- .-.|sed -e 's/^\(.*\)$/=\1=/;s/ /= =/g' -e 's/=\.-\.=/=R=/g;s/=\.-=/=A=/g' -e 's/= =//g;s/^=//;s/=$//'
If you have rules that need to preserve multiple spaces, then that can be accommodated.
Related
I have a file that contains some lines as:
#SRR4293695.199563512 199563512
CAAAANCATTCGTAGACGACCTGCTCTGTNGNTACCNTCAANAGATCNGAAGAGCACACGTCTGAACTCCAGTCAC
+SRR4293695.199563512 199563512
A.AA<#FF)FFFFFFF<<<<FF7FFFFFF#.#<FF<#FFFF#FF<A<#FFFFFFFAFFFFFFAAAFFFFF<FFFF.
#SRR4293695.199563513 199563513
CTAAANCATTCGTAGACGACCTGCTT
+SRR4293695.199563513 199563513
<AAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563514 199563514
CCAACNTCATAGAGGGACAAGTGGCGATCNGNC
+SRR4293695.199563514 199563514
AAAAA#<F.F<<FA.F7AA.)<FAFA..7#.#A
#SRR4293695.199563515 199563515
TCGCGNCCTCAGATCAGACGTGGCGA
+SRR4293695.199563515 199563515
AAAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563516 199563516
TGACCNGGGTCCGGTGCGGAGAGCCCTTC
+SRR4293695.199563516 199563516
AAAAA#FAFFFF<F.FFAA.F)FFFFFAF
#SRR4293695.199563517 199563517
AAATGNTCATCGACACTTCGAACGCACT
+SRR4293695.199563517 199563517
AA)AA#F<FFFFFFAFFFFF<)FFFAFF
#SRR4293695.199563518 199563518
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563518 199563518
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
#SRR4293695.199563519 199563519
AAAACNATTCGTAGACGNCCTGCTTNTGTNGNCACCNTNANNANNTCNGNAGAGCNCACNTCTGAACTCNAGTCAC
+SRR4293695.199563519 199563519
AAAAA#FFFFFFFFFFF#FFFFFFF#FF<#F#F.FF#7#F##F##A)#A#FF<F)#AAF#<FFFFAFF<#<FFFFF
#SRR4293695.199563520 199563520
GAAGCNGCACAGCTGGCNTTGGAGCNGANNCNGTAGNCNCNNTNNATNGNTCGGNNGAGNACACGTCTGNACTCCA
+SRR4293695.199563520 199563520
AAAAA#FFFFFFFFFFF#FFFFFFF#FF##A#FFFF#F#F##<##FF#F#FFFF##FFF#FFFFFFFFF#FFFFFF
#SRR4293695.199563521 199563521
TGGTCNGTGGGGAGTCGNCGCCTGCNTANNANTGTANGNANNANNAANANATCGNNAGANCACACGTCTNAACTCC
+SRR4293695.199563521 199563521
AAAAA#FFFFFFFFFFF#FFFFFFF#FF##F#FFFF#F#F##A##FF#A#FFFF##<FF#FFFFFFFFF#F<FFFF
#SRR4293695.199563522 199563522
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563522 199563522
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
Then, I would like to filter these lines according to a condition :
taking in consideration the length of even lines: if that length is > 34 then that line and the preceding line must be removed.
I already did an algorithm: using a while to read all lines in the file, checking the condition and retaining only lines with length < 34. The problem is that it is taking some time.
inputFile=$1
outputFile=$2
while read first_line; read second_line
do
lread=${#second_line}
if [[ "$lread" -le 34 ]] ; then
echo $first_line >> $outputFile
echo $second_line >> $outputFile
fi
done < $inputFile
# This is for the last two lines
lread=${#second_line}
if [[ "$lread" -le 34 ]] ; then
echo $first_line >> $outputFile
echo $second_line >> $outputFile
fi
I was wondering if there is not another way, quicker.
The expected output:
#SRR4293695.199563513 199563513
CTAAANCATTCGTAGACGACCTGCTT
+SRR4293695.199563513 199563513
<AAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563514 199563514
CCAACNTCATAGAGGGACAAGTGGCGATCNGNC
+SRR4293695.199563514 199563514
AAAAA#<F.F<<FA.F7AA.)<FAFA..7#.#A
#SRR4293695.199563515 199563515
TCGCGNCCTCAGATCAGACGTGGCGA
+SRR4293695.199563515 199563515
AAAAA#FFFFFF<FFFFFFFFFFFFF
#SRR4293695.199563516 199563516
TGACCNGGGTCCGGTGCGGAGAGCCCTTC
+SRR4293695.199563516 199563516
AAAAA#FAFFFF<F.FFAA.F)FFFFFAF
#SRR4293695.199563517 199563517
AAATGNTCATCGACACTTCGAACGCACT
+SRR4293695.199563517 199563517
AA)AA#F<FFFFFFAFFFFF<)FFFAFF
#SRR4293695.199563518 199563518
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563518 199563518
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
#SRR4293695.199563522 199563522
TCGTANCCAATGAGGTCTATCCGAGGCGCN
+SRR4293695.199563522 199563522
AAAAA#<FAAFFFF.FFFFFFFA.FFFFF#
Thanks in advance!
Here's an awk solution:
awk '!last { last = $0; next } length($0)<=34 { print last; print } { last = "" }' YOURFILE
The output is your expected output.
sed method:
sed -n 'h;n;/.\{34,\}/!{x;G;p}' inputfile > outputfile
h;n The odd numbered lines go into the hold buffer, then get the next line.
The resulting even numbered lines are checked for length. If they're not over 34 chars, the hold buffer is exchanged with the pattern space, then appended to it, (x;G;), so that both lines are in the pattern space, and printed.
I have a String with the folowing String : aaaccbbaabbbb
I need to either drop the front aaa's or the character sequence in the back bbbb's. I've tried resString=(${resString%%b*b}) which resString is aaaccbbaabbbb is turned into aaaccbbaa. But I need to save the deleted bbbb's into a file. Is there a way to inverse the outcome of resString=(${resString%%b*b}) to get the bbbb in a file. I've tried working with ## manipulation but it's such a hassle since I only need the repetition in the front or at the back of the String.
You could use bash regex matching:
resString='abababbbb'
if [[ $resString =~ [^b](b+)$ ]] ; then
resString=${BASH_REMATCH[1]}
fi
echo $resString
This prints bbbb.
You can use parameter expansion with extended globbing:
#!/bin/bash
shopt -s extglob # Turn extended globbing on.
s=aaaccbbaabbbb
prefix=${s%%[^a]*} # Remove everything from the first non-"a".
prefix_rest=${s##+(a)} # Remove all a's at the beginning.
suffix=${s##*[^b]} # See above.
suffix_rest=${s%%+(b)}
[[ $prefix$prefix_rest == $s ]] || echo Wrong prefix
[[ $suffix_rest$suffix == $s ]] || echo Wrong suffix
echo "$prefix : $prefix_rest"
echo "$suffix_rest : $suffix"
Ok, a whole new approach using ${//}.
It is fully automatic, it finds which is the first char, and which is the last.
With that set, it works its magic to select runs in the front and runs in the back.
Of course, you need to edit the program to choose which parts you do need to send to file, or print, or anything else. I hope you could do that part of the job.
This seems to work with any string (even repeated chars):
#!/bin/bash
a=(aaaccbbaabbbb aaabbbbaaaa abababbbb bbbaaabbb aaaaaa aaabbbbaaaa)
for resString in "${a[#]}"; do
echo
echo "String :$resString:"
l="$((2+${#resString}))"
frontchar=${resString:0:1} ; printf "%s%-${l}s\n" "Frontchar" ":$frontchar:"
backchar=${resString:0-1:1} ; printf "%s%${l}s\n" "Backchar " ":$backchar:"
head="${resString/%[^$frontchar]*}"; printf "%s%-${l}s\n" "head " ":$head:"
tail="${resString/#*[^$backchar]}" ; printf "%s%${l}s\n" "tail " ":$tail:"
prefix="${resString%$tail}" ; printf "%s%-${l}s\n" "prefix " ":$prefix:"
suffix="${resString#$head}" ; printf "%s%${l}s\n" "suffix " ":$suffix:"
echo "Using the head/suffix value: $head -- $suffix"
echo "Using the prefix/tail value: $prefix -- $tail"
done
Running it, you get:
String :aaabbbbaaaa:
Frontchar:a:
Backchar :a:
head :aaa:
tail :aaaa:
prefix :aaabbbb:
suffix :bbbbaaaa:
Using the head/suffix value: aaa -- bbbbaaaa
Using the prefix/tail value: aaabbbb -- aaaa
String :aaaccbbaabbbb:
Frontchar:a:
Backchar :b:
head :aaa:
tail :bbbb:
prefix :aaaccbbaa:
suffix :ccbbaabbbb:
Using the head/suffix value: aaa -- ccbbaabbbb
Using the prefix/tail value: aaaccbbaa -- bbbb
String :abababbbb:
Frontchar:a:
Backchar :b:
head :a:
tail :bbbb:
prefix :ababa:
suffix :bababbbb:
Using the head/suffix value: a -- bababbbb
Using the prefix/tail value: ababa -- bbbb
String :aaaaaa:
Frontchar:a:
Backchar :a:
head :aaaaaa:
tail :aaaaaa:
prefix ::
suffix ::
Using the head/suffix value: aaaaaa --
Using the prefix/tail value: -- aaaaaa
I'm trying to trim only the left half of a string that is given to ltrim() as an argument. This is my current code.
ltrim()
{
string=${1}
divider=$((${#string} / 2))
trimrule=${2}
string_left=${string:0:$divider}
string_right=${string:$divider}
echo ${string:$divider} ## My own quick debug lines
echo ${string:0:$divider} ## My own quick debug lines
if [ $# -ne 2 ]
then
printf "%d argument(s) entered. 2 required.\n" "$#"
else
while :
do
case $string_left in
${2}*) string_left=${string_left#?} ;;
*${2}) string_left=${string_left%?} ;;
*) break ;;
esac
done
printf "Left side string is %s\n" "${string_left}"
fi
}
However, when I enter ltrim abcdefghijklmnopq abc the shell returns the following:
ijklmnopq
abcdefgh
Left side string is bcdefgh
So I only lost 'a' out of the word while I'm looking to get 'defgh' as a result. What am I doing wrong?
function substr_remove() {
echo "${1//$2/}"
}
substr_remove carfoobar123foo456 foo
Output:
carbar123456
Are you searching for something like this?
function ltrim() {
echo ${1##$2}
}
ltrim abcdefghijklmnopq abc # Prints: defghijklmnopq
The purpose of the program is to make comments in the file begin in the same column.
if a line begins with ; then it doesn't change
if a line begins with code then ; the program should insert space before ; so it will start in the same column with the farthest ;
for example:
Before:
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
After:
; Also change "-f elf " for "-f elf64" in build command. # These two line don't change
; # because they start with ;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
I am a beginner in Linux and shell, so far I have got
echo "Enter the filename"
read name
cat $name | while read line;
do ....
Our teacher told us that we should use two while loop;
Record the longest length before; in the first loop and do the changes in the second while loop.
for now I don't know how to use awk or sed to find the longest length before;
Any ideas?
Here is the solution, assuming that comments in your file begin with the first semi-colon (;) that is not inside a string:
$ cat tst.awk
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
.
$ awk -f tst.awk file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
and below is how you get to it from a naive starting point (I added a semi-colon inside your Hello World! string for testing - make sure to verify all suggested solutions using that).
Note that the above DOES contain 2 loops on the input as your teacher suggests, but you do not need to manually write them as awk provides the loops for you each time it reads the file. If your input file contains tabs or similar then you need to remove them in advance, e.g. by using pr -e -t.
Here is how you get to the above:
If you cannot have semi-colons in other contexts than as the start of comments then all you need is:
$ cat tst.awk
{ cur = index($0,";") }
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
which you'd execute as awk -f tst.awk file file (yes, specify your input file twice).
If your code can contain semi-colons in contexts that are not the start of a comment, e.g. in the middle of a string, then you need to tell us how we can identify semi-colons in comment-start vs other contexts but if it can ONLY appear between singe quotes in strings, e.g. the ; inside 'Hello; World!' below:
$ cat file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
then this is all you need to replace every string with a series of blank chars before finding the first semi-colon (which is then presumably the start of a comment):
$ cat tst.awk
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
...the rest as before...
and finally if you don't want to specify the file name twice on the command line, just duplicate it's name in the ARGV[] array by adding this line at the top:
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
There are a few printf tricks that make this a manageable project. Take a look at the following. The script formats the assembly file with the assembly code beginning at column 0 to code_width - 1 with the comments following at column code_width lined up after the code. The script is fairly well commented so you should be able to follow along.
The usage is:
bash nameofscript.sh input_file [code_width (default 46char)]
or if you make nameofscript.sh executable, then simply:
./nameofscript.sh input_file [code_width (default 46char)]
NOTE: this script requires Bash, if not run on bash, you may experience inconsistent results. If you have multiple embedded ; in each line, the first will be considered the beginning of a comment. Let me know if you have questions.
#!/bin/bash
## basic function to trim (or stip) the leading & trailing whitespace from a variable
# passed to the fuction. Usage: VAR=$(trimws $VAR)
function trimws {
[ -z "$1" ] && return 1
local strln="${#1}"
[ "$strln" -lt 2 ] && return 1
local trimstr=$1
trimstr="${trimstr#"${trimstr%%[![:space:]]*}"}" # remove leading whitespace characters
trimstr="${trimstr%"${trimstr##*[![:space:]]}"}" # remove trailing whitespace characters
printf "%s" "$trimstr"
return 0
}
afn="$1" # input assembly filename
cwidth=${2:--46} # code field width (- is left justified)
[ "${cwidth:0:1}" = '-' ] || cwidth=-${cwidth} # make sure first char is '-'
[ -r "$afn" ] || { # validate input file is readable
printf "error: file not found: '%s'. Usage: %s <filename> [code_width (46 ch)]\n" "$afn" "${0//\//}"
exit 1
}
## loop through file splitting on ';'
while IFS=$';\n' read -r code comment || [ -n "$comment" ]; do
[ -n "$code" ] || { # if no '$code' comment only line
if [ -n "$comment" ]; then
printf ";%s\n" "$comment" # output the line unchanged
else
printf "\n" # it was a blank line to begin with
fi
continue # read next line
}
code=$(trimws "$code") # trim leading and trailing whitespace
comment=$(trimws "$comment") # same
printf "%*s ; %s\n" "$cwidth" "$code" "$comment" # output new format
done <"$afn"
exit 0
input:
$ cat dat/asmfile.txt
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
output:
$ bash fmtasmcmt.sh
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
So yeah, use a while loop to find the longest length, given your input in the local file input:
length=0
length2=0
while IFS= read -r -- i; do
(( ${#i} > length2 )) && length2=${#i}
i=${i/\;*/}
(( ${#i} > length )) && length=${#i}
done < ./input
(( length++ )); (( length2++ ))
In your next while loop, detect whether the line starts with ; using [[ ${i:0:1} = ';' ]] and output it, or format the output with awk using the length you determined: awk -F\; -v len=$length '{ printf "%-"len"s %-40s\n", $1, $2}'. Check here (http://www.unix.com/shell-programming-scripting/117543-formatting-output-columns.html) for more info on column formatting.
Edit: In case you didn't figure it out, the second loop looks like:
while IFS= read -r -- i; do
# echo the original if the line starts with ';'
[[ ${i:0:1} = ';' ]] && echo "$i" && continue
# column formatting with awk
(echo "$i" | grep -q ';') && echo "$i" | awk -v len=$length -v len2=$length2 -F\; '{printf "%-"len"s %-"len2"s\n",$1,";"$2}' || echo "$i"
done < ./input
That will give you what you want for the output.
I think I'm going to use this example for my personal formatting!
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $max=0;
$_= <>; # slurp file
while( /\n(.+?)$com/g ){
$max=length($1) if length($1) > $max }
s/\n(.+?)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
usage: align_coms input (after chmod+install)
Options: -com=... to redefine comments (default = ; )
and you can try align_coms -com=# align_coms to align this scripts perl comments :)
Edit 1:
Please see the (wise) comment of #EdMorton about problems when the input has strings (or similar) containing comment starters.
Edit 2: The following version can deal with 'alo; word' "alo; word". It is still
not safe -- real languages have always some extra detail (ex '...\'...', multiline comments) but it is a little bit more robust...
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $nc=qr{ # no comment regex
( '[^'\n]*' # '....'
| "[^"\n]*" # "...."
| . # common chars
)+?
}x;
my $max=0;
$_= <>; # slurp file
while( /\n($nc)$com/g ){
$max=length($1) if length($1) > $max }
s/\n($nc)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
Hello: I have a lot of files called test-MR3000-1.txt to test-MR4000-1.nt, where the number in the name changes by 100 (i.e. I have 11 files),
$ ls test-MR*
test-MR3000-1.nt test-MR3300-1.nt test-MR3600-1.nt test-MR3900-1.nt
test-MR3100-1.nt test-MR3400-1.nt test-MR3700-1.nt test-MR4000-1.nt
test-MR3200-1.nt test-MR3500-1.nt test-MR3800-1.nt
and also a file called resonancia.kumac which in a couple on lines contains the string XXXX.
$ head resonancia.kumac
close 0
hist/delete 0
vect/delete *
h/file 1 test-MRXXXX-1.nt
sigma MR=XXXX
I want to execute a bash file which substitutes the strig XXXX in a file by a set of numbers obtained from the command ls *MR* | cut -b 8-11.
I found a post in which there are some suggestions. I try my own code
for i in `ls *MR* | cut -b 8-11`; do
sed -e "s/XXXX/$i/" resonancia.kumac >> proof.kumac
done
however, in the substitution the numbers are surrounded by sigle qoutes (e.g. '3000').
Q: What should I do to avoid the single quote in the set of numbers? Thank you.
This is a reproducer for the environment described:
for ((i=3000; i<=4000; i+=100)); do
touch test-MR${i}-1.nt
done
cat >resonancia.kumac <<'EOF'
close 0
hist/delete 0
vect/delete *
h/file 1 test-MRXXXX-1.nt
sigma MR=XXXX
EOF
This is a script which will run inside that environment:
content="$(<resonancia.kumac)"
for f in *MR*; do
substring=${f:7:3}
echo "${content//XXXX/$substring}"
done >proof.kumac
...and the output looks like so:
close 0
hist/delete 0
vect/delete *
h/file 1 test-MR300-1.nt
sigma MR=300
There are no quotes anywhere in this output; the problem described is not reproduced.
or if it could be perl:
#!/usr/bin/perl
#ls = glob('*MR*');
open (FILE, 'resonancia.kumac') || die("not good\n");
#cont = <FILE>;
$f = shift(#ls);
$f =~ /test-MR([0-9]*)-1\.nt/;
$nr = $1;
#out = ();
foreach $l (#cont){
if($l =~ s/XXXX/$nr/){
$f = shift(#ls);
$f =~ /test-MR([0-9]*)-1\.nt/;
$nr = $1;
}
push #out, $l;
}
close FILE;
open FILE, '>resonancia.kumac' || die("not good\n");
print FILE #out;
That would replace the first XXXX with the first filename, what seemed to be the question before change.