Bash string manipulation -- removing characters? - string

I'm having a heck of a time removing characters in Bash. I have a string that's formatted like temp=53.0'C. I want to remove everything thats not 53.0.
I'm normally a Python programmer, and the way I'd do this in Python would be to split the string into an array of characters, and remove the unnecessary elements, before putting the array back onto string form.
But I can't figure out how to do that in Bash.
How do I remove the desired characters?

You can use Bash parameter substitution like this:
a="temp=53.0'C"
a=${a/*=/} # Remove everything up to and including = sign
a=${a/\'*/} # Remove single quote and everything after it
echo $a
53.0
Further examples are available here.

You could use sed with a regex which corresponds to the format of the string you want to be returned:
$ var="temp=53.0'C"
$ echo "$var" | sed -r 's/.*=([0-9][0-9]\.[0-9]).*/\1/g'
53.0
What exactly are the "rules" around what your original string looks like, and what the section to output looks like?

Same thing with BASH_REMATCH
> [[ $tmp =~ [0-9]+\.[0-9]+ ]] && echo ${BASH_REMATCH[0]}
53.0

Also agree with Josh but would improve the pattern match to consider the full range of floating point numbers.
.*=[ ]*([0-9]*\.[0-9]+)[cC].*
If you do not understand the pattern above, take the time to find out. Learning pattern matching will be one of the most useful things you ever do.
Test your pattern with something like http://www.freeformatter.com/regex-tester.html and then tailor for the platform you are using (e.g. Unix will probably need the brackets escaped with a backslash)

Related

When processing string with Bash, how to treat comma differently depending on whether it's surrounded by some specific characters?

I would like to transform a MySQL script into a JSON file and was asked to use Bash for it.
By writing a simple shell script:
#!/bin/bash
# I know this script just output each entry with its value, because I haven' t gone any further
for filename in $dir/home/*.sql
do
cat $filename | while read line
do
names=${line%values*}
names=${names#*(}
names=${names%)*}
values=${line#*values(}
values=${values%)*}
while [[ $names != $currentname ]]
do
currentname=${names%%,*}
currentvalue=${values%%,*}
echo $currentname
echo $currentvalue
names=${names#*,}
values=${values#*,}
done
done
done
I have been basically able to fulfill the requirement. However, there is one more problem.
Some of the string entries has comma among its characters.
This causes a mistake that my script thinks these commas as the ones that separates values and thus a string bearing comma will be treated as two different strings.
It would be an easy task to solve this with programming languages like C++, but I have been asked to do this only with bash shell script although I am not familiar with it. So now I have been stuck with no clue. Maybe regular expression would be the cure? Or if there are other approaches please also help.
FYI, here is an example of the problem:
Input:
values(100, 'A100', 'A,100');
Expected output:
100
'A100'
'A,100'
Actual current output:
100
'A100'
'A
100'
Something like this may help:
data="values(100, 'A100', 'A,100');"
json=${data//values(}
json=${json//);}
json=${json//, /$'\n'}
echo "$json"
Expected output:
Typically in shell you would match it with a regex:
echo "values(100, 'A100', 'A,100');" | sed 's/values(//; s/\(, \|);\)/\n/g'
but this does not solve the problem at all.
The best and only solution is to write a real parser for real mysql langauge to 'handle' '' ' ' 'all\tcorner\'cases' properly. Read the input char by char, store state (ex. if you are inside quotation or not), handle '\'' and other \n etc. sequences for the need of extracting the field. You might interest yourself in mysql internal lexer (it's big!) and lex and yacc programs.
Check your scripts with http://shellcheck.net . Read https://mywiki.wooledge.org/BashFAQ/001 . Quote variable expansions. Don't be nominated for useless cat award.
and was asked to use Bash for it.
Bash is a shell - it's primary role is to run and connect other programs with each other. Bash is a shell, not a full blown programming language, and writing programming stuff in it is going to be very hard or it just ends up using external programs, as that's what it's for. Write the parser in other language - use bash to run it. If you're comfortable in C++, write it in C++ inside a bash script, then compile and execute it inside a bash script.
A common arrangement is to use regex for this, yes; for example, this is a requirement for parsing CSV files. But you can parse the line piece by piece like in your attempt.
However, you have a number of quoting errors which would prevent your code from working even if you figured out a way to parse the input the way you want to. (And of course, get rid of the Useless use of cat?)
while read -r line; do
case $line in
*values\(*\)\; );;
*) continue;;
esac
line=${line#values\(}
line=${line%\)\;}
while [ "$line" ]; do
case $line in
\'*)
line=${line#\'}
tail=${line#*\'}
value=\'${line%"$tail"}
line=${tail#,}
line=${line# };;
*) value=${line%%,*}
line=${line#*,}
line=${line# };;
esac
echo "$value"
done
done <"$filename"
This is probably not really the way to go, just a hint if you really want to try to tackle this in Bash. I would write a simple parser in Python if I wanted to cover all bases.

How to capitalize and replace characters in shell script in one echo

I am trying to find a way to capitalize and replace dashes of a string in one echo. I do not have the ability to use multiple lines for reassigning the string value.
For example:
string='test-e2e-uber' needs to echo $string as TEST_E2E_UBER
I currently can do one or the other by utilizing
${string^^} for capitalization
${string//-/_} for replacement
However, when I try to combine them it does not appear to work (bad substitution error).
Is there a correct syntax to achieve this?
echo ${string^^//-/_}
This does not answer directly your question, but still following script achieves what you wanted :
declare -u string='test-e2e-uber'
echo ${string//-/_}
You can do that directly with the 'tr' command, in just one 'echo'
echo "$string" | tr "-" "_" | tr "[:lower:]" "[:upper:]"
TEST_E2E_UBER
I don't think 'tr' allows to do the conversion of 2 objects in one command only, so I used pipe for output redirection
or you could do something similar with 'awk'
echo "$string" | awk '{gsub("-","_",$0)} {print toupper($0)}'
TEST_E2E_UBER
in this case, I'm replacing with 'gsub' the hyphen, then i'm printing the whole record to uppercase
Why do you dislike it so much to have two successive assignment statements? If you really hate it, you will have to revert to some external program to do the task for you, such as
string=$(tr a-z- A-Z_ <<<$string)
but I would consider it a waste of resources to create a child process for such a simple operation.

Bash split an array, add a variable and concatenate it back together

I've been trying to figure this out, unfortunately I can't. I am trying to create a function that finds the ';' character, puts four spaces before it and then and puts the code back together in a neat sentence. I've been cracking at this for a bit, and can't figure out a couple of things. I can't get the output to display what I want it to. I've tried finding the index of the ';' character and it seems I'm going about it the wrong way. The other mistake that I seem to be making is that I'm trying to split in a array in a for loop, and then split the individual words in the array by letter but I can't figure out how to do that either. If someone can give me a pointer this would be greatly appreciated. This is in bash version 4.3.48
#!commentPlacer()
{
arg=($1) #argument
len=${#arg[#]} #length of the argument
comment=; #character to look for in second loop
commaIndex=(${arg[#]#;}) #the attempted index look up
commentSpace=" ;" #the variable being concatenated into the array
for(( count1=0; count1 <= ${#arg[#]}; count1++ )) #search the argument looking for comment space
do if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
then for (( count2=0; count2 < ${#arg[count1]} ; count2++ )) #loop through again
do if [[ ${arg[count2]} != comment ]] #if no comment
then A=(${arg[#]:0:commaIndex})
A+=(commentSpace)
A+=(${arg[#]commaIndex:-1}) #concatenate array
echo "$A"
fi
done
fi
done
}
If I understand what you want correctly, it's basically to put 4 spaces in front of each ";" in the argument, and print the result. This is actually simple to do in bash with a string substitution:
commentPlacer() {
echo "${1//;/ ;}"
}
The expansion here has the format ${variable//pattern/replacement}, and it gives the contents of the variable, with each occurrence of pattern replaced by replacement. Note that with only a single / before the pattern, it would replace only the first occurrence.
Now, I'm not sure I understand how your script is supposed to work, but I see several things that clearly aren't doing what you expect them to do. Here's a quick summary of the problems I see:
arg=($1) #argument
This doesn't create an array of characters from the first argument. var=(...) treats the thing in ( ) as a list of words, not characters. Since $1 isn't in double-quotes, it'll be split into words based on whitespace (generally spaces, tabs, and linefeeds), and then any of those words that contain wildcards will be expanded to a list of matching filenames. I'm pretty sure this isn't at all what you want (in fact, it's almost never what you want, so variable references should almost always be double-quoted to prevent it). Creating a character array in bash isn't easy, and in general isn't something you want to do. You can access individual characters in a string variable with ${var:index:1}, where index is the character you want (counting from 0).
commaIndex=(${arg[#]#;}) #the attempted index look up
This doesn't do a lookup. The substitution ${var#pattern} gives the value of var with pattern removed from the front (if it matches). If there are multiple possible matches, it uses the shortest one. The variant ${var##pattern} uses the longest possible match. With ${array[#]#pattern}, it'll try to remove the pattern from each element -- and since it's not in double-quotes, the result of that gets word-split and wildcard-expanded as usual. I'm pretty sure this isn't at all what you want.
if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
Here (and in a number of other places), you're using a variable without $ in front; this doesn't use the variable at all, it just treats "commentSpace" as a static string. Also, in several places it's important to have double-quotes around it, e.g. to keep the spaces in $commentSpace from vanishing due to word splitting. There are some places where it's safe to leave the double-quotes off, but in general it's too hard to keep track of them, so just use double-quotes everywhere.
General suggestions: don't try to write c (or java or whatever) programs in bash; it works too differently, and you have to think differently. Use shellcheck.net to spot common problems (like non-double-quoted variable references). Finally, you can see what bash is doing by putting set -x before a section that doesn't do what you expect; that'll make bash print each line as it executes it, showing the equivalent of what it's executing.
Make a little function using pattern substitution on stdin:
semicolon4s() { while read x; do echo "${x//;/ ;}"; done; }
semicolon4s <<< 'foo;bar;baz'
Output:
foo ;bar ;baz

Split bash string by newline characters

I found this.
And I am trying this:
x='some
thing'
y=(${x//\n/})
And I had no luck, I thought it could work with double backslash:
y=(${x//\\n/})
But it did not.
To test I am not getting what I want I am doing:
echo ${y[1]}
Getting:
some
thing
Which I want to be:
some
I want y to be an array [some, thing]. How can I do this?
Another way:
x=$'Some\nstring'
readarray -t y <<<"$x"
Or, if you don't have bash 4, the bash 3.2 equivalent:
IFS=$'\n' read -rd '' -a y <<<"$x"
You can also do it the way you were initially trying to use:
y=(${x//$'\n'/ })
This, however, will not function correctly if your string already contains spaces, such as 'line 1\nline 2'. To make it work, you need to restrict the word separator before parsing it:
IFS=$'\n' y=(${x//$'\n'/ })
...and then, since you are changing the separator, you don't need to convert the \n to space anymore, so you can simplify it to:
IFS=$'\n' y=($x)
This approach will function unless $x contains a matching globbing pattern (such as "*") - in which case it will be replaced by the matched file name(s). The read/readarray methods require newer bash versions, but work in all cases.
There is another way if all you want is the text up to the first line feed:
x='some
thing'
y=${x%$'\n'*}
After that y will contain some and nothing else (no line feed).
What is happening here?
We perform a parameter expansion substring removal (${PARAMETER%PATTERN}) for the shortest match up to the first ANSI C line feed ($'\n') and drop everything that follows (*).

sed: Find pattern over two lines, not replace after that pattern

Wow, this one has really got me. Gonna need some tricky sed skill here I think. Here is the output value of command text I'm trying to replace:
...
fast
n : abstaining from food
The value I'd like to replace it with, is:
...
Noun
: abstaining from food
This turns out to be tricker that I thought. Because 'fast' is listed a number of times and because it is listed in other places at the beginning of the line. So I came up with this to define the range:
sed '/fast/,/^ n : / s/fast/Noun/'
Which I thought would do, but... Unfortunately, this doesn't end the replacement and the rest of the output following this match are replaced with Noun. How to get sed to stop replacement after the match? Even better, can I find a two line pattern match and replace it?
Try this:
sed "h; :b; \$b ; N; /^${1}\n n/ {h;x;s//Noun\n/; bb}; \$b ; P; D"
Unfortunately, Paul's answer reads the whole file in which makes any additional processing you might want to do difficult. This version reads the lines in pairs.
By enclosing the sed script in double quotes instead of single quotes, you can include shell variables such as positional parameters. I would recommend surrounding them with curly braces so they are set apart from the adjacent characters. When using double quotes, you'll have to be careful of the shell wanting to do its various expansions. In this example, I've escaped the dollar signs that signify the last line of the input file for the branch commands. Otherwise the shell will try to substitute the value of a variable $b which is likely to be null thus making sed unhappy.
Another technique would be to use single quotes and close and open them each time you have a shell variable:
sed 'h; :b; $b ; N; /^'${1}'\n n/ {h;x;s//Noun\n/; bb}; $b ; P; D'
# ↑open close↑ ↑open close↑
I'm assuming that the "[/code]" in your expected result is a typo. Let me know if it's not.
This seems to do what you want:
sed -e ':a;N;$!ba;s/fast\n n/Noun\n/'
I essentially stole the answer from here.
This might work for you:
sed '$!N;s/^fast\n\s*n :/Noun\n :/;P;D' file
...
Noun
: abstaining from food

Resources