Replace spaces with underscores via BASH - linux

Suppose i have a string, $str.
I want $str to be edited such that all the spaces in it are replaced by underscores.
Example
a="hello world"
I want the final output of
echo "$a"
to be hello_world

You could try the following:
str="${str// /_}"

$ a="hello world"
$ echo ${a// /_}
hello_world
According to bash(1):
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern
just as in pathname expansion. Parameter is expanded and the
longest match of pattern against its value is replaced with string.
If pattern begins with /, all matches of pattern are replaced
with string. Normally only the first match is replaced. If
pattern begins with #, it must match at the beginning of the
expanded value of parameter. If pattern begins with %, it must match
at the end of the expanded value of parameter. If string is null,
matches of pattern are deleted and the / following pattern may be
omitted. If parameter is # or *, the substitution operation is
applied to each positional parameter in turn, and the expansion is the
resultant list. If parameter is an array variable subscripted
with # or *, the substitution operation is applied to each member of
the array in turn, and the expansion is the resultant list.

Pure bash:
a="hello world"
echo "${a// /_}"
OR tr:
tr -s ' ' '_' <<< "$a"

With sed reading directly from a variable:
$ sed 's/ /_/g' <<< "$a"
hello_world
And to store the result you have to use the var=$(command) syntax:
a=$(sed 's/ /_/g' <<< "$a")
For completeness, with awk it can be done like this:
$ a="hello my name is"
$ awk 'BEGIN{OFS="_"} {for (i=1; i<NF; i++) printf "%s%s",$i,OFS; printf "%s\n", $NF}' <<< "$a"
hello_my_name_is

Multiple spaces to one underscore
This can easily be achieved with a GNU shell parameter expansion. In particular:
${parameter/pattern/string}
If pattern begins with /, all matches of pattern are replaced with string.
with +(pattern-list)
Matches one or more occurrences of the given patterns.
Hence:
$ a='hello world example'
$ echo ${a// /_}
hello_world____example
$ echo ${a//+( )/_}
hello_world_example
However, for this to work in a bash script two amendments need to be made:
The parameter expansion requires encapsulation in double quotes " " to prevent word splitting with the input field separator $IFS.
The extglob shell option needs to be enabled using the shopt builtin, for extended pattern matching operators to be recognised.
The bash script finally looks like this:
#!/usr/bin/env bash
shopt -s extglob
a='hello world example'
echo "${a//+( )/_}"

Related

Extract email string from string in bash

I have a variable: $change.
I have tried to extract email from it (find the string between "by" and "#"):
change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
email=$(echo $change|sed -e 's/\by\(.*\)#/\1/')
It did not work.
You have an escape character before b, which makes it \b. And this is a word boundary, so something you don't want here.
See the difference:
$ echo "$change" | sed -e 's/\by\(.*\)#/\1/'
# ^
Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'
$ echo "$change" | sed -e 's/by\(.*\)#/\1/'
# ^
Change 1234 on 2016/08/31 namecompany.com 'cdex abcd'
# ^
# by is not here any more
But if you want to get the name, just use .* to match everything up to by:
$ echo "$change" | sed -e 's/.*by\(.*\)#/\1/'
namecompany.com 'cdex abcd'
Finally, if what you want is just the data between by (note the trailing space) and #, use either of these (with -r you don't have to escape the captured groups):
sed -e 's/.*by \(.*\)#.*/\1/'
sed -r 's/.*by (.*)#.*/\1/'
With your input:
$ sed -e 's/.*by \(.*\)#.*/\1/' <<< "Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
Using grep -oP you can use match reset \K:
grep -oP ' by \K[^#]*' <<< "$change"
name
or using lookbehind:
grep -oP '(?<= by )[^#]*' <<< "$change"
name
There is no need to resort sed, awk, grep, etc. use regular expression matching:
[[ $change =~ by\ ([^#]*)# ]] && email=${BASH_REMATCH[1]}
From the man page
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the
right of the operator is considered an extended regular expres-
sion and matched accordingly (as in regex(3)). The return value
is 0 if the string matches the pattern, and 1 otherwise. If the
regular expression is syntactically incorrect, the conditional
expression's return value is 2. If the shell option nocasematch
is enabled, the match is performed without regard to the case of
alphabetic characters. Any part of the pattern may be quoted to
force the quoted portion to be matched as a string. Bracket
expressions in regular expressions must be treated carefully,
since normal quoting characters lose their meanings between
brackets. If the pattern is stored in a shell variable, quoting
the variable expansion forces the entire pattern to be matched
as a string. Substrings matched by parenthesized subexpressions
within the regular expression are saved in the array variable
BASH_REMATCH. The element of BASH_REMATCH with index 0 is the
portion of the string matching the entire regular expression.
The element of BASH_REMATCH with index n is the portion of the
string matching the nth parenthesized subexpression.
It might be surprising, that the pattern is written without surrounding quotes, which is why it is probably a good idea to use a variable for the pattern instead:
regex='by ([^#]*)#'
[[ $change =~ $regex ]] && email=${BASH_REMATCH[1]}
With sed:
sed -E 's/.* by ([^#]+).*/\1/' <<<"$change"
With awk:
awk -F# '{sub(".* ", "", $1); print $1}' <<<"$change"
Example:
$ sed -E 's/.* by ([^#]+).*/\1/' <<<"Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
$ awk -F# '{sub(".* ", "", $1); print $1}' <<<"Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
awk version, this will use awk's inbuilt split function to split 6th field using "#" as delimiter and store it in an array named a. Print it for printing first value of array a.
echo $change |awk '{ split($6,a,"#"); print a[1]}'
name
In case you need complete email address then :
echo $change |awk '{print $6}'
name#company.com
Solution with Parameter Expansion
First, a temporary variable that deletes string upto by and a space
$ change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
$ tmp="${change#*by }"
$ echo "$tmp"
name#company.com 'cdex abcd'
Then, extract either the string before #
$ email="${tmp%#*}"
$ echo "$email"
name
Or, extract complete email address
$ email="${tmp%% *}"
$ echo "$email"
name#company.com
Edit:
To extract multiple strings separated by comma:
$ change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
$ email=$(echo "$change" | perl -ne 'print join(",",/(\S+)#/g)')
$ echo "$email"
name
$ change="by name#company.com asd abcd#xyz.net 123 tom#xyz asdf"
$ email=$(echo "$change" | perl -ne 'print join(",",/(\S+)#/g)')
$ echo "$email"
name,abcd,tom

Bash regexp to find part of string

I have a string like setSuperValue('sdfsdfd') and I need to get the 'sdfsdfd' value from this line. What is way to do this?
First I find line by setSuperValue and then get only string with my target content - setSuperValue('sdfsdfd'). How do I build a regexp to get sdfsdfd from this line?
This should help you
grep setSuperValue myfile.txt | grep -o "'. *'" | tr -d "'"
The grep -o will return all text that start with a single ' and ends with another ', including both quotes. Then use tr to get rid of the quotes.
You could also use cut:
grep setSuperValue myfile.txt | cut -d"'" -f2
Or awk:
grep setSuperValue myfile.txt | awk -F "'" '{print $2}'
This will split the line where the single quotes are and return the second value, that is what you are looking for.
Generally, to locate a string in multiple lines of data, external utilities will be much faster than looping over lines in Bash.
In your specific case, a single sed command will do what you want:
sed -n -r "s/^.*setSuperValue\('([^']+)'\).*$/\1/p" file
Extended (-r) regular expression ^.*setSuperValue\('([^']+)'\).*$ matches any line containing setSuperValue('...') as a whole, captures whatever ... is in capture group \1, replaces the input line with that, and prints p the result.
Due to option -n, nothing else is printed.
Move the opening and closing ' inside (...) to include them in the captured value.
Note: If the input file contains multiple setSuperValue('...') lines, the command will print every match; either way, the command will process all lines.
To only print the 1st match and stop processing immediately after, modify the command as follows:
sed -n -r "/^.*setSuperValue\('([^']+)'\).*$/ {s//\1/;p;q}" file
/.../ only matches lines containing setSuperValue('...'), causing the following {...} to be executed only for matching lines.
s// - i.e., not specifying a regex - implicitly performs substitution based on the same regex that matched the line at hand; p prints the result, and q quits processing altogether, meaning that processing stops once the fist match was found.
If you have already located a line of interest through other methods and are looking for a pure Bash method of extracting a substring based on a regex, use =~, Bash's regex-matching operator, which supports extended regular expressions and capture groups through the special ${BASH_REMATCH[#]} array variable:
$ sampleLine="... setSuperValue('sdfsdfd') ..."
$ [[ $sampleLine =~ "setSuperValue('"([^\']+)"')" ]] && echo "${BASH_REMATCH[1]}"
sdfsdfd
Note the careful quoting of the parts of the regex that should be taken literally, and how ${BASH_REMATCH[1]} refers to the first (and only) captured group.
You can parse the value from the line, using parameter expansion/substring removal without relying on any external tools:
#!/bin/bash
while read -r line; do
value=$(expr "$line" : ".*setSuperValue('\(.*\)')")
if [ "x$value" != "x" ]; then
printf "value : %s\n" "$value"
fi
done <"$1"
Test Input
$ cat dat/supervalue.txt
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
Example Output
$ bash parsevalue.sh dat/supervalue.txt
value : sdfsdfd
value : sdfsdfd
value : sdfsdfd

lowercase + capitalize + concatenate words of a string in shell (e.g. bash)

How to capitalize+concatenate words of a string?
(first letter uppercase and all other other letters lowercase)
example:
input = "jAMeS bOnD"
output = "JamesBond"
String manipulation available in bash version 4:
${variable,,} to lowercase all letters
${variable^} to uppercase first letter of each word
use ${words[*]^} instead of ${words[#]^} to save some script lines
And other improvements from mklement0 (see his comments):
Variable names in lower-case because upper-case ones may conflict with environment variables
Give meaningful names to variables (e.g. ARRAY -> words)
Use local to avoid impacting IFS outside the function (once is enougth)
Use local for all other local variables ( variable can be first declared, and later assigned)
ARRAY=( $LOWERCASE ) may expands globs (filename wildcards)
temporarily disable Pathname Expansion using set -f or shopt -so noglob
or use read -ra words <<< "$input" instead of words=( $input )
Ultimate function:
capitalize_remove_spaces()
{
local words IFS
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
If you want to keep alphanumeric characters only, extends the IFS built-in variable just before the read -ra words operation:
capitalize_remove_punctuation()
{
local words IFS=$' \t\n-\'.,;!:*?' #Handle hyphenated names and punctuation
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
Examples:
> capitalize_remove_spaces 'jAMeS bOnD'
JamesBond
> capitalize_remove_spaces 'jAMeS bOnD *'
JamesBond*
> capitalize_remove_spaces 'Jean-luc GRAND-PIERRE'
Jean-lucGrand-pierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE'
JeanLucGrandPierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE *'
JeanLucGrandPierre
Here's a bash 3+ solution that utilizes tr for case conversion (the case conversion operators (,, ^, ...) were introduced in bash 4):
input="jAMeS bOnD"
read -ra words <<<"$input" # split input into an array of words
output="" # initialize output variable
for word in "${words[#]}"; do # loop over all words
# add capitalized 1st letter
output+="$(tr '[:lower:]' '[:upper:]' <<<"${word:0:1}")"
# add lowercase version of rest of word
output+="$(tr '[:upper:]' '[:lower:]' <<<"${word:1}")"
done
Note:
Concatenation (removal of whitespace between words) happens implicitly by always directly appending to the output variable.
It's tempting to want to use words=( $input ) to split the input string into an array of words, but there's a gotcha: the string is subject to pathname expansion, so if a word happens to be a valid glob (e.g., *), it will be expanded (replaced with matching filenames), which is undesired; using read -ra to create the array avoids this problem (-a reads into an array, -r turns off interpretation of \ chars. in the input).
From other posts, I came up with this working script:
str="jAMeS bOnD"
res=""
split=`echo $str | sed -e 's/ /\n/g'` # Split with space as delimiter
for word in $split; do
word=${word,,} # Lowercase
word=${word^} # Uppercase first letter
res=$res$word # Concatenate result
done
echo $res
References:
Converting string to lower case in Bash shell scripting
How do I split a string on a delimiter in Bash?
Troubleshooting bash script to capitalize first letter in every word
Using awk it is little verbose but does the job::
s="jAMeS bOnD"
awk '{for (i=1; i<=NF; i++)
printf toupper(substr($i, 1, 1)) tolower(substr($i,2)); print ""}' <<< "$s"
JamesBond
echo -e '\n' "!!!!! PERMISSION to WRITE in /var/log/ DENIED !!!!!"
echo -e '\n'
echo "Do you want to continue?"
echo -e '\n' "Yes or No"
read -p "Please Respond_: " Response #get input from keyboard "yes/no"
#Capitalizing 'yes/no' with # echo $Response | awk '{print toupper($0)}' or echo $Response | tr [a-z] [A-Z]
answer=$(echo $Response | awk '{print toupper($0)}')
case $answer in
NO)
echo -e '\n' "Quitting..."
exit 1
;;
YES)
echo -e '\n' "Proceeding..."
;;
esac

Any way to accept special characters without delimiters in a shell script?

I'm tasked with a making a shell script that swaps 2 strings and then outputs a file. The commands are similar to:
sed s/search_for/ replace/g output.txt > temp.dat
mv temp.dat output.txt
The script works like this:
./myScript var_A var_B output.file
Which I got to work fine. The second part does the same thing, but I must treat the following special characters as regular strings:
[ ] ^ * + . $ \ -
I have a general idea on how I want to tackle this (this may be the wrong way). I want to accept those characters and set them as variable with a \ appended in the front.
var_A=\\$1
var_B=\\$2
My issue is with the * (asterisk) and \ (backslash) characters. I'm using a simple test script to see what parameters I can easily convert to a variable:
for i in "$#"
do
echo "$i"
done
But the * char shows all the files in the directory and \ shows the next argument. I know about set -o noglob and set -f, but those will not work for me (and doesn't work on the script). I also know that you can escape using a backslash but I can't use that either. I must be able to take a special character (even * and /) and convert to a string. I hope this all makes sense and someone can help me.
If I understand correctly, you put patterns in variables, then you use these variables in sed, and you need to treat the patterns as literal strings, without their special meaning in regular expressions?
If so, then before passing the patterns to sed, you need to escape the special symbols. Here's a possible implementation with my tests:
#!/bin/sh
escaped() {
echo "$1" | sed -e 's/[].+-[$\\^*]/\\&/g'
}
set -- [ ] ^ \* + . \$ \\ -
for pat1; do
pat2=$(escaped "$pat1")
echo "$pat1 was $pat1" | sed -e s/$pat2/_/
done
The escaped function takes the argument and puts a backslash in front of special characters. The loop demonstrates that the pat2 variable generated this way correctly matches the special characters in the input string.
If you want to perform literal replacements, sed is the wrong tool for the job.
See the awk script given in http://mywiki.wooledge.org/BashFAQ/021. Quoted here:
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
[[ $1 ]] || return
awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
BEGIN { len = length(str); }
{
out = "";
while (i = index($0, str)) {
out = out substr($0, 1, i-1) rep;
$0 = substr($0, i + len);
}
out = out $0;
print out;
}
'
}
...which can be used as...
tempfile=$(mktemp "$file.XXXXXX")
gsub_literal "$search" "$rep" \
<"$file" \
>"$tempfile" && \
mv -- "$tempfile" "$file"
with absolutely any values for $search and $rep.
Perl is also well-suited for operations of this type, having in-line replace functionality and (unlike sed) the ability to refer directly to its argv array or environment variables for literal search or replacement values.
You have to quote your patterns on the shell's command line. You can't work around that.
Perl regular expressions give you a "quotemeta" function that treats every character as literal
perl -e '
$str = q{this is a string with **emphasis**};
$pattern = q{**emphasis**};
$repl = "characters";
$str =~ s/$pattern/$repl/;
print $str
'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE *emphasis**/ at -e line 5.
but
perl -e '
$str = q{this is a string with **emphasis**};
$pattern = q{**emphasis**};
$repl = "characters";
$str =~ s/\Q$pattern\E/$repl/;
#.........^^
print $str
'
this is a string with characters

How to extract last part of string in bash?

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Resources