Bash: remove substrings using ${string//substr/rep} - string

I am trying to write a small shell script, which can read a text file (given as argument), deleting all invalid Base64 chars and then decode this Base64 String into readable Text.
For this Example i can assume, that i have got a valid Base64 String polluted with additional invalid chars. So simply deleting them makes the String valid again.
I am having problems with the "remove al invalid chars" part.
Here is my Script:
#!/bin/bash
args=("$#")
#echo ${args[0]}
# read file
STRING="$(cat ${args[0]})"
echo "Input:"
echo $STRING
echo "\n"
#BASE64_REGEX='!/[^A-Za-z0-9+\/=]/'
STRING=${STRING//[!?_-]/}
echo "Fixed:"
echo $STRING
echo "\n"
# decode String
DECODED=$(base64 -d <<< "$STRING")
echo "Decoded:"
echo $DECODED
echo "\n"
I think my problem is this part here STRING=${STRING//[!?_-]/}. After this Operation the String contains ??___--- + linebreak, so i must somehow be close.
EDIT:
This would be the example String. And i try to remove all Characters, which are NOT part of the Base64 alphapet.
!RGllIGVpbnppZ2VuIFNvbmRlc??nplaWNoZW4gaW0gQmFzZTY0IEFscGhhYmV0IHNpbmQgIisg_L_y_A9Ii4gQWxsZSB3ZWl0ZXJlbi-B-T-b25kZXJ6!ZWljaGVuICIhIsKnJCUiIGtvbW1!lbiBkb3J0IG5pY2h0IHZvci"4=
Thanks for your help!

It' because ! in first position in a character set invert the set like ^ (note: only true for pattern matching (glob) not regex matching, but in this case it's just pattern matching)
maybe you want
STRING=${STRING//[?\!_-]/}
why not use the set in comments
STRING=${STRING//[^A-Za-z0-9+\/=]/}

Related

Way to replace one variable with another in a string

I need to replace one variable with another variable in a multiple strings.
For example:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in string1 string2 string3; do
x="$(echo "$str" | sed 's/[a-zA-Z]//g')" # extracting a character between letters
sed 's/$x/$y/'$str # I tried this, but it does not work at all.
echo "$str"
done
Expecting output:
One;two
three;four
five;six
In my output, nothing changes:
One,two
three.four
five:six
You can use bash's substitution operator instead of sed. And simply replace anything that isn't a letter with $y.
#!/bin/bash
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "$string1" "$string2" "$string3"; do
x=${str//[^a-zA-Z]+/$y}
echo "$x"
done
Output is:
One;two
three;four
five;six
Note that your general approach wouldn't work if the input string has muliple delimiters, e.g. One,two,three. When you remove all the letters you get ,,, but that doesn't appear anywhere in the string.
Addressing issues with OP's current code:
referencing variables requires a leading $, preferably a pair of {}, and (usually) double quotes (eg, to insure embedded spaces are considered as part of the variable's value)
sed can take as input a) a stream of text on stdin, b) a file, c) process substitution or d) a here-document/here-string
when building a sed script that includes variable refences the sed script must be wrapped in double quotes (not single quotes)
Pulling all of this into OP's current code we get:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do # proper references of the 3x "stringX" variables
x="$(echo "$str" | sed 's/[a-zA-Z]//g')"
sed "s/$x/$y/" <<< "${str}" # feeding "str" as here-string to sed; allowing variables "x/y" to be expanded in the sed script
echo "$str"
done
This generates:
One;two # generated by the 2nd sed call
One,two # generated by the echo
;hree.four # generated by the 2nd sed call
three.four # generated by the echo
five;six # generated by the 2nd sed call
five:six # generated by the echo
OK, so we're now getting some output but there are obviously some issues:
the results of the 2nd sed call are being sent to stdout/terminal as opposed to being captured in a variable (presumably the str variable - per the follow-on echo ???)
for string2 we find that x=. which when plugged into the 2nd sed call becomes sed "s/./;/"; from here the . matches the first character it finds which in this case is the 1st t in string2, so the output becomes ;hree.four (and the . is not replaced)
dynamically building sed scripts without knowing what's in x (and y) becomes tricky without some additional coding; instead it's typically easier to use parameter substitution to perform the replacements for us
in this particular case we can replace both sed calls with a single parameter substitution (which also eliminates the expensive overhead of two subprocesses for the $(echo ... | sed ...) call)
Making a few changes to OP's current code we can try:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do
x="${str//[^a-zA-Z]/${y}}" # parameter substitution; replace everything *but* a letter with the contents of variable "y"
echo "${str} => ${x}" # display old and new strings
done
This generates:
One,two => One;two
three.four => three;four
five:six => five;six

How to extract a substring from a string stored in a variable, based on a start / stop character

In the first line I'm after the value 64 and F2DD65
I want to catch the first variable by reading data from from a string in a variable, first from the beginning of the line untill the : character, and read the other variable from after the # character and 6 characters forward.
Is this possible?
This is the string:
var="64: (242,221,101) #F2DD65 srgb(242,221,101)"
my end result would be stored in variables:
var1="64"
var2="F2DD65"
var1=${var%%:*}
var2=${var##*#}
var2=${var2%% *}
Reference: Shell Parameter Expansion.
sed -rn 's/(^.*)(\:.*#)(.*)([[:space:]].*$)/\1 - \3/p' <<< "64: (242,221,101) #F2DD65 srgb(242,221,101)"
With sed, split the line into sections using regular expressions (-r). Substitute the line for the relevant section (the first and then third separated with a -.
awk -F [:#\ ] '{ print $1" - "$5 }' <<< "64: (242,221,101) #F2DD65 srgb(242,221,101)"
With awk, split the line based on a :, a # and a space as delimiters. Print the 1st and 5th delimited fields with a - in between.
With bash regular expressions:
var="64: (242,221,101) #F2DD65 srgb(242,221,101)"
re="^([^:]+): .* #([[:xdigit:]]+)"
if [[ $var =~ $re ]]; then
var1="${BASH_REMATCH[1]}"
var2="${BASH_REMATCH[2]}"
else
# String isn't the right format
echo Fail
fi

How can I truncate a line of text longer than a given length?

How would you go about removing everything after x number of characters? For example, cut everything after 15 characters and add ... to it.
This is an example sentence should turn into This is an exam...
GnuTools head can use chars rather than lines:
head -c 15 <<<'This is an example sentence'
Although consider that head -c only deals with bytes, so this is incompatible with multi-bytes characters like UTF-8 umlaut ü.
Bash built-in string indexing works:
str='This is an example sentence'
echo "${str:0:15}"
Output:
This is an exam
And finally something that works with ksh, dash, zsh…:
printf '%.15s\n' 'This is an example sentence'
Even programmatically:
n=15
printf '%.*s\n' $n 'This is an example sentence'
If you are using Bash, you can directly assign the output of printf to a variable and save a sub-shell call with:
trim_length=15
full_string='This is an example sentence'
printf -v trimmed_string '%.*s' $trim_length "$full_string"
Use sed:
echo 'some long string value' | sed 's/\(.\{15\}\).*/\1.../'
Output:
some long strin...
This solution has the advantage that short strings do not get the ... tail added:
echo 'short string' | sed 's/\(.\{15\}\).*/\1.../'
Output:
short string
So it's one solution for all sized outputs.
Use cut:
echo "This is an example sentence" | cut -c1-15
This is an exam
This includes characters (to handle multi-byte chars) 1-15, c.f. cut(1)
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters
Awk can also accomplish this:
$ echo 'some long string value' | awk '{print substr($0, 1, 15) "..."}'
some long strin...
In awk, $0 is the current line. substr($0, 1, 15) extracts characters 1 through 15 from $0. The trailing "..." appends three dots.
Todd actually has a good answer however I chose to change it up a little to make the function better and remove unnecessary parts :p
trim() {
if (( "${#1}" > "$2" )); then
echo "${1:0:$2}$3"
else
echo "$1"
fi
}
In this version the appended text on longer string are chosen by the third argument, the max length is chosen by the second argument and the text itself is chosen by the first argument.
No need for variables :)
Using Bash Shell Expansions (No External Commands)
If you don't care about shell portability, you can do this entirely within Bash using a number of different shell expansions in the printf builtin. This avoids shelling out to external commands. For example:
trim () {
local str ellipsis_utf8
local -i maxlen
# use explaining variables; avoid magic numbers
str="$*"
maxlen="15"
ellipsis_utf8=$'\u2026'
# only truncate $str when longer than $maxlen
if (( "${#str}" > "$maxlen" )); then
printf "%s%s\n" "${str:0:$maxlen}" "${ellipsis_utf8}"
else
printf "%s\n" "$str"
fi
}
trim "This is an example sentence." # This is an exam…
trim "Short sentence." # Short sentence.
trim "-n Flag-like strings." # Flag-like strin…
trim "With interstitial -E flag." # With interstiti…
You can also loop through an entire file this way. Given a file containing the same sentences above (one per line), you can use the read builtin's default REPLY variable as follows:
while read; do
trim "$REPLY"
done < example.txt
Whether or not this approach is faster or easier to read is debatable, but it's 100% Bash and executes without forks or subshells.

Replace strings contain whitespace using sed command shell script

I am trying to replace the strings in an xml file using the sed command. My script contains the following code.
SEARCH='key="identifierA" value ="000000 00:00:00"'
REPLACE='key="identifierA" value ="101617 00:00:00"'
TEST_DIR=home/test/
TEST_FILE="test.xml"
ChangeXml(){
ModifyValue $TEST_DIR $TEST_FILE $SEARCH $REPLACE
}
ModifyValue (){
cd $1
echo "Search : $3 Replace : $4 "
sed -i "s/$3/$4/g" $2
}
#Actions performed
ChangeXml
But this #3 in the echo returns identifierA and $4 returns 000000 00:00:00. Its supposed to give the value assigned to those variables instead. Due to this replace is not working as expected. Tried to escape the space in between key="identifierA" value ="000000 00:00:00". But not getting the results. I am very new to the shell scripting. Can anyone tell me the reason and correct me to achieve the expected result?
Quote the variables if they can contain whitespace:
ModifyValue "$TEST_DIR" "$TEST_FILE" "$SEARCH" "$REPLACE"
Otherwise, $SEARCH is sent in pieces (split on whitespace) and populates more than one argument.

How do I reverse escape backslash encodings like "\ " and "\303\266" in bash?

I have a script that records files with UTF8 encoded names. However the script's encoding / environment wasn't set up right, and it just recoded the raw bytes. I now have lots of lines in the file like this:
.../My\ Folders/My\ r\303\266m/...
So there are spaces in the filenames with \ and UTF8 encoded stuff like \303\266 (which is ö). I want to reverse this encoding? Is there some easy set of bash command line commands I can chain together to remove them?
I could get millions of sed commands but that'd take ages to list all the non-ASCII characters we have. Or start parsing it in python. But I'm hoping there's some trick I can do.
Here's a rough stab at the Unicode characters:
text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo "$text"|sed -e 's|\\|\\\\|g')"\'"
# the argument to the echo must not be quoted or escaped-quoted in the next step
text=$(eval "echo $(eval "$text")")
read text < <(echo "$text")
echo "$text"
This makes use of the $'string' quoting feature of Bash.
This outputs "/My Folders/My röm/".
As of Bash 4.4, it's as easy as:
text="/My Folders/My r\303\266m/"
echo "${text#E}"
This uses a new feature of Bash called parameter transformation. The E operator causes the parameter to be treated as if its contents were inside $'string' in which backslash escaped sequences, in this case octal values, are evaluated.
It is not clear exactly what kind of escaping is being used. The octal character codes are C, but C does not escape space. The space escape is used in the shell, but it does not use octal character escapes.
Something close to C-style escaping can be undone using the command printf %b $escaped. (The documentation says that octal escapes start with \0, but that does not seem to be required by GNU printf.) Another answer mentions read for unescaping shell escapes, although if space is the only one that is not handled by printf %b then handling that case with sed would probably be better.
In the end I used something like this:
cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'
Some of the files had % in them, which is a printf special character, so I had to 'double it up' so that it would be escaped and passed straight through. The -r in read stops read escaping the \'s however read doesn't turn "\ " into " ", so I needed the final sed.
Use printf to solve the issue with utf-8 text. Use read to take care of spaces (\ ).
Like this:
$ text='/My\ Folders/My\ r\303\266m/'
$ IFS='' read t < <(printf "$text")
$ echo "$t"
/My Folders/My röm/
The built-in 'read' function will handle part of the
problem:
$ echo "with\ spaces" | while read r; do echo $r; done
with spaces
Pass the file (line by line) to the following perl script.
#!/usr/bin/per
sub encode {
$String = $_[0];
$_ = $String;
while(/(\\[0-9]+|.)/g) {
$Match = $1;
if ($Match =~ /\\([0-9]+)/) {
$Code = oct(0 + $1);
$Char = ((($Code >= 32) && ($Code 160))
? chr($Code)
: sprintf("\\x{%X}", $Code);
printf("%s", $Char);
} else {
print "$Match";
}
}
print "\n";
}
while ($#ARGV >= 0) {
$File = shift();
open(my $F, ") {
$String =~ s/\\ / /g;
&encode($Line);
}
}
Like this:
$ ./PerlEncode.pl Test.txt
Where Test.txt contains:
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
The line "$String =~ s/\ / /g;" replace "\ " with " " and sub encode parse those unicode char.
Hope this help

Resources