Bash remove substring in file from string - string

I've one string like this:
myString='value1|value57|value31|value21'
and I've a file, called values_to_remove.txt containing a list of values, one per line, in this way
values_to_remove.txt
value1
value31
In bash, how can I remove the values contained in "values_to_remove.txt" from the string, taking into account that the values are separated by pipe and of course if I remove a value I have to removee also the preceding and the following pipe if any.
I've achieved this in python and called the python script from bash, but I need to do this directly in bash with one line command, rather than small script, otherwise I can already use my little python script.
That's the python code
myString = 'value1|value2|value3|value4'
arrString = myString.split("|")
with open("myfile.txt", encoding="utf-8") as file:
for l in file:
if l in arrString:
arrString.remove(l)
myNewString = "|".join(arrString)
Note that: the values separeted by pipe can be anything string.
Thank you

You may use this awk:
awk -v str="$myString" 'BEGIN {
n = split(str, a, /\|/)
}
{
val[$1]
}
END {
for (i=1; i<=n; i++)
if (!(a[i] in val))
s = (s == "" ? "" : s "|") a[i]
print s
}' values_to_remove.txt
value57|value21
This awk first uses a split function to split input string on |
It stores all values to be removed in another array val
In the end block it loops through split array and builds a string if value is not found in to-be-removed array.

Here is a bash solution (The if statement is a runtime optimization to skip the repacement in case of no match, thanks #Inian):
for val in value1 value31; do
if [[ "$mystring" =~ \|$val|$val\| ]]; then
mystring=${mystring/$BASH_REMATCH/}
fi
done
This looks in pure bash for the first regular expression that matches either |value or value| and removes it. Note you can match both at the same times because then you will delete too many separators. If there is a chance there are no separators you need to use ? after each pipe (maybe just the second one is enough).
You can also avoid regular expressions and just attempt to delete both a prior and a posterior pipe:
for val in value1 value31; do
mystring=${mystring/|$val/};
mystring=${mystring/$val|/};
done
All of these can be written on one line if you really need to:
for val in value1 value31; do [[ "$mystring" =~ \|$val|$val\| ]]; mystring=${mystring/$BASH_REMATCH/}; done

A pure bash solution:
#!/usr/bin/env bash
# Define the location of the values-to-be-removed file
: ${PATH_TO_FILE:=${1:-"./values_to_remove.txt"}}
# Define the string we will be working with
: ${MY_STRING:=${2:-"value1|value57|value31|value21"}}
# Process all entries in PATH_TO_FILE, one by one
while read -r substring || [[ -n "$line" ]]; do
# Remove "substring|" from the beginning of MY_STRING
MY_STRING=${MY_STRING#${substring}|}
# Remove "|substring" from the rest of MY_STRING
MY_STRING=${MY_STRING//|${substring}}
done < "${PATH_TO_FILE}"
# Return the results
echo ${MY_STRING}
Why do we...
Use ${VAR_NAME:=${1:-"DEFAULT_VALUE"}} notation - To allow the user to customise script's inputs either via environment variables or script arguments. Basically, this notation says:
If VAR_NAME environment variable exists, then use it;
If VAR_NAME doesn't exist, then set VAR_NAME to the value of the first argument to the script;
If the first argument doesn't exist either, then set VAR_NAME to the DEFAULT_VALUE.
Use read -r substring || [[ -n "$line" ]] to read the file? – read allows us to read content of ./values_to_remove.txt file, line by line. The [[ -n "$line" ]] bit is there to catch the last line in the file if it doesn't end with a newline.
References:
Assign a default value in bash
Return default value in bash
Bash substring removal
Bash search and replace

Related

Way to replace one variable with another in a string

I need to replace one variable with another variable in a multiple strings.
For example:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in string1 string2 string3; do
x="$(echo "$str" | sed 's/[a-zA-Z]//g')" # extracting a character between letters
sed 's/$x/$y/'$str # I tried this, but it does not work at all.
echo "$str"
done
Expecting output:
One;two
three;four
five;six
In my output, nothing changes:
One,two
three.four
five:six
You can use bash's substitution operator instead of sed. And simply replace anything that isn't a letter with $y.
#!/bin/bash
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "$string1" "$string2" "$string3"; do
x=${str//[^a-zA-Z]+/$y}
echo "$x"
done
Output is:
One;two
three;four
five;six
Note that your general approach wouldn't work if the input string has muliple delimiters, e.g. One,two,three. When you remove all the letters you get ,,, but that doesn't appear anywhere in the string.
Addressing issues with OP's current code:
referencing variables requires a leading $, preferably a pair of {}, and (usually) double quotes (eg, to insure embedded spaces are considered as part of the variable's value)
sed can take as input a) a stream of text on stdin, b) a file, c) process substitution or d) a here-document/here-string
when building a sed script that includes variable refences the sed script must be wrapped in double quotes (not single quotes)
Pulling all of this into OP's current code we get:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do # proper references of the 3x "stringX" variables
x="$(echo "$str" | sed 's/[a-zA-Z]//g')"
sed "s/$x/$y/" <<< "${str}" # feeding "str" as here-string to sed; allowing variables "x/y" to be expanded in the sed script
echo "$str"
done
This generates:
One;two # generated by the 2nd sed call
One,two # generated by the echo
;hree.four # generated by the 2nd sed call
three.four # generated by the echo
five;six # generated by the 2nd sed call
five:six # generated by the echo
OK, so we're now getting some output but there are obviously some issues:
the results of the 2nd sed call are being sent to stdout/terminal as opposed to being captured in a variable (presumably the str variable - per the follow-on echo ???)
for string2 we find that x=. which when plugged into the 2nd sed call becomes sed "s/./;/"; from here the . matches the first character it finds which in this case is the 1st t in string2, so the output becomes ;hree.four (and the . is not replaced)
dynamically building sed scripts without knowing what's in x (and y) becomes tricky without some additional coding; instead it's typically easier to use parameter substitution to perform the replacements for us
in this particular case we can replace both sed calls with a single parameter substitution (which also eliminates the expensive overhead of two subprocesses for the $(echo ... | sed ...) call)
Making a few changes to OP's current code we can try:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do
x="${str//[^a-zA-Z]/${y}}" # parameter substitution; replace everything *but* a letter with the contents of variable "y"
echo "${str} => ${x}" # display old and new strings
done
This generates:
One,two => One;two
three.four => three;four
five:six => five;six

Use bash to find line in java files which include a pattern, and then replace another part of the line

I have a directory that includes a lot of java files, and in each file I have a class variable:
String system = "x";
I want to be able to create a bash script which I execute in the same directory, which will go to only the java files in the directory, and replace this instance of x, with y. Here x and y are a word. Now this may not be the only instance of the word x in the java script, however it will definitely be the first.
I want to be able to execute my script in the command line similar to:
changesystem.sh -x -y
This way I can specify what the x should be, and the y I wish to replace it with. I found a way to find and print the line number at which the first instance of a pattern is found:
awk '$0 ~ /String system/ {print NR}' file
I then found how to replace a substring on a given line using:
awk 'NR==line_number { sub("x", "y") }'
However, I have not found a way to combine them. Maybe there is also an easier way? Or even, a better and more efficient way?
Any help/advice will be greatly appreciated
You may create a changesystem.sh file with the following GNU awk script:
#!/bin/bash
for f in *.java; do
awk -i inplace -v repl="$1" '
!x && /^\s*String\s+system\s*=\s*".*";\s*$/{
lwsp=gensub(/\S.*/, "", 1);
print lwsp"String system = \""repl"\";";
x=1;next;
}1' "$f";
done;
Or, with any awk:
#!/bin/bash
for f in *.java; do
awk -v repl="$1" '
!x && /^[[:space:]]*String[[:space:]]+system[[:space:]]*=[[:space:]]*".*";[[:space:]]*$/{
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp);
print lwsp"String system = \""repl"\";";
x=1;next
}1' "$f" > tmp && mv tmp "$f";
done;
Then, make the file executable:
chmod +x changesystem.sh
Then, run it like
./changesystem.sh 'new_value'
Notes:
for f in *.java; do ... done iterates over all *.java files in the current directory
-i inplace - GNU awk feature to perform replacement inline (not available in a non-GNU awk)
-v repl="$1" passes the first argument of the script to the awk command
!x && /^\s*String\s+system\s*=\s*".*";\s*$/ - if x is false and the record starts with any amount of whitespace (\s* or [[:space:]]*), then String, any 1+ whitespaces, system, = enclosed with any zero or more whitesapces, and then a " char, then has any text and ends with "; and any zero or more whitespaces, then
lwsp=gensub(/\S.*/, "", 1); puts the leading whitespace in the lwsp variable (it removes all text starting with the first non-whitespace char from the line matched)
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp); - same as above, just in a different way since gensub is not supported in non-GNU awk and sub modifies the given input string (here, lwsp)
{print "String system = \""repl"\";";x=1;next}1 - prints the String system = " + the replacement string + ";, assigns 1 to x, and moves to the next line, else, just prints the line as is.
You don't need to pre-compute the line number. The whole job can be done by one not-too-complicated sed command. You probably do want to script it, though. For example:
#!/bin/bash
[[ $# -eq 3 ]] || {
echo "usage: $0 <context regex> <target regex> <replacement text>" 1>&2
exit 1
}
sed -si -e "/$1/ { s/\\<$2\\>/$3/; t1; p; d; :1; n; b1; }" ./*.java
That assumes that the files to modify are java source files in the current working directory, and I'm sure you understand the (loose) argument check and usage message.
As for the sed command itself,
the -s option instructs sed to treat each argument as a separate stream, instead of operating as if by concatenating all the inputs into one long stream.
the -i option instructs sed to modify the designated files in-place.
the sed expression takes the default action for each line (printing it verbatim) unless the line matches the "context" pattern given by the first script argument.
for lines that do match the context pattern,
s/\\<$2\\>/$3/ - attempt to perform the wanted substitution
the \< and \> match word start and end boundaries, respectively, so that the specified pattern will not match a partial word (though it can match multiple complete words if the target pattern allows)
t1 - if a substitution was made, then branch to label 1, otherwise
p; d - print the current line and immediately start the next cycle
:1; n; b1 - label 1 (reachable only by branching): print the current line and read the next one, then loop back to label 1. This prints the remainder of the file without any more tests or substitutions.
Example usage:
/path/to/replace_first.sh 'String system' x y
It is worth noting that that does expose the user to some details of seds interpretation of regular expressions and replacement text, though that does not manifest for the example usage.
Note that that could be simplified by removing the context pattern bit if you are sure you want to modify the overall first appearance of the target in each file. You could also hard-code the context, the target pattern, and/or the replacement text. If you hard-code all three then the script would no longer need any argument handling or checking.

IFS and moving through single positions in directory

I have two questions .
I have found following code line in script : IFS=${IFS#??}
I would like to understand what it is exactly doing ?
When I am trying to perform something in every place from directory like eg.:
$1 = home/user/bin/etc/something...
so I need to change IFS to "/" and then proceed this in for loop like
while [ -e "$1" ]; do
for F in `$1`
#do something
done
shift
done
Is that the correct way ?
${var#??} is a shell parameter expansion. It tries to match the beginning of $var with the pattern written after #. If it does, it returns the variable $var with that part removed. Since ? matches any character, this means that ${var#??} removes the first two chars from the var $var.
$ var="hello"
$ echo ${var#??}
llo
So with IFS=${IFS#??} you are resetting IFS to its value after removing its two first chars.
To loop through the words in a /-delimited string, you can store the splitted string into an array and then loop through it:
$ IFS="/" read -r -a myarray <<< "home/user/bin/etc/something"
$ for w in "${array[#]}"; do echo "-- $w"; done
-- home
-- user
-- bin
-- etc
-- something

How to pass quoted arguments but with blank spaces in linux

I have a file with these arguments and their values ​​this way
# parameters.txt
VAR1 001
VAR2 aaa
VAR3 'Hello World'
and another file to configure like this
# example.conf
VAR1 = 020
VAR2 = kab
VAR3 = ''
when I want to get the values in a function I use this command
while read p; do
VALUE=$(echo $p | awk '{print $2}')
done < parameters.txt
the firsts arguments throw the right values, but the last one just gets the 'Hello for the blank space, my question is how do I get the entire 'Hello World' value?
If you can use bash, there is no need to use awk: read and shell parameter expansion can be combined to solve your problem:
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value,
# but *including* any enclosing ' chars, if any.
# Assuming that there are no *embedded* ' chars., you can remove them
# as follows:
value=${value//\'/}
done < parameters.txt
read by default also breaks a line into fields by whitespace, like awk, but unlike awk it has the ability to assign the remainder of the line to a varaible, namely the last one, if fewer variables than fields found are specified;
read's -r option is generally worth specifying to avoid unexpected interpretation of \ chars. in the input.
As for your solution attempt:
awk doesn't know about quoting in input - by default it breaks input into fields by whitespace, irrespective of quotation marks.
Thus, a string such as 'Hello World' is simply broken into fields 'Hello and World'.
However, in your case you can split each input line into its key and value using a carefully crafted FS value (FS is the input field separator, which can be also be set via option -F; the command again assumes bash, this time for use of <(...), a so-called process substitution, and $'...', an ANSI C-quoted string):
while IFS= read -r value; do
# Work with $value...
done < <(awk -F$'^[[:alnum:]]+ (= )?\'?|\'' '{ print $2 }' parameters.txt)
Again the assumption is that values contain no embedded ' instances.
Field separator regex $'^[[:alnum:]]+ (= )?\'?|\'' splits each line so that $2, the 2nd field, contains the value, stripped of enclosing ' chars., if any.
xargs is the rare exception among the standard utilities in that it does understand single- and double-quoted strings (yet also without support for embedded quotes).
Thus, you could take advantage of xargs' ability to implicitly strip enclosing quotes when it passes arguments to the specified command, which defaults to echo (again assumes bash):
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value, strippe of any enclosing
# single quotes by `xargs`.
done < <(xargs -L1 < parameters.txt)
xargs -L1 process one (1) line (-L) at a time and implicitly invokes echo with all tokens found on each line, with any enclosing quotes removed from the individual tokens.
The default field separator in awk is the space. So you are only printing the first word in the string passed to awk.
You can specify the field separator on the command line with -F[field separator]
Example, setting the field separator to a comma:
$ echo "Hello World" | awk -F, '{print $1}'
Hello World

How to get value from command line using for loop

Following is the code for extracting input from command line into bash script:
input=(*);
for i in {1..5..1}
do
input[i]=$($i);
done;
My question is: how to get $1, $2, $3, $4 values from input command line, where command line code input is:
bash script.sh "abc.txt" "|" "20" "yyyy-MM-dd"
Note: Not using for i in "${#}"
#!/bin/bash
for ((i=$#-1;i>=0;i--)); do
echo "${BASH_ARGV[$i]}"
done
Example: ./script.sh a "foo bar" c
Output:
a
foo bar
c
I don't know what you have against for i in "$#"; do..., but you can certainly do it with shift, for example:
while [ -n "$1" ]; do
printf " '%s'\n" "$1"
shift
done
Output
$ bash script.sh "abc.txt" "|" "20" "yyyy-MM-dd"
'abc.txt'
'|'
'20'
'yyyy-MM-dd'
Personally, I don't see why you exclude for i in "$#"; do ... it is a valid way to iterate though the args that will preserve quoted whitespace. You can also use the array and C-style for loop as indicated in the other answers.
note: if you are going to use your input array, you should use input=("$#") instead of input=($*). Using the latter will not preserve quoted whitespace in your positional parameters. e.g.
input=("$#")
for ((i = 0; i < ${#input[#]}; i++)); do
printf " '%s'\n" "${input[i]}"
done
works fine, but if you use input=($*) with arguments line "a b", it will treat those as two separate arguments.
If I'm correctly understanding what you're trying to do, you can write:
input=("$#")
to copy the positional parameters into an array named input.
If you specifically want only the first five positional parameters, you can write:
input=("${#:1:5}")
Edited to add: Or are you asking, given a variable i that contains the integer 2, how you can get $2? If that's your question, then — you can use indirect expansion, where Bash retrieves the value of a variable, then uses that value as the name of the variable to substitute. Indirect expansion uses the ! character:
i=2
input[i]="${!i}" # same as input[2]="$2"
This is almost always a bad idea, though. You should rethink what you're doing.

Resources