Loading text as a string, then extracting items - linux

My task is to write a .sh script that will load the user's first name. Then it will use a loop to count the occurrences of the letter 'a' and then print their number.
I understand that it is loading the text:
read / p "Please enter some text" text
Only then referring to the element $ {text [0]} gets all the text, not its single element
#!/bin/bash
echo "Please write"
read b
if [ ${b:${#b}-1:1} -eq 'a' ] ; then
echo "Women"
else
echo "man"
fi
l=0
for (( i=0 ; i< ${#b} ; i++ )) do
if [ ${b:$i:1} -eq 'a' ] ; then
((l++))
fi
done
echo L = $l

For counting the number of a characters in a variable, you could erase first all characters which are not an a. Example:
text=abcaagg
atext=${text//[!a]/}
The variable atext now holds only aaa. Calculate the length of that string, and you know how many a you had in your original variable:
echo ${#atext}
UPDATE: By request, I quote here the part of the bash man page which eplains the substitution. It is stated in the section titled Parameter Expansion:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern
just as in pathname expansion. Parameter is expanded and the long‐
est match of pattern against its value is replaced with string. If
pattern begins with /, all matches of pattern are replaced with
string. Normally only the first match is replaced. If pattern be‐
gins with #, it must match at the beginning of the expanded value
of parameter. If pattern begins with %, it must match at the end
of the expanded value of parameter. If string is null, matches of
pattern are deleted and the / following pattern may be omitted. If
the nocasematch shell option is enabled, the match is performed
without regard to the case of alphabetic characters.

Related

Bash String Format Comparison with Wildcards

I am fairly new to bash scripting and was trying to echo only lines that match a specific formatting. I have this code so far:
LINE=1
while read -r CURRENT_LINE
do
if [[ $CURRENT_LINE == ??-?-??? ]]
then
echo "$LINE: $CURRENT_LINE"
fi
((LINE++))
done < "./new-1.txt"
The text file contains number sequences on each line that match the following format: "12-3-456", but also contains sequences that are in different formats as well, such as "123-89203-9420" or "123-456-7890". I can't quite understand why the if statement inside the while loop does not result to True on lines that match the formatting. I've tried using the * as well, but using it gives me incorrect results.
Here are the contents of the text file new-1.txt. I want the script to output "Line 1: 11-1-111", but it doesn't output anything.
11-1-111
222-22-2222
333-33-3333
444-444-4444
555-555-5555
In the regex parlance, the ? makes the character or selection optional, ie , a character/selection is allowed to occur at most one time but zero occurrences are also tolerated.
However, the == operation is not the regex matching operator. It is =~.
So changing your if clause to the below would do the job.
[[ $CURRENT_LINE =~ "^[0-9]{2}-[0-9]{1}-[0-9]{3}$" ]]
Here
The ^ specifies the beginning of regex and $ the end. So we have a tight coupling of the pattern to match
[0-9] denotes a range, here any number from zero to nine.
The {n} mandates that the preceding character/selection should match exactly n number of times
Note : You can also use a more verbose [[:digit:]] instead of [0-9]

Extracting a string from a substring in bash (yes, that way around)

I have a string of several words in bash called comp_line, which can have any number of spaces inside. For example:
"foo bar apple banana q xy"
And I have a zero-based index comp_point pointing to one character in that string, e.g. if comp_point is 4, it points to the first 'b' in 'bar'.
Based on the comp_point and comp_line alone, I want to extract the word being pointed to by the index, where the "word" is a sequence of letters, numbers, punctuation or any other non-whitespace character, surrounded by whitespace on either side (if the word is at the start or end of the string, or is the only word in the string, it should work the same way.)
The word I'm trying to extract will become cur (the current word)
Based on this, I've come up with a set of rules:
Read the current character curchar, the previous character prevchar, and the next character nextchar. Then:
If curchar is a graph character (non-whitespace), set cur to the letters before and after curchar (stopping until you reach a whitespace or string start/end on either side.)
Else, if prevchar is a graph character, set cur to the letters from the previous letter, backwards until the previous whitespace character/string start.
Else, if nextchar is a graph character, set cur to the letters from the next letter, forwards until the next whitespace character/string end.
If none of the above conditions are hit (meaning curchar, nextchar and prevchar are all whitespace characters,) set cur to "" (empty string)
I've written some code which I think achieves this. Rules 2, 3 and 4 are relatively straightforward, but rule 1 is the most difficult to implement - I've had to do some complicated string slicing. I'm not convinced that my solution is in any way ideal, and want to know if anyone knows of a better way to do this within bash only (not outsourcing to Python or another easier language.)
Tested on https://rextester.com/l/bash_online_compiler
#!/bin/bash
# GNU bash, version 4.4.20
comp_line="foo bar apple banana q xy"
comp_point=19
cur=""
curchar=${comp_line:$comp_point:1}
prevchar=${comp_line:$((comp_point - 1)):1}
nextchar=${comp_line:$((comp_point + 1)):1}
echo "<$prevchar> <$curchar> <$nextchar>"
if [[ $curchar =~ [[:graph:]] ]]; then
# Rule 1 - Extract current word
slice="${comp_line:$comp_point}"
endslice="${slice%% *}"
slice="${slice#"$endslice"}"
slice="${comp_line%"$slice"}"
cur="${slice##* }"
else
if [[ $prevchar =~ [[:graph:]] ]]; then
# Rule 2 - Extract previous word
slice="${comp_line::$comp_point}"
cur="${slice##* }"
else
if [[ $nextchar =~ [[:graph:]] ]]; then
# Rule 3 - Extract next word
slice="${comp_line:$comp_point+1}"
cur="${slice%% *}"
else
# Rule 4 - Set cur to empty string ""
cur=""
fi
fi
fi
echo "Cur: <$cur>"
The current example will return 'banana' as comp_point is set to 19.
I'm sure that there must be a neater way to do it that I hadn't thought of, or some trick that I've missed. Also it works so far, but I think there may be some edge cases I hadn't thought of. Can anyone advise if there's a better way to do it?
(The XY problem, if anyone asks)
I'm writing a tab completion script, and trying to emulate the functionality of COMP_WORDS and COMP_CWORD, using COMP_LINE and COMP_POINT. When a user is typing a command to tab complete, I want to work out which word they are trying to tab complete just based on the latter two variables. I don't want to outsource this code to Python because performance takes a big hit when Python is involved in tab complete.
Another way in bash without array.
#!/bin/bash
string="foo bar apple banana q xy"
wordAtIndex() {
local index=$1 string=$2 ret='' last first
if [ "${string:index:1}" != " " ] ; then
last="${string:index}"
first="${string:0:index}"
ret="${first##* }${last%% *}"
fi
echo "$ret"
}
for ((i=0; i < "${#string}"; ++i)); do
printf '%s <-- "%s"\n' "${string:i:1}" "$(wordAtIndex "$i" "$string")"
done
if anyone knows of a better way to do this within bash only
Use regexes. With ^.{4} you can skip the first four letters to navigate to index 4. With [[:graph:]]* you can match the rest of the word at that index. * is greedy and will match as many graphical characters as possible.
wordAtIndex() {
local index=$1 string=$2 left right indexFromRight
[[ "$string" =~ ^.{$index}([[:graph:]]*) ]]
right=${BASH_REMATCH[1]}
((indexFromRight=${#string}-index-1))
[[ "$string" =~ ([[:graph:]]*).{$indexFromRight}$ ]]
left=${BASH_REMATCH[1]}
echo "$left${right:1}"
}
And here is full test for your example:
string="foo bar apple banana q xy"
for ((i=0; i < "${#string}"; ++i)); do
printf '%s <-- "%s"\n' "${string:i:1}" "$(wordAtIndex "$i" "$string")"
done
This outputs the input string vertically on the left, and on each index extracts the word that index points to on the right.
f <-- "foo"
o <-- "foo"
o <-- "foo"
<-- ""
b <-- "bar"
a <-- "bar"
r <-- "bar"
<-- ""
<-- ""
<-- ""
a <-- "apple"
p <-- "apple"
p <-- "apple"
l <-- "apple"
e <-- "apple"
<-- ""
<-- ""
b <-- "banana"
a <-- "banana"
n <-- "banana"
a <-- "banana"
n <-- "banana"
a <-- "banana"
<-- ""
q <-- "q"
<-- ""
x <-- "xy"
y <-- "xy"

Bash: extract a part of a string, after a number

I have a few strings like this:
var1="string one=3423423 and something which i don't care"
var2="another bigger string=413145 and something which i don't care"
var3="the longest string ever=23442 and something which i don't care"
These strings are the output of a python script (which i am not allowed to touch), and I need a way to extract the 1st part of the string, right after the number. Basically, my outputs should be:
"string one=3423423"
"another bigger string=413145"
"the longest string ever=23442"
As you can see, i can't use positions, or stuff like that, because the number and the string length are not always the same. I assume i would need to use a regex or something, but i don't really understand regexes. Can you please help with a command or something which can do this?
grep -oP '^.*?=\d+' inputfile
string one=3423423
another bigger string=413145
the longest string ever=23442
Here -o flag will enable grep to print only matching part and -p will enable perl regex in grep. Here \d+ means one or more digit. So, ^.*?=\d+ means print from start of the line till you find last digit (first match).
You could use parameter expansion, for example:
var1="string one=3423423 and something which i don't care"
name=${var1%%=*}
value=${var1#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
# prints: string one=3423423
Explanation of ${var1%%=*}:
%% - remove the longest matching suffix
= - match =
* - match everything
Explanation of ${var1#*=}:
# - remove the shortest matching prefix
* - match everything
= - match =
Explanation of ${value%%[^0-9]*}:
%% - remove the longest matching suffix
[^0-9] - match any non-digit
* - match everything
To perform the same thing on more than one values easily,
you could wrap this logic into a function:
extract_and_print() {
local input=$1
local name=${input%%=*}
local value=${input#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
}
extract_and_print "$var1"
extract_and_print "$var2"
extract_and_print "$var3"
$ shopt -s extglob
$ echo "${var1%%+([^0-9])}"
string one=3423423
$ echo "${var2%%+([^0-9])}"
another bigger string=413145
$ echo "${var3%%+([^0-9])}"
the longest string ever=23442
+([^0-9]) is an extended pattern that matches one or more non-digits.
${var%%+([^0-9])} with %%pattern will remove the longest match of that pattern from the end of the variable value.
Refs: patterns, parameter substitution

How do you interpret ${VAR#*:*:*} in Bourne Shell

I am using Bourne Shell. Need to confirm if my understanding of following is correct?
$ echo $SHELL
/bin/bash
$ VAR="NJ:NY:PA" <-- declare an array with semicolon as separator?
$ echo ${VAR#*} <-- show entire array without separator?
NJ:NY:PA
$ echo ${VAR#*:*} <-- show array after first separator?
NY:PA
$ echo ${VAR#*:*:*} <-- show string after two separator
PA
${var#pattern} is a parameter expansion that expands to the value of $var with the shortest possible match for pattern removed from the front of the string.
Thus, ${VAR#*:} removes everything up and including to the first :; ${VAR#*:*:} removes everything up to and including the second :.
The trailing *s on the end of the expansions given in the question don't have any use, and should be avoided: There's no reason whatsoever to use ${var#*:*:*} instead of ${var#*:*:} -- since these match the smallest amount of text possible, and * is allowed to expand to 0 characters, the final * matches and removes nothing.
If what you really want is an array, you might consider using a real array instead.
# read contents of string VAR into an array of states
IFS=: read -r -a states <<<"$VAR"
echo "${states[0]}" # will echo NJ
echo "${states[1]}" # will echo NY
echo "${#states[#]}" # count states; will emit 3
...which also gives you the ability to write:
printf ' - %s\n' "${states[#]}" # put *all* state names into an argument list

How to split words in bash

Good evening, People
Currently I have an Array called inputArray which stores an input file 7 lines line by line. I have a word which is 70000($s0), how do I split the word so it is 70000 & ($s0) separate?
I looked at an answer which is on this website already but I couldn't understand it the answer I looked at was:
s='1000($s3)'
IFS='()' read a b <<< "$s"
echo -e "a=<$a>\nb=<$b>"
giving the output a=<1000> b=<$s3>
Let me give this a shot.
In certain circumstances, the shell will perform "word splitting", where a string of text is broken up into words. The word boundaries are defined by the IFS variable. The default value of IFS is: space, tab, newline. When a string is to be split into words, any sequence of this set of characters is removes to extract the words.
In your example, the set of characters that delimit words are ( and ). So the words in that string that are bounded by the IFS set of characters are 1000 and $s3
What is <<< "$s"? This is a here-string. It's used to send a string to some command's standard input. It's like doing
echo "$s" | read a b
except that form doesn't work as expected in bash. read a b <<< "$s" works well.
Now, what are the circumstances where word splitting occurs? One is when a variable is unquoted. A demo:
IFS='()'
echo "$s" | wc # 1 line, 1 word and 10 characters
echo $s | wc # 1 line, 2 words and 9 characters
The read command also splits a string into words, in order to assign words to the named variables. The variable a gets the first word, and b gets all the rest.
The command, broken down is:
IFS='()' read a b <<< "$s"
# ^^^^^^^ 1
# ^^^^^^^^ 2
# ^^^^^^^^ 3
only for the duration of the read command, assign the variable IFS the value ()
send the string "$s" to read's stdin
from stdin, use $IFS to split the input into words: assign the first word to variable a and the rest of the string to variable b. Trailing characters from $IFS at the end of the string are discarded.
Documentation:
Word splitting
Here strings
Simple command execution, describing why this assignment of IFS is only in effect for the duration of the read command.
read command
Hope that helps.

Resources