BASH script matching a glob at the begining of a string - string

I have folders in a directory with names giving specific information. For example:
[allied]_remarkable_points_[treatment]
[nexus]_advisory_plans_[inspection]
....
So I have a structure similar to this: [company]_title_[topic]. The script has to match the file naming structure to variables in a script in order to extract the information:
COMPANY='[allied]';
TITLE='remarkable points'
TOPIC='[treatment]'
The folders do not contain a constant number of characters, so I can't use indexed matching in the script. I managed to extract $TITLE and $TOPIC, but I can't manage to match the first string since the variable brings me back the complete folders name.
FOLDERNAME=${PWD##*/}
This is the line is giving me grief:
COMPANY=`expr $FOLDERNAME : '\(\[.*\]\)'`
I tried to avoid the greedy behaviour by placing ? in the regular expression:
COMPANY=`expr $FOLDERNAME : '\(\[.*?\]\)'`
but as soon as I do that, it returns nothing
Any ideas?

expr isn't needed for regular-expression matching in bash.
[[ $FOLDERNAME =~ (\[[^]]*\]) ]] && COMPANY=${BASH_REMATCH[1]}
Use [^]]* instead of .* to do a non-greedy match of the bracketed portion. An bigger regular expression can capture all three parts:
[[ $FOLDERNAME =~ (\[[^]]*\])_([^_]*)_(\[[^]]*\]) ]] && {
COMPANY=${BASH_REMATCH[1]}
TITLE=${BASH_REMATCH[2]}
TOPIC=${BASH_REMATCH[3]}
}

Bash has built-in string manipulation functionality.
for f in *; do
company=${f%%\]*}
company=${company#\[} # strip off leading [
topic=${f##\[}
topic=${f%\]} # strip off trailing ]
:
done
The construct ${variable#wildcard} removes any prefix matching wildcard from the value of variable and returns the resulting string. Doubling the # obtains the longest possible wildcard match instead of the shortest. Using % selects suffix instead of prefix substitution.
If for some reason you do want to use expr, the reason your non-greedy regex attempt doesn't work is that this syntax is significantly newer than anything related to expr. In fact, if you are using Bash, you should probably not be using expr at all, as Bash provides superior built-in features for every use case where expr made sense, once in the distant past when the sh shell did not have built-in regex matching and arithmetic.
Fortunately, though, it's not hard to get non-greedy matching in this isolated case. Just change the regex to not match on square brackets.
COMPANY=`expr "$FOLDERNAME" : '\(\[[^][]*\]\)'`
(The closing square bracket needs to come first within the negated character class; in any other position, a closing square bracket closes the character class. Many newbies expect to be able to use backslash escapes for this, but that's not how it works. Notice also the addition of double quotes around the variable.)

If you're not adverse to using grep, then:
COMPANY=$(grep -Po "^\[.*?\]" $FOLDERNAME)

Related

what does the condition "\>" mean in an if statement in bash?

Hi I can't seem to find this anywhere. What does this forward slash> mean in an if statement in bash?
eg:
if [ "$count" \> 0 ]; then
echo hello
else
echo goodbye
fi
The characters < and > (among others) are special in shell scripts: they perform input/output redirection. So when you're trying to use them in a test expression like this, with their "normal meaning" of "less than" and "greater than", you need to escape them, preventing them from being treated specially by the shell.
It's similar to the way you might write
cat file\ name
to cat a file with a space in its name, or
cat it\'s
to cat a file with a single quote character in its name. (That is, normally, the space character is "special" in that it separates arguments, and the single quote character is special in that it quotes things, so to allow these characters to actually be used as part of a file name, you have to quote them, in this case using \ to turn off their special meaning.)
Quoting ends up being complicated in the Unix/Linux shells, because there are typically three different quote characters: ", ', and \. They all work differently, and the rules (while they mostly make sense) are complicated enough that I'm not going to repeat them here.
Another way to write this, avoiding the ugly quoting, would be
if [ "$count" -gt 0 ]
And it turns out this is preferable for another reason. If you use -gt, it will do the comparison based on the numeric values of $count and 0, which is presumably what you want here. If you use > (that is, \>), on the other hand, it will perform a string comparison. In this case, you'd probably get the same result, but in general, doing a string comparison when you meant to compare numerically can give crazy results. See Charles Duffy's comment for an example.
This \>means GREATER THAN for STRINGS in ASCII ALPHABET
BASH has diferent meaning for backslash symbol \
in your example \> is a symbol, not two things like \ + >
For numbers you use > BUT ALSO take in consideration the [ ] for arithmetic operations you would use (( ))
A constructor in bash is with this symbol [[ ]] it's used to compare with these symbols inside double brackets
&&, ||, <, and >
IF YOU USE single brackets [ ] with THE ABOVE symbols, YOU GET ERRORS IN BASH SCRIPT, you need the double brackets.
Other Meanings of BACKSLASH
As mentioned before, in bash, you use forward slash / for PATH /home/user/Desktop/file1.txt but if your file NAME has spaces, you need to ESCAPE the blank space, and you do it with backslash \, example given
file name with spaces.txt
/home/alex/Desktop/file\ name\ with\ spaces.txt
file name with symbols -> [Stackoverflow]fileName.txt
/home/alex/Desktop/\[Stackoverflow\]fileName.txt
You need to escape each BRACKET [ ] LIKE this \[ \]
Resources
Bash Constructors |
Bash operators |
Ascii Table

concatenating variables inside linux commands in perl

I need to use a system command (grep) which has a variable concatenated with a string as the regex to search a file.
Is it possible to concatenate a regex for grep between a variable and string in Perl??
I have tried using the . operator but it doesn't work.
if(`grep -e "$fubname._early_exit_indicator = 1" $golden_path/early_exit_information.tsv`){
print "-D- The golden data indicates early exit should occur\n";
$golden_early_exit_indicator=1;
}
Expected to match the regex, "$fubname._early_exit_indicator = 1" but it doesn't match as required.
The expected result should be:
-D- The golden data indicates early exit should occur
But in the present code, it doesn't print this.
Output link: (https://drive.google.com/open?id=1N0SaZ-r3bYPlljKUgTOH5AbxCAaHw7zD)
The problem is that the . operator is not recognized as an operator inside quotes. Dot operators are use between strings, not inside strings. Using the dot inside a string, inserts it literally. This literal dot in the pattern, causes the grep command in your code to fail.
Also note that inside quotes, Perl tries to interpolates variable using certain identifier parsing rules.
See perldoc perlop for the different types of quoting that are used in Perl, and see perldoc perldata for information about the identifier parsing rules.
In summary, in order to interpolate the variable $fubname in the backticks argument, use
"${fubname}_early_exit_indicator = 1"
Note that we need braces around the identifier, since the following underscore is a valid identifier character. (To the contrary a literal dot is not a valid identifier character, so if following character was a literal dot, you would not need the braces around the identifier.)
The . operator will not work inside the quotes. use something like this-
if(`grep -e "${fubname}_early_exit_indicator = 1" ...
I hope this works

Bash split an array, add a variable and concatenate it back together

I've been trying to figure this out, unfortunately I can't. I am trying to create a function that finds the ';' character, puts four spaces before it and then and puts the code back together in a neat sentence. I've been cracking at this for a bit, and can't figure out a couple of things. I can't get the output to display what I want it to. I've tried finding the index of the ';' character and it seems I'm going about it the wrong way. The other mistake that I seem to be making is that I'm trying to split in a array in a for loop, and then split the individual words in the array by letter but I can't figure out how to do that either. If someone can give me a pointer this would be greatly appreciated. This is in bash version 4.3.48
#!commentPlacer()
{
arg=($1) #argument
len=${#arg[#]} #length of the argument
comment=; #character to look for in second loop
commaIndex=(${arg[#]#;}) #the attempted index look up
commentSpace=" ;" #the variable being concatenated into the array
for(( count1=0; count1 <= ${#arg[#]}; count1++ )) #search the argument looking for comment space
do if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
then for (( count2=0; count2 < ${#arg[count1]} ; count2++ )) #loop through again
do if [[ ${arg[count2]} != comment ]] #if no comment
then A=(${arg[#]:0:commaIndex})
A+=(commentSpace)
A+=(${arg[#]commaIndex:-1}) #concatenate array
echo "$A"
fi
done
fi
done
}
If I understand what you want correctly, it's basically to put 4 spaces in front of each ";" in the argument, and print the result. This is actually simple to do in bash with a string substitution:
commentPlacer() {
echo "${1//;/ ;}"
}
The expansion here has the format ${variable//pattern/replacement}, and it gives the contents of the variable, with each occurrence of pattern replaced by replacement. Note that with only a single / before the pattern, it would replace only the first occurrence.
Now, I'm not sure I understand how your script is supposed to work, but I see several things that clearly aren't doing what you expect them to do. Here's a quick summary of the problems I see:
arg=($1) #argument
This doesn't create an array of characters from the first argument. var=(...) treats the thing in ( ) as a list of words, not characters. Since $1 isn't in double-quotes, it'll be split into words based on whitespace (generally spaces, tabs, and linefeeds), and then any of those words that contain wildcards will be expanded to a list of matching filenames. I'm pretty sure this isn't at all what you want (in fact, it's almost never what you want, so variable references should almost always be double-quoted to prevent it). Creating a character array in bash isn't easy, and in general isn't something you want to do. You can access individual characters in a string variable with ${var:index:1}, where index is the character you want (counting from 0).
commaIndex=(${arg[#]#;}) #the attempted index look up
This doesn't do a lookup. The substitution ${var#pattern} gives the value of var with pattern removed from the front (if it matches). If there are multiple possible matches, it uses the shortest one. The variant ${var##pattern} uses the longest possible match. With ${array[#]#pattern}, it'll try to remove the pattern from each element -- and since it's not in double-quotes, the result of that gets word-split and wildcard-expanded as usual. I'm pretty sure this isn't at all what you want.
if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
Here (and in a number of other places), you're using a variable without $ in front; this doesn't use the variable at all, it just treats "commentSpace" as a static string. Also, in several places it's important to have double-quotes around it, e.g. to keep the spaces in $commentSpace from vanishing due to word splitting. There are some places where it's safe to leave the double-quotes off, but in general it's too hard to keep track of them, so just use double-quotes everywhere.
General suggestions: don't try to write c (or java or whatever) programs in bash; it works too differently, and you have to think differently. Use shellcheck.net to spot common problems (like non-double-quoted variable references). Finally, you can see what bash is doing by putting set -x before a section that doesn't do what you expect; that'll make bash print each line as it executes it, showing the equivalent of what it's executing.
Make a little function using pattern substitution on stdin:
semicolon4s() { while read x; do echo "${x//;/ ;}"; done; }
semicolon4s <<< 'foo;bar;baz'
Output:
foo ;bar ;baz

I don't understand this parameter expansion: ${p//[0-9]/}

In Linux /etc/init.d/functions script I found the following parameter expansions that I don't quite understand:
${p//[0-9]/} replace all instances of any number to/by what?
${1##[-+]} This seems to remove all the longest left instances of minuses and pluses?
${LSB:-} This seems to say that if LSB is not set then set nothing? in other words do nothing?
These are instances of bash Shell Parameter Expansion;
see http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
Note: ksh and zsh support the expansions in your question, too (I'm unclear on the full extent of the overlap in functionality), whereas sh (POSIX-features-only shells), does NOT support the string-replacement expansion, ${p//[0-9]/}.
${p//[0-9]/}
Removes all digits: replaces all (//) instances of digits ([0-9]) with an empty string - i.e., it removes all digits (what comes after the last / is the replacement string, which is empty in this case).
${1##[-+]}
Strips a single leading - or +, if present: Technically, this removes the longest prefix (##) composed of a single - or + character from parameter $1. Given that the search pattern matches just a single character, there is no need to use ## for the longest prefix here, and # - for the shortest prefix - would do.
${LSB:-}
A no-op designed to prevent the script from breaking when run with the -u (nounset) shell attribute: Technically, this expansion means: In case variable $LSB is either not set or empty, it is to be replaced with the string following :-, which, in this case, is also empty.
While this may seem pointless at first glance, it has its purpose, as Sigi points out:
"
The ${LSB:-} construct makes perfect sense if the shell is invoked with the -u option (or set -u is used), and the variable $LSB might actually be unset. You then avoid the shell bailing out if you reference $LSB as ${LSB:-} instead. Since it's good practice to use set -u in complex scripts, this move comes in handy quite often.
"
${p//[0-9]/} # removes digits from anywhere in `$p`
${1##[-+]} # removes + or - from start in $1
${LSB:-} # not really doing anything

What does "if ($line =~ m/($rr)(.+>)(\d.\d+)" mean?

I am looking through a script that a former employee wrote, and came across this. I am very confused as to what it means. It is a condition of an if loop that runs through a file, and I know what the $rr variable is but everything after that I have no idea what it means... obviously googling "\d" returns nothing pertinent... what is the ".+>" mean too?
if ($line =~ m/($rr)(.+>)(\d.\d+)</) {
I have used the x modifier to make the pattern descriptive:
$line =~ m/
( $rr ) # Match and capture the value of $rr
( .+ > ) # Match and capture everything till the last >
( # Capture the following matches
\d # Match a single digit
. # Match any character a single time
\d+ # Match one or more digits
)
/x;
There are three captures in the above pattern. These captures can be accessed using the special variables $1, $2 and $3.
References
Perl regular expressions tutorial
Perl regular expressions
This is about regular expressions.
if ($line =~ m/($rr)(.+>)(\d.\d+)
$line is a variable.
The =~ means does it match this pattern?
The pattern follows. It's something like m/ then the variable $rr, then . (a single character), + (matches previous character multiple times). The > I'm not sure. The \d means a digit (i.e. 0 through 9).
Reads up on pattern matching and regular expressions here: http://en.wikipedia.org/wiki/Regular_expression
Regular expressions are similar in many languages such as Perl, Ruby, etc.
Check out most of your string here (ruby): http://rubular.com/r/OTe4jFN545
If line matches the regular expression starting with $rr variable followed by atleast one of any character followed by atleast two digit.
how ever im not sure but it seams a paranthesis is missing.
I would try to go here to match the regex
http://www.perlfect.com/articles/regextutor.shtml
the regex is m/($rr)(.+>)(\d.\d+ but this seams wrong /($rr)(.+>)(\d.\d+)/ seams better.
the regex also has capture groups that can be accessed within the if statement with
$_[0] .. $_[2]

Resources