Bash Script - String Split Paragraph Into Sentences - string

I'm trying to write a prepare-commit-msg git hook script to check the contents of the last 10 commit messages and check to see if the message that you are attempting to enter is unique and prevent the user from checking in (without the --no-verify overload) if it detects it. When I run this command line in Git I get the following output.
dacke#MachineName /c/Development/Project (tests)
$ git log --pretty=format:'%h|%an|%s' --max-count=10
2919dc2|Eric|Test Message
4ef580c|Eric|Test Message
1a0051b|Eric|Test Message
3e2df42|Eric|Test Commit
a08d4c1|Bob|DE6717 - What I did to fix this defect
aff8afc|Bob|DE6717 - Here is some more defect info
bbbfb67|Ralph|Merge branch 'clean_up' into develop
72d0968|Ralph|Forgot to remove deleted class from the project.
bfd1505|Ralph|Clean up.
d21c6dc|Bruce|Merge branch 'Icons' into develop
My prepare-commit-msg is written like so.
1 #!/bin/bash
2
3 printf "Prepare-Commit-Msg Hook Running...\n"
4
5 #$1 = "Commit Message File 'COMMIT_EDITMSG'"
6 #$2 = "message"
7 commitMessage=$(cat "$1")
8
9 # Prevent people putting in the same commit message multiple times by looking for an identical message in the last 10 commits
10 declare -a last10CommitMessages
11 rawMessages=$(git log --pretty=format:'%h|%an|%s«' --max-count=10)
12 printf "Raw Messages Length: %d\n" "${#rawMessages[#]}"
13 for line in ${rawMessages//«/ };
14 do
15 #printf "%s\n" $line
16 last10CommitMessages+=($line);
17 done
18 printf "Last 10 Commit Length: %d\n" "${#last10CommitMessages[#]}"
19
20 # Temp exit 1 to prevent commit during testing
21 exit 1
When I try to run the "commit" I get the following output.
Raw Messages Length: 1
Last 10 Commit Length: 63
If I uncomment line 15 I can see that for every space and line break I'm getting an item added to the array. On top of that the character that I actually wanted to split the lines on is added to the end which means that I would need yet another method to take this off the end.
I am new to bash scripting and I'm coming from a C# / Windows background so I still learning. Can someone please provide me a simple solution to the problem? More important to me than a quick answer is an answer that can explain HOW this actually works. I've found a lot of conflicting information that does not work for me on the web. I plan on writing a blog piece about this after I get it all figured out so it's important that I don't get any "It just works" as an answer. Thanks.

Simply change
rawMessages=$(git log --pretty=format:'%h|%an|%s«' --max-count=10)
to this
rawMessages=($(git log --pretty=format:'%h|%an|%s«' --max-count=10))
$( ) evaluates the command inside and saves it as one string, ignoring line breaks. When you wrap something with ( ), it evaluates the contents as an array.
EDIT:
If you do this you will see you have way more elements in the array than you wanted. This is because the array will split the string by new line character and white space. To ignore white space you can do as hlovdal suggested and do this..
OLD_IFS="$IFS"
IFS=$'\n'
rawMessages=($(git log --pretty=format:'%h|%an|%s«' --max-count=10))
IFS="$OLD_IFS"

The words are split due to the IFS variable (Internal Field Separator - an ancient unix relic...) which has default value "<space><tab><newline>". Change your loop to
oldIFS=$IFS
IFS=«
for line in ${rawMessages}
do
printf "%s\n" $line
last10CommitMessages+=($line);
done
IFS=$oldIFS

Related

grep empty output file

I made a shell script the purpose of which is to find files that don't contain a particular string, then display the first line that isn't empty or otherwise useless. My script works well in the console, but for some reason when I try to direct the output to a .txt file, it comes out empty.
Here's my script:
#!/bin/bash
# takes user input.
echo "Input substance:"
read substance
echo "Listing media without $substance:"
cd media
# finds names of files that don't feature the substance given, then puts them inside an array.
searchresult=($(grep -L "$substance" *))
# iterates the array and prints the first line of each - contains both the number and the medium name.
# however, some files start with "Microorganisms" and the actual number and name feature after several empty lines
# the script checks for that occurence - and prints the first line that doesnt match these criteria.
for i in "${searchresult[#]}"
do
grep -m 1 -v "Microorganisms\|^$" $i
done >> output.txt
I've tried moving the >>output.txt to right after the grep line inside the loop, tried switching >> to > and 2>&1, tried using tee. No go.
I'm honestly feeling utterly stuck as to what the issue could be. I'm sure there's something I'm missing, but I'm nowhere near good enough with this to notice. I would very much appreciate any help.
EDIT: Added files to better illustrate what I'm working with. Sample inputs I tried: Glucose, Yeast extract, Agar. Link to files [140kB] - the folder was unzipped beforehand.
The script was given full permissions to execute. I don't think the output is being rewritten because even if I don't iterate and just run a single line of the loop, the file is empty.

Change file's name using command line arguments Bash [duplicate]

This question already has answers here:
Change file's numbers Bash
(2 answers)
Closed 2 years ago.
I need to implement a script (duplq.sh) that would rename all the text files existing in the current directory using the command line arguments. So if the command duplq.sh pic 0 3 was executed, it would do the following transformation:
pic0.txt will have to be renamed pic3.txt
pic1.txt to pic4.txt
pic2.txt to pic5.txt
pic3.txt to pic6.txt
etc…
So the first argument is always the name of a file the second and the third always a positive digit.
I also need to make sure that when I execute my script, the first renaming (pic0.txt to pic3.txt), does not erase the existing pic3.txt file in the current directory.
Here's what i did so far :
#!/bin/bash
name="$1"
i="$2"
j="$3"
for file in $name*
do
echo $file
find /var/log -name 'name[$i]' | sed -e 's/$i/$j/g'
i=$(($i+1))
j=$(($j+1))
done
But the find command does not seem to work. Do you have other solutions ?
The problem you're trying to solve is actually somewhat tricky, and I don't think you've fully thought it through. For instance, what's the difference between duplq.sh pic 0 3 and duplq.sh pic 2 5 -- it looks like both should just add 3 to the number, or would the second skip "pic0.txt" and "pic1.txt"? What effect would either one have on files named "pic", "pic.txt", "picture.txt", "picture2.txt", "pic2-2.txt", or "pic999.txt".
There are also a bunch of basic mistakes in the script you have so far:
You should (almost) always put variable references in double-qotes, to avoid unexpected word-splitting and wildcard expansion. So, for example, use echo "$file" instead of echo $file. In for file in $name*, you should put double-quotes around the variable but not the *, because you want that to be treated as a wildcard. Hence, the correct version is for file in "$name"*
Don't put variable references in single-quotes, they aren't expanded there. So in the find and sed commands, you aren't passing the variables' values, you're passing literal dollar signs followed by letters. Again, use double-quotes. Also, you don't have a "$" before "name", so it won't be treated as a variable even in double-quotes.
But the find and sed commands don't do what you want anyway. Consider find /var/log -name "name[1]" -- that looks for files named "name1", not "name1" + some extension. And it looks in the current directory and all subdirectories, which I'm pretty sure you don't want. And the "1" ("$i") may not be the number in the current filename. Suppose there are files named "pic0.jpg", "pic0.png", and "pic0.txt" -- on the first iteration, the loop might find all three with a pattern like "pic0*", then on the second and third iterations try to find "pic1*" and "pic2*, which don't exist. On the other hand, suppose there are files named "pic0.txt", "pic5.txt", and "pic8.txt" -- again, it might look for "pic0*" (ok), then "pic1*" (not found), and then "pic2*" (ditto).
Also, if you get to multi-digit numbers, the pattern "name[10]" will match "file0" and "file1", but not "file10". I don't know why you added the brackets there, but they don't do anything you'd want.
You already have the files being listed one at a time in the $file variable, searching again with different criteria just adds confusion.
Also, at no point in the script do you actually rename anything. The find | sed line will (if it works) print the new name for the file, but not actually rename it.
BTW, when you do use the mv command, use either mv -n or mv -i to keep it from silently and irretrievably overwriting files if/when a name conflict occurs.
To prevent overwriting when incrementing file numbers, you need to do the renames in reverse numeric order (i.e. rename "pic3.txt" to "pic6.txt" before renaming "pic0.txt" to "pic3.txt"). This is especially tricky because if you just sort filenames in reverse alphabetic order, you'll get "pic7.txt" before "pic10.txt". But you can't do a numeric sort without removing the "pic" and ".txt" parts first.
IMO this is actually the trickiest problem to be solved in order to get this script to work right. It might be simplest to specify the largest index number as one of the arguments, and have it start there and count down to 0 (looping over numbers rather than files), and then for each number iterate over matching files (e.g. "pic0.jpg", "pic0.png", and "pic0.txt").
So I assume that 0 3 is just a measurement for the difference of old num and new num and equivalent to 1 4 or 100 103.
To avoid overwriting existing files, create a new temp dir, move all affected files there, and move all of them back in the end.
#/bin/bash
#
# duplq.sh pic 0 3
base="$1"
delta=$(( $3 - $2 ))
# echo delta $delta
target=$(mktemp -d)
echo $target
# /tmp/tmp.7uXD2GzqAb
add () {
f="$1"
b="$2"
d=$3
num=${f#./${b}}
# echo -e "file: $f \tnum: $num \tnum + d: $((num + d))" ;
echo -e "$((num + d))" ;
}
for f in $(find -maxdepth 1 -type f -regex ".*/${base}[0-9]+")
do
newnum=$(add "$f" "${base}" $delta)
echo mv "$f" "$target/${base}$newnum"
done
# exit
echo mv $target/${base}* .
First I tried to just use bash syntax, to check, whether removal of the prefix (pic) results in just digits remaining. I also didn't use the extension .txt - this is left as an exercise for the reader. From the question it is unclear - it is never explicitly told, that all files share the same extension, but all files in the example do.
With the -regex ".*/${base}[0-9]+") in find, the values are guaranteed to be just digits.
num=${f#./${b}}
removes from file f the base ("pic"). Delta d is added.
Instead of really moving, I just echoed the mv-command.
#TODO: Implement the file name extension conservation.
And 2 other pitfalls came to my mind: If you have 3 files pic0, pic00 and pic000 they all will be renamed to pic3. And pic08 will be cut into pic and 08, 08 will then be tried to be read as octal number (or 09 or 012129 and so on) and lead to an error.
One way to solve this issue is, that you prepend the extracted number (001 or 018) with a "1", then add 3, and remove the leading 1:
001 1001 1004 004
018 1018 1021 021
but this clever solution leads to new problems:
999 1999 2002 002?
So a leading 1 has to be cut off, a leading 2 has to be reduced by 1. But now, if the delta is bigger, let's say 300:
018 1018 1318 318
918 1918 2218 1218
Well - that seems to be working.

Powershell script to parse a log file and then append to a file

I am new to Shellscripting.I am working on a poc in which a script should read a log file and then append to a existing file for the purpose of alert.It should work as per below
There will be some predefined format according to which it will decide whether to append in file or not.For example:
WWXXX9999XS message
**XXX** - is a 3 letter acronym (application code) like for **tom** for tomcat application
9999 - is a 4 numeric digit in the range 1001-1999
**E or X** - For notification X ,If open/active alerts already existing for same error code and same message,new alerts will not be raised for existing one.Once you have closed existing alerts,it will raise alarm for new error.There is any change in message for same error code from existing one, it will raise a alarm even though open/active alerts present.
X option is only for drop duplicates on code and message otherwise all alert mechanisms are same.
**S** - is the severity level, I.e 2,3
**message** - is any text that will be displayed
The script will examine the log file, and look for error like cloud server is down,then it would append 'wwclo1002X2 cloud server is down'if its a new alert.
2.If the same alert is coming again,then it should append 'wwclo1002E2 cloud server is down
There are some very handy commands you can use to do this type of File manipulation. I've updated this in response to your comment to allow functionality that will check if the error has already been appended to the new file.
My suggestion would be that there is enough functionality here to warrant saving it in a bash script.
My approach would be to use a combination of less, grep and > to read and parse the file and then append to the new file. First save the following into a bash script (e.g. a file named script.sh)
#!/bin/bash
result=$(less $1 | grep $2)
exists=$(less $3 | grep $2)
if [[ "$exists" == "$result" ]]; then
echo "error, already present in file"
exit 1
else
echo $result >> $3
exit 0
fi
Then use this file in the command passing in the log file as the first argument, the string to search for as the second argument and the target results file as the third argument like this:
./script.sh <logFileName> "errorToSearchFor" <resultsTargetFileName>
Don't forget to run the file you will need to change the permissions - you can do this using:
chmod u+x script.sh
Just to clarify as you have mentioned you are new to scripting - the less command will output the entire file, the | command (an unnamed pipe) will pass this output to the grep command which will then search the file for the expression in quotes and return all lines from the file containing that expression. The output of the grep command is then appended to the new file with >>.
You may need to tailor the expression in quotes after grep to get exactly the output you want from the log file.
The filenames are just placeholders, be sure to update these with the correct file names. Hope this helps!
Note updated > to >> (single angle bracket overwrites, double angle bracket appends

Concatenating string read from file with string literals creates jumbled output

My problem is that the result is jumbled. Consider this script:
#!/bin/bash
INPUT="filelist.txt"
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done <<< $'1\n2\n3\n4'
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done < "$INPUT"
filelist.txt
5
8
15
67
...
The first loop, with the immediate input (through something I believe is called a herestring (the <<< operator) gives the expected output
HELLO1WORLD
HELLO2WORLD
HELLO3WORLD
HELLO4WORLD
The second loop, which reads from the file, gives the following jumbled output:
WORLD5
WORLD8
WORLD15
WORLD67
I've tried echo $label: This works as expected in both cases, but the concatenation fails in the second case as described. Further, the exact same code works on my Win 7, git-bash environment. This issue is on OSX 10.7 Lion.
How to concatenate strings in bash |
Bash variables concatenation |
concat string in a shell script
Well, just as I was about to hit post, the solution hit me. Sharing here so someone else can find it - it took me 3 hours to debug this (despite being on SO for almost all that time) so I see value in addressing this specific (common) use case.
The problem is that filelist.txt was created in Windows. This means it has CRLF line endings, while OSX (like other Unix-like environments) expects LF only line endings. (See more here: Difference between CR LF, LF and CR line break types?)
I used the answer here to convert the file before consumption. Using sed I managed to replace only the final line's carriage return, so I stuck to known guns and went for the perl approach. Final script is below:
#!/bin/bash
INPUTFILE="filelist.txt"
INPUT=$(perl -pe 's/\r\n|\n|\r/\n/g' "$INPUTFILE")
i=0;
while read label
do
i=$[$i+1]
echo "HELLO${label}WORLD"
done <<< $'INPUT'
Question has been asked in a different form at Bash: Concatenating strings fails when read from certain files

bash syntax error when using case statement

I have bash script that I use regularly in my job to automate a large job. I was making some changes today, but everything seemed fine. The script itself is about 1700 lines long. The first part of the script is all good and runs through all the user input and logic just fine. It then proceeds into the core of the script and stops working at exactly line 875 (tested the script with bash -x to find the break point). However, it breaks with the following error:
script.sh: line 1341: syntax error near unexpected token `;;'
script.sh: line 1341: ` ;;'
Line 1341 is in the middle of a case statement. The following code is the beginning of that block of code where it is breaking:
if [[ $VAR1 = "TRUE" && $VAR2 = "VAL2" ]]; then
VERSION=`XXXXXXXXXXXXXXXX`
## Set variables based on location $VAR3
case $VAR3 in
STR1 )
case $VERSION in
STR2 )
VAR4 = "STR5"
VAR5 = "STR6"
VAR6 = "STR7"
VAR7 = "STR8"
Line 1341 ---> ;;
STR3 )
VAR4="STR9"
VAR5="STR10"
VAR6="STR11"
VAR7="STR12"
;;
STR4 )
VAR4="STR13"
VAR5="STR14"
VAR6="STR15"
VAR7="STR16"
;;
esac
VAR8="STR17"
VAR9="STR18"
VAR10=1
VAR11="STR19"
;;
Because of the sensitive nature of what I do, I obviously had to remove quite a bit of information. I know this may make things more difficult to help me with. However, all VAR##="STR##" are standard variable declarations with string values, nothing special (no variable substitution, etc). All the variables are used later in the script. The code for VERSION returns a string value, which is used in the nested case.
The script was working fine up until my changes today, but I really didn't touch this section, with the exception of tweaking some of the STR values. I tried setting $VAR3 and $VERSION variables in quotes "", as well as the STR values used as the cases. I tried taking out this block entirely, only to have it fail on the next block (STR1 has a different value thus change the variable declarations). I have it output to the console what it is doing as well as checks for errors after most functions. There is nothing out of the ordinary on the console and nothing in the error log.
Any help would be appreciated, and I know I'm asking a lot.
By the way here is the code around line 875 where the script stops running (no errors generated based on the code here). Again, with bash -x I could see the VAR2 variable get set, but the script breaks before the next for loop starts.
## Create file ##
echo 'Creating files . . . '
j=0
p=1111
if [ $VAR1 = "TRUE" ]
then
VAR2=1
else
VAR2=2
fi
for i in `seq 1 $HOWMANY`; do <----Line 875
echo -n "Creating file . . . "
echo "XXXXXXXXXXX
Thanks again.
The problem is likely somewhere between line 875 (or a bit earlier) and line 1341. It maybe a misplaced quote or something less subtle. It will be essentially impossible for us to debug without all the original material between those lines.
Suggestion 1: run with 'bash -n -v' and see whether that gives you any insight into the problem.
Suggestion 2: split the script into smaller pieces that are more easily managed - and that can be separately debugged. The biggest scripts I have (out of 400 in my bin directory) are from the autoconf suite - they weigh in at just under 1100 lines; the next biggest is mine, and the 750 line script is too d..n big. The next biggest scripts are between 600 and 700 lines of Perl (including Perl documentation).
Having said 'missing quote', I see that your fragment close to line 875 has:
echo -n "Creating file . . . "
echo "XXXXXXXXXXX
with a missing close double quote from the second echo.
You also mentioned making changes, albeit not close to the point where the script breaks. Since you have the code under version control (you wouldn't dream of playing with a 1700 line script without backups, would you?), you should look at the actual changes again.
Or even back up to the previous working version, and make the changes again, one at a time, carefully, until you see why you broke something.
You have spaces around your equal signs in this section:
case $VERSION in
STR2 )
VAR4 = "STR5"
VAR5 = "STR6"
VAR6 = "STR7"
VAR7 = "STR8"
Take those out and you may be OK (unless that's a posting error).

Resources