Bash reading txt file and storing in array

Bash reading txt file and storing in array - linux

I'm writing my first Bash script, I have some experience with C and C# so I think the logic of the program is correct, it's just the syntax is so complicated because apparently there are many different ways to write the same thing!
Here is the script, it simply checks if the argument (string) is contained in a certain file. If so it stores each line of the file in an array and writes an item of the array in a file. I'm sure there must be easier ways to achieve that but I want to do some practice with bash loops
#!/bin/bash
NOME=$1
c=0
#IF NAME IS FOUND IN THE PHONEBOOK THEN STORE EACH LINE OF THE FILE INTO ARRAY
#ONCE THE ARRAY IS DONE GET THE INDEX OF MATCHING NAME AND RETURN ARRAY[INDEX+1]
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY"
while read line
do
myArray[$c]=$line # store line
c=$(expr $c + 1) # increase counter by 1
done < /root/phonebook.txt
else
echo "Name not found"
fi
c=0
for i in myArray;
do
if myArray[$i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
fi
done
This code returns the only the second item of myArray (myArray[2]) or the second line of the file, why?

The first part (where you build the array) looks ok, but the second part has a couple of serious errors:
for i in myArray; -- this executes the loop once, with $i set to "myArray". In this case, you want $i to iterate over the indexes of myArray, so you need to use
for i in "${!myArray[#]}"
or
for ((i=0; i<${#a[#]}; i++))
(although I generally prefer the first, since it'll work with noncontiguous and associative arrays).
Also, you don't need the ; unless do is on the same line (in shell, ; is mostly equivalent to a line break so having a semicolon at the end of a line is redundant).
if myArray[$i]="$NOME" ; then -- the if statement takes a command, and will therefore treat myArray[$i]="$NOME" as an assignment command, which is not at all what you wanted. In order to compare strings, you could use the test command or its synonym [
if [ "${myArray[i]}" = "$NOME" ]; then
or a bash conditional expression
if [[ "${myArray[i]}" = "$NOME" ]]; then
The two are very similar, but the conditional expression has much cleaner syntax (e.g. in a test command, > redirects output, while \> is a string comparison; in [[ ]] a plain > is a comparison).
In either case, you need to use an appropriate $ expression for myArray, or it'll be interpreted as a literal. On the other hand, you don't need a $ before the i in "${myArray[i]}" because it's in a numeric expression context and therefore will be expanded automatically.
Finally, note that the spaces between elements are absolutely required -- in shell, spaces are very important delimiters, not just there for readability like they usually are in c.

1.-This is what you wrote with small adjustments
#!/bin/bash
NOME=$1
#IF NAME IS FOUND IN THE PHONE-BOOK **THEN** READ THE PHONE BOOK LINES INTO AN ARRAY VARIABLE
#ONCE THE ARRAY IS COMPLETED, GET THE INDEX OF MATCHING LINE AND RETURN ARRAY[INDEX+1]
c=0
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY...."
IFS= while read -r line #IFS= in case you want to preserve leading and trailing spaces
do
myArray[c]=$line # put line in the array
c=$((c+1)) # increase counter by 1
done < /root/phonebook.txt
for i in ${!myArray[#]}; do
if myArray[i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
fi
done
else
echo "Name not found"
fi
2.-But you can also read the array and stop looping like this:
#!/bin/bash
NOME=$1
c=0
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY...."
readarray myArray < /root/phonebook.txt
for i in ${!myArray[#]}; do
if myArray[i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
break # stop looping
fi
done
else
echo "Name not found"
fi
exit 0
3.- The following improves things. Supposing a)$NAME matches the whole line that contains it and b)there's always one line after a $NOME found, this will work; if not (if $NOME can be the last line in the phone-book), then you need to do small adjustments.
!/bin/bash
PHONEBOOK="/root/phonebook.txt"
NUMBERTOCALL="/root/numbertocall.txt"
NOME="$1"
myline=""
myline=$(grep -A1 "$NOME" "$PHONEBOOK" | sed '1d')
if [ -z "$myline" ]; then
echo "Name not found :-("
else
echo -n "$NOME FOUND.... "
echo "$myline" >> "$NUMBERTOCALL"
echo " .... AND SAVED! :-)"
fi
exit 0

Related

Calling a function that decodes in base64 in bash

#!/bin/bash
#if there are no args supplied exit with 1
if [ "$#" -eq 0 ]; then
echo "Unfortunately you have not passed any parameter"
exit 1
fi
#loop over each argument
for arg in "$#"
do
if [ -f arg ]; then
echo "$arg is a file."
#iterates over the files stated in arguments and reads them $
cat $arg | while read line;
do
#should access only first line of the file
if [ head -n 1 "$arg" ]; then
process line
echo "Script has ran successfully!"
exit 0
#should access only last line of the file
elif [ tail -n 1 "$arg" ]; then
process line
echo "Script has ran successfully!"
exit 0
#if it accesses any other line of the file
else
echo "We only process the first and the last line of the file."
fi
done
else
exit 2
fi
done
#function to process the passed string and decode it in base64
process() {
string_to_decode = "$1"
echo "$string_to_decode = " | base64 --decode
}
Basically what I want this script to do is to loop over the arguments passed to the script and then if it's a file then call the function that decodes in base64 but just on the first and the last line of the chosen file. Unfortunately when I run it even with calling a right file it does nothing. I think it might be encountering problems with the if [ head -n 1 "$arg" ]; then part of the code. Any ideas?
EDIT: So I understood that I am actually just extracting first line over and over again without really comparing it to anything. So I tried changing the if conditional of the code to this:
first_line = $(head -n 1 "$arg")
last_line = $(tail -n 1 "$arg")
if [ first_line == line ]; then
process line
echo "Script has ran successfully!"
exit 0
#should access only last line of the file
elif [ last_line == line ]; then
process line
echo "Script has ran successfully!"
exit 0
My goal is to iterate through files for example one is looking like this:
MTAxLmdvdi51awo=
MTBkb3duaW5nc3RyZWV0Lmdvdi51awo=
MXZhbGUuZ292LnVrCg==
And to decode the first and the last line of each file.

To decode the first and last line of each file given to your script, use this:
#! /bin/bash
for file in "$#"; do
[ -f "$file" ] || exit 2
head -n1 "$file" | base64 --decode
tail -n2 "$file" | base64 --decode
done

Yea, as the others already said the true goal of the script isn't really clear. That said, i imagine every variation of what you may have wanted to do would be covered by something like:
#!/bin/bash
process() {
encoded="$1";
decoded="$( echo "${encoded}" | base64 --decode )";
echo " Value ${encoded} was decoded into ${decoded}";
}
(( $# )) || {
echo "Unfortunately you have not passed any parameter";
exit 1;
};
while (( $# )) ; do
arg="$1"; shift;
if [[ -f "${arg}" ]] ; then
echo "${arg} is a file.";
else
exit 2;
fi;
content_of_first_line="$( head -n 1 "${arg}" )";
echo "Content of first line: ${content_of_first_line}";
process "${content_of_first_line}";
content_of_last_line="$( tail -n 1 "${arg}" )";
echo "Content of last line: ${content_of_last_line}";
process "${content_of_last_line}";
line=""; linenumber=0;
while IFS="" read -r line; do
(( linenumber++ ));
echo "Iterating over all lines. Line ${linenumber}: ${line}";
process "${line}";
done < "${arg}";
done;
some additions you may find useful:
If the script is invoked with multiple filenames, lets say 4 different filenames, and the second file does not exist (but the others do),
do you really want the script to: process the first file, then notice that the second file doesnt exist, and exit at that point ? without processing the (potentially valid) third and fourth file ?
replacing the line:
exit 2;
with
continue;
would make it skip any invalid filenames, and still process valid ones that come after.
Also, within your process function, directly after the line:
decoded="$( echo "${encoded}" | base64 --decode )";
you could check if the decoding was successful before echoing whatever the resulting garbage may be if the line wasnt valid base64.
if [[ "$?" -eq 0 ]] ; then
echo " Value ${encoded} was decoded into ${decoded}";
else
echo " Garbage.";
fi;
--
To answer your followup question about the IFS/read-construct, it is a mixture of a few components:
read -r line
reads a single line from the input (-r tells it not to do any funky backslash escaping magic).
while ... ; do ... done ;
This while loop surrounds the read statement, so that we keep repeating the process of reading one line, until we run out.
< "${arg}";
This feeds the content of filename $arg into the entire block of code as input (so this becomes the source that the read statement reads from)
IFS=""
This tells the read statement to use an empty value instead of the real build-in IFS value (the internal field separator). Its generally a good idea to do this for every read statement, unless you have a usecase that requires splitting the line into multiple fields.
If instead of
IFS="" read -r line
you were to use
IFS=":" read -r username _ uid gid _ homedir shell
and read from /etc/passwd which has lines such as:
root:x:0:0:root:/root:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
then that IFS value would allow it to load those values into the right variables (in other words, it would split on ":")
The default value for IFS is inherited from your shell, and it usually contains the space and the TAB character and maybe some other stuff. When you only read into one single variable ($line, in your case). IFS isn't applied but when you ever change a read statement and add another variable, word splitting starts taking effect and the lack of a local IFS= value will make the exact same script behave very different in different situations. As such it tends to be a good habbit to control it at all times.
The same goes for quoting your variables like "$arg" or "${arg}" , instead of $arg . It doesn't matter when ARG="hello"; but once the value starts containing spaces suddenly all sorts of things can act different; suprises are never a good thing.

How to test for certain characters in a file

I am currently running a script with an if statement. Before I run the script, I want to make sure the file provided as the first argument has certain characters.
If the file does not have those certain characters in certain spots then the output would be else "File is Invalid" on the command line.
For the if statement to be true, the file needs to have at least one hyphen in Field 1 line 1 and at least one comma in Field one Line one.
How would I create an if statement with perhaps a test command to validate those certain characters are present?
Thanks
Im new to Linux/Unix, this is my homework so I haven't really tried anything, only brain storming possible solutions.
function usage
{
echo "usage: $0 filename ..."
echo "ERROR: $1"
}
if [ $# -eq 0 ]
then
usage "Please enter a filename"
else
name="Yaroslav Yasinskiy"
echo $name
date
while [ $# -gt 0 ]
do
if [ -f $1 ]
then
if <--------- here is where the answer would be
starting_data=$1
echo
echo $1
cut -f3 -d, $1 > first
cut -f2 -d, $1 > last
cut -f1 -d, $1 > id
sed 's/$/:/' last > last1
sed '/last:/ d' last1 > last2
sed 's/^ *//' last2 > last3
sed '/first/ d' first > first1
sed 's/^ *//' first1 > first2
sed '/id/ d' id > id1
sed 's/-//g' id1 > id2
paste -d\ first2 last3 id2 > final
cat final
echo ''
else
echo
usage "Coult not find file $1"
fi
shift
done
fi

In answer to your direct question:
For the if statement to be true, the file needs to have at least one
hyphen in Field 1 line 1 and at least one comma in Field one Line one.
How would I create an if statement with perhaps a test command to
validate those certain characters are present?
Bash provides all the tools you need. While you can call awk, you really just need to read the first line of the file into two-variable (say a and b) and then use the [[ $a =~ regex ]] to where the regex is an extended regular expression that verifies that the first field (contained in $a) contains both a '-' and ','.
For details on the [[ =~ ]] expression, see bash(1) - Linux manual page under the section labeled [[ expression ]].
Let's start with read. When you provide two variables, read will read the first field (based on normal word-splitting given by IFS (the Internal Field Separator, default $'[ \t\n]' - space, tab, newline)). So by doing read -r a b you read the first field into a and the rest of the line into b (you don't care about b for your test)
Your regex can be ([-]+.*[,]+|[,]+.*[-]+) which is an (x|y), e.g. x OR y expression where x is [-]+.*[,]+ (one or more '-' and one or more ','), your y is [,]+.*[-]+ (one or more ',' and one or more '-'). So by using the '|' your regex will accept either a comma then zero-or-more characters and a hyphen or a hyphen and zero-or-more characters and then a comma in the first field.
How do you read the line? With simple redirection, e.g.
read -r a b < "$1"
So your conditional test in your script would look something like:
if [ -f $1 ]
then
read -r a b < "$1"
if [[ $a =~ ([-]+.*[,]+|[,]+.*[-]+) ]] # <-- here is where the ...
then
starting_data=$1
...
else
echo "File is Invalid" >&2 # redirection to 2 (stderr)
fi
else
echo
usage "Coult not find file $1"
fi
shift
...
Example Test Files
$ cat valid
dog-food, cat-food, rabbit-food
50lb 16lb 5lb
$ cat invalid
dogfood, catfood, rabbitfood
50lb 16lb 5lb
Example Use/Output
$ read -r a b < valid
if [[ $a =~ ([-]+.*[,]+|[,]+.*[-]+) ]]; then
echo "file valid"
else
echo "file invalid"
fi
file valid
and for the file without the certain characters:
$ read -r a b < invalid
if [[ $a =~ ([-]+.*[,]+|[,]+.*[-]+) ]]; then
echo "file valid"
else
echo "file invalid"
fi
file invalid
Now you really have to concentrate on eliminating the spawning of at least a dozen subshells where you call cut 3-times, sed 7-times, paste once and then cat. While it is good you are thinking through what you need to do, and getting it working, as mentioned in my comment, any time you are looping, you want to eliminate the number of subshells spawned to the greatest extent possible. I suspect as #Mig answered, awk will be the proper tool that can likely eliminate all 12 subshells are replace it with a single call to awk.

I personally would use awk for this all part since you want to test fields and create a string with concatenated fields. Awk is perfect for that.
But here is a small script which shows how you could just test your file's first line:
if [[ $(head -n 1 file.csv | awk '$1~/-/ && $1~/,/ {print "MATCH"}') == 'MATCH' ]]; then
echo "yes"
else
echo "no"
fi
It looks overkill when not doing the whole thing in awk but it works. I am sure there is a way to test only one regex, but that would involve knowing which flavour of awk you have because I think they don't all use the same regex engine. Therefore I left this out for the sake of simplicity.

How do I check this condition in shell script?

Eg. If I have a command
<package> list --all
Output of the command:
Name ID
abc 1
xyz 2
How can I check if the user input is the same as the name in the list, using a shell script. Something like this:
if ($input== $name in command )
echo "blabla"

name=$1
<package> list --all | egrep -q "^$name[ \t]"
result=$?
The somewhat dubious notation of package is from the question and is a kind of placeholder.
The result will be 0 on success and 1 on failure.
If the name is literally "name" it will match the headline, and if blanks might be in the name, it will be more complicated.
egrep -q "^$name[ \t]"
means 'quiet', don't print the matching case on the screen.
$name holds the parameter, which we assigned in the beginning.
The "^" prevents "bc" to match - it means "beginning of line".
The "[ \t]" captures blank and tab as end of word markers.

To provide an alternate approach (which allows reading and testing more than one value without rerunning your list command or needing to do an O(n) lookup):
#!/usr/bin/env bash
case $BASH_VERSION in
'') echo "This script requires bash 4.x (run with non-bash shell)" >&2; exit 1;;
[0-3].*) echo "This script requires bash 4.x (run with $BASH_VERSION)" >&2; exit 1;;
esac
declare -A seen=( ) # create an empty associative array
{
read -r _ # skip the header
while read -r name value; do # loop over other lines
seen[$name]=$value # ...populating the array from them
done
} < <(your_program list --all) # ...with input for the loop from your program
# after you've done that work, further checks will be very efficient:
while :; do
printf %s "Enter the name you wish to check, or enter to stop: " >&2
read -r name_in # read a name to check from the user
[[ $name_in ]] || break # exit the loop if given an empty value
if [[ ${seen[$name_in]} ]]; then # lookup the name in our associative array
printf 'The name %q exists with value %q\n' "$name_in" "${seen[$name_in]}"
else
printf 'The name %q does not exist\n' "$name_in"
fi
done

bash scripting reading numbers from a file

Hello i need to make a bash script that will read from a file and then add the numbers in the file. For example, the file im reading would read as:
cat samplefile.txt
1
2
3
4
The script will use the file name as an argument and then add those numbers and print out the sum. Im stuck on how i would go about reading the integers from the file and then storing them in a variable.
So far what i have is the following:
#! /bin/bash
file="$1" #first arg is used for file
sum=0 #declaring sum
readnums #declaring var to store read ints
if [! -e $file] ; do #checking if files exists
echo "$file does not exist"
exit 0
fi
while read line ; do
do < $file
exit

What's the problem? Your code looks fine, except readnums is not a valid command name, and you need spaces inside the square brackets in the if condition. (Oh and "$file" should properly be inside double quotes.)
#!/bin/bash
file=$1
sum=0
if ! [ -e "$file" ] ; do # spaces inside square brackets
echo "$0: $file does not exist" >&2 # error message includes $0 and goes to stderr
exit 1 # exit code is non-zero for error
fi
while read line ; do
sum=$((sum + "$line"))
do < "$file"
printf 'Sum is %d\n' "$sum"
# exit # not useful; script will exit anyway
However, the shell is not traditionally a very good tool for arithmetic. Maybe try something like
awk '{ sum += $1 } END { print "Sum is", sum }' "$file"
perhaps inside a snippet of shell script to check that the file exists, etc (though you'll get a reasonably useful error message from Awk in that case anyway).

Awk: loop & save different lines to different files?

I'm looping over a series of large files with a shell script:
i=0
while read line
do
# get first char of line
first=`echo "$line" | head -c 1`
# make output filename
name="$first"
if [ "$first" = "," ]; then
name='comma'
fi
if [ "$first" = "." ]; then
name='period'
fi
# save line to new file
echo "$line" >> "$2/$name.txt"
# show live counter and inc
echo -en "\rLines:\t$i"
((i++))
done <$file
The first character in each line will either be alphanumeric, or one of the above defined characters (which is why I'm renaming them for use in the output file name).
It's way too slow.
5,000 lines takes 128seconds.
At this rate I've got a solid month of processing.
Will awk be faster here?
If so, how do I fit the logic into awk?

This can certainly be done more efficiently in bash.
To give you an example: echo foo | head does a fork() call, creates a subshell, sets up a pipeline, starts the external head program... and there's no reason for it at all.
If you want the first character of a line, without any inefficient mucking with subprocesses, it's as simple as this:
c=${line:0:1}
I would also seriously consider sorting your input, so you can only re-open the output file when a new first character is seen, rather than every time through the loop.
That is -- preprocess with sort (as by replacing <$file with < <(sort "$file")) and do the following each time through the loop, reopening the output file only conditionally:
if [[ $name != "$current_name" ]] ; then
current_name="$name"
exec 4>>"$2/$name" # open the output file on FD 4
fi
...and then append to the open file descriptor:
printf '%s\n' "$line" >&4
(not using echo because it can behave undesirably if your line is, say, -e or -n).
Alternately, if the number of possible output files is small, you can just open them all on different FDs up-front (substituting other, higher numbers where I chose 4), and conditionally output to one of those pre-opened files. Opening and closing files is expensive -- each close() forces a flush to disk -- so this should be a substantial help.

A few things to speed it up:
Don't use echo/head to get the first character. You're
spawning at least two additional processes per line. Instead,
use bash's parameter expansion facilities to get the first character.
Use if-elif to avoid checking $first against all the
possibilities
each time. Even better, if you are using bash 4.0 or later, use an associative array
to store the output file names, rather than checking against
$first in a big if-statement for each line.
If you don't have a version of bash that supports associative
arrays, replace your if statements with the following.
if [[ "$first" = "," ]]; then
name='comma'
elif [[ "$first" = "." ]]; then
name='period'
else
name="$first"
fi
But the following is suggested. Note the use of $REPLY as the default variable used by read if no name is given (just FYI).
declare -A OUTPUT_FNAMES
output[","]=comma
output["."]=period
output["?"]=question_mark
output["!"]=exclamation_mark
output["-"]=hyphen
output["'"]=apostrophe
i=0
while read
do
# get first char of line
first=${REPLY:0:1}
# make output filename
name=${output[$first]:-$first}
# save line to new file
echo $REPLY >> "$name.txt"
# show live counter and inc
echo -en "\r$i"
((i++))
done <$file

#!/usr/bin/awk -f
BEGIN {
punctlist = ", . ? ! - '"
pnamelist = "comma period question_mark exclamation_mark hyphen apostrophe"
pcount = split(punctlist, puncts)
ncount = split(pnamelist, pnames)
if (pcount != ncount) {print "error: counts don't match, pcount:", pcount, "ncount:", ncount; exit}
for (i = 1; i <= pcount; i++) {
punct_lookup[puncts[i]] = pnames[i]
}
}
{
print > punct_lookup[substr($0, 1, 1)] ".txt"
printf "\r%6d", i++
}
END {
printf "\n"
}
The BEGIN block builds an associative array so you can do punct_lookup[","] and get "comma".
The main block simply does the lookups for the filenames and outputs the line to the file. In AWK, > truncates the file the first time and appends subsequently. If you have existing files that you don't want truncated, then change it to >> (but don't use >> otherwise).

Yet another take:
declare -i i=0
declare -A names
while read line; do
first=${line:0:1}
if [[ -z ${names[$first]} ]]; then
case $first in
,) names[$first]="$2/comma.txt" ;;
.) names[$first]="$2/period.txt" ;;
*) names[$first]="$2/$first.txt" ;;
esac
fi
printf "%s\n" "$line" >> "${names[$first]}"
printf "\rLine $((++i))"
done < "$file"
and
awk -v dir="$2" '
{
first = substr($0,1,1)
if (! (first in names)) {
if (first == ",") names[first] = dir "/comma.txt"
else if (first == ".") names[first] = dir "/period.txt"
else names[first] = dir "/" first ".txt"
}
print > names[first]
printf("\rLine %d", NR)
}
'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash reading txt file and storing in array - linux

Related

Calling a function that decodes in base64 in bash

How to test for certain characters in a file

How do I check this condition in shell script?

bash scripting reading numbers from a file

Awk: loop & save different lines to different files?

Categories

Resources