Bash Shell Programming Store Variables from Text File into Arrays - linux

My program should be able to work this way.
Below is the content of the text file named BookDB.txt
The individual are separated by colons(:) and every line in the text file should serve as a set of information and are in the order as stated below.
Title:Author:Price:QtyAvailable:QtySold
Harry Potter - The Half Blood Prince:J.K Rowling:40.30:10:50
The little Red Riding Hood:Dan Lin:40.80:20:10
Harry Potter - The Phoniex:J.K Rowling:50.00:30:20
Harry Potter - The Deathly Hollow:Dan Lin:55.00:33:790
Little Prince:The Prince:15.00:188:9
Lord of The Ring:Johnny Dept:56.80:100:38
I actually intend to
1) Read the file line by line and store it in an array
2) Display it
However I have no idea on how to even start the first one.
From doing research online, below are the codes which I have written up till now.
#!/bin/bash
function fnReadFile()
{
while read inputline
do
bTitle="$(echo $inputline | cut -d: -f1)"
bAuthor="$(echo $inputline | cut -d: -f2)"
bPrice="$(echo $inputline | cut -d: -f3)"
bQtyAvail="$(echo $inputline | cut -d: -f4)"
bQtySold="$(echo $inputline | cut -d: -f5)"
bookArray[Count]=('$bTitle', '$bAuthor', '$bPrice', '$bQtyAvail', '$bQtySold')
Count = Count + 1
done
}
function fnInventorySummaryReport()
{
fnReadFile
echo "Title Author Price Qty Avail. Qty Sold Total Sales"
for t in "${bookArray[#]}"
do
echo $t
done
echo "Done!"
}
if ! [ -f BookDB.txt ] ; then #check existance of bookdb file, create the file if not exist else continue
touch BookDB.txt
fi
"HERE IT WILL THEN BE THE MENU AND CALLING OF THE FUNCTION"
Thanks to those in advance who helped!

Why would you want to read the entire thing into an array? Query the file when you need information:
#!/bin/sh
# untested code:
# print the values of any line that match the pattern given in $1
grep "$1" BookDB.txt |
while IFS=: read Title Author Price QtyAvailable QtySold; do
echo title = $Title
echo author = $Author
done
Unless your text file is very large, it is unlikely that you will need the data in an array. If it is large enough that you need that for performance reasons, you really should not be coding this in sh.

Since your goal here seems to be clear, how about using awk as an alternative to using bash arrays? Often using the right tool for the job makes things a lot easier!
The following awk script should get you something like what you want:
# This will print your headers, formatted the way you had above, but without
# the need for explicit spaces.
BEGIN {
printf "%-22s %-16s %-14s %-15s %-13s %s\n", "Title", "Author", "Price",
"Qty Avail.", "Qty Sold", "Total Sales"
}
# This is described below, and runs for every record (line) of input
{
printf "%-22s %-16s %-14.2f %-15d %-13d %0.2f\n",
substr($1, 1, 22), substr($2, 1, 16), $3, $4, $5, ($3 * $5)
}
The second section of code (between curly braces) runs for every line of input. printf is for formatted output, and uses the given format string to print out each field, denoted by $1, $2, etc. In awk, these variables are used to access the fields of your record (line, in this case). substr() is used to truncate the output, as shown below, but can easily be removed if you don't mind the fields not lining up. I assumed "Total Sales" was supposed to be Price multiplied by Qty Sold, but you can update that easily as well.
Then, you save this file in books.awk invoke this script like so:
$ awk -F: -f books.awk books
Title Author Price Qty Avail. Qty Sold Total Sales
Harry Potter - The Hal J.K Rowling 40.30 10 50 2015.00
The little Red Riding Dan Lin 40.80 20 10 408.00
Harry Potter - The Pho J.K Rowling 50.00 30 20 1000.00
Harry Potter - The Dea Dan Lin 55.00 33 790 43450.00
Little Prince The Prince 15.00 188 9 135.00
Lord of The Ring Johnny Dept 56.80 100 38 2158.40
The -F: tells awk that the fields are separated by colon (:), and -f books.awk tells awk what script to run. Your data is held in books.
Not exactly what you were asking for, but just pointing you toward a (IMO) better tool for this kind of job! awk can be intimidating at first, but it's amazing for jobs that work on records like this!

Related

Retaining one member of a pair

Good afternoon to all,
I have a file containing two fields, each representing a member of a pair.
I want to retain one member of each pair and it does not matter which member as these are codes for duplicate samples in a study.
Each pair appears twice in my file, with each member of the pair appearing once in either column.
An example of an input file is:
XXX1 XXX7
XXX2 XXX4
abc2 dcb3
XXX7 XXX1
dcb3 abc2
XXX4 XXX2
And an example of the desired output would be
XXX1
XXX2
abc2
How might this be accomplished in bash? Thank you.
Here is a combination of GNU awk, cut and sort, store the scipt as duplicatePairs.awk:
{ if ( $1 < $2) print $1, $2
else print $2, $1
}
and run it like this: awk -f duplicatePairs.awk your_file | sort -u | cut -d" " -f1
The if sorts the pairs such that a line with x,y and a line with y,x will be printed the same. Then sort -u can remove the duplicate lines. And the cut selects the first column.
With a slightly larger awk script, we can solve the requirements "awk-only":
{
smallest = $1;
if ( $1 > $2) {
smallest = $2
}
if( !(smallest in seen) ) {
seen [ smallest ] = 1
print smallest
}
}
Run it like this: awk -f duplicatePairs.awk your_file
While the answer posted by Lars above works very well I would like to suggest an alternative, just in case someone stumbles upon this problem.
I had previously used awk '!seen[$2,$1]++ {print $1}' to the same result. I didn't realize it had worked since the number of lines in my file wasn't halved. This turned out to be because of some wrong assumptions I made about my data.

In-line replacement bash (replace line with new one using variables)

I'm going through and reading lines from a file. They have a ton of information that is unnecessary, and I want to reformat the lines for later use so that I can use the necessary information later.
Example line in file (file1)
Name: *name* Date: *date* Age: *age* Gender: *gender* Score: *score*
Say I want to just pull gender and age from the file and use that later
New line
*gender*, *age*
In bash:
while read line; do
<store variable for gender>
<store variable for age>
<overwrite each line in CSV - gender,age>
<use gender/age as inputs for later comparisons>
done < file1
EDIT: There is no stability in the entries. One value can be found using a echo $line | cut and the other value is found using a [ $line =~ "keyValue" ] then setting that value
I was thinking of storing the combination of the two variables as such:
newLine="$val1,$val2"
Then using a sed in-line replace to replace the $line with $newLine.
Is there a better way, though? It may come down to a sed formatting issue with variables.
This will produce your desired output from your posted sample input:
$ cat file
Name: *name* Date: *date* Age: *age* Gender: *gender* Score: *score*
$ awk -F'[: ]+' -v OFS=', ' '{for (i=1;i<NF;i+=2) a[$i]=$(i+1); print a["Gender"], a["Age"]}' file
*gender*, *age*
$ awk -F'[: ]+' -v OFS=', ' '{for (i=1;i<NF;i+=2) a[$i]=$(i+1); print a["Score"], a["Name"], a["Date"] }' file
*score*, *name*, *date*
and you can see above how easy it is to print whatever fields you like in whatever order you like.
If it's not what you want, post some more representative input.
Your example leaves room for interpretation, so I'm assuming that there may be whitespace in the field values, but that there are no colons in the field values and that each field key is followed by a colon. I also assume that the order is stable.
while IFS=: read _ _ _ age gender _; do
age="${age% Gender}" # Use parameter expansion to strip off the key for the *next* field.
gender="${gender% Score}"
printf '"%s","%s"\n' "$gender" "$age"
done < file1 > file1.csv
Update
Since your question now states that there is no stability, you have to iterate through the possible values to get your output:
while IFS=: read -a line; do
unset age key sex
for chunk in "${line[#]}"; do
val="${chunk% *}" # Everything but the key
case "$key" in
Age) age="$val";;
Gender) sex="$val";;
esac
# The key is for the *next* iteration.
key="${chunk##* }"
done
if [[ $age || $sex ]]; then
printf '"%s","%s"\n' "$sex" "$age"
fi
done < file1 > file1.csv
(Also I added quotes around the output values in the csv to be compliant with the actual csv format and in case sex or age happened to have commas in it. Maybe someone is 1,000,000 years old. ;)

Formatting Text into separate files [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a slight problem and do not know where to start.
I have a text file that contains the following information.
MINI COOPER 2007, 30,000 miles, British Racing Green, full service history, metallic paint, alloys. Great condition. £5,995 ono Telephone xxxxx xxxxx
I need to populate the above information in the following format
<advert>
<manufacturer></manufacturer>
<make></make>
<model></make>
<price></price>
<miles></miles>
<image></image>
<desc><![CDATA[desc>
<expiry></expiry> // Any point in the future
<url></url> // Optional
</advert>
<advert>
The output should be.
</advert>
<advert>
<manufacturer>MINI</manufacturer>
<make></make>
<model></make>
<price>5,995</price>
<miles>30000</miles>
<image></image>
<desc><![CDATA[2007, British Racing Green, full service history, metallic paint, alloys. Great condition.Telephone xxxxxx xxxxxx]]></desc>
<expiry>Todays date 13/05/2013</expiry>
<url></url>
</advert>
Any help will be create appreciated.
Since sometimes commas are part of a field and sometimes they aren't you can't use commas or anything else as a field separator so you need something like this in GNU awk (for gensub() and strftime()):
gawk '{
print "<advert>"
printf "\t<manufacturer>%s</manufacturer>\n", $1
printf "\t<make></make>\n"
printf "\t<model></model>\n"
printf "\t<price>%s</price>\n", gensub(/.*£([[:digit:],]+).*/,"\\1","")
printf "\t<miles>%s</miles>\n", gensub(/.*[[:space:]]([[:digit:],]+)[[:space:]]+miles.*/,"\\1","")
printf "\t<image></image>\n"
printf "\t<desc><![CDATA[%s]]></desc>\n", gensub(/.*[[:space:]]+miles[[:space:]]*,[[:space:]]*(.*)/,"\\1","")
printf "\t<expiry>Todays date %s</expiry>\n", strftime("%d/%m/%Y")
printf "\t<url></url>\n"
print "</advert>"
}' file
My editor seems to choke on British pound signs so here's the above script running using a # symbol instead:
$ cat file
MINI COOPER 2007, 30,000 miles, British Racing Green, full service history, metallic paint, alloys. Great condition. #5,995 ono Telephone xxxxx xxxxx
$ gawk '{
print "<advert>"
printf "\t<manufacturer>%s</manufacturer>\n", $1
printf "\t<make></make>\n"
printf "\t<model></model>\n"
printf "\t<price>%s</price>\n", gensub(/.*#([[:digit:],]+).*/,"\\1","")
printf "\t<miles>%s</miles>\n", gensub(/.*[[:space:]]([[:digit:],]+)[[:space:]]+miles.*/,"\\1","
")
printf "\t<image></image>\n"
printf "\t<desc><![CDATA[%s]]></desc>\n", gensub(/.*[[:space:]]+miles[[:space:]]*,[[:space:]]*(.
*)/,"\\1","")
printf "\t<expiry>Todays date %s</expiry>\n", strftime("%d/%m/%Y")
printf "\t<url></url>\n"
print "</advert>"
}' file
<advert>
<manufacturer>MINI</manufacturer>
<make></make>
<model></model>
<price>5,995</price>
<miles>30,000</miles>
<image></image>
<desc><![CDATA[British Racing Green, full service history, metallic paint, alloys. Great con
dition. #5,995 ono Telephone xxxxx xxxxx]]></desc>
<expiry>Todays date 13/05/2013</expiry>
<url></url>
</advert>
Here's some example code that should get you going at least. Run like:
awk -f script.awk file.txt
Contents of script.awk:
{
for (i=1;i<=NF;i++) {
if ($i == "miles,") {
miles = $(i - 1)
$i = $(i - 1) = ""
}
if ($i ~ /£/) {
price = substr($i, 2)
$i = $(i + 1) = ""
}
}
gsub(/ +/, " ");
print "<advert>"
print "\t<manufacturer>" $1 "</manufacturer>"
print "\t<make></make>"
print "\t<model></make>"
print "\t<price>" price "</price>"
print "\t<miles>" miles "</miles>"
print "\t<image></image>"
print "\t<desc><![CDATA[" $0 "]></desc>"
print "\t<expiry>" strftime( "%d/%m/%Y" ) "</expiry>"
print "\t<url></url>"
print "</advert>"
}
Results:
<advert>
<manufacturer>MINI</manufacturer>
<make></make>
<model></make>
<price>5,995</price>
<miles>30,000</miles>
<image></image>
<desc><![CDATA[MINI COOPER 2007, British Racing Green, full service history, metallic paint, alloys. Great condition. Telephone xxxxx xxxx]></desc>
<expiry>13/05/2013</expiry>
<url></url>
</advert>

max comma's on one line, using bash script

I have some \n ended text:
She walks, in beauty, like the night
Of cloudless climes, and starry skies
And all that's best, of dark and bright
Meet in her aspect, and her eyes
And I want to find which line has the max number of , and print that line too.
For example, the text above should result as
She walks, in beauty, like the night
Since it has 2 (max among all line) comma's.
I have tried:
cat p.txt | grep ','
but do not know where to go now.
You could use awk:
awk -F, -vmax=0 ' NF > max { max_line = $0; max = NF; } END { print max_line; }' < poem.txt
Note that if the max is not unique this picks the first one with the max count.
try this
awk '-F,' '{if (NF > maxFlds) {maxFlds=NF; maxRec=$0}} ; END {print maxRec}' poem
Output
She walks, in beauty, like the night
Awk works with 'Fields', the -F says use ',' to separate the fields. (The default for F is adjacent whitespace, (space and tabs))
NF means Number of Fields (in the current record). So we're using logic to find the record with the maximum number of Fields, capturing the value of the line '$0', and at the END, we print out the line with the most fields.
It is left undefined what will happen if 2 lines have the same maximum # of commas ;-)
I hope this helps.
FatalError's FS-based solution is nice. Another way I can think of is to remove non-comma characters from the line, then count its length:
[ghoti#pc ~]$ awk '{t=$0; gsub(/[^,]/,""); print length($0), t;}' poem
2 She walks, in beauty, like the night
1 Of cloudless climes, and starry skies
1 And all that's best, of dark and bright
1 Meet in her aspect, and her eyes
[ghoti#pc ~]$
Now we just need to keep track of it:
[ghoti#pc ~]$ awk '{t=$0;gsub(/[^,]/,"");} length($0)>max{max=length($0);line=t} END{print line;}' poem
She walks, in beauty, like the night
[ghoti#pc ~]$
Pure Bash:
declare ln=0 # actual line number
declare maxcomma=0 # max number of commas seen
declare maxline='' # corresponding line
while read line ; do
commas="${line//[^,]/}" # remove all non-commas
if [ ${#commas} -gt $maxcomma ] ; then
maxcomma=${#commas}
maxline="$line"
fi
((ln++))
done < "poem.txt"
echo "${maxline}"

Executing zgrep recursively in Linux

This zgrep command is outputting a particular field of a line containing the word yellow when given a giant input log file for all 24 hours of 26th Feb 1989.
zgrep 'yellow' /color_logs/1989/02/26/*/1989-02-26-00_* | cut -f3 -d'+'
1) I prefer using a perl script. Are there advantages of using a bash script?
Also when writing this script I would like for it to create a file after processing the data for each DAY (so it will look at all the hours in a day)
zgrep 'yellow' /color_logs/1989/02/*/*/1989-02-26-00_* | cut -f3 -d'+'
2) How do I determine the value of the first star (in Perl), after processing a day's worth of data so that I can output the file with the YYMMDD in its name. I'm interested in getting the value of the first star from the line of code directly above this question.
Grep writes out the file that where the line came from, but your cut command is throwing that away. You could do something like:
open(PROCESS, "zgrep 'yellow' /color_logs/1989/02/*/*/1989-02-26_* |");
while(<PROCESS>) {
if (m!/color_logs/(\d\d\d\d)/(\d\d)/(\d\d)/[^:]+:(.+)$!) {
my ($year, $month, $day, $data) = ($1, $2, $3, $4);
# Do the cut -f3 -d'+' on the line from the log
my $data = (split('+', $data))[2];
open(OUTFILE, ">>${year}${month}${day}.log");
print OUTFILE $data, "\n";
close(OUTFILE);
}
}
That's inefficient in that you're opening and closing the file for each line, you could use an IO::File object instead and only open when the date changes, but you get the idea.

Resources