Search book by title/author - linux

Hi i am new to shell programming i hope u can guide me along thanks.
Hi i need a function to search either book title or author from a .txt file and echo out the following
Title:*Enter*
Author:Scissorhands
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
eg
Found 3 records :
C++ for dummies, John Scissorhands, $15.01, 10, 5
Java for dummies, Mary Scissorhands, $16.02, 20, 15
VB.NET for dummies, Edward Scissorhands, $17.03, 30, 25
my file format is as below
Book name:author:price:Qty:Qty Sold
harry potter:james:12.99:197:101
function Search_book
{
FILE="/home/student/Downloads/BookDB.txt"
echo found 1 records : $FILE contents
cat FILE
}

Edit: Sorry, I misunderstood your question.
Updated mixed bash/perl solution:
#!/bin/bash
read -p "Enter search term: " search
perl -ne '
BEGIN{ $pattern = $ARGV[0]; shift; $n=0 }
#a=split /:/;
if ($a[0] =~ m/$pattern/i or $a[1] =~ m/$pattern/i) {
print "$a[0], $a[1],\$$a[2],$a[3],$a[4]\n";
$n += 1;
}
END{ print "Found $n title(s).\n" }
' "$search" /home/student/Downloads/BookDB.txt
Output:
$ ./search.sh
Enter search term: star wars vi
Title: Star wars VI - return of the jedi
Found 1 title(s).
Not that you need regular expressions for wildcards (i.e. . instead of ?, .* instead of *, etc.), and you don't need to put wildcards at beginning/end of the search term to find matches anywhere in a given (sub)string.
Of course you could also do this entirely in Perl, without the shell script wrapper:
#!/usr/bin/env perl
use strict;
use warnings;
my $booklist = './books.txt';
my #book;
print "Enter search term: ";
chomp (my $pattern = <>);
open BOOKS, "<$booklist" or die $!;
my $n = 0;
foreach (<BOOKS>) {
chomp;
#book = split /:/;
if ($book[0] =~ m/$pattern/i or $book[1] =~ m/$pattern/i) {
print "$book[0], $book[1],\$$book[2],$book[3],$book[4]\n";
$n += 1;
}
}
close BOOKS;
print "Found $n title(s).\n";
For an awk solution see Adrian Frühwirth's answer.

search_book()
{
awk -F':' -v search="$1" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
_
$ cat books.txt
X never marks the spot:Indiana Jones:9.99:1:1
A fistful of barnacles:Captain Twiddlymore:9.99:2:1
The time I blew up LeChuck:Guybrush Threepwood:8.99:100:60
When I blew up LeChuck:Guybrush Threepwood:8.99:100:50
Where I blew up LeChuck:Guybrush Threepwood:8.99:100:2
_
$ search_book Indiana
X never marks the spot, Indiana Jones,$9.99,1,1
1 records found
$ search_book Guybrush
The time I blew up LeChuck, Guybrush Threepwood,$8.99,100,60
When I blew up LeChuck, Guybrush Threepwood,$8.99,100,50
Where I blew up LeChuck, Guybrush Threepwood,$8.99,100,2
3 records found
$ search_book barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
$ search_book foo
0 records found

# Wilson Turners last point on Adrian's post:
./book.sh
Enter search term: barnacle
A fistful of barnacles, Captain Twiddlymore,$9.99,2,1
1 records found
Here is how you call the awk function within your existing code:
#!/bin/bash
search_book()
{
awk -F':' -v search="$search" '$1 ~ search || $2 ~ search { i++; printf "%s, %s,$%s,%s,%s\n", $1, $2, $3, $4, $5 } END { printf "%d records found\n", i }' books.txt
}
read -p "Enter search term: " search
search_book

Related

insert delimiter in file using Linux script [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a non delimited text file consisting of around 1 million rows.
Sample rows
1YBL LOYALTY EXT 1000101172019001
2000100101000011512753184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
On each row starting with digit "2","1","3"(rowtype) I have to insert delimiter based on the count of characters i.e on the end 0-1, 1-20,21-25... so on
How to do this using Linux script ?
Desired Output
1|YBL LOYALTY EXT |10001|01172019|001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA SAGAR |CROSS |BANDRA |WM |DELHI |456471
3|000000027
I tried this command
perl -ne ' if(/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_"} if(/^1/) { #x=(1,16,5,8); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" } if(/^3/) { #x=(1); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }' filename`
INPUT ROWS
1YBL LOYALTY EXT 1000112102018001
2000100101000002631653184911501010111199100919323739251 VIJAYPANDEY1191#GMAIL.COM VIJAY PANDEY PART OF GROUND FLOOR & BASEMENT SHOPPER STOP SV ROAD ANDHERI WEST LANDMARK-ERSTWHILE CRASSWORD BOOK STORE MUMBAI 400058
2000100101000019920453184964321010513199000919878857482 MAKSUDMASTER7775#GMAIL.COM MOHAMAD MAQSHUD MASTER H COLLECTION NEW SHIVPURI GALI NO 1 NEAR MAKHAN SINGH CHOWK LUDHIANA 141008
2000100101000023500853184923441010913197300919375580888 JAYNTITALA#GMAIL.COM JAYANTIBHAI TADA 44 KHODIYAR NAGAR B S ABHISHEK SUDAMA CHOWK KHODIYARNAGAR MOTA VARACHHA SURAT 395006
3000000066
EXPECTED OUTPUT
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|000000066
GETTING THIS BUT
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
1|41008|
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|95006
3|000000066
With GNU awk for FIELDWIDTHS:
$ awk -v FIELDWIDTHS='1 17 4 *' -v OFS='|' '/^2/{$1=$1; gsub(/\s+/,"&"OFS)} 1' file
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA |SAGAR |CROSS |BANDRA |WM |DELHI |456471
3000000027
The above use of FIELDWIDTHS says the input should be treated as separated into 4 fields of width 1 char, 17 chars, 4 chars and then the rest.
When you assign a value to a field awk recompiles the record replacing the input field separators with the value of OFS so $1=$1 is causing |s to be inserted between each of the fields described by FIELDWIDTHS.
Once that's done there's still all the remaining space-separated text to get a field separator added so the gsub() appends an OFS after every series of spaces.
Older versions of gawk don't support * as meaning the rest of the line - if you have that situation then just replace * with a large value like 99999.
You can try Perl as well
perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } }' input_file
with the given inputs
$ cat rahman.txt
1YBL LOYALTY EXT 1000101172019001
2000100101000011512753184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
$ perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } }' rahman.txt
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 VIDYA.SAGAR1#bank.IN VIDYA SAGAR CROSS BANDRA WM DELHI 456471
3000000027
$
just add entries to #x=(1,17,4) .. #x=(1,17,4,10,20)
EDIT1:
To add delimiters for those fields which can be split by space, use the below
$ perl -lpe ' if(/^2/) { #x=(1,17,4);
for $i (#x) { s/(.{$i})//; printf("%s|",$1) } s/\S+\s+\K/|/g }' rahman.txt
1YBL LOYALTY EXT 1000101172019001
2|00010010100001151|2753|184907301010614199100919699034659 |VIDYA.SAGAR1#bank.IN |VIDYA |SAGAR |CROSS |BANDRA |WM |DELHI |456471
3000000027
$
Explanation to the code
Explanation
perl -lpe # use -p for printing by default at the end of perl one-liner
# this makes sure when you dont have a line starting with 2 the line is printed after the if statement.
' if(/^2/) # if - select line that starts with 2. $_ will have the current line
{
#x=(1,17,4); # x is an array to hold the widths of fields. - 1, 17, 4
for $i (#x) # open for loop to loop through the array x
{
s/(.{$i})//; # no variable is specified, so the substitution acts on the $_ i.e current line
# first instance is s/(.{1})// => match one character and store it in $1 capturing variable
# replace the captured part with nothing and update $_
# e.g if the line is "200010010100001151" .. loop one will capture "2" and $_ becomes "00010010100001151"
# loop 2 => s/(.{17})// matches 17 character and $1 stores "00010010100001151"
printf("%s|",$1) # print $1 along with delimiter pipe
} # end of for loop
} # end of if
# here is default print statement in perl that will print the $_ after all modification
' input_file
EDIT2
I get below results based on your inputs. It works correctly.. what issues you see?
$ perl -ne ' if(/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_"} if(/^1/) { #x=(1,16,5,8); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_" } if(/^3/) { #x=(1); $i=0;
> while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
> print "$_" }' rahman.txt
1|YBL LOYALTY EXT |10001|01172019|001
2|0001001010000115127|531849|0730|101|06141991|00919699034659 |VIDYA.SAGAR1#bank.IN VID|YA SAGAR CRO|SS BAN|DRA WM | DEL|HI 456|471
3|000000027
$
EDIT3:
Got the issue... $_ is modified and so at the end of /^2/ if loop, the $_ holds the value of "141008", which is then satisfying the next if (/^1/) condition and that if also executes.. To avoid it, just copy the $_ to a $line variable in the beginning and just check $line against /^2/, /^3/, /^1/ in the separate if loops.
$ perl -lne '$line=$_; if($line=~/^2/) { #x=(1,19,6,4,3,8,20,60,40,40,40,40,30); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }
if($line=~/^1/) { #x=(1,16,5,8); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }
if($line=~/^3/) { #x=(1); $i=0;
while($i<=$#x) { $s=$x[$i]; $_=~s/(.{$s})/printf("%s|",$1);""/e;$i++ }
print "$_" }' rahman2.txt
1|YBL LOYALTY EXT |10001|12102018|001
2|0001001010000026316|531849|1150|101|01111991|00919323739251 |VIJAYPANDEY1191#GMAIL.COM |VIJAY PANDEY |PART OF GROUND FLOOR & BASEMENT |SHOPPER STOP SV ROAD ANDHERI WEST |LANDMARK-ERSTWHILE CRASSWORD BOOK STORE |MUMBAI |400058
2|0001001010000199204|531849|6432|101|05131990|00919878857482 |MAKSUDMASTER7775#GMAIL.COM |MOHAMAD MAQSHUD MASTER |H COLLECTION NEW SHIVPURI |GALI NO 1 |NEAR MAKHAN SINGH CHOWK |LUDHIANA |141008
2|0001001010000235008|531849|2344|101|09131973|00919375580888 |JAYNTITALA#GMAIL.COM |JAYANTIBHAI TADA |44 KHODIYAR NAGAR B S ABHISHEK |SUDAMA CHOWK |KHODIYARNAGAR MOTA VARACHHA |SURAT |395006
3|000000066
$
You do have delimiters in your file, you just don't see them: it's the space/tab characters. So you just need to replace those, using the sed/xxx/|/g command (by xxx I mean the space or TAB characters). In case you doubt whether your characters are spaces or tabs, you might open your file in a hex editor (space is ASCII code 32 (Hex : 20) and TAB has 9 (Hex : 09)).
You can try with gnu sed :
sed -E '/^2/{s//&|/;s/(.{19})(....)(\S+\s+)/\1|\2|\3|/}' infile
In case you don't have FIELDSWIDTHS then try following.
awk -v var="1,18,4" -v OFS="|" '
BEGIN{
num=split(var,array,",")
}
{
for(i=1;i<=num;i++){
val=val?(i==num?val substr($0,array[i-1]+1,array[i]):val substr($0,array[i-1]+1,array[i]) OFS):substr($0,1,array[i]) OFS
sum+=array[i]
}
if(sum==length($0)){
print val
}
else{
rest=substr($0,sum)
gsub(/[[:space:]]+/,"&"OFS,rest)
print val,rest
}
sum=rest=val=""
}
' Input_file

Parse columns with awk

I am new at AWK programming and I was wondering how to filter the following text:
Goedel - Declarative language for AI, based on many-sorted logic. Strongly
typed, polymorphic, declarative, with a module system. Supports bignums
and sets. "The Goedel Programming Language", P. M. Hill et al, MIT Press
1994, ISBN 0-262-08229-2. Goedel 1.4 - partial implementation in SICStus
Prolog 2.1.
ftp://ftp.cs.bris.ac.uk/goedel
info: goedel#compsci.bristol.ac.uk
Just to print this:
Goedel
I have used the following sentence but it just does not work as I wished:
awk -F " - " "/ - /{ print $1 }"
It shows the following:
Goedel
1994, ISBN 0-262-08229-2. Goedel 1.4
Could somebody tell me what I have to modify so I can get what I want?
Thanks in advance
awk 'BEGIN { RS = "" } { print $1 }' your_file.txt
which means: splits string into paragraphs by empty line, and then splits words by the default separator (space), and finally print the first word ($1) of every paragraph
this one-liner could work for your requirement:
awk -F ' - ' 'NF>1{print $1;exit}'
awk -F ' - ' ' { if (FNR % 4 == 1) next; print $1; }'
If the format is exactly the same as below, then the code above should work:
1 Author - ...
2 Year ...
3 URL
4 Extra info ...
5 Author - ...
6..N etc.
If there is a blank line between entries, you can set RS to a null string and $1 will be the author as long as the value for -F (the FS variable in an awk script) is the same. This has the advantage that if you don't have "info: ..." or a URL, you can still distinguish between entries, assuming it is not "Author - ...{newline}Year ...{newline}{newline}info: ...{newline}{newline}Author - ..." (you can't have an empty line between parts of an entry if an empty line is what separates entries.) For example:
# A blank line is what separates each entry.
BEGIN { RS = ""; }
{ print $1; }
If you have an awk that supports it, you can make RS a multiple character string if necessary (e.g. RS = "\n--\n" for entries separated by "--" on a line by itself). If you need a regex or simply don't have an awk that supports multiple character record separators, you're forced to use something like the following:
BEGIN { found_sep = 1; }
{ if (found_sep) { print $1; found_sep = 0; } }
# Entry separator is "--\n"
/^--$/ { found_sep = 1; }
More sample input will be required for something more complicated.

How to divide a data file's column A by column B using Perl

I was given a text file with a whole bunch of data sorted in columns. Each of the columns are
separated by commas.
How could I divide a column by another column to print an output answer? I am using Perl right now now so it has to be done in Perl. How could I do this?
This is what I have so far:
#!/usr/bin/perl
open (FILE, 'census2008.txt');
while (<FILE>) {
chomp;
($sumlev, $stname,$ctyname,$popestimate2008,$births2008,$deaths2008) = split(",");
}
close (FILE);
exit;
There are several options:
Read the file in line by line, split the columns on ',' and divide the relevant columns (don't forget to handle the divide-by-zero error)
Do the same thing as a one-liner:
$ perl -F/,/ -lane 'print( $F[1] == 0 ? "" : $F[3]/$F[1] )' file.txt
Utilize a ready-to-use CPAN module like Text::CSV
Of course, there are more unorthodox/crazy/unspeakable alternatives à la TMTOWTDI ™, so one could:
Parse out the relevant columns with a regex and divide the matches:
if (/^\d*,(\d+),\d*,(\d+)/) { say $2/$1 if $2 != 0; }
Do it with s///e:
$ perl -ple 's!^\d*,(\d+),\d*,(\d+).*$! $2 == 0 ? "" : $2/$1 !e' file.txt;
Get the shell to do the dirty work via backticks:
sub print_divide { say `cat file.txt | some_command_line_command` }
#!/usr/bin/env perl
# divides column 1 by column 2 of some ','-delimited file,
# read from standard input.
# usage:
# $ cat data.txt | 8458760.pl
while (<STDIN>) {
#values = split(/,/, $_);
print $values[0] / $values[1] . "\n";
}
If you have fixed width columns of data you could use 'unpack' along the lines of:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($sumlev,$stname,$ctyname,$popest,$births,$deaths)
= unpack("A2xA10xA15xA7xA5xA5");
printf "%-15s %4.2f\n", $ctyname, $births/$deaths;
}
__DATA__
10,Main ,My City , 10000, 200, 150
12,Poplar ,Somewhere , 3000, 90, 100
13,Maple ,Your Place , 9123, 100, 90

Howto insert awk command in perl script?

I want to add this awk command on my script but keep getting error. I have put inside " " but still getting error.
system("awk -F"\t" '{ for ( i=1; i<=2; i++ ) { printf "%s\t", $i } printf "\n"; }' myfile file2"};
the errors are
String found where operator expected
at host_parse line 21, near "t" '{ for
( i=1; i<=2; i++ ) { printf ""
Unquoted string "a" may clash with
future reserved word at myfile line
58.
Unquoted string "a" may clash with
future reserved word at myfile line
58.
syntax error at myfile line 21, near
"" awk -F"\"
Thanks.
One of the trickiest parts about using the system command is using quotes in a way that will get the correct command passed to the operating system. Perl's q// construction can be very helpful for this:
# treat everything between the #...# as uninterpolated string
system( q#awk -F"\t" '{ for ( i=1; i<=2; i++ ) { printf "%s\t", $i }
printf "\n"; }' myfile file2# );
To answer your immediate question, you're tripping over the default behavior of Perl's system operator. Usually, it's a great convenience for the shell to parse the command, but sometimes, as as you've seen, having multiple levels of encoding is a pain—or even a security vulnerability.
You can bypass the shell's quoting entirely with the system LIST and exec LIST forms. In your case, change your code to
#! /usr/bin/env perl
use strict;
use warnings;
my #cmd = (
"awk",
"-F", "\t",
'{ for ( i=1; i<=2; i++ ) {
printf "%s\t", $i
}
printf "\n";
}',
"myfile", "file2",
);
system(#cmd) == 0 or warn "$0: awk exited " . ($? >> 8);
You don't have to use the temporary array, but I don't like the resulting code with a multi-line command and a check for success.
Given myfile containing
1 2 3 4
foo bar baz
oui oui monsieur
and file2 with
a b c
d e f g
(where the separators in both cases are TAB characters), then the output is
1 2
foo bar
oui oui
a b
d e
They're invisible, but each line of output above has a trailing TAB.
Doing the same in Perl is straightforward. For example,
sub print_first_two_columns {
foreach my $path (#_) {
open my $fh, "<", $path or die "$0: open $path: $!";
while (<$fh>) {
chomp;
my(#cols) = (split /\t/)[0 .. 1];
print join("\t", #cols), "\n";
}
close $fh;
}
}
The part that may not be obvious is taking a slice of the values returned from split, but what's happening is simple in concept. A slice allows you to grab data at multiple indices (0 and 1 in this case, i.e., the first and second columns). The range-operator expression 0 .. 1 evaluates to the list 0 and 1. If you decide later you want the first four columns, you'd change it to 0 .. 3.
Call the sub above as in
print_first_two_columns "myfile", "file2";
Note that the code isn't exactly equivalent: it doesn't preserve the trailing TAB characters.
From the command line, it's even simpler:
$ perl -lane '$,="\t"; print #F[0,1]' myfile file2
1 2
foo bar
oui oui
a b
d e
You don't need the shell to interpret any redirection (or other shell facilities), so it would be better to pass a list of arguments to system()
system 'awk', '-F', "\t",
'{for (i=1; i<=2; i++) {printf "%s\t", $i}; print ""}',
'myfile', 'file2';

Format a file in Unix/Linux ?

I have a file containing country, catalog number, year, description and price
Kenya 563-45 1995 Heron Plover Thrush Gonolek Apalis $6.60
Surinam 632-96 1982 Butterfliers $7.50
Seychelles 831-34 2002 WWF Frogs set of 4 $1.40
Togo 1722-25 2010 Cheetah, Zebra, Antelope $5.70
File isn't delimited by a "tab" or ":" anything. There is only spaces between them. can you please tell me how can I format this file(using awk ?) and how can I find the total price from this.
With command line perl:
$ cat /your/file | perl -e '$sum=0; for(<STDIN>) { $sum += $1 if(/\$([\d\.]+)/); }; print "$sum\n"'
21.2
and awk (assumes you have dollars at the end of each line):
$ cat /your/file | awk '{s+=substr($NF,2)} END{ print s}'
21.2
Also, in response to the comment. If you want to reformat on the command line:
$ cat /your/file | perl -e 'for(<STDIN>){#a=split /\s+/; $p=pop #a; \
$line=join "|", ($a[0],$a[1],$a[2], (join" ",#a[3..$#a]) ,$p); print "$line\n"}'
Kenya|563-45|1995|Heron Plover Thrush Gonolek Apalis|$6.60
Surinam|632-96|1982|Butterfliers|$7.50
Seychelles|831-34|2002|WWF Frogs set of 4|$1.40
Togo|1722-25|2010|Cheetah, Zebra, Antelope|$5.70
If you want to do this properly, I'd do it not on the cmd line, but write a proper program to parse it.
I thought first 3 and last column is fixed meaning but middle columns are not fixed. So middle columns are kept at last with space between and fixed columns are seperated by tab so that you can start to edit it with some spreadsheet program:
awk '{ printf("%s\t%s\t%s\t%s\t", $1, $2, $3, $NF);
for(i=4; i<NF; i++){ printf("%s ", $i); }
printf("\n")
}' < yourlist.txt
For conformity, a regexp-fu solution:
$ perl -lne '/^ (.+?) \s+ (\d+-\d+) \s+ (\d{4}) \s+ (.+?) \s+ ( \$ ( \d+ (?:\.\d+)? ) ) \s* $/x and $t+=$6, print join "•",$1,$2,$3,$4,$5 }{ print $t' input_file
Kenya•563-45•1995•Heron Plover Thrush Gonolek Apalis•$6.60
Surinam•632-96•1982•Butterfliers•$7.50
Seychelles•831-34•2002•WWF Frogs set of 4•$1.40
Togo•1722-25•2010•Cheetah, Zebra, Antelope•$5.70
21.2
Expanding upon udslk's answer, awk is certainly your friend here:
#!/usr/bin/env awk -f
BEGIN {
print "country, \"catalog number\", year, description, \"price ($)\""
}
{
description = $4
for (f = 5; f < NF; ++f) {
description = description " " $f
}
price = substr($NF, 2)
total += price
printf "\"%s\", \"%s\", \"%s\", \"%s\", %0.2f\n", $1, $2, $3, description, price
}
END {
printf "Total, , , , %0.2f\n", total
}
This spits out a CSV file with headers, which you can import into your favourite spreadsheet. It also adds the total. Switch commas with tabs according to taste.

Resources