Using Spreadsheet::ParseExcel in Perl, but need help - string

I have a Perl program using Spreadsheet::ParseExcel. However, there are two difficulties that have arisen that I have been unable to figure out how to solve. The script for the program is as follows:
#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use WordNet::Similarity::lesk;
use WordNet::QueryData;
my $wn = WordNet::QueryData->new();
my $lesk = WordNet::Similarity::lesk->new($wn);
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse ( 'input.xls' );
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
WORKSHEET:
for my $worksheet ( $workbook->worksheets() ) {
my $sheetname = $worksheet->get_name();
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
my $target_col;
my $response_col;
# Skip worksheet if it doesn't contain data
if ( $row_min > $row_max ) {
warn "\tWorksheet $sheetname doesn't contain data. \n";
next WORKSHEET;
}
# Check for column headers
COLUMN:
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row_min, $col );
next COLUMN unless $cell;
$target_col = $col if $cell->value() eq 'Target';
$response_col = $col if $cell->value() eq 'Response';
}
if ( defined $target_col && defined $response_col ) {
ROW:
for my $row ( $row_min + 1 .. $row_max ) {
my $target_cell = $worksheet->get_cell( $row, $target_col);
my $response_cell = $worksheet->get_cell( $row, $response_col);
if ( defined $target_cell && defined $response_cell ) {
my $target = $target_cell->value();
my $response = $response_cell->value();
my $value = $lesk->getRelatedness( $target, $response );
print "Worksheet = $sheetname\n";
print "Row = $row\n";
print "Target = $target\n";
print "Response = $response\n";
print "Relatedness = $value\n";
}
else {
warn "\tWroksheet $sheetname, Row = $row doesn't contain target and response data.\n";
next ROW;
}
}
}
else {
warn "\tWorksheet $sheetname: Didn't find Target and Response headings.\n";
next WORKSHEET;
}
}
So, my two problems:
First of all, sometimes the program returns the error "No Excel data found in file," even though the data is there. Each Excel file is formatted the same way. There is only one sheet, with the A and B columns labelled 'Target' and 'Response,' respectively, with a list of words beneath them. However, it does not ALWAYS return this error. It works for one Excel file, but it does not work for a different one, even though both are formatted the exact same way (and yes, they are both the same file type, as well). I cannot find any reason for it to not read the second file, because it is identical to the first. The only difference is that the second file was created using an Excel macro; however, why would that matter? The file types and format are exactly the same.
Second, the variables '$target' and '$response' need to be formatted as strings in order for the 'my $value' expression to work. How do I convert them into string format? The value assigned to each variable is a word from the appropriate cell of the Excel spreadsheet. I don't know what format that is (and there is no apparent way in Perl for me to check).
Any suggestions?

In relation to your first question, the "no data found" error indicates some problem with the file format. I've seen this error with pseudo-Excel files such as Html or CSV files that have an xls extension. I've also seen this error with mal-formed files generated by third party apps.
You could do an initial verification of the files by doing a hexdump/xxd dump of a working and non working file and seeing if the overall structure is approximately the same (for example if it has similar magic numbers at the start and isn't Html).
It could also be an issue with Spreadsheet::ParseExcel. I am the maintainer of that module. If you like you could send me on a "good" and "bad" file, at the email address in the docs, and I will have a look at them.

First of all, if you are getting "no data found" you can thank proprietary Excel data file formats and the inability of even a good Perl library to extract information from them.
I strongly suggest that you export the Excel data in something easily parsed like CSV especially given the simple nature of the data layout you described. There may be a way to get Excel to process a batch but I have no idea. A quick search yielded a tool to use OpenOffice to do batch conversion.
The rest of your question is rather moot once you accept that Excel data files will not play nicely.

I wrote this code after a client couldn't decide whether the XLS he was sending every week was really in XLS format or just CSV.... HTH!
sub testForXLS ()
{
my ( $FileName ) = #_;
my $signature = '';
my $XLSsignature = 'D0CF11E0A1B11AE10000';
open(FILE, "<$FileName")||die;
read(FILE, $buffer, 10, 0);
close(FILE);
foreach (split(//, $buffer))
{ $signature .= sprintf("%02x", ord($_)); }
$signature =~ tr/a-z/A-Z/;
if ( $signature eq $XLSsignature )
{ return 1; } else { return 0; }
}

Related

How to push SQL Query from Console Output into an Excel File?

I'm new at coding and so here.
Right now I'm creating an perl script, which automatically creates an excel file with the output of an SQL Query.
SQL Query:
init_db_connections();
my #row;
my $curHnd = INV::DBI::execute('----'.':------') or die $INV::DBI::errstr;
while ($row[0] = $curHnd->fetchrow_hashref()) {
printf("Row1: >%s<\n", $row[0]{Row1}),
printf("Row2: >%s<\n", $row[0]{Row2}),
printf("Row3: >%s<\n", $row[0]{Row3})
}
exit 0;
sub init_db_connections {
INV::DBI::init({
------ => '--------',
------- => q{select Row1, Row2, Row3
from table1
}
Create the Excel:
my $workbook = Excel::Writer::XLSX->new( 'perl.xlsx' );
my $worksheet = $workbook->add_worksheet();
my $format = $workbook->add_format();
$format->set_bold();
$format->set_color( 'black' );
$format->set_underline;
my $col = my $row = 0;
$worksheet->write( $row, $col, 'SQL Report', $format );
$workbook->close();
My Problem is now that i don't know how i can combine these two, so that the Query gets automatically pushed into the Excel.
Any Ideas would be great.
I think your problem is with the hashref and dereferencing it properly. It's a subtle mistake easily made when just starting out.
my $hashref;
while ( $hashref = $curHnd->fetchrow_hashref() ) {
printf("Row1: >%s<\n", $hashref->{Row1}),
printf("Row2: >%s<\n", $hashref->{Row2}),
printf("Row3: >%s<\n", $hashref->{Row3})
}
A hashref is a reference to a hash and they are sweet, especially when you use postfix dereferencing like I've done in the example.
The $hash{row1} code you were using is for accessing the value for row1 in %hash. (and for completeness, the old way of dereferencing a hashref would be ${$hashref}{row1} )
You don't really need #row array there. You were only ever assigning to the first element $row[0], so why not just use a scalar.
As for writing out to Excel, I think you'll be using the write method inside the while loop and incrementing the row counter with $row++ .
If you're going to be doing a lot of DBI coding, pick up a copy of Programming the Perl DBI by Descartes and Bunce for chapters 4 and 5. Old but still incredibly useful. (still got mine)

Converting XLSM file to CSV format in Perl using "$csv->print" function?

Does function CSV->PRINT only accept array reference? I am pushing $cell2_value in an array and then printing the array (i.e rows), it would be nice if I can directly write $cell2_value into an opened CSV file.
Things to take care of -
Excel cell having commas in their value would print in double-quotes.
Excel cells having "keyword" in their value would print the whole-cell value in double-quotes and inner quotes will change to ""keyword"".
 
I can write a CSV file with a few undesired outputs of excel cells. its inserting double quotes whenever it sees special characters like / or *.
CSV FILE from the below code:-
"CLASS_A,,x,Singapore,,0xABCF00C4,"/* x2-4Rw */",-,,,,,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CHECK- ""CLASS_B""- WORKED",1,2,3"
"CLASS_A,,,malyaisa,," 3:0","/* ABCVF */",E,,,,,,,Yes,,,,,,,,,,,,,Yes,,,,,,Yes,Yes,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
Desired output
CLASS_A,,x,malyaisa,,0xABCF00C4,/* x2-4Rw */,-,,,,,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CHECK- ""CLASS_B""- WORKED",1,2,3
CLASS_A,,,malyaisa,, 3:0,/* ABCVF */,E,,,,,,,Yes,,,,,,,,,,,,,Yes,,,,,,Yes,Yes,Yes,,,,,,,,,,,,,,,,,,,
Why it's inserting quotes around each line? Is there any way to remove it?
sub Excel_to_CSV
{
($student_excel_file) = #_;
if($student_excel_file ne "")
{
$student_excel_out_csv_file = $student_excel_file;
$student_excel_out_csv_file =~ s/.xlsm$/_new.csv/;
my $parser_1 = Spreadsheet::ParseXLSX->new();
my $workbook = $parser_1->parse($student_excel_file);
my $csv_1 = Text::CSV->new ({ binary => 1, auto_diag => 1, sep_char => ',' });
open my $fh, ">:encoding(utf-8)", $student_excel_out_csv_file or die "failed to create $student_excel_out_csv_file: $!";
if ( !defined $workbook )
{
die $parser_1->error(), ".\n";
}
my $worksheet=$workbook_->worksheet(0);
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
printf("Copyig Sheet: %s from the provided student \n", $worksheet->{Name});
my $concurentEmptyLineCount = 0;
for my $row_1 ( $row_min .. $row_max )
{
my #row_elements_array;
for my $col_1 ( $col_min .. $col_max )
{
my $cell_1 = $worksheet->get_cell( $row_1, 0 );
next unless $cell_1;
$concurentEmptyLineCount=0;
my $cell_2 = $worksheet->get_cell( $row_1, $col_1);
my $cell2_value =$cell_2 -> {Val};
if(defined $cell2_value)
{
push(#row_elements_array, $cell2_value);
}
else
{
my $blank="";
push(#row_elements_array, $blank);
}
}
my $next_line="\n";
push(#row_elements_array, $next_line);
my #temp_row_elements_array= #row_elements_array;
$csv_1->print($fh, \#temp_row_elements_array);
}
close $fh;
}
return $student_excel_out_csv_file;
}
Based on my understanding of your requirements, one solution is to remove the extraneous double-quotes from $cell2_value before pushing it into #row_elements_array. For example:
$cell2_value =~ s/"(")*/$1/g;
push(#row_elements_array, $cell2_value);

add chart to an existing excel using perl

I am new to perl.
i have an excel sheet with lot of data.. I need to update it and create a graph based on the data..using perl.
i am succeded in updating an existing excel..
now adding chart to it is not happening
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
use Spreadsheet::WriteExcel;
# Open an existing file with SaveParser
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $template = $parser->Parse('MyExcel.xls');
my $worksheet = $template->worksheet('Firstsheet');
my $chart = $template->add_chart( type => 'line' );
$chart->add_series(
categories => '=URV!$A$17:$A$442',
values => '=URV!$D$17:$D$442',
name => 'pended graph',
);
This is not working.
Can't call method "add_chart" on an undefined value at charts4.ps line 20
Please help me with a sample working code..
Want to know whats the problem here.
add_chart() is one of the WORKBOOK METHODS. Try code like this:
use Spreadsheet::WriteExcel;
my $workbook = Spreadsheet::WriteExcel->new('perl.xls');
$worksheet = $workbook->add_worksheet();
$worksheet->write('A1', 'Hi Chart!');
my $chart = $workbook->add_chart( type => 'line', embedded => 1, name => 'pended graph' );
# Insert the chart into the a worksheet.
$worksheet->insert_chart( 'E2', $chart );
Update
The problem is that excel is very hard to update with perl.
An Excel file is a binary file within a binary file. It contains
several interlinked checksums and changing even one byte can cause it
to become corrupted.
As such you cannot simply append or update an Excel file. The only way
to achieve this is to read the entire file into memory, make the
required changes or additions and then write the file out again.
Spreadsheet::ParseExcel will read in existing excel files:
my $parser = Spreadsheet::ParseExcel->new();
# $workbook is a Spreadsheet::ParseExcel::Workbook object
my $workbook = $parser->Parse('blablabla.xls');
What you really want is Spreadsheet::ParseExcel::SaveParser, which is a combination of Spreadsheet::ParseExcel and Spreadsheet::WriteExcel.
Here is an example.
Summing it up, I would suggest you to read the excel data in and then try either of the following:
Create another xls file and use the Spreadsheet::WriteExcel::Chart
library.
Create a xlsx file and use the Excel::Writer::XLSX::Chart library.
Another close option would be to read the excel in with
Spreadsheet::ParseExcel::SaveParser and then add the chart and save
it, but with this module all original charts are lost.
If you are on a Windows machine you may try to use Win32::OLE.
Here is the example from Win32::OLE's own documentation:
use Win32::OLE;
# use existing instance if Excel is already running
eval {$ex = Win32::OLE->GetActiveObject('Excel.Application')};
die "Excel not installed" if $#;
unless (defined $ex) {
$ex = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;})
or die "Oops, cannot start Excel";
}
# get a new workbook
$book = $ex->Workbooks->Add;
# write to a particular cell
$sheet = $book->Worksheets(1);
$sheet->Cells(1,1)->{Value} = "foo";
# write a 2 rows by 3 columns range
$sheet->Range("A8:C9")->{Value} = [[ undef, 'Xyzzy', 'Plugh' ],
[ 42, 'Perl', 3.1415 ]];
# print "XyzzyPerl"
$array = $sheet->Range("A8:C9")->{Value};
for (#$array) {
for (#$_) {
print defined($_) ? "$_|" : "<undef>|";
}
print "\n";
}
# save and exit
$book->SaveAs( 'test.xls' );
undef $book;
undef $ex;
UPDATE#2
Here is an example code:
use strict;
use Spreadsheet::WriteExcel;
my $workbook = Spreadsheet::WriteExcel->new( 'chart_column.xls' );
my $worksheet = $workbook->add_worksheet();
my $bold = $workbook->add_format( bold => 1 );
# Add the worksheet data that the charts will refer to.
my $headings = [ 'Category', 'Values 1', 'Values 2' ];
my $data = [
[ 2, 3, 4, 5, 6, 7 ],
[ 1, 4, 5, 2, 1, 5 ],
[ 3, 6, 7, 5, 4, 3 ],
];
$worksheet->write( 'A1', $headings, $bold );
$worksheet->write( 'A2', $data );
###############################################################################
#
# Example 1. A minimal chart.
#
my $chart1 = $workbook->add_chart( type => 'column', embedded => 1 );
# Add values only. Use the default categories.
$chart1->add_series( values => '=Sheet1!$B$2:$B$7' );
# Insert the chart into the main worksheet.
$worksheet->insert_chart( 'E2', $chart1 );
###############################################################################
#
# Example 2. One more chart
#
my $chart2 = $workbook->add_chart( type => 'column', embedded => 1 );
# Configure the chart. # change the categories if required change the values as required
$chart2->add_series(
categories => '=Sheet1!$A$4:$A$7',
values => '=Sheet1!$B$4:$B$7',
);
$worksheet->insert_chart( 'N1', $chart2, 3, 3 );
Also,
If you don't mind xlsx over xls, you may use Excel::Writer::XLSX. It is more actively maintained.
The trick to be able to parse and use at the same time the functions inside the WriteExcel module is to use the the use Spreadsheet::ParseExcel::SaveParser; module.
Below i have an example. The example will not use the chart functions but the problem you have is not on how to use the chart functions of WriteExcel module but on how to parse an existing excel file and then use that parsed information with the WriteExcel modul (which is originally thought only for NEW excel files).
if ( ( -f $excel_file_name ) && ( ( stat $excel_file_name )[7] > 0 ) ) {
#PARSE EXCEL
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
# Open the template with SaveParser
my $parser = new Spreadsheet::ParseExcel::SaveParser;
my $template = $parser->Parse("$excel_file_name");
my $sheet = 0;
my $row = 0;
my $col = 0;
if ( !defined $template ) {
die $parser->error(), " Perlline:", __LINE__, " \n "; #probably the file is already open by your GUI
}
# Get the format from specific cell
my $format = $template->{Worksheet}[$sheet]->{Cells}[$row][$col]->{FormatNo};
# Add a new worksheet
#for my $worksheet ( $template->worksheets() ) {
my $worksheet_parser = $template->worksheet("$metrict_data_worksheet_name");
my ( $row_min, $row_max ) = $worksheet_parser->row_range();
my ( $col_min, $col_max ) = $worksheet_parser->col_range();
my #row_array_value;
for my $row ( 1 .. $row_max ) { #avoid header start from 1
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet_parser->get_cell( $row, $col );
next unless $cell;
#print "Row, Col = ($row, $col)\n";
#print "Value = ", $cell->value(), "\n";
#print "Unformatted = ", $cell->unformatted(), "\n";
#print "\n";
push( #row_array_value, $cell->value() );
} #end header column loops for one regression
} #end row loop all lines
#}
# The SaveParser SaveAs() method returns a reference to a
# Spreadsheet::WriteExcel object. If you wish you can then
# use this to access any of the methods that aren't
# available from the SaveParser object. If you don't need
# to do this just use SaveAs().
#
my $workbook;
{
# SaveAs generates a lot of harmless warnings about unset
# Worksheet properties. You can ignore them if you wish.
local $^W = 0;
# Rewrite the file or save as a new file
my $check_if_possible2write = Spreadsheet::WriteExcel->new($excel_file_name);
if ( defined $check_if_possible2write ) { #if not possible it will be undef
$workbook = $template->SaveAs("$excel_file_name");#IMPORTANT this is of type WriteExcel and not ParseExcel
}
else {
print "Not possible to write the Excel file :$excel_file_name, another user may have the file open. Aborting... ", __LINE__, " \n ";
exit;
}
}
#####################FROM HERE YOU CAN USE AGAIN use Spreadsheet::WriteExcel; ####################
use Spreadsheet::WriteExcel;
my $worksheet = $workbook->sheets("$metrict_data_worksheet_name");
my $column_header_count = 0;
foreach my $name ( sort { lc $a cmp lc $b } keys %merged_all_metrics ) {
$worksheet->write( $row_max + 1, $column_header_count, "$merged_all_metrics{$name}" ); #row,col start
$column_header_count++;
}
$worksheet->set_column( 'A:L', 50, undef, 0, 1, 0 ); #grouping #comp_src group
$worksheet->set_column( 'N:R', 50, undef, 0, 1, 0 ); #grouping
$workbook->close() or die "Error closing file: $!"; #CLOSE
}
The important part of the code is what happens after the comment line:
#####################FROM HERE YOU CAN USE AGAIN use Spreadsheet::WriteExcel; ####################
After that point you will see that you have a $workbook handler. This variable has all the information parsed and more important is that it is from type WriteExcel Object so you will have all the methods of this module available.
Important Notice. The parser is not able to parse charts and formulas (only values), therefore you will have to write then again on each parse->write loop.

Delete entire row using Spreadsheet::ParseXLSX

I'm new to spreadsheet parsers in general, and can't find much info on CPAN other than a basic introduction of the main features.
I'm trying to read in a .xlsx file and delete an entire row if column 2 exists in a hash that I'm filtering against.
Then I want to print out the an edited file, also in .xlxs
This is what I can find from CPAN for Spreadsheet::ParseExcel
use strict;
use warnings;
use Spreadsheet::ParseXLSX;
my $parser = Spreadsheet::ParseXLSX->new;
my $workbook = $parser->parse("file.xlsx");
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
# Here I want to delete an entire row if a column 2 of that row matches a value
# sudo code:
# delete 'row' if 'row column 2' exists $hash{$key}
# And then print out the edited .xlsx file
}
}
}
Can anyone give me some pointers?
Is Spreadsheet::ParseExcel the right module to use for this?
Spreadsheet::ParseXLSX is just for reading spreadsheets. It doesn't have facilities for updating and saving data from Perl to an Excel spreadsheet.
Then there are modules like Spreadsheet::WriteExcel and Excel::Writer::XLSX that can write spreadsheets but can't read them.
But put them together in the same script? Stand back and watch the magic happen.

search for a cell from one excel and search in another excel and print if its not there using perl

I'm new to perl. I have two excel files containing huge no of rows and just two columns. I want to get each cell from one of the excel files and search whether its there in another excel file or not. if its not then print that cell.
I believe that if I get each cell from one of the excel and search it in another and then run a for loop for all the rows it will be done.
I reached upto getting the cell from first excel but how to search whether it is there in the another excel and printing it is the issue.
can anybody help. ??
I'm not entirely sure what you want, but this might give you some ideas. It's completely untested, though.
use strict;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook1 = $parser->parse('Book1.xls');
if (!defined $workbook1) { die $parser->error(), ".\n"; }
my $workbook2 = $parser->parse('Book2.xls');
if (!defined $workbook2) { die $parser->error(), ".\n"; }
$worksheet1 = $workbook1->worksheet('Sheet1');
$worksheet2 = $workbook2->worksheet('Sheet1');
my ($row_min1, $row_max1) = $worksheet1->row_range();
my ($col_min1, $col_max1) = $worksheet1->col_range();
for my $row1 ($row_min1 .. $row_max1) {
for my $col1 ($col_min1 .. $col_max1) {
my $cell1 = $worksheet1->get_cell($row1, $col1);
my ($row_min2, $row_max2) = $worksheet2->row_range();
my ($col_min2, $col_max2) = $worksheet2->col_range();
my $found_match = 0;
for my $row2 ($row_min2 .. $row_max2) {
for my $col2 ($col_min2 .. $col_max2) {
my $cell2 = $worksheet2->get_cell($row2, $col2);
if ($cell1->value() eq $cell2->value()) { # or == ?
$found_match = 1;
break;
}
}
break if $found_match;
}
if (!$found_match) {
print $cell1->value, "\n";
}
}
}
This is mostly from here: http://search.cpan.org/dist/Spreadsheet-ParseExcel/lib/Spreadsheet/ParseExcel.pm

Resources