Convert CSV to Excel gives file corrupted error - excel

I converted multiple csv files into a excel spreadsheet using the perl script below which is basically a modified version of the code in the link, but i cannot open the output excel file, it gives a pop up message "The file is corrupted."
#!/usr/intel/bin/perl -W
use strict;
use Spreadsheet::WriteExcel::Big;
use Text::CSV_XS;
# Check for valid number of arguments
if (($#ARGV < 1) || ($#ARGV > 2)) {
die("Usage: csv2xls csvfile_dir xlsfile\n");
};
my $csvdir = $ARGV[0];
my $outfile = $ARGV[1];
# Create a new Excel workbook
my $workbook = Spreadsheet::WriteExcel::Big->new($outfile);
my $csvfile = "";
my $tab_title = "";
foreach $csvfile (glob("$csvdir/*.csv")) {
print "$csvfile\n";
# Open the Comma Separated Variable file
open (CSVFILE, $csvfile) or die "$csvfile: $!";
$csvfile =~ s/.*\///;
$tab_title = (split(/\./,$csvfile))[0];
print "-D- $tab_title\n";
# Create a new Excel worksheet
my $worksheet = $workbook->add_worksheet($tab_title);
# Create a new CSV parsing object
my $csv = Text::CSV_XS->new;
# Row and column are zero indexed
my $row = 0;
while (<CSVFILE>) {
if ($csv->parse($_)) {
my #Fld = $csv->fields;
print "-D- #Fld\n";
my $col = 0;
foreach my $token (#Fld) {
$worksheet->write($row, $col, $token);
$col++;
}
$row++;
} else {
my $err = $csv->error_input;
print "Text::CSV_XS parse() failed on argument: ", $err, "\n";
}
}
}
How can i fix it?

You need to close your workbook.
$workbook->close();
Also upgrade your Excel module to the latest version, I also faced the similar issue of corrupted excel files which was solved on updating Spreadsheet::WriteExcel.
Also from docs
Note about the requirement for binmode(). An Excel file is comprised
of binary data. Therefore, if you are using a filehandle you should
ensure that you binmode() it prior to passing it to new().You should
do this regardless of whether you are on a Windows platform or not.
This applies especially to users of perl 5.8 on systems where UTF-8 is
likely to be in operation such as RedHat Linux 9. If your program,
either intentionally or not, writes UTF-8 data to a filehandle that is
passed to new() it will corrupt the Excel file that is created.

Related

perl script to read an xlsx file(which has many sheets) using the sheet name

I am trying to write a perl script which reads an excel file(which has many sheets in it) using the sheet name.
I know how to access a particular sheet of the excel file using the sheet number, but not sure how to read it using sheet name.
Any help provided is highly appreciated.
Below is the code I wrote to access the sheet using sheet number:
my $Sheet_Number = 26;
my $workbook = ReadData("<path to the excel file>");
for (my $i =2; $i<$limit; $i++){
my $cell = "A" . $i;
my $key_1 = $workbook->[$Sheet_Number]{$cell};
}
Thanks
----Edit----
I want to open the particular sheet within the Excel file by using the sheet name. And then read the data from that particular sheet. The name of the sheet will be entered by the user while running the script from the command line arguments.
Below is the code that I am using after getting suggested answers for my earlier question:
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$path");
my $worksheet;
if ($worksheet->get_name() eq "$Sheet_Name"){
for (my $i =2; $i<$limit; $i++){
my $cell = $worksheet->get_cell($i,"A");
my $value = $cell->value();
push #array_keys, $value;
}
}
I want to read the values of Column A of the particular sheet and push it into an array.
$Sheet_Name : It is the name of the sheet which is entered by the user as cmd line arg.
$path : It is the complete path to the Excel file
Error Message: Can't call method "get_name" on an undefined value at perl_script.pl (The error points to the line where the if-condition is used.)
Thanks for the help.
-----EDIT----
Anyone, with any leads on this post, please post your answer or suggestions. Appreciate any responses.
Thanks
The get_name() method of the worksheet object, in conjunction with Perl's grep command should get you this:
my ($worksheet) = grep { $_->get_name() eq 'Sheet2' } $workbook->worksheets();
This would be an un-golfed version of the same:
my $worksheet;
foreach $worksheet ($workbook->worksheets()) {
last if $worksheet->get_name() eq 'Sheet2';
}
Assuming there is a match... if not, I guess my un-golfed version would give you the last worksheet if there was no match.
-- Edit --
I made assumptions and -- you certainly do need to first call the method to load the workbook:
use strict;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse('/var/tmp/foo.xls');
Then the code above should work.

Find text in Excel

I am writing a script, to automate taking reports from various SQL databases, which is currently ran once a month manually. So far I have a working prototype that will read an SQL database and will parse the information into an Excel file, save it, and then email it to someone.
What I want to be able to do is have another Excel file called emails.xlsx. This file will have three columns: Emails(A), Server(B), Database(C). I want to be able to search the file for anything in column 2 that is the same and grab the emails from those rows and put them in to a var.
#Import Email Details
$emailPath = "C:\temp\emails.xlsx"
$sheetName = "emails"
$workBook1 = $excel.Workbooks.Open($emailPath)
$worksheet = $workBook1.sheets.Item($sheetName)
$cell = 1;
$CcCheck = $worksheet.Range.("A1").Text;
FOREACH($dbase in $worksheet.Range("C1").EntireColumn)
{
DO{
$CcCheck = $worksheet.Range("A$cell").Text;
if($CcCheck -ne " ") {
$data = $worksheet.Range("C$cell").Text;
$server = $worksheet.Range("B$cell").Text;
$Cc += ", $CcCheck";
$cell++
}
} while($foreach.MoveNext() -eq $foreach.Current)
}
Write-Host " Loaded Server, $cell Emails and DB" -ForegroundColor "Green";
Write-Host "";
Import-Csv .\emails.csv -Header emails,server,database | Foreach-Object{
$dataSource = $_.server
$dataBase = $_.database
This fixed the problem by just referencing the headers after converting the excel file to a CSV.

search for a cell from one excel and search in another excel and print if its not there using perl

I'm new to perl. I have two excel files containing huge no of rows and just two columns. I want to get each cell from one of the excel files and search whether its there in another excel file or not. if its not then print that cell.
I believe that if I get each cell from one of the excel and search it in another and then run a for loop for all the rows it will be done.
I reached upto getting the cell from first excel but how to search whether it is there in the another excel and printing it is the issue.
can anybody help. ??
I'm not entirely sure what you want, but this might give you some ideas. It's completely untested, though.
use strict;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook1 = $parser->parse('Book1.xls');
if (!defined $workbook1) { die $parser->error(), ".\n"; }
my $workbook2 = $parser->parse('Book2.xls');
if (!defined $workbook2) { die $parser->error(), ".\n"; }
$worksheet1 = $workbook1->worksheet('Sheet1');
$worksheet2 = $workbook2->worksheet('Sheet1');
my ($row_min1, $row_max1) = $worksheet1->row_range();
my ($col_min1, $col_max1) = $worksheet1->col_range();
for my $row1 ($row_min1 .. $row_max1) {
for my $col1 ($col_min1 .. $col_max1) {
my $cell1 = $worksheet1->get_cell($row1, $col1);
my ($row_min2, $row_max2) = $worksheet2->row_range();
my ($col_min2, $col_max2) = $worksheet2->col_range();
my $found_match = 0;
for my $row2 ($row_min2 .. $row_max2) {
for my $col2 ($col_min2 .. $col_max2) {
my $cell2 = $worksheet2->get_cell($row2, $col2);
if ($cell1->value() eq $cell2->value()) { # or == ?
$found_match = 1;
break;
}
}
break if $found_match;
}
if (!$found_match) {
print $cell1->value, "\n";
}
}
}
This is mostly from here: http://search.cpan.org/dist/Spreadsheet-ParseExcel/lib/Spreadsheet/ParseExcel.pm

Deleting rows in already existing excel file by using perl

guys,
Iam trying to delete used rows in already existing excel file
i tried with below code
use strict;
use warnings;
use Win32::OLE;
my $xl = Win32::OLE->new('Excel.Application');
$xl->{Visible} = 0;
my $nShtsOld = $xl->{SheetsInOldWorkbook};
$xl->{SheetsInOldWorkbook} = 1;
my $wb = $xl->Workbooks->Open('C:\Users\u304079\Desktop\Test_S1_Legacy_US.xlsx');
$xl->{SheetsInOldWorkbook} = $nShtsOld;
my $sht = $wb->Sheets(o);
my $end = $sht->Usedrange->Row->Count;
print $end;
for (my $count = $end; 0 < $count; $count--)
{
my $cell = $sht->{Cells};
if (!defined $cell->{Value})
{
$cell->entireRow->delete;
}
}
# save and exit
$xl->SaveAs('C:\Users\u304079\Desktop\Test_S1_Legacy_US.xlsx');
$xl->close();
but iam unable to do with following code getting error message as
Can't call method "Usedrange" on an undefined value"
It looks like my $sht = $wb->Sheets(o); may be a typo. Should that be a zero 0 instead?

Using Spreadsheet::ParseExcel in Perl, but need help

I have a Perl program using Spreadsheet::ParseExcel. However, there are two difficulties that have arisen that I have been unable to figure out how to solve. The script for the program is as follows:
#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use WordNet::Similarity::lesk;
use WordNet::QueryData;
my $wn = WordNet::QueryData->new();
my $lesk = WordNet::Similarity::lesk->new($wn);
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse ( 'input.xls' );
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
WORKSHEET:
for my $worksheet ( $workbook->worksheets() ) {
my $sheetname = $worksheet->get_name();
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
my $target_col;
my $response_col;
# Skip worksheet if it doesn't contain data
if ( $row_min > $row_max ) {
warn "\tWorksheet $sheetname doesn't contain data. \n";
next WORKSHEET;
}
# Check for column headers
COLUMN:
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row_min, $col );
next COLUMN unless $cell;
$target_col = $col if $cell->value() eq 'Target';
$response_col = $col if $cell->value() eq 'Response';
}
if ( defined $target_col && defined $response_col ) {
ROW:
for my $row ( $row_min + 1 .. $row_max ) {
my $target_cell = $worksheet->get_cell( $row, $target_col);
my $response_cell = $worksheet->get_cell( $row, $response_col);
if ( defined $target_cell && defined $response_cell ) {
my $target = $target_cell->value();
my $response = $response_cell->value();
my $value = $lesk->getRelatedness( $target, $response );
print "Worksheet = $sheetname\n";
print "Row = $row\n";
print "Target = $target\n";
print "Response = $response\n";
print "Relatedness = $value\n";
}
else {
warn "\tWroksheet $sheetname, Row = $row doesn't contain target and response data.\n";
next ROW;
}
}
}
else {
warn "\tWorksheet $sheetname: Didn't find Target and Response headings.\n";
next WORKSHEET;
}
}
So, my two problems:
First of all, sometimes the program returns the error "No Excel data found in file," even though the data is there. Each Excel file is formatted the same way. There is only one sheet, with the A and B columns labelled 'Target' and 'Response,' respectively, with a list of words beneath them. However, it does not ALWAYS return this error. It works for one Excel file, but it does not work for a different one, even though both are formatted the exact same way (and yes, they are both the same file type, as well). I cannot find any reason for it to not read the second file, because it is identical to the first. The only difference is that the second file was created using an Excel macro; however, why would that matter? The file types and format are exactly the same.
Second, the variables '$target' and '$response' need to be formatted as strings in order for the 'my $value' expression to work. How do I convert them into string format? The value assigned to each variable is a word from the appropriate cell of the Excel spreadsheet. I don't know what format that is (and there is no apparent way in Perl for me to check).
Any suggestions?
In relation to your first question, the "no data found" error indicates some problem with the file format. I've seen this error with pseudo-Excel files such as Html or CSV files that have an xls extension. I've also seen this error with mal-formed files generated by third party apps.
You could do an initial verification of the files by doing a hexdump/xxd dump of a working and non working file and seeing if the overall structure is approximately the same (for example if it has similar magic numbers at the start and isn't Html).
It could also be an issue with Spreadsheet::ParseExcel. I am the maintainer of that module. If you like you could send me on a "good" and "bad" file, at the email address in the docs, and I will have a look at them.
First of all, if you are getting "no data found" you can thank proprietary Excel data file formats and the inability of even a good Perl library to extract information from them.
I strongly suggest that you export the Excel data in something easily parsed like CSV especially given the simple nature of the data layout you described. There may be a way to get Excel to process a batch but I have no idea. A quick search yielded a tool to use OpenOffice to do batch conversion.
The rest of your question is rather moot once you accept that Excel data files will not play nicely.
I wrote this code after a client couldn't decide whether the XLS he was sending every week was really in XLS format or just CSV.... HTH!
sub testForXLS ()
{
my ( $FileName ) = #_;
my $signature = '';
my $XLSsignature = 'D0CF11E0A1B11AE10000';
open(FILE, "<$FileName")||die;
read(FILE, $buffer, 10, 0);
close(FILE);
foreach (split(//, $buffer))
{ $signature .= sprintf("%02x", ord($_)); }
$signature =~ tr/a-z/A-Z/;
if ( $signature eq $XLSsignature )
{ return 1; } else { return 0; }
}

Resources