I have three functions in a Perl script to parse an Excel file to getthe desired output Excel file. I achieved the correct output, however I get an error
File error: data may have been lost
Likely the root cause is writing over the same Excel cell twice in the file.
How do I get rid of this error while maintaining the functions in the script?
Input file
A B
Apples Carrots
Oranges Broccoli
Grapes Spinach
Desire output file
A B
Apples Carrots
Oranges Broccoli
PEACHES ASPARAGUS
Perl code
use v5.10.0;
use warnings;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
use Spreadsheet::WriteExcel;
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $workbook_R = $parser->parse('C:\Perl\databases\Fruits_and_Veggies.xls');
my $workbook_W = Spreadsheet::WriteExcel->new('C:\Perl\databases\New_Fruits_and_Veggies.xls');
my $worksheet_W = $workbook_W->add_worksheet();
for our $worksheet_R ( $workbook_R->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet_R->row_range();
my ( $col_min, $col_max ) = $worksheet_R->col_range();
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
FruitStand();
VeggieStand();
ComboStand();
#------------------------------------------------------------------------------
# sub FruitStand - parsing: replace Grapes with PEACHES
#------------------------------------------------------------------------------
sub FruitStand {
# if the cell contains Grapes write 'PEACHES' instead
my $cell_grapes = $worksheet_R->get_cell( $row, $col );
if ( $cell_grapes->value() =~ /Grapes/ ) {
$worksheet_W->write($row, $col,"PEACHES");
}
}
#------------------------------------------------------------------------------
# sub VeggieStand - parsing: repalce Spinach with ASPARAGUS
#------------------------------------------------------------------------------
sub VeggieStand {
# if the cell contains Spinach write 'ASPARAGUS' instead
my $cell_veggies = $worksheet_R->get_cell( $row, $col );
# my $cell = $worksheet_R->get_cell( $row, $col );
if (/ $cell_veggies->value() =~ /Spinach/ ) {
$worksheet_W->write($row, $col,"ASPARAGUS");
}
}
#------------------------------------------------------------------------------
# Writing all fruits and veggies with the 2 changes (PEACHES and ASPARAGUS)
#------------------------------------------------------------------------------
sub ComboStand {
my $cell = $worksheet_R->get_cell( $row, $col );
$worksheet_W->write($row, $col, $cell->value()) ;
}
}
}
}
As you have said, the error message is due to a cell being written several times. You can get rid of the error message by ensuring that each cell is only written once. Since your three subroutines have very similar functionality, they can be combined into a single set of lines that does everything, and use an if / else cascade to decide which action should be taken.
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
my $cell = $worksheet_R->get_cell( $row, $col );
if($cell->value() =~ /Spinach/) {
$worksheet_W->write($row, $col,"ASPARAGUS");
}
elsif($cell->value() =~ /Grapes/) {
$worksheet_W->write($row, $col,"PEACHES");
}
else {
$worksheet_W->write($row, $col, $cell->value()) ;
}
}
}
If you have to keep the functions, I suggest something like this, where you have a function that takes the current cell and applies any appropriate transformations to it, and returns the text ready for output. This keeps the repetition of reading the cell and writing the cell out of the subroutines:
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
my $cell = $worksheet_R->get_cell( $row, $col );
$worksheet_w->write( $row, $col, produce_check($cell->value) );
}
}
And the produce_check sub that performs the swaps would also be a good place to do any text normalisations or other checks you might want to do on the input, e.g. removing extra whitespace, setting all the output to title case, etc.
sub produce_check {
my $prod = shift;
# maybe we have to make sure there's no trailing whitespace on $prod
$prod =~ s/\s*$//;
my %swaps = (
grapes => 'peaches',
spinach => 'asparagus',
tins => 'cans',
zuchini => 'zucchini'
);
# is $prod one of pieces of produce we have to swap?
# perhaps our input is in a mixture of cases, uppercase, lowercase, titlecase
# to avoid having to add all those variations to the %swaps hash, we convert
# to lowercase using `lc`
if ( $swaps{ lc($prod) } ) {
$prod = $swaps{ lc($prod) };
}
# this line uses `ucfirst($prod)` to convert all output to titlecase.
# You could also convert everything to lowercase ( `lc($prod)` ), to
# uppercase ( `uc($prod)` ), or just leave it as-is by using `return $prod;`
return ucfirst( $prod );
}
Related
My script is breaking when excel cell has double quotes in their value and quotes. I had to explicitly write a function to handle commas in OUTFILE. Is there any way I can provide cell value and automatically it can be encoded to CSV format.
example-
cell->value - Student GOT 8 MARKS in math, 7 in Spanish
Desired Correct CSV format-> "Student GOT 8 MARKS in math, 7 in Spanish".
cell->value - Student GOT 8 MARKS in "math", 7 in "Spanish"
Desired Correct CSV format-> "Student GOT 8 MARKS in ""math"", 7 in ""Spanish""".
I wrote my function to find COMMAS in cell value and if it exits then put the string in double-quotes. I wanted to avoid it in case there are any built functions of CSV writer.
#!/home/utils/perl-5.08
use Text::CSV_XS;
use Text::CSV;
use Excel::Writer::XLSX;
use Spreadsheet::ParseXLSX;
use CGI qw(:standard);
use DBI;
use DBD::CSV;
my $student_excel_file = "";
my $csv = "";
$student_excel_file='ABC.xlsm';
$csv = $student_excel_file;
$csv =~ s/.xlsx$/_22june_intermediate_xlsxtocsv.csv/;
$csv =~ s/.xlsm$/_22june_intermediate_xlsmtocsv.csv/;
my $parser_1 = Spreadsheet::ParseXLSX->new();
my $workbook_1 = $parser_1->parse($student_excel_file);
printf "$csv\n";
print "writing out the new csv file $csv given prvs xlsm file\n";
my $csv_1 = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n", sep_char => ',' });
open my $fh, ">:encoding(utf-8)", $csv or die "failed to create $csv: $!";
#open OUTFILE, "> $student_excel_out_csv_file" or die "ERROR: can't the student;'s CSV file:- $student_excel_out_csv_file.\n";
if ( !defined $workbook_1 )
{
die $parser_1->error(), ".\n";
}
my $worksheet_1=$workbook_1->worksheet(0);
my ( $row_min, $row_max ) = $worksheet_1->row_range();
my ( $col_min, $col_max ) = $worksheet_1->col_range();
printf("Copyig Sheet: %s from the provided PRVS \n", $worksheet_1->{Name});
my $reached_end_of_sheet = 0;
my $concurentEmptyLineCount = 0;
$col_max=65;
#$row_max=2;
my(#heading) = ("CodeA", "CodeB", "Name", "Count", "Pos", "Orientation");
$csv_1->print($fh, \#heading);
my(#datarow) = ("A", "B", "Abelone", 3, "(6,9)", "NW");
$csv_1->print($fh, \#datarow);
my(#datarow_1) = ("A", "B", "Abelone", 3, "WORKS - ""what"" - lets", "_2NW");
$csv_1->print($fh, \#datarow_1);
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
printf("Sheet: %s\n", $worksheet->{Name});
my $sheet_write = $excel_2->add_worksheet($worksheet->{Name});
# my $format = $sheet_write->add_format();
for my $row_1 ( 1 .. $row_max )
{
if($reached_end_of_sheet)
{
last;
}
for my $col_1 ( $col_min .. $col_max )
{
my $cell_1 = $worksheet_1->get_cell( $row_1, 0 );
next unless $cell_1;
$concurentEmptyLineCount=0;
my $cell_2 = $worksheet_1->get_cell( $row_1, $col_1);
my $cell2_value =$cell_2 -> {Val};
print $cell_2 -> {Val};
$csv_1->print ($fh, \$cell2_value );
# if(defined $cell2_value)
# {
# if($cell2_value=~ m/,/)
# {
# $cell2_value=qq("$cell2_value");
# }
# printf OUTFILE "%s,", $cell2_value;
# }
# else
# {
# printf OUTFILE ",";}
# }
my $cell_3 = $worksheet_1->get_cell( $row_1, 0 );
$concurentEmptyLineCount++;
if($concurentEmptyLineCount > 20)
{
$reached_end_of_sheet = 1;
}
next unless $cell_3;
#printf OUTFILE "\n";
$csv_1->print ($fh, "\n" );
}
#close OUTFILE;
close $fh;
exit;
You can use combine() to quote the fields. For example:
use feature qw(say);
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my #fields = (
q[Student GOT 8 MARKS in math, 7 in Spanish],
q[Student GOT 8 MARKS in "math", 7 in "Spanish"],
);
for my $field (#fields) {
my $success = $csv->combine($field);
if (!$success) {
die "Failed to quote field: " . $field;
}
say "Result: ", $csv->string();
}
Output:
Result: "Student GOT 8 MARKS in math, 7 in Spanish"
Result: "Student GOT 8 MARKS in ""math"", 7 in ""Spanish"""
I'm treating large blocks of all-caps text, converting them to mixed case and adding punctuation. However, there's a large set of words and names that I want to capitalize, such as days of the week, months, people, and so on.
Rather than using a giant glob of substitutions, is there a way to use an array or hash of properly capitalized terms somehow? So if I have a string "and then on monday, we should see bob and sue" to convert it to "and then on Monday, we should see Bob and Sue" if I have the terms stored in an array or hash?
Thanks!
use feature qw( fc ); # Alternatively, "lc" is "close enough".
my #to_capitalize = qw( Monday Bob Sue );
my #abbreviations = qw( USA );
my #exceptions = qw( iPhone );
my %words = (
( map { fc($_) => ucfirst(lc($_)) } #to_capitalize ),
( map { fc($_) => uc($_) } #abbreviations ),
( map { fc($_) => $_ } #exceptions ),
);
my $pat =
join '|',
#map quotemeta, # If this is needed, our use of \b is incorrect.
keys %words;
s/\b($pat)\b/$words{fc($1)}/ig;
Adjust as needed.
You could create a hash of words to transformations, then if you encounter those words apply the expected transformation:
use strict;
use warnings;
use feature qw(say);
my %transformations = (
monday => \&titlecase,
bob => \&titlecase,
sue => \&titlecase,
shout => \&uppercase,
);
while ( my $line = <$filehandle> ) {
chomp $line;
foreach my $word ( split /\s+/, $line ) {
if ( my $transform = $transformations{lc $word} ) {
$line =~ s/$word/$transform->($word)/ge;
}
}
say $line;
}
sub titlecase {
my ($s) = #_;
return ucfirst lc $s;
}
sub uppercase {
my ($s) = #_;
return uc $s;
}
Thanks for the great ideas. I ended up using the idea of a hash (for the first time!). This may not be very optimal but it gets the job done :)
%capitalize = (
"usa" => "USA",
"nfl" => "NFL",
);
$row = "I live in the usa and love me some nfl.";
foreach my $key (keys %capitalize) {
if ($row =~ $key) {
$row =~ s/$key/$capitalize{$key}/g;
}
}
print $row;
I am new to perl and having trouble with the spreadsheet excel parser module.
WHen I use $cell->value() on its own line, there appears to be no problem, however when I try to use this to insert the value into an array, it returns undef, code show below:
for my $row ($row_min .. $row_max) {
my #line;
for my $col ( $col_min .. $col_max) {
my $cell = $worksheet->get_cell( $row, $col );
$value = $cell->value;
push #line, $value if defined($cell);
print $cell->value;
print Dumper $line;
}
}
}
Here print $cell->value; returns the contents of the cell but print Dumper $line; returns undef
You have to push $cell->valuenot $value
push #line, $cell->value if defined($cell);
You should add use strict at the beginning of your program, so you get an error message in such a case.
I have the following code which parses an Excel file.. For every row, if a cell is substring of another cell on the same row, I want to delete this cell.
My data (in the .xls file) look like that:
Number1 Text1 Text2 Text3 ... TextN Number2 Number3 ... NumberN
Each number and each text is in a different cell. The number of numbers and text may vary per row.. I want to check if Text1 is a substring of Text2 or Text3 etc... similarly if Text3 is a substring of Text4 Text5 etc.. If they are substrings I want to delete these cells.
#!/usr/bin/perl -w
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use diagnostics;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse('test.xls');
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
my $test = $cell->value();
if (defined $test) {
my $cellValue = $cell->value();
print"The cell value is $cellValue \n";
} else {
print "Cell value is not defined \n";
}
#my $nextCell = $worksheet->get_cell( $row, $col+1 );
#if (index($nextCell->value(), $cell->value()) != -1) {
#print "$nextCell->value() contains $cell->value()\n";
#}
#next unless $cell;
}
}
}
I get an error Can't call method "value" on an undefined value at ... I believe it has to do with the fact that when the final cell in the row is found, the $cell->value function fails because the cell is empty.. I tried checking if the value if undefined so that I avoid processing this cell but I still get the same error.. How does Perl deal with empty cells ? How can I avoid getting this error? Thanks !
The error means $cell is undef when you call $cell->value.
If you simply want to skip empty cells, why not add
next unless $cell;
in your for my $col ( ... ) loop
Edit:
you could add
if( my $test = $cell->value() ){
$cell->delete if grep{ ( my $forward = $worksheet->get_cell( $row, $_ ) ) && ( $forward =~ /\Q$test\E/ } ( $col+1 .. $colMax );
}
Edit: This does not work ( I was not sure and could not test at the time ). Sorry.
Either declare a temporary variable $forward first, AND ( which was also wrong ) call ->value:
if( my $test = $cell->value() ){
my $forward;
$cell->delete if grep{ ( $forward = $worksheet->get_cell( $row, $_ )->value ) && ( $forward =~ /\Q$test\E/ } ( $col+1 .. $colMax );
}
Or, probably better, write it as a for loop (me was trying to be too smart for me own good)
for my $pos ( $col+1 .. $colMax ){
my $forward_cell = $worksheet->get_cell( $row, $pos );
if ( $forward_cell->value =~ /\Q$text/ ){
$cell->delete;
last;
}
}
This, elegantly, goes back to my earlier point: This seems inefficient
However, it might be more efficient to first get all the actually existing cells and then delete, again grepping for a following cell that matches text. Not sure you want to do /\Q$text\E/ or /^\Q$text\E/ ( string begins with $text ), and you might not need \Q ... \E since it only escapes special characters and is unnecessary if there are none.
in my code, I have a function which verifies if a cell from an worksheet contains valid values, meaning:
1. it contains something.
2. the contained value is not SPACE, TAB or ENTER.
when i am just checking the function within a print (to print the value returned by the function for a single cell), apparently, everything works fine.
when I am integrating the "print of the result" into a while loop, for checking a range of cells, it received the error message: Can't call method "value" on an undefined value at D:\test.pl
here is my code:
use strict;
use Spreadsheet::ParseXLSX;
my $parser = Spreadsheet::ParseXLSX->new();
sub check_cell($) {
my ( $worksheet, $a, $b ) = #_;
if ( $worksheet->get_cell( $a, $b )
or not ord( $worksheet->get_cell( $a, $b )->value() ) ~~ [ 32, 9, 10 ] )
{
return 1;
} else {
return 0;
}
}
my $workbook = $parser->parse('D:\test.pl data.xlsx');
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
my $worksheet = $workbook->worksheet(0);
my $i = 8;
while ( &check_cell( $worksheet, $i, 0 ) ) {
print &check_cell( $worksheet, $i, 0 ), "\n";
$i++;
}
if I remove the while and index increment, everything works fine.
Can anyone tell me why the error occurs in 1st case?
Thank you.
When you compare two strings for equallity, you have to use eq, not ==
while( check_cell($worksheet, $i, 0) eq "correct cell" ) {
#...
};
Also, it's more natural to return 0 or 1 in your check_cell sub, so you haven't to test the result in while loop:
sub check_cell {
# return 1 if it's ok, else 0
}
while( check_cell($worksheet, $i, 0) ) {
#...
};
And the smartmatch operator ~~ has been marked as experimental since Perl 5.18.0
When $worksheet->get_cell($a, $b) doesn't return a defined value, you can't call ->value on the undefined value it returned. Don't you want to use and instead of or in the condition?
in this case, it worked modifying the condition:
instead of this:
if ( $worksheet->get_cell( $a, $b )
or not ord( $worksheet->get_cell( $a, $b )->value() ) ~~ [ 32, 9, 10 ] )
{
return 1;
}
I used:
if ( $worksheet -> get_cell($a, $b) ) {
if ( not ord($worksheet -> get_cell($a, $b) -> value()) ~~ [32, 9, 10] ){
return 1;
}
}
and the scrit exits without any error.
Thank you guys for your help.