why do I receive the errror - excel

in my code, I have a function which verifies if a cell from an worksheet contains valid values, meaning:
1. it contains something.
2. the contained value is not SPACE, TAB or ENTER.
when i am just checking the function within a print (to print the value returned by the function for a single cell), apparently, everything works fine.
when I am integrating the "print of the result" into a while loop, for checking a range of cells, it received the error message: Can't call method "value" on an undefined value at D:\test.pl
here is my code:
use strict;
use Spreadsheet::ParseXLSX;
my $parser = Spreadsheet::ParseXLSX->new();
sub check_cell($) {
my ( $worksheet, $a, $b ) = #_;
if ( $worksheet->get_cell( $a, $b )
or not ord( $worksheet->get_cell( $a, $b )->value() ) ~~ [ 32, 9, 10 ] )
{
return 1;
} else {
return 0;
}
}
my $workbook = $parser->parse('D:\test.pl data.xlsx');
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
my $worksheet = $workbook->worksheet(0);
my $i = 8;
while ( &check_cell( $worksheet, $i, 0 ) ) {
print &check_cell( $worksheet, $i, 0 ), "\n";
$i++;
}
if I remove the while and index increment, everything works fine.
Can anyone tell me why the error occurs in 1st case?
Thank you.

When you compare two strings for equallity, you have to use eq, not ==
while( check_cell($worksheet, $i, 0) eq "correct cell" ) {
#...
};
Also, it's more natural to return 0 or 1 in your check_cell sub, so you haven't to test the result in while loop:
sub check_cell {
# return 1 if it's ok, else 0
}
while( check_cell($worksheet, $i, 0) ) {
#...
};
And the smartmatch operator ~~ has been marked as experimental since Perl 5.18.0

When $worksheet->get_cell($a, $b) doesn't return a defined value, you can't call ->value on the undefined value it returned. Don't you want to use and instead of or in the condition?

in this case, it worked modifying the condition:
instead of this:
if ( $worksheet->get_cell( $a, $b )
or not ord( $worksheet->get_cell( $a, $b )->value() ) ~~ [ 32, 9, 10 ] )
{
return 1;
}
I used:
if ( $worksheet -> get_cell($a, $b) ) {
if ( not ord($worksheet -> get_cell($a, $b) -> value()) ~~ [32, 9, 10] ){
return 1;
}
}
and the scrit exits without any error.
Thank you guys for your help.

Related

How to automatically change the format or encode the excel cell value while writing it to CSV file

My script is breaking when excel cell has double quotes in their value and quotes. I had to explicitly write a function to handle commas in OUTFILE. Is there any way I can provide cell value and automatically it can be encoded to CSV format.
example-
cell->value - Student GOT 8 MARKS in math, 7 in Spanish
Desired Correct CSV format-> "Student GOT 8 MARKS in math, 7 in Spanish".
cell->value - Student GOT 8 MARKS in "math", 7 in "Spanish"
Desired Correct CSV format-> "Student GOT 8 MARKS in ""math"", 7 in ""Spanish""".
I wrote my function to find COMMAS in cell value and if it exits then put the string in double-quotes. I wanted to avoid it in case there are any built functions of CSV writer.
#!/home/utils/perl-5.08
use Text::CSV_XS;
use Text::CSV;
use Excel::Writer::XLSX;
use Spreadsheet::ParseXLSX;
use CGI qw(:standard);
use DBI;
use DBD::CSV;
my $student_excel_file = "";
my $csv = "";
$student_excel_file='ABC.xlsm';
$csv = $student_excel_file;
$csv =~ s/.xlsx$/_22june_intermediate_xlsxtocsv.csv/;
$csv =~ s/.xlsm$/_22june_intermediate_xlsmtocsv.csv/;
my $parser_1 = Spreadsheet::ParseXLSX->new();
my $workbook_1 = $parser_1->parse($student_excel_file);
printf "$csv\n";
print "writing out the new csv file $csv given prvs xlsm file\n";
my $csv_1 = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n", sep_char => ',' });
open my $fh, ">:encoding(utf-8)", $csv or die "failed to create $csv: $!";
#open OUTFILE, "> $student_excel_out_csv_file" or die "ERROR: can't the student;'s CSV file:- $student_excel_out_csv_file.\n";
if ( !defined $workbook_1 )
{
die $parser_1->error(), ".\n";
}
my $worksheet_1=$workbook_1->worksheet(0);
my ( $row_min, $row_max ) = $worksheet_1->row_range();
my ( $col_min, $col_max ) = $worksheet_1->col_range();
printf("Copyig Sheet: %s from the provided PRVS \n", $worksheet_1->{Name});
my $reached_end_of_sheet = 0;
my $concurentEmptyLineCount = 0;
$col_max=65;
#$row_max=2;
my(#heading) = ("CodeA", "CodeB", "Name", "Count", "Pos", "Orientation");
$csv_1->print($fh, \#heading);
my(#datarow) = ("A", "B", "Abelone", 3, "(6,9)", "NW");
$csv_1->print($fh, \#datarow);
my(#datarow_1) = ("A", "B", "Abelone", 3, "WORKS - ""what"" - lets", "_2NW");
$csv_1->print($fh, \#datarow_1);
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
printf("Sheet: %s\n", $worksheet->{Name});
my $sheet_write = $excel_2->add_worksheet($worksheet->{Name});
# my $format = $sheet_write->add_format();
for my $row_1 ( 1 .. $row_max )
{
if($reached_end_of_sheet)
{
last;
}
for my $col_1 ( $col_min .. $col_max )
{
my $cell_1 = $worksheet_1->get_cell( $row_1, 0 );
next unless $cell_1;
$concurentEmptyLineCount=0;
my $cell_2 = $worksheet_1->get_cell( $row_1, $col_1);
my $cell2_value =$cell_2 -> {Val};
print $cell_2 -> {Val};
$csv_1->print ($fh, \$cell2_value );
# if(defined $cell2_value)
# {
# if($cell2_value=~ m/,/)
# {
# $cell2_value=qq("$cell2_value");
# }
# printf OUTFILE "%s,", $cell2_value;
# }
# else
# {
# printf OUTFILE ",";}
# }
my $cell_3 = $worksheet_1->get_cell( $row_1, 0 );
$concurentEmptyLineCount++;
if($concurentEmptyLineCount > 20)
{
$reached_end_of_sheet = 1;
}
next unless $cell_3;
#printf OUTFILE "\n";
$csv_1->print ($fh, "\n" );
}
#close OUTFILE;
close $fh;
exit;
You can use combine() to quote the fields. For example:
use feature qw(say);
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my #fields = (
q[Student GOT 8 MARKS in math, 7 in Spanish],
q[Student GOT 8 MARKS in "math", 7 in "Spanish"],
);
for my $field (#fields) {
my $success = $csv->combine($field);
if (!$success) {
die "Failed to quote field: " . $field;
}
say "Result: ", $csv->string();
}
Output:
Result: "Student GOT 8 MARKS in math, 7 in Spanish"
Result: "Student GOT 8 MARKS in ""math"", 7 in ""Spanish"""

Parse Excel data raises "File error"

I have three functions in a Perl script to parse an Excel file to getthe desired output Excel file. I achieved the correct output, however I get an error
File error: data may have been lost
Likely the root cause is writing over the same Excel cell twice in the file.
How do I get rid of this error while maintaining the functions in the script?
Input file
A B
Apples Carrots
Oranges Broccoli
Grapes Spinach
Desire output file
A B
Apples Carrots
Oranges Broccoli
PEACHES ASPARAGUS
Perl code
use v5.10.0;
use warnings;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
use Spreadsheet::WriteExcel;
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $workbook_R = $parser->parse('C:\Perl\databases\Fruits_and_Veggies.xls');
my $workbook_W = Spreadsheet::WriteExcel->new('C:\Perl\databases\New_Fruits_and_Veggies.xls');
my $worksheet_W = $workbook_W->add_worksheet();
for our $worksheet_R ( $workbook_R->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet_R->row_range();
my ( $col_min, $col_max ) = $worksheet_R->col_range();
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
FruitStand();
VeggieStand();
ComboStand();
#------------------------------------------------------------------------------
# sub FruitStand - parsing: replace Grapes with PEACHES
#------------------------------------------------------------------------------
sub FruitStand {
# if the cell contains Grapes write 'PEACHES' instead
my $cell_grapes = $worksheet_R->get_cell( $row, $col );
if ( $cell_grapes->value() =~ /Grapes/ ) {
$worksheet_W->write($row, $col,"PEACHES");
}
}
#------------------------------------------------------------------------------
# sub VeggieStand - parsing: repalce Spinach with ASPARAGUS
#------------------------------------------------------------------------------
sub VeggieStand {
# if the cell contains Spinach write 'ASPARAGUS' instead
my $cell_veggies = $worksheet_R->get_cell( $row, $col );
# my $cell = $worksheet_R->get_cell( $row, $col );
if (/ $cell_veggies->value() =~ /Spinach/ ) {
$worksheet_W->write($row, $col,"ASPARAGUS");
}
}
#------------------------------------------------------------------------------
# Writing all fruits and veggies with the 2 changes (PEACHES and ASPARAGUS)
#------------------------------------------------------------------------------
sub ComboStand {
my $cell = $worksheet_R->get_cell( $row, $col );
$worksheet_W->write($row, $col, $cell->value()) ;
}
}
}
}
As you have said, the error message is due to a cell being written several times. You can get rid of the error message by ensuring that each cell is only written once. Since your three subroutines have very similar functionality, they can be combined into a single set of lines that does everything, and use an if / else cascade to decide which action should be taken.
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
my $cell = $worksheet_R->get_cell( $row, $col );
if($cell->value() =~ /Spinach/) {
$worksheet_W->write($row, $col,"ASPARAGUS");
}
elsif($cell->value() =~ /Grapes/) {
$worksheet_W->write($row, $col,"PEACHES");
}
else {
$worksheet_W->write($row, $col, $cell->value()) ;
}
}
}
If you have to keep the functions, I suggest something like this, where you have a function that takes the current cell and applies any appropriate transformations to it, and returns the text ready for output. This keeps the repetition of reading the cell and writing the cell out of the subroutines:
for our $row ( $row_min .. $row_max ) {
for our $col ( $col_min .. $col_max ) {
my $cell = $worksheet_R->get_cell( $row, $col );
$worksheet_w->write( $row, $col, produce_check($cell->value) );
}
}
And the produce_check sub that performs the swaps would also be a good place to do any text normalisations or other checks you might want to do on the input, e.g. removing extra whitespace, setting all the output to title case, etc.
sub produce_check {
my $prod = shift;
# maybe we have to make sure there's no trailing whitespace on $prod
$prod =~ s/\s*$//;
my %swaps = (
grapes => 'peaches',
spinach => 'asparagus',
tins => 'cans',
zuchini => 'zucchini'
);
# is $prod one of pieces of produce we have to swap?
# perhaps our input is in a mixture of cases, uppercase, lowercase, titlecase
# to avoid having to add all those variations to the %swaps hash, we convert
# to lowercase using `lc`
if ( $swaps{ lc($prod) } ) {
$prod = $swaps{ lc($prod) };
}
# this line uses `ucfirst($prod)` to convert all output to titlecase.
# You could also convert everything to lowercase ( `lc($prod)` ), to
# uppercase ( `uc($prod)` ), or just leave it as-is by using `return $prod;`
return ucfirst( $prod );
}

What is the fastest way to increment a string in perl?

I would like to append a string in perl within a loop in a fast way, without having to copy the string for each iteration. I'm looking for something like StringBuilder from Java or C#.
I currently know the following alternatives in mind, in order to do 'a += b'.
a .= b # concat
a = join('', a, b); # join
push #a, b # array push
I am not interested in copying all string to the other. I need to copy one character per time, or append small strings foreach iteration. I am trying to solve the following problem: compress the input string 'aaabbccc' to '3a2b3c'. So the idea is to iterate over the input string, check how many repeated characters we have, and then append to the output in the compressed way. What is the most efficient to perform this in perl ?
Here is a link to the problem I was trying to solve. I's slightly different though.
For comparsion, I tried to test different versions for solving your actual problem of compressing the string. Here is my test script test.pl:
use strict;
use warnings;
use Benchmark qw(cmpthese);
use Inline C => './compress_c.c';
my $str_len = 10000;
my #chars = qw(a b c d);
my $str;
$str .= [#chars]->[rand 4] for 1 .. $str_len;
cmpthese(
-1,
{
compress_array => sub { compress_array( $str ) },
compress_regex => sub { compress_regex( $str ) },
compress_str => sub { compress_str( $str ) },
compress_c => sub { compress_c( $str ) },
}
);
# Suggested by #melpomene in the comments
sub compress_regex {
return $_[0] =~ s/([a-z])\1+/($+[0] - $-[0]) . $1/egr;
}
sub compress_array {
my $result = '';
my #chrs = split //, $_[0];
my $prev = $chrs[0];
my $count = 1;
my #result;
for my $i ( 1..$#chrs ) {
my $char = $chrs[$i];
if ( $prev eq $char ) {
$count++;
next if $i < $#chrs;
}
if ( $count > 1) {
push #result, $count, $prev;
}
else {
push #result, $prev;
}
if ( ( $i == $#chrs ) and ( $prev ne $char ) ) {
push #result, $char;
last;
}
$count = 1;
$prev = $char;
}
return join '', #result;
}
sub compress_str {
my $result = '';
my $prev = substr $_[0], 0, 1;
my $count = 1;
my $lastind = (length $_[0]) - 1;
for my $i (1 .. $lastind) {
my $char = substr $_[0], $i, 1;
if ( $prev eq $char ) {
$count++;
next if $i < $lastind;
}
if ( $count > 1) {
$result .= $count;
}
$result .= $prev;
if ( ( $i == $lastind ) and ( $prev ne $char ) ) {
$result .= $char;
last;
}
$count = 1;
$prev = $char;
}
return $result;
}
where compress_c.c is:
SV *compress_c(SV* str_sv) {
STRLEN len;
char* str = SvPVbyte(str_sv, len);
SV* result = newSV(len);
char *buf = SvPVX(result);
char prev = str[0];
int count = 1;
int j = 0;
int i;
for (i = 1; i < len; i++ )
{
char cur = str[i];
if ( prev == cur ) {
count++;
if ( i < (len - 1) )
continue;
}
if ( count > 1) {
buf[j++] = count + '0'; // assume count is less than 10
}
buf[j++] = prev;
if ( (i == (len - 1)) && (prev != cur) ) buf[j++] = cur;
count = 1;
prev = cur;
}
buf[j] = '\0';
SvPOK_on(result);
SvCUR_set(result, j);
return result;
}
The result of running perl test.pl:
Rate compress_array compress_str compress_regex compress_c
compress_array 311/s -- -42% -45% -99%
compress_str 533/s 71% -- -6% -98%
compress_regex 570/s 83% 7% -- -98%
compress_c 30632/s 9746% 5644% 5273% --
Which shows that regex version is slightly faster than the string version. However, the C version is the fastest, and it is about 50 times as fast as the regex version.
Note: I tested this on my Ubuntu 16.10 laptop (Intel Core i7-7500U CPU # 2.70GHz)
I've performed the following benchmark in several ways to perform that:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);
my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 10000;
sub frequency_concat {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result .= substr($dna, $idx, 1);
}
return $result;
}
sub frequency_join {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result = join '', $result, substr($dna,$idx,1);
}
return $result;
}
sub frequency_list_push {
my #result = ();
for my $idx (0 .. length($dna) - 1) {
push #result, substr($dna,$idx,1);
}
return join '', #result;
}
sub frequency_list_prealloc {
my #result = (' ' x length($dna));
for my $idx (0 .. length($dna) - 1) {
$result[$idx] = substr($dna,$idx,1);
}
return join '', #result;
}
cmpthese(-1, # Run each for at least 1 second(s) {
concat => \&frequency_concat,
join => \&frequency_join,
list_push => \&frequency_list_push,
list_list_prealloc => \&frequency_list_prealloc
}
);
The results below have shown that the concat (a . b) is the fastest operation. I don't understand why, since this will need to make several copies of the string.
Rate join list_push list_list_prealloc concat
join 213/s -- -38% -41% -74%
list_push 342/s 60% -- -5% -58%
list_list_prealloc 359/s 68% 5% -- -56%
concat 822/s 285% 140% 129% --

Spreadsheet::ParseExcel $cell->value() returning undef

I am new to perl and having trouble with the spreadsheet excel parser module.
WHen I use $cell->value() on its own line, there appears to be no problem, however when I try to use this to insert the value into an array, it returns undef, code show below:
for my $row ($row_min .. $row_max) {
my #line;
for my $col ( $col_min .. $col_max) {
my $cell = $worksheet->get_cell( $row, $col );
$value = $cell->value;
push #line, $value if defined($cell);
print $cell->value;
print Dumper $line;
}
}
}
Here print $cell->value; returns the contents of the cell but print Dumper $line; returns undef
You have to push $cell->valuenot $value
push #line, $cell->value if defined($cell);
You should add use strict at the beginning of your program, so you get an error message in such a case.

Perl Excel - Can't call method value on undefined value.. Deleting cells when text is a substring of another cell

I have the following code which parses an Excel file.. For every row, if a cell is substring of another cell on the same row, I want to delete this cell.
My data (in the .xls file) look like that:
Number1 Text1 Text2 Text3 ... TextN Number2 Number3 ... NumberN
Each number and each text is in a different cell. The number of numbers and text may vary per row.. I want to check if Text1 is a substring of Text2 or Text3 etc... similarly if Text3 is a substring of Text4 Text5 etc.. If they are substrings I want to delete these cells.
#!/usr/bin/perl -w
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use diagnostics;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse('test.xls');
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
my $test = $cell->value();
if (defined $test) {
my $cellValue = $cell->value();
print"The cell value is $cellValue \n";
} else {
print "Cell value is not defined \n";
}
#my $nextCell = $worksheet->get_cell( $row, $col+1 );
#if (index($nextCell->value(), $cell->value()) != -1) {
#print "$nextCell->value() contains $cell->value()\n";
#}
#next unless $cell;
}
}
}
I get an error Can't call method "value" on an undefined value at ... I believe it has to do with the fact that when the final cell in the row is found, the $cell->value function fails because the cell is empty.. I tried checking if the value if undefined so that I avoid processing this cell but I still get the same error.. How does Perl deal with empty cells ? How can I avoid getting this error? Thanks !
The error means $cell is undef when you call $cell->value.
If you simply want to skip empty cells, why not add
next unless $cell;
in your for my $col ( ... ) loop
Edit:
you could add
if( my $test = $cell->value() ){
$cell->delete if grep{ ( my $forward = $worksheet->get_cell( $row, $_ ) ) && ( $forward =~ /\Q$test\E/ } ( $col+1 .. $colMax );
}
Edit: This does not work ( I was not sure and could not test at the time ). Sorry.
Either declare a temporary variable $forward first, AND ( which was also wrong ) call ->value:
if( my $test = $cell->value() ){
my $forward;
$cell->delete if grep{ ( $forward = $worksheet->get_cell( $row, $_ )->value ) && ( $forward =~ /\Q$test\E/ } ( $col+1 .. $colMax );
}
Or, probably better, write it as a for loop (me was trying to be too smart for me own good)
for my $pos ( $col+1 .. $colMax ){
my $forward_cell = $worksheet->get_cell( $row, $pos );
if ( $forward_cell->value =~ /\Q$text/ ){
$cell->delete;
last;
}
}
This, elegantly, goes back to my earlier point: This seems inefficient
However, it might be more efficient to first get all the actually existing cells and then delete, again grepping for a following cell that matches text. Not sure you want to do /\Q$text\E/ or /^\Q$text\E/ ( string begins with $text ), and you might not need \Q ... \E since it only escapes special characters and is unnecessary if there are none.

Resources