I'm new at coding and so here.
Right now I'm creating an perl script, which automatically creates an excel file with the output of an SQL Query.
SQL Query:
init_db_connections();
my #row;
my $curHnd = INV::DBI::execute('----'.':------') or die $INV::DBI::errstr;
while ($row[0] = $curHnd->fetchrow_hashref()) {
printf("Row1: >%s<\n", $row[0]{Row1}),
printf("Row2: >%s<\n", $row[0]{Row2}),
printf("Row3: >%s<\n", $row[0]{Row3})
}
exit 0;
sub init_db_connections {
INV::DBI::init({
------ => '--------',
------- => q{select Row1, Row2, Row3
from table1
}
Create the Excel:
my $workbook = Excel::Writer::XLSX->new( 'perl.xlsx' );
my $worksheet = $workbook->add_worksheet();
my $format = $workbook->add_format();
$format->set_bold();
$format->set_color( 'black' );
$format->set_underline;
my $col = my $row = 0;
$worksheet->write( $row, $col, 'SQL Report', $format );
$workbook->close();
My Problem is now that i don't know how i can combine these two, so that the Query gets automatically pushed into the Excel.
Any Ideas would be great.
I think your problem is with the hashref and dereferencing it properly. It's a subtle mistake easily made when just starting out.
my $hashref;
while ( $hashref = $curHnd->fetchrow_hashref() ) {
printf("Row1: >%s<\n", $hashref->{Row1}),
printf("Row2: >%s<\n", $hashref->{Row2}),
printf("Row3: >%s<\n", $hashref->{Row3})
}
A hashref is a reference to a hash and they are sweet, especially when you use postfix dereferencing like I've done in the example.
The $hash{row1} code you were using is for accessing the value for row1 in %hash. (and for completeness, the old way of dereferencing a hashref would be ${$hashref}{row1} )
You don't really need #row array there. You were only ever assigning to the first element $row[0], so why not just use a scalar.
As for writing out to Excel, I think you'll be using the write method inside the while loop and incrementing the row counter with $row++ .
If you're going to be doing a lot of DBI coding, pick up a copy of Programming the Perl DBI by Descartes and Bunce for chapters 4 and 5. Old but still incredibly useful. (still got mine)
Related
I have an excel file of new-hires for the company.
Firstly I need to hide all the columns that will be used for searching users.
That was pretty simple and I managed to do it. Now I'm left only with the columns I really need.
Now is the real problem:
I need to filter the data and then import those usernames to the PowerShell array.
So in excel it looks like this:
Then I have the function:
Function GetUsernames ($WorkSheet) {
$userName = $WorkSheet.UsedRange.Rows.Columns["UserNameColumn"].Value2
return $userName
}
But it's returning all of the records in the Username column - 651 records instead of 476.
The function is waiting for my input after I format the excel file manually.
Any directions will be appreciated! :)
What you seek is all the values from rows in a certain column that are not hidden in the Excel file.
To get those, you need to go through the Rows of the selected column.
In my Office version 2016, I cannot reference a column directly by its name, so I have extended your function to first find the column index.
Also, I have renamed the function a bit to follow the Verb-Noun convention in PowerShell
function Get-Usernames ($WorkSheet, $Column) {
# for me (office 2016) I cannot reference a Column by its name
# using Columns["UserNameColumn"], so I have to find the index first
$index = 0
if ($Column -is [int]) {
$index = $Column
}
else {
for ($col = 1; $col -le $WorkSheet.UsedRange.Columns.Count; $col++) {
$name = $WorkSheet.Cells.Item(1, $col).Value() # assuming the first row has the headers
if ($name -eq $Column) {
$index = $col
break
}
}
}
if ($index -gt 0) {
# now return the values in the columns for the rows that are not hidden
# skip the first row, because that is the column name itself
($WorkSheet.UsedRange.Rows.Columns($index).Rows | Select-Object -Skip 1 | Where-Object { !$_.hidden }).Value2
}
}
You can now use the function in your script like this:
$userNames = Get-Usernames $workbook.Worksheets(1) "UserNameColumn"
I am trying to write a perl script which reads an excel file(which has many sheets in it) using the sheet name.
I know how to access a particular sheet of the excel file using the sheet number, but not sure how to read it using sheet name.
Any help provided is highly appreciated.
Below is the code I wrote to access the sheet using sheet number:
my $Sheet_Number = 26;
my $workbook = ReadData("<path to the excel file>");
for (my $i =2; $i<$limit; $i++){
my $cell = "A" . $i;
my $key_1 = $workbook->[$Sheet_Number]{$cell};
}
Thanks
----Edit----
I want to open the particular sheet within the Excel file by using the sheet name. And then read the data from that particular sheet. The name of the sheet will be entered by the user while running the script from the command line arguments.
Below is the code that I am using after getting suggested answers for my earlier question:
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$path");
my $worksheet;
if ($worksheet->get_name() eq "$Sheet_Name"){
for (my $i =2; $i<$limit; $i++){
my $cell = $worksheet->get_cell($i,"A");
my $value = $cell->value();
push #array_keys, $value;
}
}
I want to read the values of Column A of the particular sheet and push it into an array.
$Sheet_Name : It is the name of the sheet which is entered by the user as cmd line arg.
$path : It is the complete path to the Excel file
Error Message: Can't call method "get_name" on an undefined value at perl_script.pl (The error points to the line where the if-condition is used.)
Thanks for the help.
-----EDIT----
Anyone, with any leads on this post, please post your answer or suggestions. Appreciate any responses.
Thanks
The get_name() method of the worksheet object, in conjunction with Perl's grep command should get you this:
my ($worksheet) = grep { $_->get_name() eq 'Sheet2' } $workbook->worksheets();
This would be an un-golfed version of the same:
my $worksheet;
foreach $worksheet ($workbook->worksheets()) {
last if $worksheet->get_name() eq 'Sheet2';
}
Assuming there is a match... if not, I guess my un-golfed version would give you the last worksheet if there was no match.
-- Edit --
I made assumptions and -- you certainly do need to first call the method to load the workbook:
use strict;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse('/var/tmp/foo.xls');
Then the code above should work.
I'm new to perl. I have two excel files containing huge no of rows and just two columns. I want to get each cell from one of the excel files and search whether its there in another excel file or not. if its not then print that cell.
I believe that if I get each cell from one of the excel and search it in another and then run a for loop for all the rows it will be done.
I reached upto getting the cell from first excel but how to search whether it is there in the another excel and printing it is the issue.
can anybody help. ??
I'm not entirely sure what you want, but this might give you some ideas. It's completely untested, though.
use strict;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook1 = $parser->parse('Book1.xls');
if (!defined $workbook1) { die $parser->error(), ".\n"; }
my $workbook2 = $parser->parse('Book2.xls');
if (!defined $workbook2) { die $parser->error(), ".\n"; }
$worksheet1 = $workbook1->worksheet('Sheet1');
$worksheet2 = $workbook2->worksheet('Sheet1');
my ($row_min1, $row_max1) = $worksheet1->row_range();
my ($col_min1, $col_max1) = $worksheet1->col_range();
for my $row1 ($row_min1 .. $row_max1) {
for my $col1 ($col_min1 .. $col_max1) {
my $cell1 = $worksheet1->get_cell($row1, $col1);
my ($row_min2, $row_max2) = $worksheet2->row_range();
my ($col_min2, $col_max2) = $worksheet2->col_range();
my $found_match = 0;
for my $row2 ($row_min2 .. $row_max2) {
for my $col2 ($col_min2 .. $col_max2) {
my $cell2 = $worksheet2->get_cell($row2, $col2);
if ($cell1->value() eq $cell2->value()) { # or == ?
$found_match = 1;
break;
}
}
break if $found_match;
}
if (!$found_match) {
print $cell1->value, "\n";
}
}
}
This is mostly from here: http://search.cpan.org/dist/Spreadsheet-ParseExcel/lib/Spreadsheet/ParseExcel.pm
I wonder if there is any way to speed up reading an Excel file with powershell. Many would say I should stop using the do until, but the problem is I need it badly, because in my Excel sheet there can be 2 rows or 5000 rows. I understand that 5000 rows needs some time. But 2 rows shouldn't need 90sec+.
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
$Path = EXCELFILEPATH
$Workbook = $Excel.Workbooks.open($Path)
$Sheet1 = $Workbook.Worksheets.Item(test)
$URows = #()
Do {$URows += $Sheet1.Cells.Item($Row,1).Text; $row = $row + [int] 1} until (!$Sheet1.Cells.Item($Row,1).Text)
$URows | foreach {
$MyParms = #{};
$SetParms = #{};
And i got this 30 times in the script too:
If ($Sheet1.Cells.Item($Row,2).Text){$var1 = $Sheet1.Cells.Item($Row,2).Text
$MyParms.Add("PAR1",$var1)
$SetParms.Add("PAR1",$var1)}
}
I have the idea of running the $MyParms stuff contemporarily, but I have no idea how. Any suggestions?
Or
Increase the speed of reading, but I have no clue how to achieve that without destroying the "read until nothing is there".
Or
The speed is normal and I shouldn't complain.
Don't use Excel.Application in the first place if you need speed. You can use an Excel spreadsheet as an ODBC data source - the file is analogous to a database, and each worksheet a table. The speed difference is immense. Here's an intro on using Excel spreadsheets without Excel
Appending to an array with the += operator is terribly slow, because it will copy all elements from the existing array to a new array. Use something like this instead:
$URows = for ($row = 1; !$Sheet1.Cells.Item($row, 1).Text; $row++) {
if ($Sheet1.Cells.Item($Row,2).Text) {
$MyParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
$SetParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
}
$Sheet1.Cells.Item($Row,1).Text
}
Your Do loop is basically a counting loop. The canonical form for such loops is
for (init counter; condition; increment counter) {
...
}
so I changed the loop accordingly. Of course you'd achieve the same result like this:
$row = 1
$URows = Do {
...
$row += 1
}
but that would just mean more code without any benefits. This modification doesn't have any performance impact, though.
Relevant in terms of performance are the other two changes:
I moved the code filling the hashtables inside the first loop, so the code won't loop twice over the data. Using index and assignment operators instead of the Add method for assigning values to the hashtable prevents the code from raising an error when a key already exists in the hashtable.
Instead of appending to an array (which has the abovementioned performance impact) the code now simply echoes the cell text in the loop, which PowerShell automatically turns into a list. The list is then assigned to the variable $URows.
I have a Perl program using Spreadsheet::ParseExcel. However, there are two difficulties that have arisen that I have been unable to figure out how to solve. The script for the program is as follows:
#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use WordNet::Similarity::lesk;
use WordNet::QueryData;
my $wn = WordNet::QueryData->new();
my $lesk = WordNet::Similarity::lesk->new($wn);
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse ( 'input.xls' );
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
WORKSHEET:
for my $worksheet ( $workbook->worksheets() ) {
my $sheetname = $worksheet->get_name();
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
my $target_col;
my $response_col;
# Skip worksheet if it doesn't contain data
if ( $row_min > $row_max ) {
warn "\tWorksheet $sheetname doesn't contain data. \n";
next WORKSHEET;
}
# Check for column headers
COLUMN:
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row_min, $col );
next COLUMN unless $cell;
$target_col = $col if $cell->value() eq 'Target';
$response_col = $col if $cell->value() eq 'Response';
}
if ( defined $target_col && defined $response_col ) {
ROW:
for my $row ( $row_min + 1 .. $row_max ) {
my $target_cell = $worksheet->get_cell( $row, $target_col);
my $response_cell = $worksheet->get_cell( $row, $response_col);
if ( defined $target_cell && defined $response_cell ) {
my $target = $target_cell->value();
my $response = $response_cell->value();
my $value = $lesk->getRelatedness( $target, $response );
print "Worksheet = $sheetname\n";
print "Row = $row\n";
print "Target = $target\n";
print "Response = $response\n";
print "Relatedness = $value\n";
}
else {
warn "\tWroksheet $sheetname, Row = $row doesn't contain target and response data.\n";
next ROW;
}
}
}
else {
warn "\tWorksheet $sheetname: Didn't find Target and Response headings.\n";
next WORKSHEET;
}
}
So, my two problems:
First of all, sometimes the program returns the error "No Excel data found in file," even though the data is there. Each Excel file is formatted the same way. There is only one sheet, with the A and B columns labelled 'Target' and 'Response,' respectively, with a list of words beneath them. However, it does not ALWAYS return this error. It works for one Excel file, but it does not work for a different one, even though both are formatted the exact same way (and yes, they are both the same file type, as well). I cannot find any reason for it to not read the second file, because it is identical to the first. The only difference is that the second file was created using an Excel macro; however, why would that matter? The file types and format are exactly the same.
Second, the variables '$target' and '$response' need to be formatted as strings in order for the 'my $value' expression to work. How do I convert them into string format? The value assigned to each variable is a word from the appropriate cell of the Excel spreadsheet. I don't know what format that is (and there is no apparent way in Perl for me to check).
Any suggestions?
In relation to your first question, the "no data found" error indicates some problem with the file format. I've seen this error with pseudo-Excel files such as Html or CSV files that have an xls extension. I've also seen this error with mal-formed files generated by third party apps.
You could do an initial verification of the files by doing a hexdump/xxd dump of a working and non working file and seeing if the overall structure is approximately the same (for example if it has similar magic numbers at the start and isn't Html).
It could also be an issue with Spreadsheet::ParseExcel. I am the maintainer of that module. If you like you could send me on a "good" and "bad" file, at the email address in the docs, and I will have a look at them.
First of all, if you are getting "no data found" you can thank proprietary Excel data file formats and the inability of even a good Perl library to extract information from them.
I strongly suggest that you export the Excel data in something easily parsed like CSV especially given the simple nature of the data layout you described. There may be a way to get Excel to process a batch but I have no idea. A quick search yielded a tool to use OpenOffice to do batch conversion.
The rest of your question is rather moot once you accept that Excel data files will not play nicely.
I wrote this code after a client couldn't decide whether the XLS he was sending every week was really in XLS format or just CSV.... HTH!
sub testForXLS ()
{
my ( $FileName ) = #_;
my $signature = '';
my $XLSsignature = 'D0CF11E0A1B11AE10000';
open(FILE, "<$FileName")||die;
read(FILE, $buffer, 10, 0);
close(FILE);
foreach (split(//, $buffer))
{ $signature .= sprintf("%02x", ord($_)); }
$signature =~ tr/a-z/A-Z/;
if ( $signature eq $XLSsignature )
{ return 1; } else { return 0; }
}