Perl - Dropping delimited text files into one Excel file with tabs - excel

Returning to Perl after some time off, I have been looking for a way to drop some tab delimited text files into an array and then into an Excel file; basically an Excel tab generated for each text file in a directory. Generally the text files are a similar format.
The code below which has been cobbled from examples generally produces what I am after. However the output ignores any tabs and prints all text (per row) in one string. I am struggling with how to implement the tab delimiter into the code. I know I will need to split the text files as they are pushed into the array. I had been playing around with hashes, but I think I am looking too far into the problem, and it's likely to be an obvious answer that I am missing.
use warnings;
use strict;
use Cwd qw(abs_path);
use Spreadsheet::WriteExcel;
die "Log path ARG required " unless defined $ARGV[0];
my $path = abs_path( $ARGV[0] );
my $workbook = Spreadsheet::WriteExcel->new("resultsbook.xls");
chdir $path or die "no such directory: $!";
if ( -d $path ) { ## test if $path given is a directory
opendir my $dir, $path or die "can't open the directory: $!";
while ( defined( my $file = readdir($dir) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
(my $sheetname = $file) =~s/\.\w+?//;
my $wrksheet = $workbook->add_worksheet($sheetname);
$wrksheet->write_col( 0, 0, [ #{ readfile($file) } ] );
}
}
sub readfile {
my $textfilecontent = [];
open my $fh, '<', shift() or die "can't open file:$!";
while (<$fh>) {
chomp;
push #{$textfilecontent}, $_, $/;
}
return $textfilecontent;
}

You need to split the lines with tab (or whatever delimiter) before pushing them into the #textfilecontent variable. There are a couple of other minor corrections in here:
use warnings;
use strict;
use Cwd qw(abs_path);
use Spreadsheet::WriteExcel;
die "Log path ARG required " unless defined $ARGV[0];
my $path = abs_path( $ARGV[0] );
my $workbook = Spreadsheet::WriteExcel->new("resultsbook.xls");
chdir $path or die "no such directory: $!";
if ( -d $path ) { ## test if $path given is a directory
opendir my $dir, $path or die "can't open the directory: $!";
while ( defined( my $file = readdir($dir) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
(my $sheetname = $file) =~s/\.\w+//;
my $wrksheet = $workbook->add_worksheet($sheetname);
$wrksheet->write_col( 0, 0, readfile($file));
}
}
sub readfile {
my #textfilecontent = ();
open my $fh, '<', shift() or die "can't open file:$!";
while (<$fh>) {
chomp;
push #textfilecontent, [split(/\t/)];
}
return \#textfilecontent;
}

Related

search multi line string from multiple files in a directory

the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
#!/usr/bin/perl
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
break;
}
}
close (LOGFILE);
}
}
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
break;
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
}
}
}
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl foo.pl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my($dir,$re,$logfile);
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
}
}
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
}
Reference: regex modifies, unlink, perlvar

Trying to read a pdf, parse the data, and write desired data to spreadsheet using Perl on Linux

I am trying to extract data from credit card statements and enter it into a spreadsheet for tax purposes. What I've done so far involves multiple steps but I'm relatively new to Perl and am working from what I know. Here are two separate scripts I've written so far...one reads all data from a pdf and writes to a text file, the other parses the text (imperfectly) and writes it to another text file. Then I'd like to either create a csv file to import into a spreadsheet or write directly to a spreadsheet. I'd like to do this in one script but two or three will suffice.
first script:
#!/usr/bin/perl
use CAM::PDF;
my $file = "/home/cd/Documents/Jan14.pdf";
my $pdf = CAM::PDF->new($file);
my $doc="";
my $filename = 'report.txt';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
for ($i=1; $i <= $pdf->numPages(); $i++) {
$doc = $doc.$pdf->getPageText($i);
}
print $fh " $doc\n";
close $fh;
print "done\n";
Second script:
#!/usr/bin/perl
use strict;
use warnings;
undef $/; # Enable 'slurp' mode
open (FILE, '<', 'report.txt') or die "Could not open report.txt: $!";
my $file = <FILE>; # Whole file here now...
my ($stuff_that_interests_me) =
($file =~ m/.*?(Date of Transaction.*?CONTINUED).*/s);
print "$stuff_that_interests_me\n";
my $filename = 'data.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
print $fh " $stuff_that_interests_me\n";
close $fh;
print "done\n";
close (FILE) or die "Could not close report.txt: $!";
open (FILE2, '<', 'report.txt') or die "Could not open report.txt: $!";
my $file2 = <FILE2>; # Whole file here now...
my ($other_stuff_that_interests_me) =
($file2 =~ m/.*?(Page 2 .*?TRANSACTIONS THIS CYCLE).*/s);
print "$other_stuff_that_interests_me\n";
$filename = 'data.txt';
open($fh, '>>', $filename) or die "Could not open file '$filename' $!";
print $fh " $other_stuff_that_interests_me\n";
close $fh;
print "done\n";
close (FILE2) or die "Could not close report.txt: $!";
Update:
I found a module (CAM:PDF) on CPAN that works great for what I'm trying to do...it even renders the data in a format that I can more easily use for my spreadsheet. However, I haven't yet figured out how to get it to print to a .txt file...any suggestions?
#!/usr/bin/perl -w
package main;
use warnings;
use strict;
use CAM::PDF;
use Getopt::Long;
use Pod::Usage;
use English qw(-no_match_vars);
our $VERSION = '1.60';
my %opts = (
density => undef,
xdensity => undef,
ydensity => undef,
check => 0,
renderer => 'CAM::PDF::Renderer::Dump',
verbose => 0,
help => 0,
version => 0,
);
Getopt::Long::Configure('bundling');
GetOptions('r|renderer=s' => \$opts{renderer},
'd|density=f' => \$opts{density},
'x|xdensity=f' => \$opts{xdensity},
'y|ydensity=f' => \$opts{ydensity},
'c|check' => \$opts{check},
'v|verbose' => \$opts{verbose},
'h|help' => \$opts{help},
'V|version' => \$opts{version},
) or pod2usage(1);
if ($opts{help})
{
pod2usage(-exitstatus => 0, -verbose => 2);
}
if ($opts{version})
{
print "CAM::PDF v$CAM::PDF::VERSION\n";
exit 0;
}
if (defined $opts{density})
{
$opts{xdensity} = $opts{ydensity} = $opts{density};
}
if (defined $opts{xdensity} || defined $opts{ydensity})
{
if (!eval "require $opts{renderer}") ## no critic (StringyEval)
{
die $EVAL_ERROR;
}
if (defined $opts{xdensity})
{
no strict 'refs'; ## no critic(ProhibitNoStrict)
my $varname = $opts{renderer}.'::xdensity';
${$varname} = $opts{xdensity};
}
if (defined $opts{ydensity})
{
no strict 'refs'; ## no critic(ProhibitNoStrict)
my $varname = $opts{renderer}.'::ydensity';
${$varname} = $opts{ydensity};
}
}
if (#ARGV < 1)
{
pod2usage(1);
}
my $file = shift;
my $pagelist = shift;
my $doc = CAM::PDF->new($file) || die "$CAM::PDF::errstr\n";
foreach my $p ($doc->rangeToArray(1, $doc->numPages(), $pagelist))
{
my $tree = $doc->getPageContentTree($p, $opts{verbose});
if ($opts{check})
{
print "Checking page $p\n";
if (!$tree->validate())
{
print " Failed\n";
}
}
$tree->render($opts{renderer});
}
I'd like to either create a csv file to import into a spreadsheet or
write directly to a spreadsheet.
You can write directly to the spreadsheet, check out Excel::Writer::XLSX.
If you want to create a CSV file then you can try using Text::CSV and Text::CSV_XS.

While passing arguments to a Perl script, it's not opening the file

This is script is used to compare two .csv files and write difference in results.xls
my perl script(name:cmp.pl) is:
#!C:\Perl\bin
use Spreadsheet::WriteExcel;
use Spreadsheet::WriteExcel::Utility;
my $Wb = Spreadsheet::WriteExcel->new('results.xls');
my $s1 = $Wb->add_worksheet('res');
open(FILE1, "< ARGV[0]") || die "Cannot open $ARGV[0]\n";
open(FILE2, "< ARGV[1]") || die "Cannot open $ARGV[1]\n";
#file1 = < FILE1>;
#file2 = < FILE2>;
my $format = $Wb->add_format();
my $format1 = $Wb->add_format();
$format->set_bg_color('red');
$format1->set_bg_color('yellow');
for $i (0 .. $#file1) {
$line1 = $file1[$i];
$line2 = $file2[$i];
if (!($line1 eq $line2)) {
#x = split(/\,/, $line1);
#y = split(/\,/, $line2);
for $j (0 .. $#x) {
if ((($x[$j] != $y[$j]) || (!($x[$j] eq $y[$j])))) {
$s1->write($i, $j, $y[$j], $format);
}
else {
$s1->write($i, $j, $y[$j], $format1);
}
}
}
else {
#x = split(/\,/, $line1);
$s1->write_row($i, 0, \#x);
}
}
$Wb->close();
close(FILE1);
close(FILE2);
I passed the arguments(files) in cmd promt like
\perl>cmp.pl t1.csv t2.csv:
output:its showing cannot open
The code where you open the files and read them into arrays #file1 and #file2 should look like this
open my $fh, '<', $ARGV[0] or die qq{Cannot open "$ARGV[0]": $!\n};
my #file1 = <$f1>;
open $fh, '<', $ARGV[1] or die qq{Cannot open "$ARGV[1]": $!\n};
my #file2 = <$f2>;
close $fh;
and the two close calls at the end should be removed.
You should change your open lines - you've forgotten to put a $ before the ARGV[0] (you're accessing an array element):
use strict;
use warnings;
You could use this:
open my $fh, '<', $ARGV[0] or die $!;
Or this:
my $file = $ARGV[0];
open my $fh, '<', $file or die $!;

I can print to file using > in terminal but how do I print to files where I create the name using a $ in this code

I can print to file using > in terminal but how do I print to files where I create the name using a $ in this code?
use strict;
use warnings;
my $calls_dir = "Ask/";
opendir(my $search_dir, $calls_dir) or die "$!\n";
my #files = grep /\.txt$/i, readdir $search_dir;
closedir $search_dir;
print "Got ", scalar #files, " files\n";
#my %seen = ();
foreach my $file (#files) {
my %seen = ();
my $current_file = $calls_dir . $file;
open my $FILE, '<', $current_file or die "$file: $!\n";
while (<$FILE>) {
#if (/phone/i) {
chomp;
#if (/phone\s*(.*)\r?$/i) {
#if (/^phone\s*:\s*(.*)\r?$/i) {
#if (/Contact\s*(.*)\r?$/i) {
if (/^*(.*)Contact\s*(.*)\r?$/i) {
$seen{$1} = 1;
print $file."\t"."$_\n";# I want to print this line to file named $file."result".txt
#print "\t";
#print "\n";
#print "$_\n";
#print "\t";
#print "\n";
foreach my $addr ( sort keys %seen ) {
...
}
}
}
close $FILE;
}
Open a filehandle for writing to that file and print to it. If it doesn't exist, Perl will create it.
open my $fh, '>', "${file}result.txt" or die $!;
$fh->print("$file\t$_\n");
From perldoc -f open:
If MODE is ">", the file is opened for
output, with existing files first being truncated ("clobbered")
and nonexisting files newly created. If MODE is ">>", the file is
opened for appending, again being created if necessary.
If you want to avoid truncation, check if it exists first using -e and/or add something to the filename to make it reasonably unique (like a Unix timestamp).

copying data from multiple files and adding to different files

okay, i am not sure whether this is even possible or not..I may sound stochastic...
There are around 250 files with names as
Eg:1
1_0.pdb,1_60.pdb,1_240.pdb,....50_0.pdb,50_60.pdb,50_240.pdb..... having some data.
Now for each of the above file there is a another file of same name....just prefix file is added...Like:
E.g:2
file1_0.pdb,file1_60.pdb,file1_240.pdb,....file50_0.pdb,file50_60.pdb,file50_240.pdb..... again having some data.
is there a code possible that can copy data from each file from first example and paste it to its corresponding file in example2..? like from 1_0.pdb to file1_0.pdb...I hope iam not random and more clear...
With perl you could do something like
#!/usr/bin/perl -w
use strict;
my #filenames = qw(1_0.pdb 1_60.pdb 1_240.pdb);
for my $filename (#filenames) {
open(my $fr, '<', $filename) or next;
open(my $fw, '>>', "file$filename") or next;
local($/) = undef;
my $content = <$fr>;
print $fw $content;
close $fr;
close $fw;
}
EDIT:
Instead of listing all filnames in
my #filenames = qw(1_0.pdb 1_60.pdb 1_240.pdb);
you could do something like
my #filenames = grep {/^\d+_\d+/} glob "*.pdb";
Give this code a try:
use strict;
use warnings;
foreach my $file (glob "*.pdb") {
next if ($file =~ /^file/);
local $/ = undef;
my $newfile = "file$file";
open(my $fh1, "<", $file) or die "Could not open $file: " . $!;
open(my $fh2, ">>", $newfile) or die "Could not open $newfile: " . $!;
my $contents = <$fh1>;
print $fh2 $contents;
close($fh1);
close($fh2);
}
If you want to overwrite the contents of the files rather than appending, change ">>" to ">" in the second open statement.
This shell script will also work
foreach my_orig_file ( `ls *.pdb | grep -v ^file` )
set my_new_file = "file$my_orig_file"
cat $my_orig_file >> $my_new_file
end

Resources