how to combine directory path in perl - linux

I am having a perl script in which i am giving path to directory as input.
Directory has xml files inside it.
In my code i am iterating through all the xml files and creating absolute path for all xml files. Code is working fine.
#!/usr/bin/perl
use File::Spec;
$num_args = $#ARGV + 1;
if ($num_args != 1) {
print "\nUsage: $0 <input directory>\n";
exit;
}
my $dirPath = $ARGV[0];
opendir(DIR, $dirPath);
my #docs = grep(/\.xml$/,readdir(DIR));
foreach my $file (#docs)
{
my $abs_path = join("",$dir,$file);
print "absolute path is $abs_path";
}
Question i have here is,
joining $dirPath and $file with no separator which means that $dirPath must end in a "/". So is there any way or built in function in perl which take cares of this condition and replaces the join method.
All i want is not to worry about the separator "/". Even if script is called with path as "/test/dir_to_process" or "/test/dir_to_process/", i should be able to produce the correct absolute path to all xml files present without worrying about the separator.
Let me know if anyone has any suggestions.

Please take heed of the advice you are given. It is ridiculous to keep asking questions when comments and answers to previous posts are being ignored.
You must always use strict and use warnings at the top of every Perl program you write, and declare every variable using my. It isn't hard to do, and you will be reprimanded if you post code that doesn't have these measures in place.
You use the File::Spec module in your program but never make use of it. It is often easier to use File::Spec::Functions instead, which exports the methods provided by File::Spec so that there is no need to use the object-oriented call style.
catfile will correctly join a file (or directory) name to a path, doing the right thing if path separators are incorrect. This rewrite of your program works fine.
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec::Functions 'catfile';
if (#ARGV != 1) {
print "\nUsage: $0 <input directory>\n";
exit;
}
my ($dir_path) = #ARGV;
my $xml_pattern = catfile($dir_path, '*.xml');
while ( my $xml_file = glob($xml_pattern) ) {
print "Absolute path is $xml_file\n";
}

The answer is in the documentation for File::Spec, e.g., catfile:
$path = File::Spec->catfile( #directories, $filename );
or catpath:
$full_path = File::Spec->catpath( $volume, $directory, $file );

This will add the trailing slash if not there:
$dirPath =~ s!/*$!/!;

Related

Having a small Issue running a Perl scripts IF statement.

I created a small script in Perl and I am really new to this. I'm supposed to have a script that looks at an argument given and create a directory tree in the given argument. This part of the script works. The second part (which is the nested if statement) does not when you do not give an argument and it asks you to input a directory of your choice. I believe the nested if statement is messing up due to the $file input but I'm not entirely sure whats wrong. This is probably something really simple, but I have not been able to find the solution. Thank you in advance for the help and tips.
#! /usr/bin/perl
if ($#ARGV == -1)
{
print "Please enter default directory:";
my $file=<STDIN>;
if (-d $file)
{
chdir $file;
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print "Structure Created";
}
else
{
print "Directory does not exsist";
}
}
else
{
chdir $ARGV[0];
system("mkdir Data");
system("mkdir Data/Image");
system("mkdir Data/Cache");
print ("Structure Created");
}
print ("\n");
The test -d $file is failing because what is entered via STDIN also has the newline, after the string that specifies the directory name. You need chomp($file);
However, there are a few more points I would like to bring up.
Most importantly, there is repeated code in both branches. You really do not want to do that. It can, and does, cause trouble later. Instead, decide on the directory name, and then make it.
Second, there is no reason to go out to the system in order to make a directory. It is far better to do it in Perl, and there are good modules for this.
use strict;
use warnings;
use File::Path qw(make_path);
my $dir;
if (not #ARGV) {
print "Please enter default directory: ";
$dir = <STDIN>;
chomp $dir;
}
else {
$dir = $ARGV[0];
}
die "No directory $dir" if not -d $dir;
my $orig_cwd = chdir $dir or die "Can't chdir to $dir: $!";
my #dirs = map { "Data/$_" } qw(Image Cache);
my #dirs_made = make_path( #dirs, { verbose => 1 } );
print "Created directories:\n";
print "$_\n" for #dirs_made;
I build the directory list using map so to avoid repeated strings with Data/..., and for later flexibility. You can of course just type the names in, but that tends to invite silly mistakes.
I used File::Path to make the directories. It builds the whole path, like mkdir -p, and has a few other useful options that you can pass in { }, including error handling. There are other modules as well, for example Path::Tiny with its mkpath (and a lot of other goodies).
Note that with chdir you probably want to record the current working directory, that it returns, and that you want to check for error. But you don't have to chdir, if there are no other reasons for that. Just include the $dir name in the map
# No chdir needed here
my #dirs = map { "$dir/Data/$_" } qw(Image Cache);

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

Why can't I print a very long string? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I'm writing a Perl script that searches a kml file and I need to print a very long line of latitude/longitude coordinates. The following script successfully finds the string I'm looking for, but just prints a blank line instead of the value of the string:
#!/usr/bin/perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
$file = $ARGV[0];
# read existing kml file
open( INFO, $file ); # Open the file
#lines = <INFO>; # Read it into an array
close(INFO); # Close the file
#print #lines; # Print the array
$x = 0;
$coord_string = "<coordinates>";
# go through each line looking for above string
foreach $line (#lines) {
$x++;
if ( $x > 12 ) {
if ( $line =~ $coord_string ) {
$thisCooordString = $line;
$var_startX = $x;
print "Found coord string: $thisCoordString\n";
print " on line: $var_startX\n";
}
}
}
The file that it's reading is here
and this is the output I get:
-bash-4.3$ perl writekml.pl HUC8short.kml
Found coord string:
on line: 25
Found coord string:
on line: 38
Is there some cap on the maximum length that a string can be in Perl? The longest line in this file is ~151,000 characters long. I've verified that all the lines in the file are read successfully.
You've misspelled the variable name (two os vs three os):
$thisCooordString = $line;
...
print "Found coord string: $thisCoordString\n";
Add use strict and use warnings to your script to prevent these sorts of errors.
Always include use strict and use warnings in EVERY perl script.
If you had done this, you would've gotten the following error message to clue you into your bug:
Global symbol "$thisCoordString" requires explicit package name
Adding these pragmas and simplifying your code results in the following:
#!/usr/bin/env perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
use strict;
use warnings;
local #ARGV = 'HUC8short.kml';
while (<>) {
if ( $. > 12 && /<coordinates>/ ) {
print "Found coord string: $_\n";
print " on line: $.\n";
}
}
You can even try with perl one liners as shown below:
Perl One liner on windows command prompt:
perl -lne "if($_ =~ /<coordinates>/is && $. > 12) { print \"Found coord string : $_ \n"; print \" on line : $. \n\";}" HUC8short.kml
Perl One liner on unix prompt:
perl -lne 'if($_ =~ /<coordinates>/is && $. > 12) { print "Found coord string : $_ \n"; print " on line : $. \n";}' HUC8short.kml
As others have pointed out, you need. No, you MUST always use use strict; and use warnings;.
If you used strict, you would have gotten an error message telling you that your variable $thisCoordString or $thisCooordString was not declared with my. Using warnings would have warned you that you're printing an undefined string.
Your whole program is written in a very old (and obsolete) Perl programming style. This is the type of program writing I would have done back in Perl 3.0 days about two decades ago. Perl has changed quite a bit since then, and using the newer syntax will allow you to write easier to read and maintain programs.
Here's your basic program written in a more modern syntax:
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use autodie; # Program automatically dies on bad file operations
use IO::File; # Lots of nice file activity.
# Make Constants constant
use constant {
COORD_STRING => qr/<coordinates>/, # qr is a regular expression quoted string
};
my $file = shift;
# read existing kml file
open my $fh, '<', $file; # Three part open with scalar filehandle
while ( my $line = <$fh> ) {
chomp $line; # Always "chomp" on read
next unless $line =~ COORD_STRING; #Skip non-coord lines
say "Found coord string: $line";
say " on line: " . $fh->input_line_number;
}
close $fh;
Many Perl developers are self taught. There is nothing wrong with that, but many people learn Perl from looking at other people's obsolete code, or from reading old Perl manuals, or from developers who learned Perl from someone else back in the 1990s.
So, get some books on Modern Perl and learn the new syntax. You might also want to learn about things like references which can lead you to learn Object Oriented Perl. References and OO Perl will allow you to write longer and more complex programs.

Perl to check the sub directories and change the onwership

I am trying to write a perl script which checks all the directories in the current directory and then accordingly penetrates in the subsequent directories to the point where it contains the last directory. This is what I have written:
#!/usr/bin/perl -w
use strict;
my #files = <*>;
foreach my $file (#files){
if (-d $file){
my $cmd = qx |chown deep:deep $file|;
my $chdir = qx |cd $file|;
my #subfiles = <*>:
foreach my $ subfile(#subfiles){
if (-d $file){
my $cmd = qx |chown deep:deep $subfile|;
my $chdir = qx |cd $subfile|;
. # So, on in subdirectories
.
.
}
}
}
}
Now, some of the directories I have conatins around 50 sub directories. How can I penetrate through it without writing 50 if conditions? Please suggest. Thank you.
Well, a CS101 way (if this is just an exercise) is to use a recursive function
sub dir_perms {
$path = shift;
opendir(DIR, $path);
my #files = grep { !/^\.{1,2}$/ } readdir(DIR); # ignore ./. and ./..
closedir(DIR);
for (#files) {
if ( -d $_ ) {
dir_perms($_);
}
else {
my $cmd = qx |chown deep:deep $_|;
system($cmd);
}
}
}
dir_perms(".");
But I'd also look at File::Find for something more elegant and robust (this can get caught in a circular link trap, and errors out if you don't call it on a directory, etc.), and for that matter I'd look at plain old UNIX find(1), which can do exactly what you're trying to do with the -exec option, eg
/bin/bash$ find /path/to/wherever -type f -exec chown deep:deep {} \;
perldoc File::Find has examples for what you are doing. Eg,
use File::Find;
finddepth(\&wanted, #directories_to_search);
sub wanted { ... }
further down the doc, it says you can use find2perl to create the wanted{} subproc.
find2perl / -name .nfs\* -mtime +7 \
-exec rm -f {} \; -o -fstype nfs -prune
NOTE: The OS usually won't let you change ownership of a file or directory unless you are the superuser (i.e. root).
Now, we got that out of the way...
The File::Find module does what you want. Use use warnings; instead of -w:
use strict;
use warnings;
use feature qw(say);
use autodie;
use File::Find;
finddepth sub {
return unless -d; # You want only directories...
chown deep, deep, $File::Find::name
or warn qq(Couldn't change ownership of "$File::Find::name\n");
}, ".";
The File::Find package imports a find and a finddepth subroutine into your Perl program.
Both work pretty much the same. They both recurse deeply into your directory and both take as their first argument a subroutine that's used to operate on the found files, and list of directories to operate on.
The name of the file is placed in $_ and you are placed in the directory of that file. That makes it easy to run the standard tests on the file. Here, I'm rejecting anything that's not a directory. It's one of the few places where I'll use $_ as the default.
The full name of the file (from the directory you're searching is placed in $File::Find::name and the name of that file's directory is $File::Find::dir.
I prefer to put my subroutine embedded in my find, but you can also put a reference to another subroutine in there too. Both of these are more or less equivalent:
my #directories;
find sub {
return unless -d;
push #directories, $File::Find::name;
}, ".";
my #directories;
find \&wanted, ".";
sub wanted {
return unless -d;
push #directories, $File::Find::name;
}
In both of these, I'm gathering the names of all of the directories in my path and putting them in #directories. I like the first one because it keeps my wanted subroutine and my find together. Plus, the mysteriously undeclared #directories in my subroutine doesn't look so mysterious and undeclared. I declared my #directories; right above the find.
By the way, this is how I usually use find. I find what I want, and place them into an array. Otherwise, you're stuck putting all of your code into your wanted subroutine.

Perl if condition parameters

I have a log file which looks like below:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip-turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
There are many such entries in the same log file such that some contains IDLE none at the end while some does not. I would like to retain the ones having "R integration" and "IDLE none" in a hash and ignore the rest. I have tried the following code but not getting the desired results.
#!/usr/bin/perl
open (FH,'/root/log.txt');
my %stat;
my ($killid, $killid_details);
while ($line = <FH>) {
if ($line =~ m/(\d+)/){
$killid = $1;
}
if ($line =~ /R integration/ and $line =~ /IDLE none/){
$killid_details = $line;
}
$stat{$killid} = {
killid => $killid_details
};
}
close (FH);
I am getting all the lines with R integration (for example I get 8869, 8870 lines) which should not be the case as 8870 should be ignored.
Please inform me if any mistake. I am still learning perl. Thank you.
I made a few changes in your program:
Always put in use strict; and use warnings;. These will catch 90% of your errors. (Although not this time).
When you open a file, you need to either use or die as in open my $fh, "<", $file or die qq(blah, blah, blah); or use use autodie; (which is now preferred). In your case, if the file didn't open, your program would have continued merrily along. You need to test whether or not the open statement worked.
Note my open statement. I use a variable for the file handle. This is preferred because it's not global, and it's easier to pass into subroutines. Also note I use the three parameter open. This way, you don't run into trouble if your file name begins with some strange character.
When you declare a variable, it's best to do it in scope. This way, variables go out of scope when you no longer need them. I moved where $killid and $killid_details to be declared inside the loop. That way, they no longer exist outside the loop.
You need to be more careful with your regular expressions. What if the phrase IDLE none appears elsewhere in your line? You only want it if its on the end of the line.
Now, for the issues you had:
You need to chomp lines when you read them. In Perl, the NL at the end of the line is read in. The chomp command removes it.
Your logic was a bit strange. You set $killid if your line had a digit in it (I modified it to look only for digits at the beginning of the line). However, you simply went on your merry way even if killid was not set. In your version, because you declared $killid outside of the loop, it had a value in each loop. Here I go to the next statement if $killid isn't defined.
You had a weird definition for your hash. You were defining a reference hash within a hash. No need for that. I made it a simple hash.
Here it is:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while (my $line = <$log_fh>) {
chomp $line;
next if not $line =~ /^(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
say Dumper \%stat;
I think this is probably what you want:
while (<FH>) {
next unless /^(\d+).*R integration.*IDLE none/;
$stat{$1} = $_;
}
The regexp should be anchored to the beginning of the line, so you don't match a number anywhere on the line. There's no need to do multiple regexp matches, assuming the order of R integration and IDLE none are always as in the example. You need to use next when there's no match, so you don't process non-matching lines.
And I suspect that you just want to set the value of the hash entry to the string, not a reference to another hash.

Resources