Perl if condition parameters - linux

I have a log file which looks like below:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip-turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
There are many such entries in the same log file such that some contains IDLE none at the end while some does not. I would like to retain the ones having "R integration" and "IDLE none" in a hash and ignore the rest. I have tried the following code but not getting the desired results.
#!/usr/bin/perl
open (FH,'/root/log.txt');
my %stat;
my ($killid, $killid_details);
while ($line = <FH>) {
if ($line =~ m/(\d+)/){
$killid = $1;
}
if ($line =~ /R integration/ and $line =~ /IDLE none/){
$killid_details = $line;
}
$stat{$killid} = {
killid => $killid_details
};
}
close (FH);
I am getting all the lines with R integration (for example I get 8869, 8870 lines) which should not be the case as 8870 should be ignored.
Please inform me if any mistake. I am still learning perl. Thank you.

I made a few changes in your program:
Always put in use strict; and use warnings;. These will catch 90% of your errors. (Although not this time).
When you open a file, you need to either use or die as in open my $fh, "<", $file or die qq(blah, blah, blah); or use use autodie; (which is now preferred). In your case, if the file didn't open, your program would have continued merrily along. You need to test whether or not the open statement worked.
Note my open statement. I use a variable for the file handle. This is preferred because it's not global, and it's easier to pass into subroutines. Also note I use the three parameter open. This way, you don't run into trouble if your file name begins with some strange character.
When you declare a variable, it's best to do it in scope. This way, variables go out of scope when you no longer need them. I moved where $killid and $killid_details to be declared inside the loop. That way, they no longer exist outside the loop.
You need to be more careful with your regular expressions. What if the phrase IDLE none appears elsewhere in your line? You only want it if its on the end of the line.
Now, for the issues you had:
You need to chomp lines when you read them. In Perl, the NL at the end of the line is read in. The chomp command removes it.
Your logic was a bit strange. You set $killid if your line had a digit in it (I modified it to look only for digits at the beginning of the line). However, you simply went on your merry way even if killid was not set. In your version, because you declared $killid outside of the loop, it had a value in each loop. Here I go to the next statement if $killid isn't defined.
You had a weird definition for your hash. You were defining a reference hash within a hash. No need for that. I made it a simple hash.
Here it is:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while (my $line = <$log_fh>) {
chomp $line;
next if not $line =~ /^(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
say Dumper \%stat;

I think this is probably what you want:
while (<FH>) {
next unless /^(\d+).*R integration.*IDLE none/;
$stat{$1} = $_;
}
The regexp should be anchored to the beginning of the line, so you don't match a number anywhere on the line. There's no need to do multiple regexp matches, assuming the order of R integration and IDLE none are always as in the example. You need to use next when there's no match, so you don't process non-matching lines.
And I suspect that you just want to set the value of the hash entry to the string, not a reference to another hash.

Related

Perl: String in Substring or Substring in String

I'm working with DNA sequences in a file, and this file is formatted something like this, though with more than one sequence:
>name of sequence
EXAMPLESEQUENCEATCGATCGATCG
I need to be able to tell if a variable (which is also a sequence) matches any of the sequences in the file, and what the name of the sequence it matches, if any, is. Because of the nature of these sequences, my entire variable could be contained in a line of the file, or a line of the variable could be a part of my variable.
Right now my code looks something like this:
use warnings;
use strict;
my $filename = "/users/me/file/path/file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open file, "<$filename" or die "Can't find file";
my #Name;
my #Sequence;
my $inx = 0;
while (<file>){
$Name[$inx] = <file>;
$Sequence[$inx] = <file>;
$indx++;
}unless(index($Sequence[$inx], $exampleentry) != -1 || index($exampleentry, $Sequence[$inx]) != -1){
$returnval = "The sequence matches: ". $Name[$inx];
}
print $returnval;
However, even when I purposely set $entry as a match from the file, I still return The sequence does not match any in the file. Also, when running the code, I get Use of uninitialized value in index at thiscode.pl line 14, <file> line 3002. as well as Use of uninitialized value within #Name in concatenation (.) or string at thiscode.pl line 15, <file> line 3002.
How can I perform this search?
I will assume that the purpose of this script is to determine if $exampleentry matches any record in the file file.txt. A record describes here a DNA sequence and corresponds to three consecutive lines in the file. The variable $exampleentry will match the sequence if it matches the third line of the record. A match means here that either
$exampleentry is a substring of $line, or
$line is a substring of $exampleentry,
where $line referes to the corresponding line in the file.
First, consider the input file file.txt:
>name of sequence
EXAMPLESEQUENCEATCGATCGATCG
in the program you try to read these two lines, using three calls to readline. Accordingly, that last call to readline will return undef since there are no more lines to read.
It therefore seems reasonable that the two last lines in file.txt are malformed, and the correct format should be:
>name of sequence
EXAMPLESEQUENCE
ATCGATCGATCG
If I now understand you correctly, I hope this could solve your problem:
use feature qw(say);
use strict;
use warnings;
my $filename = "file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open (my $fh, '<', $filename ) or die "Can't find file: $!";
my #name;
my #sequence;
my $inx = 0;
while (<$fh>) {
chomp ($name[$inx] = <$fh>);
chomp ($sequence[$inx] = <$fh>);
if (
index($sequence[$inx], $exampleentry) != -1
|| index($exampleentry, $sequence[$inx]) != -1
) {
$returnval = "The sequence matches: ". $name[$inx];
last;
}
}
say $returnval;
Notes:
I have changed variable names to follow snake_case convention. For example the variable #Name is better written using all lower case as #name.
I changed the open() call to follow the new recommended 3-parameter style, see Don't Open Files in the old way for more information.
Used feature say instead of print
Added a chomp after each readline to avoid storing newline characters in the arrays.

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

Errors in declaration when trying to parse a csv file

I'm trying to parse a CSV file that is formatted like this:
dog cats,yellow blue tomorrow,12445
birds,window bank door,-novalue-
birds,window door,5553
aspirin man,red,567
(there is no value where -novalue- is written)
use strict;
use warnings;
my $filename = 'in.txt';
my $filename2 = 'out.txt';
open(my $in, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
my $word = "";
while (my $row = <$in>) {
chomp $row;
my #fields = split(/,/,$row);
#Save the first word of the second column
($word) = split(/\s/,$fields[1]);
if ($word eq 'importartWord')
{
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
}
else #keep as it was
{
printf $out "$fields[0]".';'."$fields[1]".';'."$fields[2]";
}
Use of uninitialized value $word in string ne at prueba7.pl line 22, <$in> line 10.
No matter where I define $word I cannot stop receiving that error and can't understand why. I think I have initialized $word correctly. I would really appreciate your help here.
Please if you are going to suggest using Text::CSV post a working code example since I haven't been able to apply it for the propose I have explained here. That's the reason I ended up writing the above code.
PD:
Because I know you are going to ask for my previous code using Text::CSV, here it is:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ';', binary => 1 }) or
die "Cannot use CSV: ".Text::CSV->error_diag ();
#directorio donde esta esc_prim2.csv
my $file = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim2.csv';
my $sal = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim3.csv';
open my $data, "<:encoding(utf8)", "$file" or die "$file: $!";
open my $out, ">:encoding(utf8)", "$sal" or die "$sal: $!";
$csv->eol ("\r\n");
#initializing variables
my $row = "";
my $word = "";
my $validar = 0;
my $line1 = "";
my #mwords = [""];#Just a try to initialize mwords... doesn't work, error keeps showing
#save the first line with field names on the other file
$line1 = <$data>;
$csv->parse($line1);
my #fields = $csv->fields();
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
while ($row = <$data>) {
if ($csv->parse($row)) {
#fields = $csv->fields();
#save first word of the field's second element
#mwords = split (/\s/, $fields[1]);
#keep the first one
$word = $mwords[0];
printf($mwords[0]);
#if that word is not one of SAN, EL y LA... writes a line in the new file with the updated second field.
$validar = ($word ne 'SAN') && ($word ne 'EL') && ($word ne 'LA');
if ($validar)
{
$csv->print($out,[$fields[0], $word, $fields[2]]);
}
else { #Saves the line in the new file as it was in the old one.
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
}
} else {#error procesing row
warn "La row no se ha podido procesar\n";
}
}
close $data or die "$file: $!";
close $out or die "$sal: $!";
Here the line where $validar is declared brings the same error of "uninitialized value" although I did it.
I also tried the push #rows, $row; approach but I don't really know how to handle the $rows[$i] since they are references to arrays (pointers) and I know they can't be operated as variables... Couldn't find a working example on how to use them.
I think you're misunderstanding the error. It's not a problem with the declaration of the variable, but with the data that you're putting into the variable.
Use of uninitialized value
This means that you are trying to use a value that is undefined (not undeclared). That means you are using a variable that you haven't given a value.
You can get more details about the warning (and it's a warning, not an error) by adding use diagnostics to your code. You'll get something like this:
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
So, when you're populating $word, it's not getting a value. Presumably, that's because some lines in your input file have an empty record there.
I have no way of knowing whether or not that's a valid input for your program, so I can't really give any helpful suggestions on how to fix this.
The error message you provided ends with: line 22, <$in> line 10. but your question doesn't show line 10 of the data ($in) requiring some speculation in this answer - but, I'd say that the second field, $field[1], of line 10 of in.txt is empty.
Consequently, this line: ($word) = split(/\s/,$fields[1]); is causing $word to be undefined. As a result, some use of it latter - be it the ne operator (as displayed in the message) or anything else is going to generate an error.
As an aside - there's little point in interpolating a variable in a string on its own; instead of "$fields[0]", say $fields[0] unless you're going to put something else in there, like "$fields[0];". You may want to consider replacing
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
with
printf $out $fields[0] . ';' . $word . ';' . $fields[2];
or
printf $out "$fields[0];$word;$fields[2]";
Of course, TMTOWTDI - so you may want to tell me to mind my own business instead. :-)

how to combine directory path in perl

I am having a perl script in which i am giving path to directory as input.
Directory has xml files inside it.
In my code i am iterating through all the xml files and creating absolute path for all xml files. Code is working fine.
#!/usr/bin/perl
use File::Spec;
$num_args = $#ARGV + 1;
if ($num_args != 1) {
print "\nUsage: $0 <input directory>\n";
exit;
}
my $dirPath = $ARGV[0];
opendir(DIR, $dirPath);
my #docs = grep(/\.xml$/,readdir(DIR));
foreach my $file (#docs)
{
my $abs_path = join("",$dir,$file);
print "absolute path is $abs_path";
}
Question i have here is,
joining $dirPath and $file with no separator which means that $dirPath must end in a "/". So is there any way or built in function in perl which take cares of this condition and replaces the join method.
All i want is not to worry about the separator "/". Even if script is called with path as "/test/dir_to_process" or "/test/dir_to_process/", i should be able to produce the correct absolute path to all xml files present without worrying about the separator.
Let me know if anyone has any suggestions.
Please take heed of the advice you are given. It is ridiculous to keep asking questions when comments and answers to previous posts are being ignored.
You must always use strict and use warnings at the top of every Perl program you write, and declare every variable using my. It isn't hard to do, and you will be reprimanded if you post code that doesn't have these measures in place.
You use the File::Spec module in your program but never make use of it. It is often easier to use File::Spec::Functions instead, which exports the methods provided by File::Spec so that there is no need to use the object-oriented call style.
catfile will correctly join a file (or directory) name to a path, doing the right thing if path separators are incorrect. This rewrite of your program works fine.
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec::Functions 'catfile';
if (#ARGV != 1) {
print "\nUsage: $0 <input directory>\n";
exit;
}
my ($dir_path) = #ARGV;
my $xml_pattern = catfile($dir_path, '*.xml');
while ( my $xml_file = glob($xml_pattern) ) {
print "Absolute path is $xml_file\n";
}
The answer is in the documentation for File::Spec, e.g., catfile:
$path = File::Spec->catfile( #directories, $filename );
or catpath:
$full_path = File::Spec->catpath( $volume, $directory, $file );
This will add the trailing slash if not there:
$dirPath =~ s!/*$!/!;

Why can't I print a very long string? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I'm writing a Perl script that searches a kml file and I need to print a very long line of latitude/longitude coordinates. The following script successfully finds the string I'm looking for, but just prints a blank line instead of the value of the string:
#!/usr/bin/perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
$file = $ARGV[0];
# read existing kml file
open( INFO, $file ); # Open the file
#lines = <INFO>; # Read it into an array
close(INFO); # Close the file
#print #lines; # Print the array
$x = 0;
$coord_string = "<coordinates>";
# go through each line looking for above string
foreach $line (#lines) {
$x++;
if ( $x > 12 ) {
if ( $line =~ $coord_string ) {
$thisCooordString = $line;
$var_startX = $x;
print "Found coord string: $thisCoordString\n";
print " on line: $var_startX\n";
}
}
}
The file that it's reading is here
and this is the output I get:
-bash-4.3$ perl writekml.pl HUC8short.kml
Found coord string:
on line: 25
Found coord string:
on line: 38
Is there some cap on the maximum length that a string can be in Perl? The longest line in this file is ~151,000 characters long. I've verified that all the lines in the file are read successfully.
You've misspelled the variable name (two os vs three os):
$thisCooordString = $line;
...
print "Found coord string: $thisCoordString\n";
Add use strict and use warnings to your script to prevent these sorts of errors.
Always include use strict and use warnings in EVERY perl script.
If you had done this, you would've gotten the following error message to clue you into your bug:
Global symbol "$thisCoordString" requires explicit package name
Adding these pragmas and simplifying your code results in the following:
#!/usr/bin/env perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
use strict;
use warnings;
local #ARGV = 'HUC8short.kml';
while (<>) {
if ( $. > 12 && /<coordinates>/ ) {
print "Found coord string: $_\n";
print " on line: $.\n";
}
}
You can even try with perl one liners as shown below:
Perl One liner on windows command prompt:
perl -lne "if($_ =~ /<coordinates>/is && $. > 12) { print \"Found coord string : $_ \n"; print \" on line : $. \n\";}" HUC8short.kml
Perl One liner on unix prompt:
perl -lne 'if($_ =~ /<coordinates>/is && $. > 12) { print "Found coord string : $_ \n"; print " on line : $. \n";}' HUC8short.kml
As others have pointed out, you need. No, you MUST always use use strict; and use warnings;.
If you used strict, you would have gotten an error message telling you that your variable $thisCoordString or $thisCooordString was not declared with my. Using warnings would have warned you that you're printing an undefined string.
Your whole program is written in a very old (and obsolete) Perl programming style. This is the type of program writing I would have done back in Perl 3.0 days about two decades ago. Perl has changed quite a bit since then, and using the newer syntax will allow you to write easier to read and maintain programs.
Here's your basic program written in a more modern syntax:
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use autodie; # Program automatically dies on bad file operations
use IO::File; # Lots of nice file activity.
# Make Constants constant
use constant {
COORD_STRING => qr/<coordinates>/, # qr is a regular expression quoted string
};
my $file = shift;
# read existing kml file
open my $fh, '<', $file; # Three part open with scalar filehandle
while ( my $line = <$fh> ) {
chomp $line; # Always "chomp" on read
next unless $line =~ COORD_STRING; #Skip non-coord lines
say "Found coord string: $line";
say " on line: " . $fh->input_line_number;
}
close $fh;
Many Perl developers are self taught. There is nothing wrong with that, but many people learn Perl from looking at other people's obsolete code, or from reading old Perl manuals, or from developers who learned Perl from someone else back in the 1990s.
So, get some books on Modern Perl and learn the new syntax. You might also want to learn about things like references which can lead you to learn Object Oriented Perl. References and OO Perl will allow you to write longer and more complex programs.

Resources