Why can't I print a very long string? [closed] - string

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I'm writing a Perl script that searches a kml file and I need to print a very long line of latitude/longitude coordinates. The following script successfully finds the string I'm looking for, but just prints a blank line instead of the value of the string:
#!/usr/bin/perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
$file = $ARGV[0];
# read existing kml file
open( INFO, $file ); # Open the file
#lines = <INFO>; # Read it into an array
close(INFO); # Close the file
#print #lines; # Print the array
$x = 0;
$coord_string = "<coordinates>";
# go through each line looking for above string
foreach $line (#lines) {
$x++;
if ( $x > 12 ) {
if ( $line =~ $coord_string ) {
$thisCooordString = $line;
$var_startX = $x;
print "Found coord string: $thisCoordString\n";
print " on line: $var_startX\n";
}
}
}
The file that it's reading is here
and this is the output I get:
-bash-4.3$ perl writekml.pl HUC8short.kml
Found coord string:
on line: 25
Found coord string:
on line: 38
Is there some cap on the maximum length that a string can be in Perl? The longest line in this file is ~151,000 characters long. I've verified that all the lines in the file are read successfully.

You've misspelled the variable name (two os vs three os):
$thisCooordString = $line;
...
print "Found coord string: $thisCoordString\n";
Add use strict and use warnings to your script to prevent these sorts of errors.

Always include use strict and use warnings in EVERY perl script.
If you had done this, you would've gotten the following error message to clue you into your bug:
Global symbol "$thisCoordString" requires explicit package name
Adding these pragmas and simplifying your code results in the following:
#!/usr/bin/env perl
# Strips unsupported tags out of a QGIS-generated kml and writes a new one
use strict;
use warnings;
local #ARGV = 'HUC8short.kml';
while (<>) {
if ( $. > 12 && /<coordinates>/ ) {
print "Found coord string: $_\n";
print " on line: $.\n";
}
}

You can even try with perl one liners as shown below:
Perl One liner on windows command prompt:
perl -lne "if($_ =~ /<coordinates>/is && $. > 12) { print \"Found coord string : $_ \n"; print \" on line : $. \n\";}" HUC8short.kml
Perl One liner on unix prompt:
perl -lne 'if($_ =~ /<coordinates>/is && $. > 12) { print "Found coord string : $_ \n"; print " on line : $. \n";}' HUC8short.kml

As others have pointed out, you need. No, you MUST always use use strict; and use warnings;.
If you used strict, you would have gotten an error message telling you that your variable $thisCoordString or $thisCooordString was not declared with my. Using warnings would have warned you that you're printing an undefined string.
Your whole program is written in a very old (and obsolete) Perl programming style. This is the type of program writing I would have done back in Perl 3.0 days about two decades ago. Perl has changed quite a bit since then, and using the newer syntax will allow you to write easier to read and maintain programs.
Here's your basic program written in a more modern syntax:
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say); # Let's you use 'say' instead of 'print' (No \n needed)
use autodie; # Program automatically dies on bad file operations
use IO::File; # Lots of nice file activity.
# Make Constants constant
use constant {
COORD_STRING => qr/<coordinates>/, # qr is a regular expression quoted string
};
my $file = shift;
# read existing kml file
open my $fh, '<', $file; # Three part open with scalar filehandle
while ( my $line = <$fh> ) {
chomp $line; # Always "chomp" on read
next unless $line =~ COORD_STRING; #Skip non-coord lines
say "Found coord string: $line";
say " on line: " . $fh->input_line_number;
}
close $fh;
Many Perl developers are self taught. There is nothing wrong with that, but many people learn Perl from looking at other people's obsolete code, or from reading old Perl manuals, or from developers who learned Perl from someone else back in the 1990s.
So, get some books on Modern Perl and learn the new syntax. You might also want to learn about things like references which can lead you to learn Object Oriented Perl. References and OO Perl will allow you to write longer and more complex programs.

Related

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

Errors in declaration when trying to parse a csv file

I'm trying to parse a CSV file that is formatted like this:
dog cats,yellow blue tomorrow,12445
birds,window bank door,-novalue-
birds,window door,5553
aspirin man,red,567
(there is no value where -novalue- is written)
use strict;
use warnings;
my $filename = 'in.txt';
my $filename2 = 'out.txt';
open(my $in, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
my $word = "";
while (my $row = <$in>) {
chomp $row;
my #fields = split(/,/,$row);
#Save the first word of the second column
($word) = split(/\s/,$fields[1]);
if ($word eq 'importartWord')
{
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
}
else #keep as it was
{
printf $out "$fields[0]".';'."$fields[1]".';'."$fields[2]";
}
Use of uninitialized value $word in string ne at prueba7.pl line 22, <$in> line 10.
No matter where I define $word I cannot stop receiving that error and can't understand why. I think I have initialized $word correctly. I would really appreciate your help here.
Please if you are going to suggest using Text::CSV post a working code example since I haven't been able to apply it for the propose I have explained here. That's the reason I ended up writing the above code.
PD:
Because I know you are going to ask for my previous code using Text::CSV, here it is:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ';', binary => 1 }) or
die "Cannot use CSV: ".Text::CSV->error_diag ();
#directorio donde esta esc_prim2.csv
my $file = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim2.csv';
my $sal = 'C:\Users\Sergio\Desktop\GIS\perl\esc_prim3.csv';
open my $data, "<:encoding(utf8)", "$file" or die "$file: $!";
open my $out, ">:encoding(utf8)", "$sal" or die "$sal: $!";
$csv->eol ("\r\n");
#initializing variables
my $row = "";
my $word = "";
my $validar = 0;
my $line1 = "";
my #mwords = [""];#Just a try to initialize mwords... doesn't work, error keeps showing
#save the first line with field names on the other file
$line1 = <$data>;
$csv->parse($line1);
my #fields = $csv->fields();
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
while ($row = <$data>) {
if ($csv->parse($row)) {
#fields = $csv->fields();
#save first word of the field's second element
#mwords = split (/\s/, $fields[1]);
#keep the first one
$word = $mwords[0];
printf($mwords[0]);
#if that word is not one of SAN, EL y LA... writes a line in the new file with the updated second field.
$validar = ($word ne 'SAN') && ($word ne 'EL') && ($word ne 'LA');
if ($validar)
{
$csv->print($out,[$fields[0], $word, $fields[2]]);
}
else { #Saves the line in the new file as it was in the old one.
$csv->print($out,[$fields[0], $fields[1], $fields[2]]);
}
} else {#error procesing row
warn "La row no se ha podido procesar\n";
}
}
close $data or die "$file: $!";
close $out or die "$sal: $!";
Here the line where $validar is declared brings the same error of "uninitialized value" although I did it.
I also tried the push #rows, $row; approach but I don't really know how to handle the $rows[$i] since they are references to arrays (pointers) and I know they can't be operated as variables... Couldn't find a working example on how to use them.
I think you're misunderstanding the error. It's not a problem with the declaration of the variable, but with the data that you're putting into the variable.
Use of uninitialized value
This means that you are trying to use a value that is undefined (not undeclared). That means you are using a variable that you haven't given a value.
You can get more details about the warning (and it's a warning, not an error) by adding use diagnostics to your code. You'll get something like this:
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
So, when you're populating $word, it's not getting a value. Presumably, that's because some lines in your input file have an empty record there.
I have no way of knowing whether or not that's a valid input for your program, so I can't really give any helpful suggestions on how to fix this.
The error message you provided ends with: line 22, <$in> line 10. but your question doesn't show line 10 of the data ($in) requiring some speculation in this answer - but, I'd say that the second field, $field[1], of line 10 of in.txt is empty.
Consequently, this line: ($word) = split(/\s/,$fields[1]); is causing $word to be undefined. As a result, some use of it latter - be it the ne operator (as displayed in the message) or anything else is going to generate an error.
As an aside - there's little point in interpolating a variable in a string on its own; instead of "$fields[0]", say $fields[0] unless you're going to put something else in there, like "$fields[0];". You may want to consider replacing
printf $out "$fields[0]".';'."$word".';'."$fields[2]";
with
printf $out $fields[0] . ';' . $word . ';' . $fields[2];
or
printf $out "$fields[0];$word;$fields[2]";
Of course, TMTOWTDI - so you may want to tell me to mind my own business instead. :-)

perl shell command variable error

I am trying following code in one of my perl script and getting error, how do i execute following shell command and store in variable
#!/usr/bin/perl -w
my $p = $( PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`; read L1 L2 L3 DUMMY < /proc/loadavg ; echo ${L1}:${L2}:${L3}:${PROCS} );
print $p;
Error:
./foo.pl
Bareword found where operator expected at /tmp/foo.pl line 3, near "$( PROCS"
(Missing operator before PROCS?)
syntax error at /tmp/foo.pl line 3, near "$( PROCS"
Unterminated <> operator at /tmp/foo.pl line 3.
What is wrong?
This:
my $p = $( PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`; read L1 L2 L3 DUMMY < /proc/loadavg ; echo ${L1}:${L2}:${L3}:${PROCS} );
Isn't perl. It's how you'd execute a command in bash.
To run a command in perl you can:
use system.
put your command in backticks
qx (quote-execute): http://perldoc.perl.org/perlop.html#Quote-Like-Operators
However, you're enumerating a directory there, wordcounting, tr-ing and reading. So you don't actually need to do all that using a shell command. And indeed, I'd discourage you from doing so, because that's just a way to make a mess with no productive benefit.
Looks like what you're after as an end result is the 3 load average samples and a count of number of processes. Is that right?
In which case:
my $proc_count = scalar ( () = glob ( "/proc/[0-9]*" ));
open ( my $la, "<", "/proc/loadavg" ) or warn $!;
print join ( ":", split ( /\s+/, <$la> ), $proc_count ),"\n";
Something like that, anyway.
Simply printing a shell command in your Perl script won't actually execute it. You have to tell Perl that it's an external command, which you can do with system:
use strict;
use warnings;
my $command = q{
PROCS=`echo /proc/[0-9]*|wc -w|tr -d ' '`;
read L1 L2 L3 DUMMY < /proc/loadavg;
echo ${L1}:${L2}:${L3}:${PROCS}
};
system($command);
(Note that you should put use strict; use warnings; at the top of every Perl script you write.)
However, it's generally better to use native Perl functionality instead of system. All you're doing is reading from files, which Perl is perfectly capable of doing:
use strict;
use warnings;
use 5.010;
my #procs = glob '/proc/[0-9]*';
my $file = '/proc/loadavg';
open my $fh, '<', $file or die "Failed to open '$file': $!";
my $load = <$fh>;
say(join ':', (split ' ', $load)[0..2], scalar #procs);
Even better might be to use the Proc::ProcessTable module, which provides a consistent interface to the /proc filesystem across different flavors of *nix. It got some bad reviews early on but is supposedly getting bugfixes now; I haven't used it myself but you might take a look.

Perl if condition parameters

I have a log file which looks like below:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip-turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
There are many such entries in the same log file such that some contains IDLE none at the end while some does not. I would like to retain the ones having "R integration" and "IDLE none" in a hash and ignore the rest. I have tried the following code but not getting the desired results.
#!/usr/bin/perl
open (FH,'/root/log.txt');
my %stat;
my ($killid, $killid_details);
while ($line = <FH>) {
if ($line =~ m/(\d+)/){
$killid = $1;
}
if ($line =~ /R integration/ and $line =~ /IDLE none/){
$killid_details = $line;
}
$stat{$killid} = {
killid => $killid_details
};
}
close (FH);
I am getting all the lines with R integration (for example I get 8869, 8870 lines) which should not be the case as 8870 should be ignored.
Please inform me if any mistake. I am still learning perl. Thank you.
I made a few changes in your program:
Always put in use strict; and use warnings;. These will catch 90% of your errors. (Although not this time).
When you open a file, you need to either use or die as in open my $fh, "<", $file or die qq(blah, blah, blah); or use use autodie; (which is now preferred). In your case, if the file didn't open, your program would have continued merrily along. You need to test whether or not the open statement worked.
Note my open statement. I use a variable for the file handle. This is preferred because it's not global, and it's easier to pass into subroutines. Also note I use the three parameter open. This way, you don't run into trouble if your file name begins with some strange character.
When you declare a variable, it's best to do it in scope. This way, variables go out of scope when you no longer need them. I moved where $killid and $killid_details to be declared inside the loop. That way, they no longer exist outside the loop.
You need to be more careful with your regular expressions. What if the phrase IDLE none appears elsewhere in your line? You only want it if its on the end of the line.
Now, for the issues you had:
You need to chomp lines when you read them. In Perl, the NL at the end of the line is read in. The chomp command removes it.
Your logic was a bit strange. You set $killid if your line had a digit in it (I modified it to look only for digits at the beginning of the line). However, you simply went on your merry way even if killid was not set. In your version, because you declared $killid outside of the loop, it had a value in each loop. Here I go to the next statement if $killid isn't defined.
You had a weird definition for your hash. You were defining a reference hash within a hash. No need for that. I made it a simple hash.
Here it is:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while (my $line = <$log_fh>) {
chomp $line;
next if not $line =~ /^(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
say Dumper \%stat;
I think this is probably what you want:
while (<FH>) {
next unless /^(\d+).*R integration.*IDLE none/;
$stat{$1} = $_;
}
The regexp should be anchored to the beginning of the line, so you don't match a number anywhere on the line. There's no need to do multiple regexp matches, assuming the order of R integration and IDLE none are always as in the example. You need to use next when there's no match, so you don't process non-matching lines.
And I suspect that you just want to set the value of the hash entry to the string, not a reference to another hash.

Shell Script to parse/retrieve a string found after another string/match

The shell script will be passed a string of arguments. The position of the key/value I am looking to parse out may change over time, i.e. it may come before or after another key at any time so parsing between two keys wouldn't be an option.
I am looking to parse the domain key out of a string like this:
maxpark 0 maxsub n domain sample.foo maxlst n max_defer_fail_percentage user oli force no_cache_update 0 maxpop n maxaddon 0 locale en contactemail
The key would be "domain" the value would be "sample.foo". The domain key could have more than one '.' in it so I would need to grab the entire domain key.
I am not the best with regular expressions but I imagine using 'sed' is what I'm going to need to do.
I am accessing this full string using $*, if I could simply reference the key by accessing $DOMAIN that would be great, but since my only option is to access based on position, $3, and the position could change, that isn't an option
Solved the problem using PERL.
#!/usr/bin/perl -w
use strict;
my %OPTS = #ARGV;
open(FILE, "</var/named/$OPTS{'domain'}.db") || die "File not found";
my #lines = <FILE>;
close(FILE);
my #newlines;
foreach(#lines) {
$_ =~ s/$LOCAL_IP/$PUBLIC_IP/g;
push(#newlines,$_);
}
open(FILE, ">/var/named/$OPTS{'domain'}.db") || die "File not found";
print FILE #newlines;
close(FILE);
If you do have perl, just use this one-liner from your shell script.
domain=$( echo $* | perl -ne '/domain\s([^\s]+)\s/ and print "$1"' )
Or if you'd rather just do it with sed:
domain=$( echo $* | sed 's/.*\<domain \([^ ]\+\).*/\1/' )

Resources