Perl string replacements of file paths in text file - string

I'm trying to match file paths in a text file and replace them with their share file path. E.G. The string "X:\Group_14\Project_Security" I want to replace with "\\Project_Security$".
I'm having a problem at getting my head around the syntax, as I have use the backslash (\) to escape another backslash (\\) but this does not seem to work for matching a path in a text file.
open INPUT, '< C:\searchfile.txt';
open OUTPUT, '> C:\logsearchfiletest.txt';
#lines = <INPUT>;
%replacements = (
"X:\\Group_14\\Project_Security" => "\\\\Project_Security\$",
...
(More Paths as above)
...
);
$pattern = join '|', keys %replacements;
for (#lines) {
s/($pattern)/#{[$replacements{$1}]}/g;
print OUTPUT;
}
Not totally sure whats happening as "\\\\Project_Security\$" appears as \\Project_Security$" correctly.
So I think the issues lies with "X:\\Group_14\\Project_Security" not evaluating to
"X:\Group_14\Project_Security" correctly therefore not match within the text file?
Any advice on this would be appreciated, Cheers.

If all the file paths and replacements are in a similar format to your example, you should just be able to do the following rather than using a hash for looking up replacements:
for my $line (#lines) {
$line =~ s/.+\\(.+)$/\\\\$1\$/;
print OUTPUT $line;
}

Some notes:
Always use the 3-argument open
Always check for errors on open, print, or close
Sometimes is easier to use a loop than clever coding
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# place file names in variables to they are easily changed
my $search_file = 'C:\\searchfile.txt';
my $log_search_file = 'C:\\logsearchfiletest.txt';
my %replacements = (
"X:\\Group_14\\Project_Security" => "\\\\Project_Security\$",
# etc
);
# use the 3-argument open as a security precaution
open my $search_fh, '<', $search_file or die "could not open $search_file: $OS_ERROR\n";
open my $log_search_fh, '>', $log_search_file or die "could not open $log_search_file: $OS_ERROR\n";
while( my $line = <$search_fh> ){
# scan for replacements
while( my ( $pattern, $replacement ) = each %replacements ){
$line =~ s/\Q$pattern\E/$replacement/g;
}
print {$log_search_fh} $line or die "could not print to $log_search_file: $OS_ERROR\n";
}
# always close the file handles and always check for errors
close $search_fh or die "could not close $search_file: $OS_ERROR\n";
close $log_search_fh or die "could not close $log_search_file: $OS_ERROR\n";

I see you've posted my rusty Perl code here, how embarrassing. ;) I made an update earlier today to my answer in the original PowerShell thread that gives a more general solution that also handles regex metacharacters and doesn't require you to manually escape each of 600 hash elements: PowerShell multiple string replacement efficiency. I added the perl and regex tags to your original question, but my edit hasn't been approved yet.
[As I mentioned, since I've been using PowerShell for everything in recent times (heck, these days I prepare breakfast with PowerShell...), my Perl has gotten a tad dusty, which I see hasn't gone unnoticed here. :P I fixed several things that I noticed could be coded better when I looked at it a second time, which are noted at the bottom. I don't bother with error messages and declarations and other verbosity for limited use quick-and-dirty scripts like this, and I don't particularly recommend it. As the Perl motto goes, "making easy things easy and hard things possible". Well, this is a case of making easy things easy, and one of Perl's main advantages is that it doesn't force you to be "proper" when you're trying to do something quick and simple. But I did close the filehandles. ;)

Related

Perl on Linux: change locale for subprocesses

What is the correct way to change the locale for a subprocess (in Linux)?
Example, when running
perl -e 'use POSIX qw(setlocale); setlocale(POSIX::LC_ALL, "C"); open F, "locale|"; while (<F>) { print if /LC_MESS/ }; close F'
I get the answer LC_MESSAGES="ca_ES.UTF-8" but I would like to obtain LC_MESSAGES="C". Whatever I've tried I can't seem to change it.
Note: I know about doing LC_ALL=C perl ..... but this is not what I want todo, I neet to change the locale inside the Perl script.
I'm picking up on Ted Lyngmo's comment, so credit goes to him.
You can set the environment for your code as well as subsequent sub-processes with %ENV. As with all global variables, it makes sense to only change these locally, temporarily, for your scope and smaller scopes. That's what local does.
I've also changed your open to use the three-arg form as that's more secure (even though you're not using a variable for the filename/command), and used a lexical filehandle. The lexical handle will go out of scope at the end of the block and close implicitly.
use strict;
use warnings;
use POSIX qw(setlocale);
{
setlocale(POSIX::LC_ALL, "C");
local $ENV{LC_ALL} = 'C';
open my $fh, '-|', 'locale' or die $!;
while (<$fh>) {
print if /LC_MESS/
};
}

Linux - Perl: Printing the content of the script I am executing

Is it possible to print the whole content of the script I am executing?
Since there are many things that will go on inside the script like the calling perl modules I will call in runtime (require "/dir/file";), the print lines I am executing inside an array (foreach(#array) {print "$_\n";}).
Why do I need this? To study the script generation I am making, especially when errors are occurring. Error occurred on line 2000 (even I have only 1 thousand lines of script).
There are probably better ways to debug a script (the perl debugger, using Carp::Always to get stack traces with any errors and warnings), but nonetheless there are at least two three mechanisms for obtaining the source code of the running script.
Since $0 contains the name of the file that perl is executing, you can read from it.
open my $fh, '<', $0;
my #this_script = <$fh>;
close $fh;
If a script has the __DATA__ or __END__ token in its source, then Perl also sets up the DATA file handle. Initially, the DATA file handle points to the text after
the __DATA__ or __END__ token, but it is actually opened to the whole source file, so you can seek to the beginning of that file handle and access the entire script.
seek DATA, 0, 0;
my #this_script = <DATA>;
HT Grinnz: the token __FILE__ in any Perl source file refers to the name of the file that contains that token
open my $fh, '<', __FILE__;
my #this_file = <$fh>;
close $fh;

How terrible is my Perl? Script that takes IP addresses and returns Fully Qualified Domain Names [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I invite you, tear me a new one.
This code gets the job done. It takes a .txt file containing a list of IPs and writes a file containing their respective fully qualified domain names.
I want to know in what ways is this code poorly written. What bad habits are here?
I am a perl and programming newbie. I managed to put this together using google and trail and error. Getting it to work was satisfying but please tell me how I can improve.
use strict;
use warnings;
use Socket;
use autodie;
my $filename = 'IPsForFQDN.txt';
#File with list of IPs to lookup.
#One IP address per line like so:
#10.10.10.10
#10.10.10.11
#10.10.10.12
#etc...
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not opne file '$filename' $!";
my $fqdn = '';
while (my $row = <$fh>) {
chomp $row;
print "$row\n";
$fqdn = gethostbyaddr(inet_aton($row), AF_INET);
print $fqdn;
print "\n";
open FILE, ">>fqdn.txt" or die $!;
print FILE $fqdn;
print FILE "\n";
close FILE;
}
print "done\n";
For instance is the {chomp $row;} line needed? I have NO IDEA what it does.
I am equally mystified by the whole {or die $!;} thing.
$! reports why something failed. Here if you were unable to open the file the reason for failure would be pointed out. perlvar has a section on error variables.
You're using chomp to remove the newline character from the end of each line.
When writing the file you call open slightly differently, consider using the same 3 argument version as you do when opening for reading earlier in your code (also see the link I gave you for open), and in the same coding style. It's good to be consistent, also this method is safer.
You're repeatedly opening fqdn.txt for every line you write. I'd just open it before the loop and close it at the end.
Oh - and you're using autodie so the or die shouldn't be necessary.
Oh - and you've used old-style open for it too, compared to new-style open for the reading file.
Not much going on at work so I had a go at a little rewrite with comments in to explain a few things. Not right/not wrong just my spin and a few of the standards we use at my place have been added.
Hope this helps...
use strict;
use warnings;
use Socket;
# initialize variables here.
my $filename = "IPsForFQDN.txt";
# open both file handles - once only
# Note safer expression using 2 commas
open(FH, "<", $filename)
or die "Could not opne file '$filename' $!";
# open FILE for appending
open FILE, ">>", "fqdn.txt" or die $!;
# use foreach instead of while - easier syntax (may provoke discussion ;-) )
# replaced $fh for FH - use file handles throughout for consitency
foreach my $row ( <FH> )
{
chomp $row;
# put a regex check in for comments
if( $row !~ m/^#/ )
{
printf ("Row in file %s \n", $row );
# initialize $fqdn here to keep it fresh
my $fqdn = gethostbyaddr(inet_aton($row), AF_INET);
# formatted print to screen (STDOUT)
printf ("FQDN %s \n", $fqdn);
# formatted print to output file
printf FILE ("%s \n", $fqdn);
}
}
# close both file handles - once only
close FILE;
close FH;
print "done\n";

Convert an excel file to txt and open in perl

I have an excel file with my data. I saved it as a tab delimited txt file.
But if I do a simple perl script:
open(IN, '<', 'myfile.txt') or die;
while (defined(my $line = <IN>)){
print "$line\n";
}
close IN;
it only prints out one line, but it contains all the data - just in one line
If I use another data file, there are no problems, so i think there is a problem convertin the excel file to a txt file.
can anybody help me?
try while (<IN>) instead. Your condition beats the while magic..
I'd change the loop to:
while(my $line = <IN>) { ... }
There's no need to use defined().
I am not sure if have this answered yet. But, first make sure you have the following in your code:
use strict;
use warnings;
This will give you debugging help that you would receive otherwise. Using the above will give you more messages that can help.
When I put your open command in a current program I am working on I received this debugging message:
Name "main::IN" used only once: possible typo at ./test.pl line 37
You also may want to use a file handle so Perl can remember where go. This is the "new" way to open files in Perl and is explained on the online perldoc. Just search for "perl file handle open." I learned to do my open's this way:
open my $in '<', 'myfile.txt' or die;
Then, you can just run the following:
while ( my $line = <$in> ) { ... }
There is a better way to do this if you ever have been introduced to Perl's default variable, yet I don't think that you have so the above solution may be the best.

Resolving Out of Memory error when executing Perl script

I'm attempting to build a n-gram language model based on the top 100K words found in the english language wikipedia dump. I've already extracted out the plain text with a modified XML parser written in Java, but need to convert it to a vocab file.
In order to do this, I found a perl script that is said to do the job, but lacks instructions on how to execute. Needless to say, I'm a complete newbie to Perl and this is the first time I've encountered a need for its usage.
When I run this script, I'm getting an Out of Memory Error when using this on a 7.2GB text file on two separate dual core machines with 4GB RAM and runnung Ubuntu 10.04 and 10.10.
When I contacted the author, he said this script ran fine on a MacBook Pro with 4GB RAM, and the total in-memory usage was about 78 MB when executed on a 6.6GB text file with perl 5.12. The author also said that the script reads the input file line by line and creates a hashmap in memory.
The script is:
#! /usr/bin/perl
use FindBin;
use lib "$FindBin::Bin";
use strict;
require 'english-utils.pl';
## Create a list of words and their frequencies from an input corpus document
## (format: plain text, words separated by spaces, no sentence separators)
## TODO should words with hyphens be expanded? (e.g. three-dimensional)
my %dict;
my $min_len = 3;
my $min_freq = 1;
while (<>) {
chomp($_);
my #words = split(" ", $_);
foreach my $word (#words) {
# Check validity against regexp and acceptable use of apostrophe
if ((length($word) >= $min_len) && ($word =~ /^[A-Z][A-Z\'-]+$/)
&& (index($word,"'") < 0 || allow_apostrophe($word))) {
$dict{$word}++;
}
}
}
# Output words which occur with the $min_freq or more often
foreach my $dictword (keys %dict) {
if ( $dict{$dictword} >= $min_freq ) {
print $dictword . "\t" . $dict{$dictword} . "\n";
}
}
I'm executing this script from the command line via mkvocab.pl corpus.txt
The included extra script is simply a regex script to test the placement of apostrophe's and whether they match English grammar rules.
I thought the memory leak was due to the different versions, as 5.10 was installed on my machine. So I upgraded to 5.14, but the error still persists. According to free -m, I have approximately 1.5GB free memory on my system.
As I am completely unfamiliar with the syntax and structure of language, can you point out the problem areas along with why the issue exists and how to fix it.
Loading a 7,2Gb file into a hash could be possible if there is some repetition in the words, e.g. the occurs 17,000 times, etc. It seems to be rather a lot, though.
Your script assumes that the lines in the file are appropriately long. If your file does not contain line breaks, you will load the whole file into memory in $_, then double that memory load with split, and then add quite a whole lot more into your hash. Which would strain any system.
One idea may be to use space " " as your input record separator. It will do approximately what you are already doing with split, except that it will leave other whitespace characters alone, and will not trim excess whitespace as prettily. For example:
$/ = " ";
while (<>) {
for my $word ( split ) { # avoid e.g. "foo\nbar" being considered one word
if (
(length($word) >= $min_len) &&
($word =~ /^[A-Z][A-Z\'-]+$/) &&
(index($word,"'") < 0 || allow_apostrophe($word))
) {
$dict{$word}++;
}
}
}
This will allow even very long lines to be read in bite size chunks, assuming you do have spaces between the words (and not tabs or newlines).
Try running
dos2unix corpus.txt
It is possible that you are reading the entire file as one line...

Resources