Perl URL replace - string

I am trying to achieve following task,
Extract all urls from the text.
If domain belongs to white list, then replace them with modified urls.
Following is the code.
$text = '<img src="http://www.testurl.de/Sasdfhopr.jpg" width="80%">';
$regex = '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?';
#whiteList = ("www.amazon.de");
while ($text =~ /$regex/g) {
# regex result has following groups as matches
# $1 = scheme
# $2 = domain
# $3 = query parameters
# check if domain is in white list
if ( grep( /^$2$/, #whiteList ) ) {
# build new url
$new = "http://test.xyz.pqr/url=".$1."://".$2.$3;
# recreate old url
$old = $1."://".$2.$3;
# replace it here, but its not replacing
$text =~ s/$old/$new/g;
# but as an example replacing
# domain name with test, its working.
# it appears to be something to with back slash or forward
# slashes
$text =~ s/$2/test/g;
}
} print $text;
Any help or hint would be great. As I am new to perl programming.

I would use Regexp::Common in conjunction with Regexp::Common::URI to locate the URLs, and
URI to parse and transform them
Your very minimal data sample doesn't help, but here is a proof of my idea using that data
use strict;
use warnings 'all';
use Regexp::Common 'URI';
use URI;
use List::Util 'any';
use constant NEW_HOST => 'test.xyz.pqr';
my $text = <<'END';
<a href="http://www.amazon.de/Lenovo-Moto-Smartphone-Android-schwarz/dp/B01FLZC8ZI">
<img src="http://www.testurl.de/Sasdfhopr.jpg" width="80%">
</a>
END
my #white_list = qw/ www.amazon.de /;
$text =~ s{ ( $RE{URI}{HTTP} ) } {
my $uri = URI->new($1);
my $host = $uri->host;
$uri->host(NEW_HOST) if any { $host eq $_ } #white_list;
$uri->as_string;
}exg;
print $text, "\n";
output
<a href="http://test.xyz.pqr/Lenovo-Moto-Smartphone-Android-schwarz/dp/B01FLZC8ZI">
<img src="http://www.testurl.de/Sasdfhopr.jpg" width="80%">
</a>

The URL in $old contains characters that Perl's regex engine treats as part of the pattern, not as literal characters, when you use it inside the pattern match.
$text =~ s/$old/$new/g;
You need to escape those. You can do that with the \Q and \E commands.
$text =~ s/\Q$old\E/$new/g;
That should do the trick, assuming the rest of your code is working, which I have not tried.

Related

remove a space from a perl variable

I am having a lot of trouble doing a simple search and replace. I tried the solution offered in
How do I remove white space in a Perl string?
but was unable to print this.
Here is my sample code:
#!/usr/bin/perl
use strict;
my $hello = "hello world";
print "$hello\n"; #this should print out >> hello world
#now i am trying to print out helloworld (space removed)
my $hello_nospaces = $hello =~ s/\s//g;
#my $hello_nospaces = $hello =~ s/hello world/helloworld/g;
#my $hello_nospaces = $hello =~ s/\s+//g;
print "$hello_nospaces\n"
#am getting a blank response when i run this.
I tried a couple of different ways, but I was unable to do this.
My end result is to automate some aspects of moving files around in a linux environment, but sometimes the files have spaces in the name, so I want to remove the space from the variable.
You're almost there; you're just confused about operator precedence. The code you want to use is:
(my $hello_nospaces = $hello) =~ s/\s//g;
First, this assigns the value of the variable $hello to the variable $hello_nospaces. Then it performs the substitution operation on $hello_nospaces, as if you said
my $hello_nospaces = $hello;
$hello_nospaces =~ s/\s//g;
Because the bind operator =~ has higher precedence than the assignment operator =, the way you wrote it
my $hello_nospaces = $hello =~ s/\s//g;
first performs the substitution on $hello and then assigns the result of the substitution operation (which is 1 in this case) to the variable $hello_nospaces.
As of 5.14, Perl provides a non-destructive s/// option:
Non-destructive substitution
The substitution (s///) and transliteration (y///) operators now support an /r option that copies the input variable, carries out the substitution on the copy, and returns the result. The original remains unmodified.
my $old = "cat";
my $new = $old =~ s/cat/dog/r;
# $old is "cat" and $new is "dog"
This is particularly useful with map. See perlop for more examples.
So:
my $hello_nospaces = $hello =~ s/\s//gr;
should do what you want.
You just need to add parentheses so Perl's parser can understand what you want it to do.
my $hello = "hello world";
print "$hello\n";
to
(my $hello_nospaces = $hello) =~ s/\s//g;
print "$hello_nospaces\n";
## prints
## hello world
## helloworld
Split this line:
my $hello_nospaces = $hello =~ s/\s//g;
Into those two:
my $hello_nospaces = $hello;
$hello_nospaces =~ s/\s//g;
From the official Perl Regex Tutorial:
If there is a match, s/// returns the number of substitutions made; otherwise it returns false.

Perl Inserting a string from a file after every occurence of a slash in a url

I have the following URL's:
FILE1.txt
http://www.stackoveflow.com/dog/cat/rabbit/hamster/
192.168.192.168/lion/tiger/elephant/
FILE2.txt
HELLO
GOODBYE
The output I am trying to achieve:
http://www.stackoveflow.com/dogHELLO/cat/rabbit/hamster/
http://www.stackoveflow.com/dog/catHELLO/rabbit/hamster/
http://www.stackoveflow.com/dog/cat/rabbitHELLO/hamster/
http://www.stackoveflow.com/dog/cat/rabbit/hamsterHELLO/
http://www.stackoveflow.com/dog/cat/rabbit/hamster/HELLO
http://www.stackoveflow.com/dogGOODBYE/cat/rabbit/hamster/
http://www.stackoveflow.com/dog/catGOODBYE/rabbit/hamster/
http://www.stackoveflow.com/dog/cat/rabbitGOODBYE/hamster/
http://www.stackoveflow.com/dog/cat/rabbit/hamsterGOODBYE/
http://www.stackoveflow.com/dog/cat/rabbit/hamster/GOODBYE
192.168.192.168/lionHELLO/tiger/elephant/
192.168.192.168/lion/tigerHELLO/elephant/
192.168.192.168/lion/tiger/elephantHELLO/
192.168.192.168/lion/tiger/elephant/HELLO
192.168.192.168/lionGOODBYE/tiger/elephant/
192.168.192.168/lion/tigerGOODBYE/elephant/
192.168.192.168/lion/tiger/elephantGOODBYE/
192.168.192.168/lion/tiger/elephant/GOODBYE
As you can see the strings HELLO and GOODBYE are inserted after every slash, and if there is already a string after the slash it will append the HELLO and GOODBYE after that (e.g http://www.stackoveflow.com/dogHELLO/cat/rabbit/hamster/ and so on).
What I have tried
use strict;
use warnings;
my #f1 = do {
open my $fh, '<', 'FILE1.txt';
<$fh>;
};
chomp #f1;
my #f2 = do {
open my $fh, '<', 'FILE2.txt';
<$fh>;
};
chomp #f2;
for my $f1 (#f1) {
my #fields = $f1 =~ m{[^/]+}g;
for my $f2 (#f2) {
for my $i (0 .. $#fields) {
my #new = #fields;
$new[$i] .= $f2;
print qq{/$_/\n}, for join '/', #new;
}
print "\n\n";
}
}
#courtesy of Borodin
However this code does not cater for url's that have the slashes in the http:// part as these are replaced with http:HELLO/ when it should not do.
Also it does not put HELLO or GOODBYE after the slash if there is no string already there e.g http://www.stackoveflow.com/dog/cat/rabbit/hamster/<--SHOULD PUT HELLO AFTER THIS SLASH AS WELL BUT DOSN'T
It appears that this code removes then re-inserts the slashes with the strings from FILE2.txt, as opposed to inserting HELLO and GOODBYE in the correct place to start with.
My question
Is there a better method of going about achieving the output I require or is there something I can do to my existing code to cater for the problems described above?
Your help is much appreciated, many thanks
Here is the algorithm in prose:
Open File2.txt. Read in all lines, removing the newline. We call the array #words.
Open File2.txt. We call the file handle $fh.
As long as we can read a $line from $fh:
Remove the newline, remove starting and ending slashes.
Split the $line at every slash, call the array #animals.
Loop through the #words, calling each element $word:
Loop through the indices of the #animals, calling each index $i:
Make a #copy of the #animals.
Append the $word to the $i-th element of #copy.
Join the #copy with slashes, surround it with slashes, and print with newline.
Print an empty line.
This program will do what you ask.
use strict;
use warnings;
use autodie;
my #f1 = do {
open my $fh, '<', 'FILE1.txt';
<$fh>;
};
chomp #f1;
my #f2 = do {
open my $fh, '<', 'FILE2.txt';
<$fh>;
};
chomp #f2;
for my $f1 (#f1) {
my #fields = $f1 =~ m{[^/]+}g;
for my $f2 (#f2) {
for my $i (0 .. $#fields) {
my #new = #fields;
$new[$i] .= $f2;
print qq{/$_/\n}, for join '/', #new;
}
print "\n\n";
}
}
output
/dogHELLO/cat/rabbit/hamster/
/dog/catHELLO/rabbit/hamster/
/dog/cat/rabbitHELLO/hamster/
/dog/cat/rabbit/hamsterHELLO/
/dogGOODBYE/cat/rabbit/hamster/
/dog/catGOODBYE/rabbit/hamster/
/dog/cat/rabbitGOODBYE/hamster/
/dog/cat/rabbit/hamsterGOODBYE/
/lionHELLO/tiger/elephant/
/lion/tigerHELLO/elephant/
/lion/tiger/elephantHELLO/
/lionGOODBYE/tiger/elephant/
/lion/tigerGOODBYE/elephant/
/lion/tiger/elephantGOODBYE/
Rather than splitting the line on every slash, you can do it all with a regex.
Updated version:
#!usr/bin/perl
use strict;
use warnings;
my #insert_words = qw/HELLO GOODBYE/;
my $word = 0;
while (<DATA>)
{
chomp;
foreach my $word (#insert_words)
{
my $repeat = 1;
while ((my $match=$_) =~ s|(?<!/)(?:/(?!/)[^/]*){$repeat}[^/]*\K|$word|)
{
print "$match\n";
$repeat++;
}
print "\n";
}
}
__DATA__
/dog/cat/rabbit/hamster/
http://www.stackoverflow.com/dog/cat/rabbit/hamster/
The key is the substitution operator: s|(?<!/)(?:/(?!/)[^/]*){$repeat}[^/]*\K|$word|.
(?<!/) and (?!/) are negative look-behind and look-ahead, respectively. They ensure that we are only matching a single /, thus ignoring http://.
(?:/(?!/)[^/]*){$repeat} is a capturing group that must match a specified number of times, and we increase that number until it no longer matches.
I had to use [^/]* instead of [^/]+ to meet your requirement of matching at the end of the string. That is why both the look-behind and the look-ahead are needed.
\K means "match everything up to this point, but don't include it in the match itself." Thus we don't have to worry about including the whole beginning of the string that matched in the replacement.
Note: The r option is another way to perform substitution without modifying the original string. However, it requires Perl 5.16 (thanks Amon). Thus I removed it from the example.

Perl: Removing characters up to certain point.

I've tried searching through questions already asked, but can't seem to find anything. I'm sure its incredibly simple to do, but I am completely new to Perl.
What I am trying to do is remove characters in an string up to a certain point. For example, I have:
Parameter1 : 0xFFFF
and what I would like to do is remove the "Parameter1:" and be left with just the "0xFFFF". If anyone can help and give a simple explanation of the operators used, that'd be great.
Sounds like you need the substr function.
#!/usr/bin/perl
use strict;
use warnings;
my $string = 'Parameter1 : 0xFFFF';
my $fragment = substr $string, 12;
print " string: <$string>\n";
print "fragment: <$fragment>\n";
s/.*:\s*//;
or
$s =~ s/.*:\s*//;
This deletes everything up to and including the first occurrence of : followed by zero or more whitespace characters. With $s =~ it's applied to $s; without it, it's applied to $_.
Have you considered using something like Config::Std?
Here is how to parse a configuration file like that by hand:
#!/usr/bin/perl
use strict; use warnings;
my %params;
while ( my $line = <DATA> ) {
if ($line =~ m{
^
(?<param> Parameter[0-9]+)
\s*? : \s*?
(?<value> 0x[[:xdigit:]]+)
}x ) {
$params{ $+{param} } = $+{value};
}
}
use YAML;
print Dump \%params;
__DATA__
Parameter1 : 0xFFFF
Parameter3 : 0xFAFF
Parameter4 : 0xCAFE
With Config::Std:
#!/usr/bin/perl
use strict; use warnings;
use Config::Std;
my $config = do { local $/; <DATA> };
read_config \$config, my %params;
use YAML;
print Dump \%params;
__DATA__
Parameter1 : 0xFFFF
Parameter3 : 0xFAFF
Parameter4 : 0xCAFE
Of course, in real life, you'd pass a file name to read_config instead of slurping it.
I like split for these parameter/value pairs.
my $str = "Parameter1 : 0xFFFF";
my ($param, $value) = split /\s*:\s*/, $str, 2;
Note the use of LIMIT in the split, which limits the split to two fields (in case of additional colons in the value).

Remove Part of String Perl

I have this in perl
return "$file->{srv_cgi_url}/dl.cgi/$hash/$fname";
where
$file->{srv_cgi_url}
returns
http://s1.site.com/cgi-bin/
how can I remove the trailing /cgi-bin/ from the string? :)
Thanks!
Like this:
my $new = $file->{srv_cgi_url};
$new =~ s{/cgi-bin/}{};
That is all. See perldoc perlre for details.
While substitution can work, it’s fragile and difficult to extend and maintain. I strenuously recommend learng to use URI, URI::QueryParam, and Path::Class instead (the last is not used in this example but important and related).
use warnings;
use strict;
use URI;
my $file;
$file->{srv_cgi_url} = "http://s1.site.com/cgi-bin/";
my $srv_cgi_uri = URI->new( $file->{srv_cgi_url} );
my $hash = "some";
my $fname = "path.ext";
$srv_cgi_uri->path("/dl.cgi/$hash/$fname");
print $srv_cgi_uri, "\n";
__END__
http://s1.site.com/dl.cgi/some/path.ext

How can I parse people's full names into user names in Perl?

I need to convert a name in the format Parisi, Kenneth into the format kparisi.
Does anyone know how to do this in Perl?
Here is some sample data that is abnormal:
Zelleb, Charles F.,,IV
Eilt, John,, IV
Wods, Charles R.,,III
Welkt, Craig P.,,Jr.
These specific names should end up as czelleb, jeilt, cwoods, cwelkt, etc.
I have one more condition that is ruining my name builder
O'Neil, Paulso far, Vinko Vrsalovic's answer is working the best when weird/corrupt names are in the mix, but this example above would come out as "pneil"... id be damned below judas if i cant get that o between the p and the n
vinko#parrot:~$ cat genlogname.pl
use strict;
use warnings;
my #list;
push #list, "Zelleb, Charles F.,,IV";
push #list, "Eilt, John,, IV";
push #list, "Woods, Charles R.,,III";
push #list, "Welkt, Craig P.,,Jr.";
for my $name (#list) {
print gen_logname($name)."\n";
}
sub gen_logname {
my $n = shift;
#Filter out unneeded characters
$n =~ s/['-]//g;
#This regex will grab the lastname a comma, optionally a space (the
#optional space is my addition) and the first char of the name,
#which seems to satisfy your condition
$n =~ m/(\w+), ?(.)/;
return lc($2.$1);
}
vinko#parrot:~$ perl genlogname.pl
czelleb
jeilt
cwoods
cwelkt
I would start by filtering the abnormal data so you only have regular names. Then something like this should do the trick
$t = "Parisi, Kenneth";
$t =~ s/(.+),\s*(.).*/\l$2\l$1/;
Try:
$name =~ s/(\w+),\s(\w)/$2$1/;
$name = lc $name;
\w here matches an alphanumerical character. If you want to be more specific, you could also use [a-z] instead, and pass the i flag (case insensitive):
$name =~ s/([a-z]+)\s([a-z])/$2$1/i;
Here's a one line solution, assuming you store all the names in a file called "names" (one per line) and you will do duplicated name detection somehow later.
cat names | perl -e 'while(<>) {/^\s*(\S*)?,\s*(\S)/; print lc "$2$1\n";}' | sed s/\'//g
It looks like your input data is comma-separated. To me, the clearest way to do this would be split into components, and then generate the login names from that:
while (<>) {
chomp;
my ($last, $first) = split /,/, lc $_;
$last =~ s/[^a-z]//g; # strip out nonletters
$first =~ s/[^a-z]//g; # strip out nonletters
my $logname = substr($first, 0, 1) . $last;
print $logname, "\n";
}
$rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
$rowfetch =~ m/(\w+), ?(.)/;
$rowfetch = lc($2.$1);
this is how I ended up using Vinko Vrsalovic's solution... its inside a while loop that goes through a sql query result ... thanks again vinko
This should do what you need
use strict;
use warnings;
use 5.010;
while ( <DATA> ) {
say abbreviate($_);
}
sub abbreviate {
for ( #_ ) {
s/[-']+//g;
tr/A-Z/a-z/;
tr/a-z/ /c;
return "$2$1" if /([a-z]+)\s+([a-z])/;
}
}
__DATA__
Zelleb, Charles F.,,IV
Eilt, John,, IV
Woods, Charles R.,,III
Welkt, Craig P.,,Jr.
O'Neil, Paul
output
czelleb
jeilt
cwoods
cwelkt
poneil

Resources