Changing the names of multiple files using 'rename' - linux

I've about 100 files, I'd like to change the letter P to the letter B in all file names. But when using the rename command:
rename 's/\P/\Bi/' *.txt
I get the following message
Empty \P{} in regex; marked by <-- HERE in m/\P <-- HERE / at (eval 1) line 1.
Help Please
Thanks

Two mistakes. First you don't need to escape P and B since they are not special characters. Second if that i is meant for case insensitivity then that is a flag and needs to be at the end.
rename 's/P/B/i' *.txt
This will change the first occurrence of P or p to B. If you want to change all occurrences then use g flag which means global like this:
rename 's/P/B/ig' *.txt
Update based on new requirement:
From anything complex, it is better off to write your own perl script. Here is a quick example using the File::Copy core module.
use strict;
use warnings;
use File::Copy;
my $dir = '/path/to/your/files/'; # path to your files
opendir my ($dh), $dir;
my #files = grep { /\.txt$/ } readdir $dh; # create a list of all files
for(#files) {
my $cnt;
my $from = $_;
chomp $from;
(my $to = $from) =~ s/(P)/++$cnt>=2 && $cnt<=3 ? "B" : $1/gie;
#print "from:$from to:$to\n";
move("$dir$from", "$dir$to") if ($to ne $from);
}
We create a temporary variable $cnt and with the following condition check if the the character in question is second or third. If it is we replace it with B else we retain it as is.
++$cnt>=2 && $cnt<=3 ? "B" : $1

Dont escape letter, also put regexp modifiers after close slash
rename 's/P/B/ig' *.txt
For more information about regexp see perlre

Related

Find and replace words using sed command not working

I have a a text file which is tab separated, the first column holds the word to be found and the second column holds the word to replace the found word. This text file contains English and Arabic pairs. Once the word is found and replaced it should not be changed again.
For example:
adam a +dam
a b
ال ال+
So for a given text file:
adam played with a ball ال
I expect:
a +dam played with b ball ال+
However, I get:
b +dbm plbyed with b bbll ال+
I am using the following sed command to find and replace:
sed -e 's/^/s%/' -e 's/\t/%/' -e 's/$/%g/' tab_sep_file.txt | sed -f - original_file.txt >replaced.txt
How can I fix this issue
The basic problem to your approach is that you don't want to replace matched text in a prior substitution with a later one - you don't want to change the a's in a +dam to b's. This makes sed a pretty poor choice - you can make a regular expression that matches all of the things you want to replace fairly easily, but picking which replacement to use is an issue.
A way using GNU awk:
gawk -F'\t' '
FNR == NR { subs[$1] = $2; next } # populate the array of substitutions
ENDFILE {
if (FILENAME == ARGV[1]) {
# Build a regular expression of things to substitute
subre = "\\<("
first=0
for (s in subs)
subre = sprintf("%s%s%s", subre, first++ ? "|" : "", s)
subre = sprintf("%s)\\>", subre)
}
}
{
# Do the substitution
nwords = patsplit($0, words, subre, between)
printf "%s", between[0]
for (n = 1; n <= nwords; n++)
printf "%s%s", subs[words[n]], between[n]
printf "\n"
}
' tab_sep_file.txt original_file.txt
which outputs
a +dam played with b ball
First it reads the TSV file and builds an array of words to be replaced and text to replace it with (subs). Then after reading that file, it builds a regular expression to match all possible words to be found - \<(a|adam)\> in this case. The \< and \> match only at the beginning and end, respectively, of words, so the a in ball won't match.
Then for the second file with the text you want to process, it uses patsplit() to split each line into an array of matched parts (words) and the bits between matches (between), and iterates over the length of the array, printing out the replacement text for each match. That way it avoids re-matching text that's already been replaced.
And a perl version that uses a similar approach (Taking advantage of perl's ability to evaluate the replacement text in a s/// substitution):
perl -e '
use strict;
use warnings;
# Set file/standard stream char encodings from locale
use open ":locale";
# Or for explicit UTF-8 text
# use open ":encoding(UTF-8)", ":std";
my %subs;
open my $words, "<", shift or die $!;
while (<$words>) {
chomp;
my ($word, $rep) = split "\t" ,$_, 2;
$subs{$word} = $rep;
}
my $subre = "\\b(?:" . join("|", map { quotemeta } keys %subs) . ")\\b";
while (<<>>) {
print s/$subre/$subs{$&}/egr;
}
' tab_sep_file.txt original_file.txt
(This one will escape regular expression metacharacters in the words to replace, making it more robust)

How can I remove all vowels in a sentence except for the vowel at the first letter?

What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)
First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines
The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs

PERL eplacing newline characters inside fish-brackets, while leaving other newline characters untouched?

I finally know how to use regular expressions to replace one substring with another every place where it occurs within a string. But what I need to do now is a bit more complicated than that.
A string I must transform will have many instances of the newline character ('\n'). If those newline character are enclosed within fish-tags (between '<' and '>') I need to replace it with a simple whitespace character (' ').
However, if a newline character occurs anywhere else in the string, I need to leave that newline character alone.
There will be several places in the string that are enclosed in fish-tags, and several places that aren't.
Is there a way to do this in PERL?
I honestly don't recommend doing this with regular expressions. Besides the fact that you should never parse html with a regular expression, it's also a pain to do negative matches with regular expressions and anyone reading the code will honestly have no idea what you just did. Doing it manually on the other hand is really easy to understand.
This code assumes well formed html that doesn't have tags starting inside the definition of other tags (otherwise you would have to track all the instances and increment/decrement a count appropriately) and it does not handle < or > inside quoted strings which isn't the most common thing. And if you're doing all that I really recommend you use a real html parser, there are many of them.
Obviously if you're not reading this from a filehandle, the loop would be going over an array of lines (or the output of splitting the whole text, though you would instead be appending ' ' or "\n" depending on the inside variable if you split since it would remove the newline)
use strict;
use warnings;
# Default to being outside a tag
my $inside = 0;
while(my $line = <DATA>) {
# Find the last < and > in the string
my ($open, $close) = map { rindex($line, $_) } qw(< >);
# Update our state accordingly.
if ($open > $close) {
$inside = 1;
} elsif ($open < $close) {
$inside = 0;
}
# If we're inside a tag change the newline (last character in the line) with a space. If you instead want to remove it you can use the built-in chomp.
if ($inside) {
# chomp($line);
substr($line, -1) = ' ';
}
print $line;
}
__DATA__
This is some text
and some more
<enclosed><a
b
c
> <d
e
f
>
<g h i
>
Given:
$ echo "$txt"
Line 1
Line 2
< fish tag line 1
and line 2 >
< line 3 >
< fish tag line 4
and line 5 >
You can do:
$ echo "$txt" | perl -0777 -lpe "s/(<[^\n>]*)\n+([^>]*>)/\1\2/g"
Line 1
Line 2
< fish tag line 1 and line 2 >
< line 3 >
< fish tag line 4 and line 5 >
I will echo that this only works in limited cases. Please do not get in the general habit of using a regex for HTML.
This solution uses zdim's data (thanks, zdim)
I prefer to use an executable replacement together with the non-destructive option of the tr/// operator
This solution finds all occurrences of strings enclosed in angle brackets <...> and alters all newlines within each one to single spaces
Note that it would be simple to allow for quoted substrings containing any characters by writing this instead
$data =~ s{ ( < (?: "[^"]+" | [^>] )+ > ) }{ $1 =~ tr/\n/ /r }gex;
use strict;
use warnings 'all';
use v5.14; # For /r option
my $data = do {
local $/;
<DATA>;
};
$data =~ s{ ( < [^<>]+ > ) }{ $1 =~ tr/\n/ /r }gex;
print $data;
__DATA__
start < inside tags> no new line
again <inside, with one nl
> out
more <inside, with two NLs
and more text
>
output
start < inside tags> no new line
again <inside, with one nl > out
more <inside, with two NLs and more text >
The (X)HTML/XML shouldn't be parsed with regex. But since no description of the problem is given here is a way to go at it. Hopefully it demonstrates how tricky and involved this can get.
You can match a newline itself. Together with details of how linefeeds may come in text
use warnings;
use strict;
my $text = do { # read all text into one string
local $/;
<DATA>;
};
1 while $text =~ s/< ([^>]*) \n ([^>]*) >/<$1 $2>/gx;
print $text;
__DATA__
start < inside tags> no new line
again <inside, with one nl
> out
more <inside, with two NLs
and more text
>
This prints
start < inside tags> no new line
again <inside, with one nl > out
more <inside, with two NLs and more text >
The negated character class [^>] matches anything other than >, optionally and any number of times with *, up to an \n. Then another such pattern follows \n, up to the closing >. The /x modifier allows spaces inside, for readability. We also need to consider two particular cases.
There may be multiple \n inside <...>, for which the while loop is a clean solution.
There may be multiple <...> with \n, which is what /g is for.
The 1 while ... idiom is another way to write while (...) { }, where the body of the loop is empty so everything happens in the condition, which is repeatedly evaluated until false. In our case the substitution keeps being done in the condition until there is no match, when the loop exits.
Thanks to ysth for bringing up these points and for the 1 while ... solution.
All of this necessary care for various details and edge cases (of which there may be more) hopefully convinces you that it is better to reach for an HTML parsing module suitable for the particular task. For this we'd need to know more about the problem.

How do I include new lines in a string in Perl?

I have a string that looks like this
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
I am trying to added in new lines so get in list format. Like this
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
I have a perl script
use strict;
use warnings;
open my $new_tree_fh, '>', 'test_match.txt'
or die qq{Failed to open "update_color.txt" for output: $!\n};
open my $file, '<', $ARGV[0]
or die qq{Failed to open "$ARGV[0]" for input: $!\n};
while ( my $string = <$file> ) {
my $splitmessage = join ("\n", ($string =~ m/(.+)+\,+\#+\w{6}/gs));
print $new_tree_fh $splitmessage, "\n";
}
close $file;
close $new_tree_fh;
The pattern match works but it wont print the new line as I want to make the list. Can anyone please suggest anything.
I'd do:
my $str = 'Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5';
$str =~ s/(?<=,#\w{6})/\n/g;
say $str;
Output:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
OK, I think your problem here is that your regular expression doesn't match properly.
(.+)+
for example - probably doesn't do what you think it does. It's a greedy capture of 1 or more of "anything" which will grab your whole string.
Check it out on regex101.
Try:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $string = <DATA> ) {
my $splitmessage = join( "\n", ( $string =~ m/(\w+,\#+\w{6})/g ) );
print $splitmessage, "\n";
}
__DATA__
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
Which will print:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
Rather than a quickfix solution, let's find the problem in your existing code and hence learn from it. Your problem is in the regular expression, so we'll dissect and fix it.
($string =~ m/(.+)+\,+\#+\w{6}/gs)
First, the two significant mistakes that lead to the bug:
At the beginning, you're doing a .+, followed by matching with , and # and so on. The problem is, .+ is greedy, which means it'll match upto the last , in the input, and not the first one. So when you run this, almost the entire line (except for the last plant's color) gets matched up by this single .+.
There are a few different ways you can fix this, but the easiest is to restrict what you're matching. Instead of saying .+ "match anything", make it [\w\s]+ at the beginning - which means match either "word characters" (which includes alphabets and digits) or space characters (since there is a space in the middle of the plant name).
($string =~ m/([\w\s]+)+\,+\#+\w{6}/gs)
That changes the output, but still not to the fully correct version because:
m/some regex/g returns a list of its matches as a list here, and what we want is for it to return the whole match including both plant name and color. But, when there are paranthesis inside the match anywhere, m/ returns only the part matched by the paranthesis (which is the plant name here), not the whole match. So, remove the paranthesis, and it becomes:
($string =~ m/[\w\s]++\,+\#+\w{6}/gs)
This works, but is quite clumsy and bug-prone, so here's some improvement suggestions:
Since your input has no newline characters, the /s at the end is unnecessary.
($string =~ m/[\w\s]++\,+\#+\w{6}/g)
, and # are not a special character in perl regular expressions, so they don't need a \ before them.
($string =~ m/[\w\s]++,+#+\w{6}/g)
+ is for when you know only that the character will be present, but don't know how many times it'll be there. Here, since we're only trying to match one , and one # characters, the + after them is unnecessary.
($string =~ m/[\w\s]++,#\w{6}/g)
The ++ after [\w\s] means something quite different from + (basically an even greedier match than usual), so let's make it a single +
($string =~ m/[\w\s]+,#\w{6}/g)
Optionally, you can change the last \w to match only the hexadecimal characters which will appear in the colour code:
($string =~ m/[\w\s]+,#[0-9A-F]{6}/g)
That's a pretty solid, working regular expression that does what you want.

Using Perl to remove n characters from the end of multiple lines

I want to remove n characters from each line using PERL.
For example, I have the following input:
catbathatxx (length 11; 11%3=2 characters) (Remove 2 characters from this line)
mansunsonx (length 10; 10%3=1 character) (Remove 1 character from this line)
#!/usr/bin/perl -w
open FH, "input.txt";
#array=<FH>;
foreach $tmp(#array)
{
$b=length($tmp)%3;
my $c=substr($tmp, 0, length($tmp)-$b);
print "$c\n";
}
I want to output the final string (after the characters have been removed).
However, this program is not giving the correct result. Can you please guide me on what the mistake is?
Thanks a lot. Please let me know if there are any doubts/clarifications.
I am assuming trailing whitespace is not significant.
#!/usr/bin/env perl
use strict; use warnings;
use constant MULTIPLE_OF => 3;
while (my $line = <DATA>) {
$line =~ s/\s+\z//;
next unless my $length = length $line;
my $chars_to_remove = $length % MULTIPLE_OF;
$line =~ s/.{$chars_to_remove}\z//;
print $line, "\n";
}
__DATA__
catbathatxx
mansunsonx
0123456789
012345678
The \K regex sequence makes this a lot clearer; it was introduced in Perl v5.10.0.
The code looks like this
use 5.10.0;
use warnings;
for (qw/ catbathatxx mansunsonx /) {
(my $s = $_) =~ s/^ (?:...)* \K .* //x;
say $s;
}
output
catbathat
mansunson
In general you would want to post the result you are getting. That being said...
Each line in the file has a \n (or \r\n on windows) on the end of it that you're not accounting for. You need to chomp() the line.
Edit to add: My perl is getting rusty from non-use but if memory serves me correct you can actually chomp() the entire array after reading the file: chomp(#array)
You should use chomp() on your array, like this:
#array=<FH>;
chomp(#array);
perl -plwe 'chomp; $c = length($_) % 3; chop while $c--' < /tmp/zock.txt
Look up the options in perlrun. Note that line endings are characters, too. Get them out of the way using chomp; re-add them on output using the -l option. Use chop to efficiently remove characters from the end of a string.
Reading your code, you are trying to print just the first 'nx3' characters for the largest value of n for each line.
The following code does this using a simple regular expression.
For each line, it first removes the line ending, then greedy matches
as many .{3} as it can (. matches any character, {3} asks for exactly 3 of them).
The memory requirement of this approach (compared with using an array the size of your file) is fixed. Not too important if your file is small compared with your free memory, but sometimes files are gigabytes, and sometimes memory is very small.
It's always worth using variable names that reflect the purpose of the variable, rather than things like $a or #array. In this case I used only one variable, which I called $line.
It's also good practice to close files as soon as you have finished with them.
#!/usr/bin/perl
use strict;
use warnings; # This will apply warnings even if you use command perl to run it
open FH, '<', 'input.txt'; # Use three part file open - single quote where no interpolation required.
for my $line (<FH>){
chomp($line);
$line =~ s/((.{3})*).*/$1\n/;
print $line;
}
close FH;

Resources