How to get the charsets supported by Perl's String::Multibyte - string

I am working with the String::Multibyte module, and it seems that String::Multibyte->new() accepts a charset as the first argument and throws an exception if the charset is not supported. I think the supported charsets are defined by *.pm files in a specific directory.
What is the most robust way to get the supported charsets? Is getting the files the only way?

I am not sure I understand the value of this module, and the source code has some questionable things, but, apart from that, it looks like you are going to have to use a method similar to what the module itself does:
use File::Basename qw( basename );
use File::Spec::Functions qw( catdir );
require String::Multibyte;
my $dir = catdir(basename($INC{'String/Multibyte.pm'}), 'Multibyte');
opendir my $dh, $dir or die "Cannot open charset dir '$dir': $!";
my #charsets = grep s/[.]pm\z//, readdir $dh;
close $dh;
The code is untested.

Related

sed command working on command line but not in perl script

I have a file in which i have to replace all the words like $xyz and for them i have to substitutions like these:
$xyz with ${xyz}.
$abc_xbs with ${abc_xbc}
$ab,$cd with ${ab},${cd}
This file also have some words like ${abcd} which i don't have to change.
I am using this command
sed -i 's?\$([A-Z_]+)?\${\1}?g' file
its working fine on command line but not inside a perl script as
sed -i 's?\$\([A-Z_]\+\)?\$\{\1\}?g' file;
What i am missing?
I think adding some backslashes would help.I tried adding some but no success.
Thanks
In a Perl script you need valid Perl language, just like you need valid C text in a C program. In the terminal sed.. is understood and run by the shell as a command but in a Perl program it is just a bunch of words, and that line sed.. isn't valid Perl.
You would need this inside qx() (backticks) or system() so that it is run as an external command. Then you'd indeed need "some backslashes," which is where things get a bit picky.
But why run a sed command from a Perl script? Do the job with Perl
use warnings;
use strict;
use File::Copy 'move';
my $file = 'filename';
my $out_file = 'new_' . $file;
open my $fh, '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $out_file or die "Can't open $out_file: $!";
while (<$fh>)
{
s/\$( [^{] [a-z_]* )/\${$1}/gix;
print $fh_out $_;
}
close $fh_out;
close $fh;
move $out_file, $file or die "Can't move $out_file to $file: $!";
The regex uses a negated character class, [^...], to match any character other than { following $, thus excluding already braced words. Then it matches a sequence of letters or underscore, as in the question (possibly none, since the first non-{ already provides at least one).
With 5.14+ you can use the non-destructive /r modifier
print $fh_out s/\$([^{][a-z_]*)/\${$1}/gir;
with which the changed string is returned (and original is unchanged), right for the print.
The output file, in the end moved over the original, should be made using File::Temp. Overwriting the original this way changes $file's inode number; if that's a concern see this post for example, for how to update the original inode.
A one-liner (command-line) version, to readily test
perl -wpe's/\$([^{][a-z_]*)/\${$1}/gi' file
This only prints to console. To change the original add -i (in-place), or -i.bak to keep backup.
A reasonable question of "Isn't there a shorter way" came up.
Here is one, using the handy Path::Tiny for a file that isn't huge so we can read it into a string.
use warnings;
use strict;
use Path::Tiny;
my $file = 'filename';
my $out_file = 'new_' . $file;
my $new_content = path($file)->slurp =~ s/\$([^{][a-z_]*)/\${$1}/gir;
path($file)->spew( $new_content );
The first line reads the file into a string, on which the replacement runs; the changed text is returned and assigned to a variable. Then that variable with new text is written out over the original.
The two lines can be squeezed into one, by putting the expression from the first instead of the variable in the second. But opening the same file twice in one (complex) statement isn't exactly solid practice and I wouldn't recommend such code.
However, since module's version 0.077 you can nicely do
path($file)->edit_lines( sub { s/\$([^{][a-z_]*)/\${$1}/gi } );
or use edit to slurp the file into a string and apply the callback to it.
So this cuts it to one nice line after all.
I'd like to add that shaving off lines of code mostly isn't worth the effort while it sure can lead to trouble if it disturbs the focus on the code structure and correctness even a bit. However, Path::Tiny is a good module and this is legitimate, while it does shorten things quite a bit.

Printing all files in a directory in Perl - will not work

So I am new to Perl and trying to simply open a directory, and list all its files. When I run this very simple code below trying to print everything in /usr/bin it will not work, and no matter what I try I keep getting told 'Could not open /usr/bin: No such file or directory'.
Any help would be much appreciated!
#!/usr/bin/perl
$indir = "/usr/bin";
# read in all files from the directory
opendir (DIR, #indir) or die "Could not open $indir: $!\n";
while ($filename = readdir(DIR)) {
print "$filename\n";
}
closedir(DIR);
Here is another place where the very basic troubleshooting step of use strict; and use warnings; has been omitted, and it would have told you exactly what was wrong.
Global symbol "#indir" requires explicit package name (did you forget to declare "my #indir"?)
Of course, you'd also have to fix a few other errors (e.g. my $indir = '/usr/bin';)
I would also suggest that readdir is not well suited for this job, and would tend to recommend glob:
#!/usr/bin/env perl
use strict;
use warnings;
my $indir = "/usr/bin";
# read in all files from the directory
foreach my $filename ( glob "$indir/*" ) {
print "$filename\n";
}
Note how this differs - it prints a full path to the file, and it omits certain things (like . and ..) which is in my opinion, more generally useful. Not least because another really common error is to open my $fh, '<', $filename or die $!, forgetting that it's not in the current working directory.

GetAttributes uses wrong working directory in subthread

I used File::Find to traverse a directory tree and Win32::File's GetAttributes function to look at the attributes of files found in it. This worked in a single-threaded program.
Then I moved the directory traversal into a separate thread, and it stopped working. GetAttributes failed on every file with "The system cannot find the file specified" as the error message in $^E.
I traced the problem to the fact that File::Find uses chdir, and apparently GetAttributes doesn't use the current directory. I could work around this by passing it an absolute path, but then I could run into path length limits, and long paths are definitely going to be present where this script will run, so I really need to take advantage of chdir and relative paths.
To demonstrate the problem, here is a script which creates a file in the current directory, another file in a subdirectory, chdir's to the subdirectory, and looks for the file 3 ways: system("dir"), open, and GetAttributes.
When the script is run without arguments, dir shows the subdirectory, open finds the file in the subdirectory, and GetAttributes returns its attributes successfully. When run with --thread, all the tests are done in a subthread, and the dir and open still work, but the GetAttributes fails. Then it calls GetAttributes on the file that is in the original directory (which we have chdir'ed out of) and it finds that one! Somehow GetAttributes is using the original working directory of the process - or maybe the working directory of the main thread - unlike all the other file operations.
How can I fix this? I can guarantee that the main thread won't do any chdir'ing, if that matters.
use strict;
use warnings;
use threads;
use Data::Dumper;
use Win32::File qw/GetAttributes/;
sub doit
{
chdir("testdir") or die "chdir: $!\n";
system "dir";
my $attribs;
open F, '<', "file.txt" or die "open: $!\n";
print "open succeeded. File contents:\n-------\n", <F>, "\n--------\n";
close F;
my $x = GetAttributes("file.txt", $attribs);
print Dumper [$x, $attribs, $!, $^E];
if(!$x) {
# If we didn't find the file we were supposed to find, how about the
# bad one?
$x = GetAttributes("badfile.txt", $attribs);
if($x) {
print "GetAttributes found the bad file!\n";
if(open F, '<', "badfile.txt") {
print "opened the bad file\n";
close F;
} else {
print "But open didn't open it. Error: $! ($^E)\n";
}
}
}
}
# Setup
-d "testdir" or mkdir "testdir" or die "mkdir testdir: $!\n";
if(!-f "badfile.txt") {
open F, '>', "badfile.txt" or die "create badfile.txt: $!\n";
print F "bad\n";
close F;
}
if(!-f "testdir/file.txt") {
open F, '>', "testdir/file.txt" or die "create testdir/file.txt: $!\n";
print F "hello\n";
close F;
}
# Option 1: do it in the main thread - works fine
if(!(#ARGV && $ARGV[0] eq '--thread')) {
doit();
}
# Option 2: do it in a secondary thread - GetAttributes fails
if(#ARGV && $ARGV[0] eq '--thread') {
my $thr = threads->create(\&doit);
$thr->join();
}
Eventually, I figured out that perl is maintaining some kind of secondary cwd that only applies to perl built-in operators, while GetAttributes is using the native cwd. I don't know why it does this or why it only happens in the secondary thread; my best guess is that perl is trying to emulate the unix rule of one cwd per process, and failing because the Win32::* modules don't play along.
Whatever the reason, it's possible to work around it by forcing the native cwd to be the same as perl's cwd whenever you're about to do a Win32::* operation, like this:
use Cwd;
use Win32::FindFile qw/SetCurrentDirectory/;
...
SetCurrentDirectory(getcwd());
Arguably File::Find should do this when running on Win32.
Of course this only makes the "pathname too long" problem worse, because now every directory you visit will be the target of an absolute-path SetCurrentDirectory; try to work around it with a series of smaller SetCurrentDirectory calls and you have to figure out a way to get back where you came from, which is hard when you don't even have fchdir.

Calculating CRC32 checksum of a file

I'm trying to calculate CRC32 check sum of a file to use with this module Mod_zip, I tried to do this with PHP but unfortunately failed, even if passed won't be efficient for larger files.
I also tried linux cksum command but it calculates CRC checksum of the file.
I found that perl on linux can be used to calculate CRC32 of a file, if this is possible I could use shell_exec to import the output onto my PHP application, how can I do this?
Have you looked at Digest::CRC? From the documentation: "It contains wrapper functions with the correct parameters for CRC-CCITT, CRC-16, CRC-32 and CRC-64, as well as the CRC used in OpenPGP's ASCII-armored checksum."
use strict;
use warnings;
use Digest::CRC;
my $ctx = Digest::CRC->new( type => 'crc32' );
open my $fh, '<:raw', $ARGV[0] or die $!;
$ctx->addfile(*$fh);
close $fh;
print $ctx->hexdigest, "\n";
Command-line usage: perl script.pl inFile
Hope this helps!

How do I catch permission denied errors from the glob operator?

The following simple Perl script will list the contents of a directory, with the directory listed as an argument to the script. How, on a Linux system can I capture permission denied errors? Currently if this script is run on a directory that the user does not have read permissions to, nothing happens in the terminal.
#!/bin/env perl
use strict;
use warnings;
sub print_dir {
foreach ( glob "#_/*" )
{print "$_\n"};
}
print_dir #ARGV
The glob function does not have much error control, except that $! is set if the last glob fails:
glob "A/*"; # No read permission for A => "Permission denied"
print "Error globbing A: $!\n" if ($!);
If the glob succeeds to find something later, $! will not be set, though. For example glob "*/*" would not report an error even if it couldn't list the contents for a directory.
The bsd_glob function from the standard File::Glob module allows setting a flag to enable reliable error reporting:
use File::Glob qw(bsd_glob);
bsd_glob("*/*", File::Glob::GLOB_ERR);
print "Error globbing: $!\n" if (File::Glob::GLOB_ERROR);
Use File::Find, which is a core module and is able to control everything on a file.
#!perl
use 5.10.0;
use strict;
use warnings;
use File::Find;
find {
wanted => sub {
return if not -r $_; # skip if not readable
say $_;
},
no_chdir => 1,
}, #ARGV;

Resources