parsing string using powershell - string

I am moving aspx files from an old system to the new one using Powershell. I need to parse the page and change href of hyperlink tags as following.
old system
href=/ranet/templates/page____9372.aspx
will be in new system
/newfolder/folder1/9372.aspx

Try this
$input = "href=/ranet/templates/page____9372.aspx"
$new = "/newfolder/folder1/"
$array = $input.Split('_')
$array2 = ($array[$array.Count-1]).Split('.')
$newLine = $new+$array2[0]+".aspx"
$input
$newLine
Actually this is not such a great answer as you have to ensure that there are no other underline characters in the page!

Related

php strips [[:char_class:]] from the string

When concatenating mysql regex character classes in php they disappear from the resulting string i.e.:
$regexp_arr = array('(word1)', '(word2)');
$value = 'word3';
$regexp_str = implode('[[:space:]]', $regexp_arr);
$v1 = '[[:<:]](' . $value . ')';
echo $regexp_str;
// gives
'(word1)(word2)';
// instead of
'(word1)[[:space:]](word2)'
echo $v1;
// gives
'(word3)'
//instead of
'[[:<:]](word3)'
I've tried with double quotation marks ", the result still the same.
Is there a special way to concatenate this in php? Why are the '[[:char_class:]]' getting stripped?
server php version is 5.6.36
In MODX, [[ and ]] are special characters used to indicate they are tags MODX needs to process. Even when you echo or retrieve it from the database, MODX will process them when rendering.
For debugging, you can follow-up your echo with an exit().
echo $regexp_str;
exit();
That short-circuits MODX and gives you the actual value of the string including the square brackets.
If you want the value to be visible in a MODX-rendered resource or template, then you'll have to replace them with their html entities first:
$regexp_str = str_replace(['[',']'], ['[', ']'], $regexp_str);

How to duplicate objects in variable with different variable names for PowerShell

I would like to create a duplicate of same objects in different variable names.
The object I required is archive files from dotnetzip.
The following code is the full implementation:
[System.Reflection.Assembly]::LoadFrom($zipFileDirectory + "Ionic.Zip.dll")
$zipfile = [Ionic.Zip.ZipFile]::Read($zipfilename)
foreach ($file in $zipfile)
{
$strSearchItem = [string]$file.FileName
$strSearchItem = $strSearchItem.TrimEnd("/")
$newfile = $file.PSObject.Copy()
for ($i = 0; $i -lt $newfile.Count; $i++)
{
if ($strSearchItem -like $searchFolderName + "/*")
{
$newFile[$i].FileName = $newFile[$i].FileName.Replace($searchFolderName + "/", "")
$newFile[$i].Extract($fileDestination, [Ionic.Zip.ExtractExistingFileAction]::OverWriteSilently)
}
}
}
$zipfile.Dispose()
For this purpose I need to be able to copy $file as separate entity from $zipfile, or at least retain the original default value for $file (making it read-only doesnt seemed viable). Is there any workaround for this matter?
Thanks in advance.
maybe
$newFile = $zipfile.PSObject.Copy()
in reply to #bdrc comment
example adding a comment to the zip
PS>$zf=[ionic.zip.zipfile]::read("c:\temp\zip\test.zip")
PS>$zf.comment
PS>$zf2=$zf.psobject.copy()
PS>$zf2.comment="TEST COMMENT"
PS>$zf2.save("c:\temp\test2.zip")
when opening original file with 7-zip I dont see the comment, I can see it in the new zipfile ...

String Replacement in Program

I have a perl script (new.pl) that needs to run an instance of a different perl script (old.pl). The issue I'm having is that I want to replace all instances of a certain string in the old.pl script without modifying it or having to create a new file with the changes made to it.
So let's say I have this:
my $replacementVar = "replace";
my $originalString = "string to be replaced";
do something to replace all instances of the original string with the replacement var
run the original.pl
Can this be done without modifying the original.pl, as in can I make a temporary change to the string when I run it so that the string reverts back to it's default value after it's done running?
I should note that I can't go in and change any of the code for original.pl
One possible way is to use eval:
use strict;
use warnings;
my $replacementVar = "replace";
my $originalString = "string to be replaced";
open my $olds, "<", "old.pl" or die("$!");
my $contents = join("", <$olds>);
close $olds;
$contents =~ s/\Q$originalString/$replacementVar/g;
local #ARGV=('param1', 'param2');
eval "$contents; 1" or die $#;

Batch Rename Files - Append Lines 1 & 3

I'm using Windows 7. I have a bunch of text files, each containing one email message. Each starts this way:
FROM: Person
TO: Another Person
DATE: 01-Jan-11 at 18:12:00
SUBJECT: Whatever
I want to rename these files so that their names look like this:
2011-01-01 18.12 Email from Person to Another Person re Whatever.txt
Batch programming is all I know, and I don't know it very well. For purposes of restraining this to a project that I can understand quickly, I think my best solution will be to extract the essential data into a text file that I can then massage into a batch renaming file.
In that case, what I'm looking for is a batch file that will extract the data into single lines in a text file that I can then massage into shape with global edits. In other words, I think I'm looking for text lines in this format:
[current filename] [extracted date and time string] [from] [to] [subject]
Example:
file01.txt 01-Jan-11 at 18:12:00 from Person to Another Person re Whatever
If I've got lines like that, I can parse them into renaming commands pretty quickly in Excel.
Thanks!
Given that your using Windows 7, I thought I'd suggest an alternative. Windows Powershell is a a very useful command tool that can be used for a ton of stuff. I think I solved your complete problem:
$folder = "C:\..."
$regex = "FROM: (.*) TO: (.*) DATE: (.*) at (.*) SUBJECT: (.*)"
$files = Get-ChildItem $folder *.txt
ForEach ($file in $files) {
$line = (Get-Content $file.FullName -TotalCount 1)
$match = ([regex]$regex).matches($line)[0]
$date = [DateTime]($match.Groups[3]).Value + [TimeSpan]($match.Groups[4]).Value
$from = ($match.Groups[1])
$to = ($match.Groups[2])
$subject = ($match.Groups[5])
# You can change the naming format in the brackets below
Rename-Item $file.FullName -NewName ( $date.ToString("yyyy-MM-dd_HH-mm-ss") + " Email From " + $from + " to " + $to + " RE " + $subject)
}
It makes a few assumptions (like a match will always be found). You can easily adjust naming format and other things. Save this code as a script (.ps1) and run it in the Powershell prompt (powershell.exe)

Split a PDF by Bookmarks?

I am to process single PDFs that have each been created by 'merging' multiple PDFs. Each of the merged PDF has the places where the PDF parts start displayed with a bookmark.
Is there any way to automatically split this up by bookmarks with a script?
We only have the bookmarks to indicate the parts, not the page numbers, so we would need to infer the page numbers from the bookmarks. A Linux tool would be best.
pdftk can be used to split the PDF file and extract the page numbers of the bookmarks.
To get the page numbers of the bookmarks do
pdftk in.pdf dump_data
and make your script read the page numbers from the output.
Then use
pdftk in.pdf cat A-B output out_A-B.pdf
to get the pages from A to B into out_A-B.pdf.
The script could be something like this:
#!/bin/bash
infile=$1 # input pdf
outputprefix=$2
[ -e "$infile" -a -n "$outputprefix" ] || exit 1 # Invalid args
pagenumbers=( $(pdftk "$infile" dump_data | \
grep '^BookmarkPageNumber: ' | cut -f2 -d' ' | uniq)
end )
for ((i=0; i < ${#pagenumbers[#]} - 1; ++i)); do
a=${pagenumbers[i]} # start page number
b=${pagenumbers[i+1]} # end page number
[ "$b" = "end" ] || b=$[b-1]
pdftk "$infile" cat $a-$b output "${outputprefix}"_$a-$b.pdf
done
There's a command line tool written in Java called Sejda where you can find the splitbybookmarks command that does exactly what you asked. It's Java so it runs on Linux and being a command line tool you can write script to do that.
Disclaimer
I'm one of the authors
you have programs that are built like pdf-split that can do that for you:
A-PDF Split is a very simple, lightning-quick desktop utility program that lets you split any Acrobat pdf file into smaller pdf files. It provides complete flexibility and user control in terms of how files are split and how the split output files are uniquely named. A-PDF Split provides numerous alternatives for how your large files are split - by pages, by bookmarks and by odd/even page. Even you can extract or remove part of a PDF file. A-PDF Split also offers advanced defined splits that can be saved and later imported for use with repetitive file-splitting tasks. A-PDF Split represents the ultimate in file splitting flexibility to suit every need.
A-PDF Split works with password-protected pdf files, and can apply various pdf security features to the split output files. If needed, you can recombine the generated split files with other pdf files using a utility such as A-PDF Merger to form new composite pdf files.
A-PDF Split does NOT require Adobe Acrobat, and produces documents compatible with Adobe Acrobat Reader Version 5 and above.
edit*
also found a free open sourced program Here if you do not want to pay.
Here's a little Perl program I use for the task. Perl isn't special; it's just a wrapper around pdftk to interpret its dump_data output to turn it into page numbers to extract:
#!perl
use v5.24;
use warnings;
use Data::Dumper;
use File::Path qw(make_path);
use File::Spec::Functions qw(catfile);
my $pdftk = '/usr/local/bin/pdftk';
my $file = $ARGV[0];
my $split_dir = $ENV{PDF_SPLIT_DIR} // 'pdf_splits';
die "Can't find $ARGV[0]\n" unless -e $file;
# Read the data that pdftk spits out.
open my $pdftk_fh, '-|', $pdftk, $file, 'dump_data';
my #chapters;
while( <$pdftk_fh> ) {
state $chapter = 0;
next unless /\ABookmark/;
if( /\ABookmarkBegin/ ) {
my( $title ) = <$pdftk_fh> =~ /\ABookmarkTitle:\s+(.+)/;
my( $level ) = <$pdftk_fh> =~ /\ABookmarkLevel:\s+(.+)/;
my( $page_number ) = <$pdftk_fh> =~ /\BookmarkPageNumber:\s+(.+)/;
# I only want to split on chapters, so I skip higher
# level numbers (higher means more nesting, 1 is lowest).
next unless $level == 1;
# If you have front matter (preface, etc) then this numbering
# will be off. Chapter 1 might be called Chapter 3.
push #chapters, {
title => $title,
start_page => $page_number,
chapter => $chapter++,
};
}
}
# The end page for one chapter is one before the start page for
# the next chapter. There might be some blank pages at the end
# of the split for PDFs where the next chapter needs to start on
# an odd page.
foreach my $i ( 0 .. $#chapters - 1 ) {
my $last_page = $chapters[$i+1]->{start_page} - 1;
$chapters[$i]->{last_page} = $last_page;
}
$chapters[$#chapters]->{last_page} = 'end';
make_path $split_dir;
foreach my $chapter ( #chapters ) {
my( $start, $end ) = $chapter->#{qw(start_page last_page)};
# slugify the title so use it as a filename
my $title = lc( $chapter->{title} =~ s/[^a-z]+/-/gri );
my $path = catfile( $split_dir, "$title.pdf" );
say "Outputting $path";
# Use pdftk to extract that part of the PDF
system $pdftk, $file, 'cat', "$start-$end", 'output', $path;
}

Resources