Using PowerShell to find the differences in strings - string

So I'm playing around with Compare-Object, and it works fine for comparing files. But what about just strings? Is there a way to find the difference between strings? CompareTo() is good about reporting that there is a difference, but not what the difference is. For example:
PS:> $a = "PowerShell rocks"
PS:> $b = "Powershell rocks"
PS:> $a.CompareTo($b)
1
PS:> Compare-Object -ReferenceObject $a -DifferenceObject $b
PS:>
Nothing returned.
Any way to let me know about the actual difference between the strings, not just that there is a difference?

Perhaps something like this:
function Compare-String {
param(
[String] $string1,
[String] $string2
)
if ( $string1 -ceq $string2 ) {
return -1
}
for ( $i = 0; $i -lt $string1.Length; $i++ ) {
if ( $string1[$i] -cne $string2[$i] ) {
return $i
}
}
return $string1.Length
}
The function returns -1 if the two strings are equal or the position of the first difference between the two strings. If you want case-insensitive comparisons, you would need to use -eq instead of -ceq and -ne instead of -cne.

Related

PowerShell Regex get multiple substrings between 2 strings and write them to files with sequence numbers

Old thread
My question regards:
function GetStringBetweenTwoStrings($firstString, $secondString, $importPath){
#Get content from file
$file = Get-Content $importPath
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($file,$pattern).Groups[1].Value
#Return result
return $result
}
GetStringBetweenTwoStrings -firstString "Lorem" -secondString "is" -importPath "C:\Temp\test.txt"
This is nice for only one -firstString and -secondString, but how to use this function to chronologically write multiple same strings in numbered TXT?
txt - file(with more sections of text):
Lorem
....
is
--> write to 001.txt
Lorem
....
is
--> write to 002.txt
and so forth....
And the structure of the section is preserved and is not in one line.
I hope someone can tell me that. Thanks.
The function you quote has several limitations (I've left feedback on the original answer), most notably only ever reporting one match.
Assuming an improved function named Select-StringBetween (see source code below), you can solve your problem as follows:
$index = #{ value = 0 }
Get-ChildItem C:\Temp\test.txt |
Select-StringBetween -Pattern 'Lorem', 'is' -Inclusive |
Set-Content -LiteralPath { '{0:000}.txt' -f ++$index.Value }
Select-StringBetween source code:
Note: The syntax is in part patterned after Select-String. After defining the function, run Select-StringBetween -? to see its syntax; the parameter names are hopefully self-explanatory.
function Select-StringBetween {
[CmdletBinding(DefaultParameterSetName='String')]
param(
[Parameter(Mandatory, Position=0)]
[ValidateCount(2, 2)]
[string[]] $Patterns,
[Parameter(Mandatory, ValueFromPipelineByPropertyName, ParameterSetName='File')]
[Alias('PSPath')]
[string] $LiteralPath,
[Parameter(Mandatory, ValueFromPipeline, ParameterSetName='String')]
[string] $InputObject,
[switch] $Inclusive,
[switch] $SimpleMatch,
[switch] $Trim
)
process {
if ($LiteralPath) {
$InputObject = Get-Content -ErrorAction Stop -Raw -LiteralPath $LiteralPath
}
if ($Inclusive) {
$regex = '(?s)(?:{0}).*?(?:{1})' -f
($Patterns[0], [regex]::Escape($Patterns[0]))[$SimpleMatch.IsPresent],
($Patterns[1], [regex]::Escape($Patterns[1]))[$SimpleMatch.IsPresent]
}
else {
$regex = '(?s)(?<={0}).*?(?={1})' -f
($Patterns[0], [regex]::Escape($Patterns[0]))[$SimpleMatch.IsPresent],
($Patterns[1], [regex]::Escape($Patterns[1]))[$SimpleMatch.IsPresent]
}
if ($Trim) {
[regex]::Matches(
$InputObject,
$regex
).Value.Trim()
}
else {
[regex]::Matches(
$InputObject,
$regex
).Value
}
}
}
Note that there's also a pending feature request on GitHub to add this functionality directly to Select-String - see GitHub issue #15136

Converting Unicode string to ASCII

I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o, u. This is because I will be creating IIS web sites from those strings (i.e. I will be using them as domain names).
function Convert-DiacriticCharacters {
param(
[string]$inputString
)
[string]$formD = $inputString.Normalize(
[System.text.NormalizationForm]::FormD
)
$stringBuilder = new-object System.Text.StringBuilder
for ($i = 0; $i -lt $formD.Length; $i++){
$unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
$nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
if($unicodeCategory -ne $nonSPacingMark){
$stringBuilder.Append($formD[$i]) | out-null
}
}
$stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
}
The resulting function will convert diacritics in the follwoing way:
PS C:\> Convert-DiacriticCharacters "Ångström"
Angstrom
PS C:\> Convert-DiacriticCharacters "Ó señor"
O senor
Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html
Taking this answer from a C#/.Net question it seems to work in PowerShell ported roughly like this:
function Remove-Diacritics
{
Param([string]$Text)
$chars = $Text.Normalize([System.Text.NormalizationForm]::FormD).GetEnumerator().Where{
[System.Char]::GetUnicodeCategory($_) -ne [System.Globalization.UnicodeCategory]::NonSpacingMark
}
(-join $chars).Normalize([System.Text.NormalizationForm]::FormC)
}
e.g.
PS C:\> Remove-Diacritics 'abcdeéfg'
abcdeefg

How to compare strings that have an ampersand in them in PowerShell

I am using PowerShell to compare two strings that have an ampersand (&) in them (i.e. the string "Policies & Procedures").
No matter what I try, I cannot get these strings to match. I have tried trimmed the strings to get rid of an extra white spaces. I have tried wrapping the the string in both single and double quotes (and a combination of both):
"Policies & Procedures"
'Policies & Procedures'
"'Policies & Procedures'"
The code I am using to compare the strings is:
if ($term1 -eq $term2) {
do something
}
Inspecting the strings visually - they are identical, however the if statement never evaluates to true. Is there a way to compare these two strings so that it does evaluate to true?
EDIT
The context in which I am doing this string compare is looking for a term name in a taxonomy for a SharePoint site. Here is the code I am using:
function getTerm($termName) {
foreach($term in $global:termset.Terms) {
$termTrimmed = $term.Name.trim()
Write-Host "term name = $termTrimmed" -foregroundcolor cyan
if ($termTrimmed -eq $termName) {
return $term
}
}
return null
}
I have printed both term.Name and termName to the screen and they are identical. If there is no ampersand in the string, this function works. If there is an ampersand this function fails. This is how I know the ampersand is the problem.
This is a known quirk:
There are two types of ampersands that you need to be aware of when
playing with SharePoint Taxonomy
Our favorite and most loved
& ASCII Number: 38
And the impostor
& ASCII Number: 65286
After reading this article by Nick Hobbs, it became apparent
that when you create a term it replaces the 38 ampersand with a
65286 ampersand.
This then becomes a problem if you want to do a comparison with your
original source (spreadsheet, database, etc) as they are no longer the
same.
As detailed in Nick’s article, you can use the
TaxonomyItem.NormalizeName method to create a "Taxonomy" version of
your string for comparison:
Try this (not tested on real SharePoint):
function getTerm($termName)
{
foreach($term in $global:termset.Terms) {
$termNormalized = [Microsoft.SharePoint.Taxonomy.TaxonomyItem]::NormalizeName($term.Name)
if ($termNormalized -eq $termName) {
return $term
}
}
return null
}
After converting both strings to char arrays and comparing the unicode value of the ampersands the problem is revealed. The ampersand used in the search string has a value of 38 while the ampersand returned from the SharePoint term store has a value of 65286 (called a full ampersand although looks identical to a regular ampersand on screen).
The solution was to write my own string comparison function and take into account the differences in the ampersand values. Here is the code:
function getTerm($termName) {
$searchChars = $termName.toCharArray()
$size = $searchChars.Count;
foreach($term in $global:termset.Terms) {
$match = $True
$chars = $term.Name.trim().toCharArray()
if ($size -eq $chars.Count) {
for ($i = 0; $i -lt $size; $i++) {
if ($searchChars[$i] -ne $chars[$i]) {
# handle the difference between a normal ampersand and a full width ampersand
$charCode1 = [int] $searchChars[$i]
$charCode2 = [int] $chars[$i]
if ((($charCode1 -eq 38) -or ($charCode1 -eq 65286 )) -and (($charCode2 -eq 38) -or ($charCode2 -eq 65286 ))) {
continue
} else {
$match = $False
break
}
}
}
} else {
$match = $False
}
if ($match -eq $True) {
return $term
}
}
return $null
}

How can I compare 2 variables if I don't know what they are?

If I have 2 variables $x and $y somewhere in the code flow and I don't really know if they contain numbers or string, how do I compare them?
I mean for strings we use eq etc while for numbers == or <= etc
Also what about greater/less etc?
If you don't know what they are, how can you ask if they're the same?
Specifically, do you consider these two to be the same?
"1"
"1.0"
Numerically, they both represent one, but stringily they contain different characters, so are different.
greater/less for strings can be done with cmp.
if ( ( $a cmp $b ) == 0 ) { print "a == b\n" }
elsif ( ( $a cmp $b ) < 0 ) { print "a < b\n" }
elsif ( ( $a cmp $b ) > 0 ) { print "a > b\n" }
To reiterate a comment above "123" cmp "56" will give less than.
So you may want to do something like this:
if ( compareEm($a, $b) == 0 ) { print "a == b\n" }
elsif ( compareEm($a, $b) < 0 ) { print "a < b\n" }
elsif ( compareEm($a, $b) > 0 ) { print "a > b\n" }
sub compareEm {
my ( $a, $b ) = #_;
my $isnum = qr/(?=.)(?!^\.$)^[\-\+]?\d*\.?\d*$/o;
return ( $a =~ $isnum && $b =~ $isnum ) ? $a <=> $b : $a cmp $b;
}
Use eq, it will always work...
If you don't know whether your data is strings or numbers then it's usually perfectly safe to treat them as strings. If you want to treat your data as numbers, then you should probably validate the input to ensure that it is in the correct format.

how to compare 2 strings by each characters in perl

basically I want to compare
$a = "ABCDE";
$b = "--(-)-";
and get output CE.
i.e where ever parentheses occur the characters of $a should be taken.
One of the rare uses of the bitwise or-operator.
# magic happens here ↓
perl -E'say (("ABCDE" | "--(-)-" =~ tr/-()/\377\000/r) =~ tr/\377//dr)'
prints CE.
Use this for golfing purposes only, AHA’s solution is much more maintainable.
Simple regex and pos solution:
my $str = "ABCDE";
my $pat = "--(-)-";
my #list;
while ($pat =~ /(?=[()])/g) {
last if pos($pat) > length($str); # Required to prevent matching outside $x
my $char = substr($str, pos($y), 1);
push #list, $char;
}
print #list;
Note the use of lookahead to get the position before the matching character.
Combined with Axeman's use of the #- variable we can get an alternative loop:
while ($pat =~ /[()]/g) {
last if $-[0] > length($str);
my $char = substr($str, $-[0], 1);
push #list, $char;
}
This is pretty much mentioned in the documentation for #-:
After a match against some variable $var :
....
$& is the same as substr($var, $-[0], $+[0] - $-[0])
In other words, the matched string $& equals that substring expression. If you replace $var with another string, you would get the characters matching the same positions.
In my example, the expression $+[0] - $-[0] (offset of end of match minus offset of start of match) would be 1, since that is the max length of the matching regex.
QED.
This uses the idea that you can scan one string for positions and just take the values of the other strings. #s is a reusable product.
use strict;
use warnings;
sub chars {
my $source = shift;
return unless #_;
my #chars = map { substr( $source, $_, 1 ) } #_;
return wantarray ? #chars, join( '', #chars );
}
my $a = "ABCDE";
my $b = "--(-)-";
my #s;
push #s, #- while $b =~ m/[()]/g;
my $res = chars( $a, #s );
Way faster than all the solutions except daxim's, and almost as fast as daxim's without preventing the use of characters 255 and above:
my $pat = $b =~ s/[^()]/.?/gr =~ s/[()]/(.?)/gr
my $c = join '', $a =~ /^$pat/s;
It changes
---(-)-
to
.?.?.?(.?).?(.?).?
Then uses the result as regex pattern to extract the desired characters.
This is easy to accomplish using each_array, each_arrayref or pairwise from List::MoreUtils:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw( min );
use List::MoreUtils qw( each_array );
my $string = 'ABCDE';
my $pattern = '--(-)-';
my #string_chars = split //, $string;
my #pattern_chars = split //, $pattern;
# Equalise length
my $min_length = min $#string_chars, $#pattern_chars;
$#string_chars = $#pattern_chars = $min_length;
my $ea = each_array #string_chars, #pattern_chars;
while ( my ( $string_char, $pattern_char ) = $ea->() ) {
print $string_char if $pattern_char =~ /[()]/;
}
Using pairwise:
{
no warnings qw( once );
print pairwise {
$a if $b =~ /[()]/;
} #string_chars, #pattern_chars;
}
Without using List::MoreUtils:
for ( 0 .. $#string_chars ) {
print $string_chars[$_] if $pattern_chars[$_] =~ /[()]/;
}
Thanks to TLP for discovering the set $# technique without which this solution will have been longer and complicated. :-)
#!/usr/bin/perl
use strict;
use warnings;
my $a = "ABCDE";
my $b = "--(-)-";
my ($i, $c, $x, $y) = 0;
$c .= $y =~ /\(|\)/ ? $x : "" while ($x = substr $a, $i, 1) && ($y = substr $b, $i++, 1);
print "$c\n";

Resources