Converting Unicode string to ASCII - string

I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o, u. This is because I will be creating IIS web sites from those strings (i.e. I will be using them as domain names).

function Convert-DiacriticCharacters {
param(
[string]$inputString
)
[string]$formD = $inputString.Normalize(
[System.text.NormalizationForm]::FormD
)
$stringBuilder = new-object System.Text.StringBuilder
for ($i = 0; $i -lt $formD.Length; $i++){
$unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
$nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
if($unicodeCategory -ne $nonSPacingMark){
$stringBuilder.Append($formD[$i]) | out-null
}
}
$stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
}
The resulting function will convert diacritics in the follwoing way:
PS C:\> Convert-DiacriticCharacters "Ångström"
Angstrom
PS C:\> Convert-DiacriticCharacters "Ó señor"
O senor
Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html

Taking this answer from a C#/.Net question it seems to work in PowerShell ported roughly like this:
function Remove-Diacritics
{
Param([string]$Text)
$chars = $Text.Normalize([System.Text.NormalizationForm]::FormD).GetEnumerator().Where{
[System.Char]::GetUnicodeCategory($_) -ne [System.Globalization.UnicodeCategory]::NonSpacingMark
}
(-join $chars).Normalize([System.Text.NormalizationForm]::FormC)
}
e.g.
PS C:\> Remove-Diacritics 'abcdeéfg'
abcdeefg

Related

PowerShell Regex get multiple substrings between 2 strings and write them to files with sequence numbers

Old thread
My question regards:
function GetStringBetweenTwoStrings($firstString, $secondString, $importPath){
#Get content from file
$file = Get-Content $importPath
#Regex pattern to compare two strings
$pattern = "$firstString(.*?)$secondString"
#Perform the opperation
$result = [regex]::Match($file,$pattern).Groups[1].Value
#Return result
return $result
}
GetStringBetweenTwoStrings -firstString "Lorem" -secondString "is" -importPath "C:\Temp\test.txt"
This is nice for only one -firstString and -secondString, but how to use this function to chronologically write multiple same strings in numbered TXT?
txt - file(with more sections of text):
Lorem
....
is
--> write to 001.txt
Lorem
....
is
--> write to 002.txt
and so forth....
And the structure of the section is preserved and is not in one line.
I hope someone can tell me that. Thanks.
The function you quote has several limitations (I've left feedback on the original answer), most notably only ever reporting one match.
Assuming an improved function named Select-StringBetween (see source code below), you can solve your problem as follows:
$index = #{ value = 0 }
Get-ChildItem C:\Temp\test.txt |
Select-StringBetween -Pattern 'Lorem', 'is' -Inclusive |
Set-Content -LiteralPath { '{0:000}.txt' -f ++$index.Value }
Select-StringBetween source code:
Note: The syntax is in part patterned after Select-String. After defining the function, run Select-StringBetween -? to see its syntax; the parameter names are hopefully self-explanatory.
function Select-StringBetween {
[CmdletBinding(DefaultParameterSetName='String')]
param(
[Parameter(Mandatory, Position=0)]
[ValidateCount(2, 2)]
[string[]] $Patterns,
[Parameter(Mandatory, ValueFromPipelineByPropertyName, ParameterSetName='File')]
[Alias('PSPath')]
[string] $LiteralPath,
[Parameter(Mandatory, ValueFromPipeline, ParameterSetName='String')]
[string] $InputObject,
[switch] $Inclusive,
[switch] $SimpleMatch,
[switch] $Trim
)
process {
if ($LiteralPath) {
$InputObject = Get-Content -ErrorAction Stop -Raw -LiteralPath $LiteralPath
}
if ($Inclusive) {
$regex = '(?s)(?:{0}).*?(?:{1})' -f
($Patterns[0], [regex]::Escape($Patterns[0]))[$SimpleMatch.IsPresent],
($Patterns[1], [regex]::Escape($Patterns[1]))[$SimpleMatch.IsPresent]
}
else {
$regex = '(?s)(?<={0}).*?(?={1})' -f
($Patterns[0], [regex]::Escape($Patterns[0]))[$SimpleMatch.IsPresent],
($Patterns[1], [regex]::Escape($Patterns[1]))[$SimpleMatch.IsPresent]
}
if ($Trim) {
[regex]::Matches(
$InputObject,
$regex
).Value.Trim()
}
else {
[regex]::Matches(
$InputObject,
$regex
).Value
}
}
}
Note that there's also a pending feature request on GitHub to add this functionality directly to Select-String - see GitHub issue #15136

How can i delete string in variable powershell?

I have a log file like this :
[2021/04/13 18:21:57.577+02:00][VERBOSE] Finished: 0 file(s), 5.23 GB; Average Speed:17.26 MB/s.
I just want to remove all string between the "," and "/s." I tried many times I can't do it correctly.
Can someone help me to do this on Powershell ?
If you do not only need to get the interesting part from the log line, but also need to be able to do math on the number ('5.23 GB') in your example, you need to do some more splitting:
foreach ($line in (Get-Content -Path 'TheFile.log')) {
$interestingPart = ($line -split ',')[-1].Trim()
$logSize, $logAverage = $interestingPart -split ';'
$size, $unit = $logSize -split '\s+'
# calculate the size from both the number ('5.23') and the unit ('GB')
$size = [double]::Parse($size) * "1$unit"
# now you have the number to do further math on
}

How to compare strings that have an ampersand in them in PowerShell

I am using PowerShell to compare two strings that have an ampersand (&) in them (i.e. the string "Policies & Procedures").
No matter what I try, I cannot get these strings to match. I have tried trimmed the strings to get rid of an extra white spaces. I have tried wrapping the the string in both single and double quotes (and a combination of both):
"Policies & Procedures"
'Policies & Procedures'
"'Policies & Procedures'"
The code I am using to compare the strings is:
if ($term1 -eq $term2) {
do something
}
Inspecting the strings visually - they are identical, however the if statement never evaluates to true. Is there a way to compare these two strings so that it does evaluate to true?
EDIT
The context in which I am doing this string compare is looking for a term name in a taxonomy for a SharePoint site. Here is the code I am using:
function getTerm($termName) {
foreach($term in $global:termset.Terms) {
$termTrimmed = $term.Name.trim()
Write-Host "term name = $termTrimmed" -foregroundcolor cyan
if ($termTrimmed -eq $termName) {
return $term
}
}
return null
}
I have printed both term.Name and termName to the screen and they are identical. If there is no ampersand in the string, this function works. If there is an ampersand this function fails. This is how I know the ampersand is the problem.
This is a known quirk:
There are two types of ampersands that you need to be aware of when
playing with SharePoint Taxonomy
Our favorite and most loved
& ASCII Number: 38
And the impostor
& ASCII Number: 65286
After reading this article by Nick Hobbs, it became apparent
that when you create a term it replaces the 38 ampersand with a
65286 ampersand.
This then becomes a problem if you want to do a comparison with your
original source (spreadsheet, database, etc) as they are no longer the
same.
As detailed in Nick’s article, you can use the
TaxonomyItem.NormalizeName method to create a "Taxonomy" version of
your string for comparison:
Try this (not tested on real SharePoint):
function getTerm($termName)
{
foreach($term in $global:termset.Terms) {
$termNormalized = [Microsoft.SharePoint.Taxonomy.TaxonomyItem]::NormalizeName($term.Name)
if ($termNormalized -eq $termName) {
return $term
}
}
return null
}
After converting both strings to char arrays and comparing the unicode value of the ampersands the problem is revealed. The ampersand used in the search string has a value of 38 while the ampersand returned from the SharePoint term store has a value of 65286 (called a full ampersand although looks identical to a regular ampersand on screen).
The solution was to write my own string comparison function and take into account the differences in the ampersand values. Here is the code:
function getTerm($termName) {
$searchChars = $termName.toCharArray()
$size = $searchChars.Count;
foreach($term in $global:termset.Terms) {
$match = $True
$chars = $term.Name.trim().toCharArray()
if ($size -eq $chars.Count) {
for ($i = 0; $i -lt $size; $i++) {
if ($searchChars[$i] -ne $chars[$i]) {
# handle the difference between a normal ampersand and a full width ampersand
$charCode1 = [int] $searchChars[$i]
$charCode2 = [int] $chars[$i]
if ((($charCode1 -eq 38) -or ($charCode1 -eq 65286 )) -and (($charCode2 -eq 38) -or ($charCode2 -eq 65286 ))) {
continue
} else {
$match = $False
break
}
}
}
} else {
$match = $False
}
if ($match -eq $True) {
return $term
}
}
return $null
}

how to add string to the start of each chunk of fixed-line text file in PowerShell?

I have a text file which is comprised of only one line. I have had much trouble with splitting the file into a specific number of characters, then adding a string in front of each chunk of characters.
With a multi-line file, I can add characters to each line very easily using
Get-Content -Path $path | foreach-object {$string + $_} | out-file $output but it is much more complicated with a file with only one line.
For example, if I had a file with these random characters,
(******************************************) and i wanted to add a string to the start of every 10 chars, then it would look like this, (examplestring**********examplestring**********examplestring**********) and so on. I have researched everywhere but I have just managed to add the chars to the end of each chunk of characters.
Does anyone have a way of doing this? Preferably using streamreader and writer as get-content may not work for very large files. Thanks.
Hmm, there are some dynamic parameters applicable to file-system get-content and set-content commands that are close to what you are asking for. For example, if test.txt contains a number of * characters, you might interleave every four * with two + characters with something like this:
get-content .\test.txt -Delimiter "****" | % { "++$_" } | Set-Content test2.txt -NoNewline
I don't know how close that is to a match for what you want, but it's probably useful to know that some of these provider-specific parameters, like '-Delimiter' aren't obvious. See https://technet.microsoft.com/en-us/library/hh847764.aspx under the heading 'splitting large files'.
Alternatively, here's a quick function that reads length-delimited strings from a file.
Set-StrictMode -Version latest
function read-characters( $path, [int]$charCount ) {
begin {
$buffer = [char[]]::new($charCount)
$path = Join-Path $pwd $path
[System.IO.StreamReader]$stream = [System.IO.File]::OpenText($path)
try {
while (!$stream.EndOfStream) {
$len = $stream.ReadBlock($buffer,0,$charCount);
if ($len) {Write-Output ([string]::new($buffer,0,$len))}
}
} catch {
Write-Error -Exception $error[0]
} finally {
[void]$stream.Close()
}
}
}
read-characters .\test.txt -charCount 10 |
% {"+$_"} |
write-host -NoNewline
It could use some parameter checking, but should get you started...
With a manageable file size, you might want to try something like this:
$directory = "C:\\"
$inputFile = "test.txt"
$reader = new-object System.IO.StreamReader("{0}{1}" -f ($directory, $inputFile))
# prefix string of each line
$startString = "examplestring"
# how many chars to put on each line
$range = 10
$outputLine = ""
$line = $reader.ReadLine()
$i = 0
while ($i -lt $line.length) {
$outputLine += $($startString + $line.Substring($i, [math]::min($range, ($line.length - $i))))
$i += $range
}
$reader.Close()
write-output $outputLine
Basically it's using substring to cut out each chunk, prefixing the chumk with given string, and appending to the result variable.
Sample input:
==========================
Sample output:
examplestring==========examplestring==========examplestring======

Using PowerShell to find the differences in strings

So I'm playing around with Compare-Object, and it works fine for comparing files. But what about just strings? Is there a way to find the difference between strings? CompareTo() is good about reporting that there is a difference, but not what the difference is. For example:
PS:> $a = "PowerShell rocks"
PS:> $b = "Powershell rocks"
PS:> $a.CompareTo($b)
1
PS:> Compare-Object -ReferenceObject $a -DifferenceObject $b
PS:>
Nothing returned.
Any way to let me know about the actual difference between the strings, not just that there is a difference?
Perhaps something like this:
function Compare-String {
param(
[String] $string1,
[String] $string2
)
if ( $string1 -ceq $string2 ) {
return -1
}
for ( $i = 0; $i -lt $string1.Length; $i++ ) {
if ( $string1[$i] -cne $string2[$i] ) {
return $i
}
}
return $string1.Length
}
The function returns -1 if the two strings are equal or the position of the first difference between the two strings. If you want case-insensitive comparisons, you would need to use -eq instead of -ceq and -ne instead of -cne.

Resources