Split string in PowerShell by pattern

Split string in PowerShell by pattern - string

I have a fairly long string in PowerShell that I need to split. Each section begins with a date in format mm/dd/yyyy hh:mm:ss AM. Essentially what I am trying to do is get the most recent message in the string. I don't need to keep the date/time part as I already have that elsewhere.
This is what the string looks like:
10/20/2018 1:22:33 AM
Some message the first one in the string
It can be several lines long
With multiple line breaks
But this is still the first message in the string
10/21/2018 4:55:11 PM
This would be second message
Same type of stuff
But its a different message
I know how to split a string on specific characters, but I don't know how on a pattern like date/time.

Note:
The solution below assumes that the section are not necessarily chronologically ordered so that you must inspect all time stamps to determine the most recent one.
If, by contrast, you can assume that the last message is the most recent one, use LotPings' much simpler answer.
If you don't know ahead of time what section has the most recent time stamp, a line-by-line approach is probably best:
$dtMostRecent = [datetime] 0
# Split the long input string ($longString) into lines and iterate over them.
# If input comes from a file, replace
# $longString -split '\r?\n'
# with
# Get-Content file.txt
# If the file is large, replace the whole command with
# Get-Content file.txt | ForEach-Object { ... }
# and replace $line with $_ in the script block (loop body).
foreach ($line in $longString -split '\r?\n') {
# See if the line at hand contains (only) a date.
if ($dt = try { [datetime] $line } catch {}) {
# See if the date at hand is the most recent so far.
$isMostRecent = $dt -ge $dtMostRecent
if ($isMostRecent) {
# Save this time stamp as the most recent one and initialize the
# array to collect the following lines in (the message).
$dtMostRecent = $dt
$msgMostRecentLines = #()
}
} elseif ($isMostRecent) {
# Collect the lines of the message associated with the most recent date.
$msgMostRecentLines += $line
}
}
# Convert the message lines back into a single, multi-line string.
# $msgMostRecent now contains the multi-line message associated with
# the most recent time stamp.
$msgMostRecent = $msgMostRecentLines -join "`n"
Note how try { [datetime] $line } catch {} is used to try to convert a line to a [datetime] instance and fail silently, if it can't, in which case $dt is assigned $null, which in a Boolean context is interpreted as $False.
This technique works irrespective of the culture currently in effect, because PowerShell's casts always use the invariant culture when casting from strings, and the dates in the input are in one of the formats the invariant culture understands.
By contrast, the -as operator, whose use would be more convenient here - $dt =$line -as [datetime] - unexpectedly is culture-sensitive, as Esperento57 points out.
This surprising behavior is discussed in this GitHub issue.

Provided the [datetime] sections are ascending,
it should be sufficient to split on them with a RegEx and get the last one
((Get-Content .\test.txt -Raw) -split "\d+/\d+/\d{4} \d+:\d+:\d+ [AP]M`r?`n")[-1]
Output based on your sample string stored in file test.txt
This would be second message
Same type of stuff
But its a different message

you can split it by timestamp pattern like this:
$arr = $str -split "[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2} [AaPp]M\n"

To my knowledge you can't use any of the static String methods like Split() for this. I tried to find a regular expression that would handle the entire thing, but wasn't able to come up with anything that would quite break it up properly.
So, you'll need to go line by line, testing to see if it that line is a date, then concatenate the lines in between like the following:
$fileContent = Get-Content "inputFile.txt"
$messages = #()
$currentMessage = [string]::Empty
foreach($line in $fileContent)
{
if ([Regex]::IsMatch($line, "\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{2}:\d{2} (A|P)M"))
{
# The current line is a date, the current message is complete
# Add the current message to the output, and clear out the old message
# from your temporary storage variable $currentMessage
if (-not [string]::IsNullOrEmpty($currentMessage))
{
$messages += $currentMessage
$currentMessage = [string]::Empty
}
}
else
{
# Add this line to the message you're building.
# Include a new line character, as it was stripped out with Get-Content
$currentMessage += "$line`n"
}
}
# Add the last message to the output
$messages += $currentMessage
# Do something with the message
Write-Output $messages
As the key to all of this is recognizing that a given line is a date and therefore the start of a message, let's look a bit more at the regex. "\d" will match any decimal character 0-9, and the curly braces immediately following indicate the number of decimal characters that need to match. So, "\d{1,2}" means "look for one or two decimal characters" or in this case the month of the year. We then look for a "/", 1 or 2 more decimal characters - "\d{1,2}", another "/" and then exactly 4 decimal characters - "\d{4}". The time is more of the same, with ":" in between the decimal characters instead of "/". At the end, there will either be "AM" or "PM" so we look for either an "A" or a "P" followed by an "M", which as a regular expression is "(A|P)M".
Combine all of that, and you get "\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{2}:\d{2} (A|P)M" to determine if you have a date on that line. I believe it would also be possible to use[DateTime]::Parse() to determine if the line is a date, but then you wouldn't get to have fun with Regex's and would need a try-catch. For more info on Regex's in Powershell (which are just the .NET regex) see .NET Regex Quick Reference

Related

Formatting string in Powershell but only first or specific occurrence of replacement token

I have a regular expression that I use several times in a script, where a single word gets changed but the rest of the expression remains the same. Normally I handle this by just creating a regular expression string with a format like the following example:
# Simple regex looking for exact string match
$regexTemplate = '^{0}$'
# Later on...
$someString = 'hello'
$someString -match ( $regexTemplate -f 'hello' ) # ==> True
However, I've written a more complex expression where I need to insert a variable into the expression template and... well regex syntax and string formatting syntax begin to clash:
$regexTemplate = '(?<=^\w{2}-){0}(?=-\d$)'
$awsRegion = 'us-east-1'
$subRegion = 'east'
$awsRegion -match ( $regexTemplate -f $subRegion ) # ==> Error
Which results in the following error:
InvalidOperation: Error formatting a string: Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
I know what the issue is, it's seeing one of my expression quantifiers as a replacement token. Rather than opt for a string-interpolation approach or replace {0} myself, is there a way I can tell PowerShell/.NET to only replace the 0-indexed token? Or is there another way to achieve the desired output using format strings?

If a string template includes { and/or } characters, you need to double these so they do not interfere with the numbered placeholders.
Try
$regexTemplate = '(?<=^\w{{2}}-){0}(?=-\d$)'

non-trivial explode string to collection

I need a PS function that would take input string and generate output collection as per below:
Input:
$someString = "abcd{efg|hijk|lmn|o}pqrs"
Desired output:
$someCollection = #("abcdefgpqrs","abcdhijkpqrs","abcdlmnpqrs","abcdopqrs")
Note: there is going to be at most 1 {...|...|...} expression within the input string; the number of pipes is dynamic and can be anything from 1 to 20 ish.
As I drive the input data, the format of the string to explode does not have to follow exactly the example above; it can be anything else; I am looking for simplicity rather than sophistication.
My question is, is there any RegExp based solution that I could use straight away or should I write my function from the scratch, analysing intput string, detecting all the {s, |s and }s and so on?
Platform: Windows 7 / Windows Server 2012, PowerShell 5.x

You could do this using PowerShell 5 using regex pretty easily:
# define a regex pattern with named groups for all three parts of your string
$pattern = '^(?<pre>[^\{]*)\{(?<exp>.*)\}(?<post>[^\}]*)$'
if($someString -match $pattern){
# grab the first and last parts
$prefix = $Matches['pre']
$postfix = $Matches['post']
# explode the middle part
foreach($part in $Matches['exp'] -split '\|'){
# create a new string for each of the exploded middle parts
"$prefix$part$postfix"
}
}

PowerShell String replacement with a wildcard

I have a version number in a SQL file that I need to replace via PowerShell. In the file it looks like so:
values
('Current', '15.7.1.0'),
('New', '15.7.22.0')
I need to replace it with whatever the current version is, but due to the fact that I don't know what the current version number is going to be, nor do I know its exact length (digit count can change as leading zeroes are not used), I need to use wildcards, and what I'm trying isn't working.
Currently this is what I'm trying:
$content = Get-Content C:\Users\user\Desktop\sql\file.sql
$newContent = $content -replace "'Current', '*'", "'Current', '$newVersion'"
And it actually finds it and replaces it, but not all of it for some reason. After running that, what I get is:
values
('Current', '16.6.21.0'15.7.1.0'),
('New', '15.7.22.0')
So it definitely finds the correct place in the file and does a replace on some of it, but doesn't actually replace the version number. Can anyone tell me what I'm doing wrong?

As #PetSorAl said, -replace uses wildcards not regular expressions.
Below is an example based on yours.
Example
$content = #"
values
('Current', '15.7.1.0'),
('New', '15.7.22.0')
"#
$newVersion = '16.6.21.0'
$newContent = $content -replace "'Current', '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'", "'Current', '$newVersion'"
Write-host $newContent
Results
values
('Current', '16.6.21.0'),
('New', '15.7.22.0')

Split a string containing fixed length columns

I got data like this:
3LLO24MACT01 24MOB_6012010051700000020100510105010 123456
It contains different values for different columns when I import it.
Every column is fixed width:
Col#1 is the ID and just 1 long. Meaning it is "3" here.
Col#2 is 3 in length and here "LLO".
Col#3 is 9 in length and "24MACT01 " (notice that the missing ones gets filled up by blanks).
This goes on for 15 columns or so...
Is there a method to quickly cut it into different elements based on sequence length? I couldn't find any.

This can be done with RegEx matching, and creating an array of custom objects. Something like this:
$AllRecords = Get-Content C:\Path\To\File.txt | Where{$_ -match "^(.)(.{3})(.{9})"} | ForEach{
[PSCustomObject]#{
'Col1' = $Matches[1]
'Col2' = $Matches[2]
'Col3' = $Matches[3]
}
}
That will take each line, match by how many characters are specified, and then create an object based off those matches. It collects all objects in an array and could be exported to CSV or whatever. The 'Col1', 'Col2' etc are just generic column headers I suggested due to a lack of better information, and could be anything you wanted.
Edit: Thank you iCodez for showing me, perhaps inadvertantly, that you can specify a language for your code samples!

[Regex]::Matches will do this rather easily. All you need to do is specify a Regex pattern that has . followed by the number of characters you want in curly braces. For example, to match a column of three characters, you would write .{3}. You then do this for all 15 columns.
To demonstrate, I will use a string that contains the first three columns of your example data (since I know their sizes):
PS > $data = '3LLO24MACT01 '
PS > $pattern = '(.{1})(.{3})(.{9})'
PS > ([Regex]::Matches($data, $pattern).Groups).Value
3LLO24MACT01
3
LLO
24MACT01
PS >
Note that the first value outputted will be the text matched be all of the capture groups. If you do not need this, you can remove it with slicing:
$columns = ([Regex]::Matches($data, $pattern).Groups).Value
$columns = $columns[1..$columns.Length]

New-PSObjectFromMatches is a helper function for creating PS Objects from regex matches.
The -Debug option can help with the process of writing the regex.

PowerShell Split a String On First Occurrence of Substring/Character

I have a string that I want to split up in 2 pieces. The first piece is before the comma (,) and the second piece is all stuff after the comma (including the commas).
I already managed to retrieve the first piece before the comma in the variable $Header, but I don't know how to retrieve the pieces after the first comma in one big string.
$string = "Header text,Text 1,Text 2,Text 3,Text 4,"
$header = $string.Split(',')[0] # $Header = "Header text"
$content = "Text 1,Text 2,Text 3,Text 4,"
# There might be more text then visible here, like say Text 5, Text 6, ..

PowerShell's -split operator supports specifying the maximum number of sub-strings to return, i.e. how many sub-strings to return. After the pattern to split on, give the number of strings you want back:
$header,$content = "Header text,Text 1,Text 2,Text 3,Text 4," -split ',',2

Try something like :
$Content=$String.Split([string[]]"$Header,", [StringSplitOptions]"None")[1]
As you split according to a String, you are using a different signature of the function split.
The basic use needs only 1 argument, a separator character (more info about it can be found here, for instance). However, to use strings, the signature is the following :
System.String[] Split(String[] separator, StringSplitOptions options)
This is why you have to cast your string as an array of string. We use the None option in this case, but you can find the other options available in the split documentation.
Finally, as the value of $Heasder, is at the beggining of your $String, you need to catch the 2nd member of the resulting array.

method of Aaron is the best, but i propose my solution
$array="Header text,Text 1,Text 2,Text 3,Text 4," -split ','
$array[0],($array[1..($array.Length -1)] -join ",")

This alternate solution makes use of PowerShell's ability to distribute arrays to multiple variables with a single assignment. Note, however, that the -split operator splits on every comma and PowerShell's built-in conversion from Array back to String results in the elements being concatenated back together. So it's not as efficient as String.Split, but in your example, it's negligible.
$OFS = ','
$Content = 'Header text,Text 1,Text 2,Text 3,Text 4,'
[String]$Header,[String]$Rest = $Content -split $OFS
$OFS = ' '
Write-Host "Header = $Header"
Write-Host "Rest = $Rest"
Finally, $OFS is a special variable in PowerShell that determines which character will be used when joining the array elements back into a single string. By default, it's a space. But it can be changed to anything.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split string in PowerShell by pattern - string

you can split it by timestamp pattern like this: $arr = $str -split "[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2} [AaPp]M\n"

Related

Formatting string in Powershell but only first or specific occurrence of replacement token

non-trivial explode string to collection

PowerShell String replacement with a wildcard

Split a string containing fixed length columns

PowerShell Split a String On First Occurrence of Substring/Character

Categories

Resources