Trying to parse some substrings in Powershell

Trying to parse some substrings in Powershell - string

I've been banging my ahead against the wall trying to do what should be a fairly simple substring search in Powershell.
I have a text file with the following content:
2015-08-30 13:12:59 10944512 DATACLUS1\RandomDBName_FULL_20150823_044919.bak
2015-08-30 13:12:59 11010048 DATACLUS1\RandomDBName_FULL_20150830_050126.bak
I need to pull out the filename(s) ("DATACLUS1\RandomDBName_FULL_20150823_044919.bak") and then compare to see which one was created later by the data stamp (20150823 in this case) and then output to a text file that contains only the full name to be actioned later in the process.
I've gone through regex's, match and substring but can't find a combination which will allow me to reliably pull that data. Once I'm over this hurdle I can move on to the compare.

You could do this without extensive regex if you want (assuming each line in the file is a newline.
# Fetch the lines of your backup file
$lines = (Get-Content .\backup.data);
# Format the items in a proper way
$formattedItems = $lines|Select-Object #{Name="Filename"; Expression={($_ -split " ")[3]}}, #{Name="DataStamp"; Expression={(($_.Trim() -split " ")[-1] -split "_")[-2]}}, #{Name="TimeString"; Expression={(($_.Trim() -split " ")[-1] -split "_")[-1] -replace ".bak" ,""}};; # Sort the items by the new property DateString
$sortedItems=$formattedItems|Sort-Object -Property DataStamp
This will give you a sorted list with properties on which you can select on (e.g
you can use the first [0] of $sortedItems).
If you do not want to sort on DataStamp you could select the DateTime stamp (in the beginning of each line, which would probably be more reliable to sort on).

Related

Powershell multiple string replacement using while cycle

I am trying to solve a somewhat weird problem: I need to replace strings within a raw content by strings from the same content that meet a certain matching criteria. The input data look like this:
apple-beta
apple-alpha_orange-beta
apple-alpha_orange-alpha_cherry-beta
apple-alpha_orange-alpha_kiwi-beta
apple-alpha_orange-alpha_mango-beta
abcd-alpha_efgh-beta
abcd-alpha_efgh-alpha_ijkl-beta
abcd-alpha_efgh-alpha_mnop-beta
The replacment should work as follows: look for all "-beta" strings in the content and delete all according "-alpha" strings (eg because there is "orange-beta" already => all "orange-alpha" should be deleted, because there is "apple-beta" already => all "apple-alpha" should be deleted etc.). The result would look like this:
apple-beta
_orange-beta
__cherry-beta
__kiwi-beta
__mango-beta
abcd-alpha_efgh-beta
abcd-alpha__ijkl-beta
abcd-alpha__mnop-beta
I have tried to achieve this with a number of awkward single replacements and temporary file storages as well as with a while-construction that doesn't work at all:
$whileinput = get-content -raw C:\content-input.txt
while ($whileinput -match "\w+-beta") {
$fullval = $whileinput -match "\w+-beta" -replace "-beta","-alpha"
$whileinput = $whileinput -replace '$fullval',''
}
Any help is very appreciated!
Daniel

I would find all your beta items. Then replace the corresponding alpha items.
$data = Get-Content C:\content-input.txt
$betas = ([regex]::Matches($data,'[^_]*?(?=-beta)').Value -ne '' | Foreach-Object {
[regex]::Escape($_)} ) -join '|'
$data -replace "($betas)-alpha"
Explanation:
[regex]::Matches().Value returns only the matched texts.
[^_]*? lazily matches consecutive characters that are not _. (?=-beta) is a positive lookahead for the text -beta but doesn't include the text in the match.
-ne '' is to filter out blank output.
[regex]::Escape() is not necessarily needed in this case. But it is good practice when your text may have special regex characters that you want to match literally.
$betas contains | delimited items because | is the regex OR. Using () to surround the $betas string allows one of those words to be fully matched before matching -alpha in the replacement.

Get-Content gets the entire contents of a file into a variable, so if anything in your file matches that pattern, it'll loop infinitely (because the contents of the file always match your pattern).
PowerShell is heavily based around the concept of the "pipeline" which you can use in conjunction with the Foreach-Object cmdlet to iterate over each line in a file.
I'm not quite clear on what you want the regexes to do, but I don't think the ones you have will do what you want. Try this.
Get-Content -raw C:\content-input.txt | Foreach-Object {
if($_ -match 'beta$') {
$out+=$_ -replace '\w+-alpha',''
}
}
$out | Out-File .\path-to-output.txt
$_ is the default "pipeline variable" aka the current item in the iteration - in this case the current line. Now at least your loop is working!

Log Parsing via Powershell - print all array elements after nth element

I'm parsing a log file that is space delimited for the first 7 elements and then a log message or sentence follows. I know just enough to get around in PS, and I'm learning more each day, so I'm not sure this is the best way to do this and apologies if I'm not leveraging a more efficient means that would be second nature to you. I'm using -split(' ')[n] to extract each field of the log file line by line. I'm able to extract the first parts fine as they are space-delimited, but I'm not sure how to get the rest of the elements up to the end of the line.
$logFile=Get-Content $logFilePath
$dateStamp=$logfile -split(' ')[0]
$timeStamp=$logfile -split(' ')[1]
$requestID=$logfile -split(' ')[3]
$binaryID=$logfile -split(' ')[4]
$logID=$logfile -split(' ')[5]
$action=$logfile -split(' ')[6]
$logMessage=$logfile -split(' ')[?]
This is not a CSV that I can import. I'm more familiar with string manipulation in bash so I am able to successfully replace spaces in the first 7 elements, and the end, with "," :
#!/bin/bash
inputFile="/cygdrive/c/Temp/logfile.log"
outputFile="/cygdrive/c/Temp/test_log.csv"
echo "\"DATE\",\"TIME\",\"HYPEN\",\"REQUESTID\",\"BINARY\",\"PROC_NUMBER\",\"MESSAGE\"" > $outputFile
while read -a line
do
arrLength=$(echo ${#line[#]})
echo \"${line[0]}\",\"${line[1]}\",\"${line[2]}\",\"${line[3]}\",\"${line[4]}\",\"${line[5]}\",\"${line[#]:6:$arrLength}\"
done < $inputFile >> $outputFile
Can you help either printing the array elements from position n to the end, or replacing the spaces appropriately in PS so I have a CSV that I can import? Just trying to avoid the two-step process of converting it in bash, then importing it in PS but I'm still researching. I did find this post Parsing Text file and placing contents into an Array Powershell
for importing the file assuming it's space-delimited and that works for the first 7 elements but not sure about everything after that.
Of course I welcome any other PS solutions such as one of those [something]::SOMETHING things I've seen by googling that might do all this much more seamlessly.

You can specify the maximum number of substrings in which the string is split like this:
$splittedRow = $logfile.split(' ',8)
$dateStamp=$splittedRow[0]
$timeStamp=$splittedRow[1]
$requestID=$splittedRow[3]
$binaryID=$splittedRow[4]
$logID=$splittedRow[5]
$action=$spltttedRow[6]
$logMessage=$splittedRow[7]

As an addition to Viktor Be's answer:
$data = "111 22222 333 4444444 5 6 77 888888 9999999 0" #this is the content of file below for testing purposes
#$data = get-content -path C:\temp\mytest.txt
foreach ($line in $data){
$splitted = $line.split(' ',8)
$line_output= ""
for ($i = 0;$i -lt 7;$i++){
$line_output += "$($splitted[$i]);"
}
$line_output += $splitted[7]
$line_output | out-file "C:\temp\MyCsvThatPowershellCanRead.csv" -append
}

You should be able to iterate over each line in the logfile and get the information you need the way you are doing. However, it's easy to grab the message field, which could include n number of spaces in the log message with a regular expression.
The following regex should work for you. Assuming $line is the current line you are on:
$line -match '(?<=(\S+\s+){6}).*'
$logMessage = $matches[0]
The way this expression works is that it looks for .* (which means any character 0 or more times) that comes after 6 occurences of non-whitespace characters followed by whitespace characters. The .* in this expression should match on your log message.

PowerShell split not working using words (read from file)

I am trying to extract the sentences that appear between a particular pattern of word, from a file. The intention is to extract the sentences that appear between the first pair of 'GO' words from the file. The logic implemented here is to split the file based on the word 'GO', and then print the second element of the array(the sentences starting with SET in this example). However, PowerShell is not recognizing the separator (GO); instead it seems to be recognizing 'new line' as the separator, and is printing the second sentence.
Please note that I need to read the file and then get the extraction done.
Content of the file
Home address "TJ One way"
Office address "C company Two way"
GO
SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;
GO
Home address "TJ One way"
Office address "C company Two way"
GO
:on error exit
GO
My code
$path = 'D:\Scripts'
$deltaFile = 'GoSampleFile.txt'
$modifiedDelta = 'GoSampleFile1.txt'
New-Item -path $path -Name $modifiedDelta -ItemType file -Force
#Split for each appearing GO, after escaping the double quotes
(Get-Content $path'\'$deltaFile).replace('"', '`"') | Set-Content $path'\'$modifiedDelta
$separator = 'GO'
$modifiedDeltaString = Get-Content $path'\'$modifiedDelta
#Write-Host $modifiedDeltaString
#Write-Host $separator
$goArray = $modifiedDeltaString -split "GO", 0, "SimpleMatch"
Write-Output $goArray[1]
#Housekeeping of the temporary file
Remove-Item $path'\'$modifiedDelta

Use Get-Content -Raw ... to read the contents as one string instead of an array of strings for each line

Might as well be a new answer as there's another problem and I'll provide more detail.
As DAX has said you need to use -Raw as Get-Content returns an array of strings, one for each line. When you use -split on it each element is treated separately.
Eg when used on the following array
[0] "Testing"
[1] "This is a test"
[2] "'tis still a test"
$array -split "is", 0, "SimpleMatch"
[0] "Testing"
[1] "Th"
[2] " "
[3] " a test"
[4] "'t"
[5] " still a test"
When you use the -Raw switch, Get-Content returns the entire file as a single string with newline characters.
The other thing I'll point out is you're escaping the quotes, but this isn't necessary. The reason you need to escape quotes is so PowerShell doesn't assume you're terminating the string:
$t = "This is a "bad" test"
> At line:1 char:18
+ $t = "This is a "bad" test"
+ ~~~~~~~~~~
Unexpected token 'bad" test"' in expression or statement.
You need to escape the quotes so that "bad" is still part of the string.
However when you are reading from a file the quotes are already part of the string:
Get-Content C:\test.txt
> This is a "bad" test
Because you are not typing the quotes into the console, they do not need to be escaped. To show you with your own code, check the full content of your temp file:
Home address `"TJ One way`"
Office address `"C company Two way`"
I can't think of any reason you would need to be doing this. Perhaps if you wanted to copy and paste into a console for some reason but that's it.
This may appear to work for now but only because the SQL query I assume you are trying to run doesn't contain quotes, and while I'm not sure if they are used in SQL it would throw an error if you tried, and regardless it's an extra step you don't need to be doing so you can basically scrap the whole temp file and read straight from the original.

Splitting a string into separate variables

I have a string, which I have split using the code $CreateDT.Split(" "). I now want to manipulate two separate strings in different ways. How can I separate these into two variables?

Like this?
$string = 'FirstPart SecondPart'
$a,$b = $string.split(' ')
$a
$b

An array is created with the -split operator. Like so,
$myString="Four score and seven years ago"
$arr = $myString -split ' '
$arr # Print output
Four
score
and
seven
years
ago
When you need a certain item, use array index to reach it. Mind that index starts from zero. Like so,
$arr[2] # 3rd element
and
$arr[4] # 5th element
years

It is important to note the following difference between the two techniques:
$Str="This is the<BR />source string<BR />ALL RIGHT"
$Str.Split("<BR />")
This
is
the
(multiple blank lines)
source
string
(multiple blank lines)
ALL
IGHT
$Str -Split("<BR />")
This is the
source string
ALL RIGHT
From this you can see that the string.split() method:
performs a case sensitive split (note that "ALL RIGHT" his split on the "R" but "broken" is not split on the "r")
treats the string as a list of possible characters to split on
While the -split operator:
performs a case-insensitive comparison
only splits on the whole string

Try this:
$Object = 'FirstPart SecondPart' | ConvertFrom-String -PropertyNames Val1, Val2
$Object.Val1
$Object.Val2

Foreach-object operation statement:
$a,$b = 'hi.there' | foreach split .
$a,$b
hi
there

Drop last section of string in powershell

I have found many ways in Powershell to capture the sections of strings using split(), but I am stumped on this one. Using the example string below:
"Monkey/Zebra/Bird/Bird"
I am able to capture the end "Bird" using the code below:
$path = "Monkey/Zebra/Bird/Bird"
$animal = $path.split("/")[-1]
My end goal is to be able to capture the front of the string, without the last "split", so to output:
"Monkey/Zebra/Bird"
The number of "Animals" will vary, so I cannot hard code the number of characters or "/" to look for.

Using a regular expression with -replace:
$text = "Monkey/Zebra/Bird/Bird"
$text -replace '/[^/]+$'
Monkey/Zebra/Bird

I would probably use a regex too, but if you wanted to use a split:
("Monkey/Zebra/Bird/Bird" -split '/')[0..((("Monkey/Zebra/Bird/Bird" -split '/').count)-2)] -join '\'
I love Perl...errr Powershell

You could use a for each command such as
$path = "Monkey/Zebra/Bird/Bird"
foreach($animal in $path.split("/"))
{
Write-Host $animal
}
This will then split the path and process each animal in turn

I preface this with the fact that RegEx is probably your fastest answer. That said...
Another approach, using split, to get only one of each animal since your example was parsing out the duplicate "bird" from "Monkey/Zebra/Bird/Bird"
$Animals = "Monkey/Zebra/Bird/Bird"
($Animals.Split('/')|Select -Unique) -join '/'
Or if you just want to drop the last part you can do what EBGreen suggested, split it into individual animals, count those, return all but the last one, and re-join them together.
($Animals.Split('/'))[0..($Animals.Split('/').count-2)] -join '/'
Either of those will return Monkey/Zebra/Bird but if you like the latter of the options please attribute the answer to EBGreen.

$path = "Monkey/Zebra/Bird/Bird"
$path -replace '/\w+$'
Monkey/Zebra/Bird

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Trying to parse some substrings in Powershell - string

Related

Powershell multiple string replacement using while cycle

Log Parsing via Powershell - print all array elements after nth element

PowerShell split not working using words (read from file)

Splitting a string into separate variables

Drop last section of string in powershell

Categories

Resources