PowerShell: How to find a word within a line of text and modify the data behind it - string

I'm looking for a way to find a word with a value behind it in a piece of text and then update the value.
Example:
In the file the are multiple occurrences of 'schema="anon" maxFileSize="??????" maxBufferSize="123"'
I want to find all the lines containing maxFileSize and then update the unknown value ?????? to 123456.
So far, I came up with this:
cls
$Files = "C:\temp1\file.config","C:\temp2\file.config"
$newMaxFileSize = "123456"
ForEach ($File in $Files) {
If ((Test-Path $File -PathType Leaf) -eq $False) {
Write-Host "File $File doesn't exist"
} Else {
# Check content of file and select all lines where maxFileSize isn't equal to 123456 yet
$Result = Get-Content $File | Select-String -Pattern "maxFileSize" -AllMatches | Select-String -Pattern "123456" -NotMatch -AllMatches
Write-Host $Result
<#
ROUTINE TO UPDATE THE SIZE
#>
}
}
Yet, I have no clue how to find the word "maxFileSize", let alone how to update the value behind it...

Assuming the input file is actually XML, use the following XPath expression to locate all nodes that have a maxFileSize attribute (regardless of value):
# Parse input file as XML
$configXml = [xml](Get-Content $file)
# Use Select-Xml to find relevant nodes
$configXml |Select-Xml '//*[#maxFileSize]' |ForEach-Object {
# Update maxFileSize attribute value
$_.Node.SetAttribute('maxFileSize','123456')
}
# Overwrite original file with updated XML
$configXml.Save($file.FullName)
If the config file is some archaic format for which no readily available parser exists, use the -replace operator to update the value where appropriate:
$Results = #(Get-Content $File) -creplace '(?<=maxFileSize=")[^"]*(?=")','123456'
The pattern used above, (?<=maxFileSize=")[^"]*(?="), describes:
(?<= # Positive look-behind assertion, this pattern MUST precede the match
maxFileSize=" # literal string `maxFileSize="`
) # Close look-behind
[^"]* # Match 0 or more non-" characters
(?= # Positive look-ahead assertion, this pattern MUST succeed the match
" # literal string `"`
) # Close look-ahead

Related

Manipulate strings in a txt file with Powershell - Search for words in each line containing "." and save them in new txt file

I have a text file with different entries. I want to manipulate it to filter out always the word, containing a dot (using Powershell)
$file = "C:\Users\test_folder\test.txt"
Get-Content $file
Output:
Compass Zype.Compass 1.1.0 thisisaword
Pomodoro Logger zxch3n.PomodoroLogger 0.6.3 thisisaword
......
......
......
Bla Word Program.Name 1.1.1 this is another entry
As you can see, in all lines, the "second" "word" contains a dot, like "Program.Name".
I want to create a new file, which contains just those words, each line one word.
So my file should look something like:
Zype.Compass
zxch3n.PomodoroLogger
Program.Name
What I have tried so far:
Clear-Host
$folder = "C:\Users\test_folder"
$file = "C:\Users\test_folder\test.txt"
$content_txtfile = Get-Content $file
foreach ($line in $content_textfile)
{
if ($line -like "*.*"){
$line | Out-File "$folder\test_filtered.txt"
}
}
But my output is not what I want.
I hope you get what my problem is.
Thanks in advance! :)
Here is a solution using Select-String to find sub strings by RegEx pattern:
(Select-String -Path $file -Pattern '\w+\.\w+').Matches.Value |
Set-Content "$folder\test_filtered.txt"
You can find an explanation and the ability to experiment with the RegEx pattern at RegEx101.
Note that while the RegEx101 demo also shows matches for the version numbers, Select-String gives you only the first match per line (unless argument -AllMatches is passed).
This looks like fixed-width fields, and if so you can reduce it to this:
Get-Content $file | # Read the file
%{ $_.Substring(29,36).Trim()} | # Extract the column
?{ $_.Contains(".") } | # Filter for values with "."
Set-Content "$folder\test_filtered.txt" # Write result
Get-content is slow and -like is sometimes slower than -match. I prefer -match but some prefer -like.
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line -match "\.") {
$line | out-file $output -append
}
}
Otherwise for a shorter option, maybe
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
Get-content "c:\path\to\file.txt" | where {$_ -match "\.") | Out-file $output
For other match options that are for the first column, either name the column (not what you do here) or use a different search criteria
\. Means a period anywhere seein the whole line
If it's all periods and at the beginning you can use begining of line so..
"^\." Which means first character is a period.
If it's always a period before the tab maybe do an anything except tab period anything except tab or...
"^[^\t]*\.[^\t]*" this means at the start of the line anything except tab any quantity then a period then anything except a tab any number of times.

Powershell look for a string between two string

How could I check if there is the "data" value between my two other "values" knowing that the number of lines spacing them is not regular and that the last "data" has no "value" after it?
values
xxx
xxx
data
xxx
values
xxx
xxx
values
xxx
data
values
xxx
data
xxx
All I have is:
if (xxx) {
xxx | Add-Content -Path $DATA
} else {
Add-Content -Path $DATA -Value 'N/A'
}
And in the end the result I would like is :
data
N/A
data
data
You can use a Select-String approach:
# $s contains a single string of your data
($s | Select-String -Pattern '(?s)(?<=values).*?(?=values|$)' -AllMatches).Matches.Value |
Foreach-Object {
('N/A',$matches.0)[$_ -match 'data']
}
Explanation:
-Pattern without -SimpleMatch uses regex. (?s) is a single-line modifier so that . matches newline characters. (?<=values) is a positive lookbehind for the string values. (?=values|$) is a positive lookahead for the string values or (|) the end of string ($). Using lookaheads and lookbehinds prevents values from being matched so that each one can be used in the next set of matches. Otherwise, once values is matched, it can't be used again in another match condition. .*? lazily matches all characters.
Inside the Foreach-Object, the current match object $_ is checked for data. If data is found, automatic variable $matches gets updated with its value. Since $matches is a hash table, you need to access capture group 0 (0 is the key name) for the value.
It is imperative that $s be a single string here. If you are reading it from a text file, use $s = Get-Content file.txt -Raw.
You could make this a bit more dynamic using variables:
$Start = 'values'
$End = 'values'
$data = 'data'
($s | Select-String -Pattern "(?s)(?<=$Start).*?(?=$End|$)" -AllMatches).Matches.Value |
Foreach-Object {
('N/A',$matches.0)[$_ -match 'data']
}
You'll want to read the lines one by one, and keep track of 1) when you see a values line and 2) when you see a data line.
I prefer switch statements for this kind of top-down parsing:
# In real life you would probably read from a file on disk, ie.
# $data = Get-Content some\file.txt
$data = -split 'values xxx xxx data xxx values xxx xxx values xxx data values xxx data xxx'
# Use this to keep track of whether the first "values" line has been observed
$inValues = $false
# Use this to keep track of whether a "data" line has been observed since last "values" line
$hasData = $false
switch($data)
{
'values' {
# don't output anything the first time we see `values`
if($inValues){
if($hasData){
'data'
} else {
'N/A'
}
}
$inValues = $true
# reset our 'data' monitor
$hasData = $false
}
'data' {
# we got one!
$hasData = $true
}
}
if($inValues){
# remember to output for the last `values` block
if($hasData){
'data'
} else {
'N/A'
}
}

How do I check a string exist in a file using PowerShell?

I have a 1st text file looks like this : 12AB34.US. The second text file is CD 34 EF.
I want to find my 2nd text file exist or not in the 1st text file.
I tried to cut 3 characters last in the first text file (.US). Then I split to each 2 characters (because the 2nd text file consist of 2 characters). Then, I tried this code, and it always return "Not Found".
$String = Get-Content "C:\Users\te2.txt"
$Data = Get-Content "C:\Users\Fixed.txt"
$Split = $Data -split '(..)'
$Cut = $String.Substring(0,6)
$String_Split = $Cut -split '(..)'
$String_Split
$Check= $String_Split | %{$_ -match $Split}
if ($Check-contains $true) {
Write-Host "0"
} else {
Write-Host "1"
}
There are a number of problems with your current approach.
The 2-char groups don't align:
# strings split into groups of two
'12' 'AB' '34' # first string
'CD' ' 3' '4 ' # second string
When you test multiple strings with -match, you need to
escape the input string to avoid matchings on meta characters (like .), and
place the collection on the left-hand side of the operator, the pattern on the right:
$Compare = $FBString_Split | % {$Data_Split -match [regex]::Escape($_)}
if ($Compare -contains $true) {
Write-Host "Found"
} else {
Write-Host "Not Found"
}
For a more general solution to find out if any substring of N chars of one string is also a substring of another, you could probably do something like this instead:
$a = '12AB34.US'
$b = 'CD 34 EF'
# we want to test all substrings of length 2
$n = 2
$possibleSubstrings = 0..($n - 1) | ForEach-Object {
# grab substrings of length $n at every offset from 0 to $n
$a.Substring($_) -split "($('.'*$n))" | Where-Object Length -eq $n |ForEach-Object {
# escape the substring for later use with `-match`
[regex]::Escape($_)
}
} |Sort-Object -Unique
# We can construct a single regex pattern for all possible substrings:
$pattern = $possibleSubstrings -join '|'
# And finally we test if it matches
if($b -match $pattern){
Write-Host "Found!"
}
else {
Write-Host "Not found!"
}
This approach will give you the correct answer, but it'll become extremely slow on large inputs, at which point you may want to look at non-regex based strategies like Boyer-Moore

Extract substrings where match is found

I have a text file with a number of lines. I would like to search each line individually for a particular pattern and, if that pattern is found output a substring at a particular position relative to where the pattern was found.
i.e. if a line contains the pattern at position 20, I would like to output the substring that begins at position 25 on the same line and lasts for five characters.
The following code will output every line that contains the pattern:
select-string -path C:\Scripts\trimatrima\DEBUG.txt -pattern $PATTERN
Where do I go from here?
You can use the $Matches automatic variable:
Last match is stored in $Matches[0], but you can also use named capture groups, like this:
"test","fest","blah" |ForEach-Object {
if($_ -match "^[bf](?<groupName>es|la).$"){
$Matches["groupName"]
}
}
returns es (from "fest") and la (from "blah")
Couple of options.
Keeping Select-String, you'll want to use the .line property to get your substrings:
select-string -path C:\Scripts\trimatrima\DEBUG.txt -pattern $PATTERN |
foreach { $_.line.Substring(19,5) }
For large files, Get-Content with -ReadCount and -match may be faster:
Get-Content C:\Scripts\trimatrima\DEBUG.txt-ReadCount 1000 |
foreach {
$_ -match $pattern |
foreach { $_.substring(19,5) }
}

Replacing a set of strings containing any character

Is there a way to replace a set of string no matter what the string contains?
I am trying to replace one string containing: quotes(""), brackets([]), #, e.
gci C:\test *.txt -recurse | ForEach {(Get-Content $_ | ForEach {$_ -replace '"my"', "money"}) | Set-Content $_ }
but what if a string I want to replace has EVERYTHING in <>:
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[1234#1]" Author="PPWIN" Version="4.0"
So you want to replace everything in [] in the sample text you included in your question. If you were not aware ( although I think you are now ) -replace supports regular expressions. A simple regex can find the text you are looking for. I am also going remove some of the redundancy in your code.
Get-ChildItem C:\test -Filter *.txt -Recurse | ForEach-Object{
$file = $_.FullName
(Get-Content $file) -replace "\[.*?]","[bagel]" | Set-Content $file
}
Explanation borrowed from regex101.com
\[ matches the character [ literally
.*? matches any character (except newline). Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
] matches the character ] literally
So that line would then appear as the following inside the source file.
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[bagel]" Author="PPWIN" Version="4.0"

Resources