Extract substrings where match is found - string

I have a text file with a number of lines. I would like to search each line individually for a particular pattern and, if that pattern is found output a substring at a particular position relative to where the pattern was found.
i.e. if a line contains the pattern at position 20, I would like to output the substring that begins at position 25 on the same line and lasts for five characters.
The following code will output every line that contains the pattern:
select-string -path C:\Scripts\trimatrima\DEBUG.txt -pattern $PATTERN
Where do I go from here?

You can use the $Matches automatic variable:
Last match is stored in $Matches[0], but you can also use named capture groups, like this:
"test","fest","blah" |ForEach-Object {
if($_ -match "^[bf](?<groupName>es|la).$"){
$Matches["groupName"]
}
}
returns es (from "fest") and la (from "blah")

Couple of options.
Keeping Select-String, you'll want to use the .line property to get your substrings:
select-string -path C:\Scripts\trimatrima\DEBUG.txt -pattern $PATTERN |
foreach { $_.line.Substring(19,5) }
For large files, Get-Content with -ReadCount and -match may be faster:
Get-Content C:\Scripts\trimatrima\DEBUG.txt-ReadCount 1000 |
foreach {
$_ -match $pattern |
foreach { $_.substring(19,5) }
}

Related

Manipulate strings in a txt file with Powershell - Search for words in each line containing "." and save them in new txt file

I have a text file with different entries. I want to manipulate it to filter out always the word, containing a dot (using Powershell)
$file = "C:\Users\test_folder\test.txt"
Get-Content $file
Output:
Compass Zype.Compass 1.1.0 thisisaword
Pomodoro Logger zxch3n.PomodoroLogger 0.6.3 thisisaword
......
......
......
Bla Word Program.Name 1.1.1 this is another entry
As you can see, in all lines, the "second" "word" contains a dot, like "Program.Name".
I want to create a new file, which contains just those words, each line one word.
So my file should look something like:
Zype.Compass
zxch3n.PomodoroLogger
Program.Name
What I have tried so far:
Clear-Host
$folder = "C:\Users\test_folder"
$file = "C:\Users\test_folder\test.txt"
$content_txtfile = Get-Content $file
foreach ($line in $content_textfile)
{
if ($line -like "*.*"){
$line | Out-File "$folder\test_filtered.txt"
}
}
But my output is not what I want.
I hope you get what my problem is.
Thanks in advance! :)
Here is a solution using Select-String to find sub strings by RegEx pattern:
(Select-String -Path $file -Pattern '\w+\.\w+').Matches.Value |
Set-Content "$folder\test_filtered.txt"
You can find an explanation and the ability to experiment with the RegEx pattern at RegEx101.
Note that while the RegEx101 demo also shows matches for the version numbers, Select-String gives you only the first match per line (unless argument -AllMatches is passed).
This looks like fixed-width fields, and if so you can reduce it to this:
Get-Content $file | # Read the file
%{ $_.Substring(29,36).Trim()} | # Extract the column
?{ $_.Contains(".") } | # Filter for values with "."
Set-Content "$folder\test_filtered.txt" # Write result
Get-content is slow and -like is sometimes slower than -match. I prefer -match but some prefer -like.
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line -match "\.") {
$line | out-file $output -append
}
}
Otherwise for a shorter option, maybe
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
Get-content "c:\path\to\file.txt" | where {$_ -match "\.") | Out-file $output
For other match options that are for the first column, either name the column (not what you do here) or use a different search criteria
\. Means a period anywhere seein the whole line
If it's all periods and at the beginning you can use begining of line so..
"^\." Which means first character is a period.
If it's always a period before the tab maybe do an anything except tab period anything except tab or...
"^[^\t]*\.[^\t]*" this means at the start of the line anything except tab any quantity then a period then anything except a tab any number of times.

Replacing specific strings in all matching files content with the file's basename using PowerShell

Get-ChildItem 'C:\Users\Zac\Downloads\script test\script test\*.txt' -Recurse | ForEach {(Get-Content $_ | ForEach { $_ -replace '1000', $fileNameOnly}) | Set-Content $_ }
I have been trying to use a simple PowerShell script to replace the 1000 value in my documents with the goal of replacing the value with the name of the .nc1/.txt file it is editing.
For example a file that is called BM3333.nc1 has a line value of 1000 which needs to replace it with BM3333 so on, so forth. This will be used in batch editing.
What is the variable that I use for replacing the 1000 with the file name?
So far, I can get this to run but it doesn't replace the 1000 value, it removes it.
Your problem is that inside the ScriptBlock of a ForEach-Object invocation, the variable is $_ (also known as $PSItem). There is no name for the inner script to get the value from the outer script.
You need to create a unique name in the outer script beforehand. The ScriptBlock argument to ForEach-Object does not need to be a single expression. You can either use multiple lines or a ;.
1..3 | ForEach-Object { $a = $_; 100..105 | ForEach-Object { $_ * $a } }
For your use case, you need this variable to be the name of the file. The values in the outer ScriptBlock are System.IO.FileSystemInfo, which were returned by Get-ChildInfo.
PowerShell makes iterating on work like this very easy; try seeing which properties are available:
Get-ChildItem 'C:\Users\Zac\Downloads\script test\script test\*.txt' -Recurse | Select-Object -First 1 | Format-List *

PowerShell: How to find a word within a line of text and modify the data behind it

I'm looking for a way to find a word with a value behind it in a piece of text and then update the value.
Example:
In the file the are multiple occurrences of 'schema="anon" maxFileSize="??????" maxBufferSize="123"'
I want to find all the lines containing maxFileSize and then update the unknown value ?????? to 123456.
So far, I came up with this:
cls
$Files = "C:\temp1\file.config","C:\temp2\file.config"
$newMaxFileSize = "123456"
ForEach ($File in $Files) {
If ((Test-Path $File -PathType Leaf) -eq $False) {
Write-Host "File $File doesn't exist"
} Else {
# Check content of file and select all lines where maxFileSize isn't equal to 123456 yet
$Result = Get-Content $File | Select-String -Pattern "maxFileSize" -AllMatches | Select-String -Pattern "123456" -NotMatch -AllMatches
Write-Host $Result
<#
ROUTINE TO UPDATE THE SIZE
#>
}
}
Yet, I have no clue how to find the word "maxFileSize", let alone how to update the value behind it...
Assuming the input file is actually XML, use the following XPath expression to locate all nodes that have a maxFileSize attribute (regardless of value):
# Parse input file as XML
$configXml = [xml](Get-Content $file)
# Use Select-Xml to find relevant nodes
$configXml |Select-Xml '//*[#maxFileSize]' |ForEach-Object {
# Update maxFileSize attribute value
$_.Node.SetAttribute('maxFileSize','123456')
}
# Overwrite original file with updated XML
$configXml.Save($file.FullName)
If the config file is some archaic format for which no readily available parser exists, use the -replace operator to update the value where appropriate:
$Results = #(Get-Content $File) -creplace '(?<=maxFileSize=")[^"]*(?=")','123456'
The pattern used above, (?<=maxFileSize=")[^"]*(?="), describes:
(?<= # Positive look-behind assertion, this pattern MUST precede the match
maxFileSize=" # literal string `maxFileSize="`
) # Close look-behind
[^"]* # Match 0 or more non-" characters
(?= # Positive look-ahead assertion, this pattern MUST succeed the match
" # literal string `"`
) # Close look-ahead

How do I check a string exist in a file using PowerShell?

I have a 1st text file looks like this : 12AB34.US. The second text file is CD 34 EF.
I want to find my 2nd text file exist or not in the 1st text file.
I tried to cut 3 characters last in the first text file (.US). Then I split to each 2 characters (because the 2nd text file consist of 2 characters). Then, I tried this code, and it always return "Not Found".
$String = Get-Content "C:\Users\te2.txt"
$Data = Get-Content "C:\Users\Fixed.txt"
$Split = $Data -split '(..)'
$Cut = $String.Substring(0,6)
$String_Split = $Cut -split '(..)'
$String_Split
$Check= $String_Split | %{$_ -match $Split}
if ($Check-contains $true) {
Write-Host "0"
} else {
Write-Host "1"
}
There are a number of problems with your current approach.
The 2-char groups don't align:
# strings split into groups of two
'12' 'AB' '34' # first string
'CD' ' 3' '4 ' # second string
When you test multiple strings with -match, you need to
escape the input string to avoid matchings on meta characters (like .), and
place the collection on the left-hand side of the operator, the pattern on the right:
$Compare = $FBString_Split | % {$Data_Split -match [regex]::Escape($_)}
if ($Compare -contains $true) {
Write-Host "Found"
} else {
Write-Host "Not Found"
}
For a more general solution to find out if any substring of N chars of one string is also a substring of another, you could probably do something like this instead:
$a = '12AB34.US'
$b = 'CD 34 EF'
# we want to test all substrings of length 2
$n = 2
$possibleSubstrings = 0..($n - 1) | ForEach-Object {
# grab substrings of length $n at every offset from 0 to $n
$a.Substring($_) -split "($('.'*$n))" | Where-Object Length -eq $n |ForEach-Object {
# escape the substring for later use with `-match`
[regex]::Escape($_)
}
} |Sort-Object -Unique
# We can construct a single regex pattern for all possible substrings:
$pattern = $possibleSubstrings -join '|'
# And finally we test if it matches
if($b -match $pattern){
Write-Host "Found!"
}
else {
Write-Host "Not found!"
}
This approach will give you the correct answer, but it'll become extremely slow on large inputs, at which point you may want to look at non-regex based strategies like Boyer-Moore

Replacing a set of strings containing any character

Is there a way to replace a set of string no matter what the string contains?
I am trying to replace one string containing: quotes(""), brackets([]), #, e.
gci C:\test *.txt -recurse | ForEach {(Get-Content $_ | ForEach {$_ -replace '"my"', "money"}) | Set-Content $_ }
but what if a string I want to replace has EVERYTHING in <>:
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[1234#1]" Author="PPWIN" Version="4.0"
So you want to replace everything in [] in the sample text you included in your question. If you were not aware ( although I think you are now ) -replace supports regular expressions. A simple regex can find the text you are looking for. I am also going remove some of the redundancy in your code.
Get-ChildItem C:\test -Filter *.txt -Recurse | ForEach-Object{
$file = $_.FullName
(Get-Content $file) -replace "\[.*?]","[bagel]" | Set-Content $file
}
Explanation borrowed from regex101.com
\[ matches the character [ literally
.*? matches any character (except newline). Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
] matches the character ] literally
So that line would then appear as the following inside the source file.
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[bagel]" Author="PPWIN" Version="4.0"

Resources