PowerShell FINDSTR eqivalent? - search

What's the DOS FINDSTR equivalent for PowerShell? I need to search a bunch of log files for "ERROR".

Here's the quick answer
Get-ChildItem -Recurse -Include *.log | select-string ERROR
I found it here which has a great indepth answer!

For example, find all instances of "#include" in the c files in this directory and all sub-directories.
gci -r -i *.c | select-string "#include"
gci is an alias for get-childitem

Just to expand on Monroecheeseman's answer. gci is an alias for Get-ChildItem (which is the equivalent to dir or ls), the -r switch does a recursive search and -i means include.
Piping the result of that query to select-string has it read each file and look for lines matching a regular expression (the provided one in this case is ERROR, but it can be any .NET regular expression).
The result will be a collection of match objects, showing the line matching, the file, and and other related information.

if ($entry.EntryType -eq "Error")
Being Object Oriented, you want to test the property in question with one of the standard comparison operators you can find here.
I have a PS script watching logs remotely for me right now - some simple modification should make it work for you.
edit: I suppose I should also add that is a cmdlet built for this already if you don't want to unroll the way I did. Check out:
man Get-EventLog
Get-EventLog -newest 5 -logname System -EntryType Error

On a related note, here's a search that will list all the files containing a particular regex search or string. It could use some improvement so feel free to work on it. Also if someone wanted to encapsulate it in a function that would be welcome.
I'm new here so if this should go in it's own topic just let me know. I figured I'd put it her since this looks mostly related.
# Search in Files Script
# ---- Set these before you begin ----
$FolderToSearch="C:\" # UNC paths are ok, but remember you're mass reading file contents over the network
$Search="Looking For This" # accepts regex format
$IncludeSubfolders=$True #BUG: if this is set $False then $FileIncludeFilter must be "*" or you will always get 0 results
$AllMatches=$False
$FileIncludeFilter="*".split(",") # Restricting to specific file types is faster than excluding everything else
$FileExcludeFilter="*.exe,*.dll,*.wav,*.mp3,*.gif,*.jpg,*.png,*.ghs,*.rar,*.iso,*.zip,*.vmdk,*.dat,*.pst,*.gho".split(",")
# ---- Initialize ----
if ($AllMatches -eq $True) {$SelectParam=#{AllMatches=$True}}
else {$SelectParam=#{List=$True}}
if ($IncludeSubfolders -eq $True) {$RecurseParam=#{Recurse=$True}}
else {$RecurseParam=#{Recurse=$False}}
# ---- Build File List ----
#$Files=Get-Content -Path="$env:userprofile\Desktop\FileList.txt" # For searching a manual list of files
Write-Host "Building file list..." -NoNewline
$Files=Get-ChildItem -Include $FileIncludeFilter -Exclude $FileExcludeFilter -Path $FolderToSearch -ErrorAction silentlycontinue #RecurseParam|Where-Object{-not $_.psIsContainer} # #RecurseParam is basically -Recurse=[$True|$False]
#$Files=$Files|Out-GridView -PassThru -Title 'Select the Files to Search' # Manually choose files to search, requires powershell 3.0
Write-Host "Done"
# ---- Begin Search ----
Write-Host "Searching Files..."
$Files|
Select-String $Search #SelectParam| #The # instead of $ lets me pass the hastable as a list of parameters. #SelectParam is either -List or -AllMatches
Tee-Object -Variable Results|
Select-Object Path
Write-Host "Search Complete"
#$Results|Group-Object path|ForEach-Object{$path=$_.name; $matches=$_.group|%{[string]::join("`t", $_.Matches)}; "$path`t$matches"} # Show results including the matches separated by tabs (useful if using regex search)
<# Other Stuff
#-- Saving and restoring results
$Results|Export-Csv "$env:appdata\SearchResults.txt" # $env:appdata can be replaced with any UNC path, this just seemed like a logical place to default to
$Results=Import-Csv "$env:appdata\SearchResults.txt"
#-- alternate search patterns
$Search="(\d[-|]{0,}){15,19}" #Rough CC Match
#>

This is not the best way to do this:
gci <the_directory_path> -filter *.csv | where { $_.OpenText().ReadToEnd().Contains("|") -eq $true }
This helped me find all csv files which had the | character in them.

PowerShell has basically precluded the need for findstr.exe as the previous answers demonstrate. Any of these answers should work fine.
However, if you actually need to use findstr.exe (as was my case) here is a PowerShell wrapper for it:
Use the -Verbose option to output the findstr command line.
function Find-String
{
[CmdletBinding(DefaultParameterSetName='Path')]
param
(
[Parameter(Mandatory=$true, Position=0)]
[string]
$Pattern,
[Parameter(ParameterSetName='Path', Mandatory=$false, Position=1, ValueFromPipeline=$true)]
[string[]]
$Path,
[Parameter(ParameterSetName='LiteralPath', Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
[Alias('PSPath')]
[string[]]
$LiteralPath,
[Parameter(Mandatory=$false)]
[switch]
$IgnoreCase,
[Parameter(Mandatory=$false)]
[switch]
$UseLiteral,
[Parameter(Mandatory=$false)]
[switch]
$Recurse,
[Parameter(Mandatory=$false)]
[switch]
$Force,
[Parameter(Mandatory=$false)]
[switch]
$AsCustomObject
)
begin
{
$value = $Pattern.Replace('\', '\\\\').Replace('"', '\"')
$findStrArgs = #(
'/N'
'/O'
#('/R', '/L')[[bool]$UseLiteral]
"/c:$value"
)
if ($IgnoreCase)
{
$findStrArgs += '/I'
}
function GetCmdLine([array]$argList)
{
($argList | foreach { #($_, "`"$_`"")[($_.Trim() -match '\s')] }) -join ' '
}
}
process
{
$PSBoundParameters[$PSCmdlet.ParameterSetName] | foreach {
try
{
$_ | Get-ChildItem -Recurse:$Recurse -Force:$Force -ErrorAction Stop | foreach {
try
{
$file = $_
$argList = $findStrArgs + $file.FullName
Write-Verbose "findstr.exe $(GetCmdLine $argList)"
findstr.exe $argList | foreach {
if (-not $AsCustomObject)
{
return "${file}:$_"
}
$split = $_.Split(':', 3)
[pscustomobject] #{
File = $file
Line = $split[0]
Column = $split[1]
Value = $split[2]
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
}

FYI:
If you update to Powershell version 7 you can use grep...
I know egrep is in powershell on Azure CLI...
But SS is there!
An old article here: [https://devblogs.microsoft.com/powershell/select-string-and-grep/]

Related

How to recursively search all files in a directory and sub-directories using PowerShell?

I'm not understanding where the recursion is occurring nor how it's used in the below tree function (which is meant to emulate some of the linux tree command results).
From the tree function, how are files (or file names and their path) passed to, here, a SearchString function?
for context, here's a REPL session demonstrating the end-goal on a single file: getting the PSPath property for a file, and using that property for a simple regex.
Session transcript:
posh> $dir = "/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)"
posh> $files = Get-ChildItem -Path $dir –File
posh> $files.Length
3
posh> $files[0].Extension
.txt
posh> $files[0].PSPath
Microsoft.PowerShell.Core\FileSystem::/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)/549 - Microsoft Office User.txt
posh> $pattern = '(?=.*?foo)(?=.*?bar)'
posh> $string = Get-Content $files[0]
posh> $string | Select-String $pattern
This file doesn't have any "foo" and "bar" matches. The goal is to search the entire Calibre library using PowerShell as above.
large output from a tree of the Calibre library trimmed to a single result:
Directory: /home/nicholas/Calibre Library/Microsoft Office User/548 (1474)
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 2/20/2021 3:22 AM 159883 548 - Microsoft Office User.txt
----- 2/20/2021 2:13 AM 351719 cover.jpg
----- 2/20/2021 2:39 AM 1126 metadata.opf
posh> ./worker.ps1
How is the above file and path passed to the SearchString function?
the goal being to iterate through the entire library and search all plain-text file. (Assumption being that plain-text files have a ".txt" extension.)
library code:
function SearchFile($dir,$file)
{
$path = [string]::Concat($dir,"/",$file)
$pattern='(?=.*?foo)(?=.*?bar)'
$string = Get-Content $path
$result = $string | Select-String $pattern
$result
}
function tree($dir)
{
"$dir"
$tree = Get-ChildItem -Recurse
$tree = Get-ChildItem -Path $dir -Recurse
# get any files and invoke SearchFile here ?
$tree
}
worker code:
. /home/nicholas/powershell/functions/library.ps1
$dir = "/home/nicholas/Calibre Library"
tree $dir
The execution of the SearchFile function should be triggered when a ".txt" file is found. That logic is missing. But the larger missing piece is how to invoke SearchFile from the tree function so that every file gets searched.
How is that done? Leaving aside the file-type or file extension. Not seeing where the recursion occurs.
You are really overcomplicating things. You can do this very easily by using Get-ChildItem to find your txt files recursively in $dir path and then piping these FileInfo objects directly to Select-String cmdlet which accepts pipeline input and will grab the PSPath from the FileInfo object being passed to it and do its thing. Select-String will do this for every object that Get-ChildItem sends to it which are FileInfo objects for all txt files found recursively in your $dir path.
$dir = '/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)'
Get-ChildItem -Recurse -Path $dir -Filter *.txt |
Select-String -Pattern '(?=.*?foo)(?=.*?bar)'
Get-ChildItem already does the recursion for you when you specify the -Recurse argument. For your code it doesn't make any difference. You get a linear list of all file informations that you can process using ForEach-Object in the same way as if you didn't specify -Recurse.
The SearchFile function should be executed when a ".txt" file is found.
Use the -Filter parameter to specify *.txt. Also when you want to get files only, always pass -File. This allows the filesystem provider to already skip directories, which is faster and also more correct (in theory there could be directories named e. g. foo.txt which would let SearchFile run into an error).
function tree($dir)
{
"$dir"
Get-ChildItem -Path $dir -Recurse -File -Filter *.txt | ForEach-Object {
SearchFile -dir $_.Directory.PSPath -file $_.Name
}
}
I don't know why your function SearchFile has separate parameters for directory and file name. The Get-ChildItem already outputs the full path in $_.PSPath. It doesn't make much sense to split the path apart and join it together again in SearchFile. I suggest you replace them by a single Path parameter.

Powershell: Extract several strings from txt and create table out of it

I need to create a csv file out of values that are spread over many txt files. Here is an example for one of the txt files (they are all formatted the same way and stored in one folder, lets say c:\user\txtfiles):
System: asdf
Store: def
processid: 00001
Language: english
prodid: yellowshoes12
email: asdf#asdf.com
prodid: blueshoes34
some
other
text blabla
The result csv should look like this (i added values from another sample txt just to make it clear):
processid, prodid
00001, yellowshoes12
00001, blueshoes34
00002, redtshirt12
00002, greensocks34
That means that every product ID in the txt should be assigned to the one processid in the txt and added as single line to the csv.
I tried to reach the result as follows:
$pathtofiles = Get-ChildItem c:\user\txtfiles | select -ExpandProperty FullName
$parsetxt = $pathtofiles |
ForEach {
$orderdata = Import-Csv $_ |
Where-Object {($_ -like '*processid*') -or ($_ -like '*prodid*')} |
foreach {
write-output $orderdata -replace 'processid: ','' -replace 'prodid: ',''
}
}
$orderdata
So my intention was to isolate the relevant lines, delete everything that is not wanted, assign the values to variables and build a table out of it. One problem is that if I replace $orderdata from the end of the code into the end of the first foreach-loop nothing is printed. But after deliberating quite a while I am not sure if my approach is a good one anyway. So any help would be very appreciated!
Daniel
I think this is best done using a switch -Regex -File construct while iterating over the files in your folder.
# get the files in the folder and loop over them
$result = Get-ChildItem -Path 'c:\user\txtfiles' -Filter '*.txt' -File | ForEach-Object {
# the switch processes each line of a file and matches the regex to it
switch -Regex -File $_.FullName {
'^processid:\s+(\d+)' { $id = $matches[1] }
'^prodid:\s+(\w+)' { [PsCustomObject]#{'processid' = $id; 'prodid' = $matches[1]}}
}
} | Sort-Object processid, prodid
# output on console screen
$result
# output to CSV file
$result | Export-Csv -Path 'c:\user\txtfiles\allids.csv'
Result on screen:
processid prodid
--------- ------
00001 blueshoes34
00001 yellowshoes12
00002 greenshoes56
00002 purpleshoes88

How can I modify this PowerShell script to continue looking for one string after another?

I want this power shell script to search for the occurrence of multiple strings, one after the other, and to append the results in a .txt file.
Currently I am specifying the string that I want to look for, waiting for the script to finish looking for that string and transferring the results into a spreadsheet. This is taking a lot of time as I have to keep specifying the string I want to look for, especially since there are well over 100 that I need to look for.
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "C:\Users\username\Documents\FileName"
$files = Get-Childitem $path -Include *.docx,*.doc,*.ppt, *.xls,
*.xlsx, *.pptx, *.eap -Recurse | Where-Object { !($_.psiscontainer) }
$output =
"C:\Users\username\Documents\FileName\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First_String"
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
$wordFound = $range.find.execute($findText)
if($wordFound)
{
"$file.fullname has found the string called $findText and it is
$wordfound" | Out-File $output -Append
}
}
$document.close()
$application.quit()
}
getStringMatch
This script will look for 'First_String' successfully, I was hoping to be able to specify 'Second_String', 'Third_String' etc rather than replace First_String every time.
As an alternative to the suggestion from #Mathias, you could use Regex to query the document text instead.
Read the context of the document as a string $text = $document.content.text and then use Select-String $findtext -AllMatches to evaluate the matches with $findtext as string representation of a regular expression instead.
Example:
# pipe delimited string as a regular expression
$findtext = "First_String|Second_String|Third_String"
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$text = $document.content.text
$result = $text | Select-String $findtext -AllMatches
if($result)
{
"$file.fullname has found the strings called $($result.Matches.Value) at indexes $($result.Matches.Index)" | Out-File $output -Append
}
}
$document.close()
$application.quit()
}
Note that if you're trying find strings that do have reserved regex character, you'll need to escape them first

PowerShell script to look for particular word in the file and add “4” at the beginning of the line

I am the best. Page
I am good. Page
I am funny. Page
Output:
4 I am the best. Page
4 I am good. Page
4 I am funny. Page
PowerShell script needs to look for “page” and add “4” at the beginning of the line. I have created this script:
powershell -Command “sed ‘s/^Page/4 &/‘c:\users*.txt >test.txt”
but it didn't work in PowerShell.
This ought to do it:
$content = Get-Content "C:\path\to\my\file.txt"
$newcontent = $null
Foreach($line in $content)
{
if($line -ne "")
{
$line = "4 "+"$line`r`n"
$newcontent += $line
}
else
{
$newcontent += "`r`n"
}
}
Set-Content -Path "C:\path\to\my\file.txt" -Value $newcontent
powershell -command "foreach($ln in cat 'c:\users*.txt'){if($ln -match 'page'){write-host '4'$ln}}"
or
powershell -command "foreach($ln in cat 'c:\users*.txt'){if($ln -match 'page'){write-host '4'$ln}else{echo $ln}}"
depending on whether you only want to output lines with "page" in them.
Note also that powershell does not have a built in alias for sed and your /^Page/ would have only matched "page" at the beginning of a line in anycase.
sed is a Unix commandline tool that isn't commonly installed on Windows (although there are Windows ports of it).
The PowerShell way of doing what you're asking is
(Get-Content 'c:\users\*.txt') -replace '.*page','4$&' | Set-Content 'test.txt'
or (using aliases and redirection for reduced typing):
(cat 'c:\users\*.txt') -replace '.*page','4$&' > 'test.txt'
If you want to update each file separately (note: that is NOT what your Unix code snippet does) you'd do something like this:
Get-ChildItem 'C:\users\*.txt' | ForEach-Object {
(Get-Content $_.FullName) -replace '.*page','4$&' | Set-Content $_.FullName
}
or (again using aliases):
ls 'c:\users\*.txt' | %{(cat $_.FullName) -replace '.*page','4$&' | sc $_.FullName}
Note that you cannot use redirection in this case, because the redirection operator would open the file for writing before cat could read it, which would effectively truncate the file.

Optimizing simple search script in PowerShell

I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.
So far I made this;
$location = 'C:\Work*'
$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)
for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}
This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".
Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?
All input appreciated!
If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):
$location = 'C:\ROM'
$arr = "Roman", "Kuzmin"
# remove output files
foreach($test in $arr) {
Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
}
Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
# read all text once
$content = [System.IO.File]::ReadAllText($_.FullName)
# test patterns and output paths once
foreach($test in $arr) {
if ($content -match $test) {
$_.FullName >> ".\$test.txt"
}
}
}}}
Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.
Let's suppose that 1) the files are not too big and you can load it into memory, 2) you really just want the Path of the file, that matches (not the line etc.).
I tried to read the file only once and then iterate through the regexes. There is some gain (it's a faster then the original solution), but the final result will depend on other factors like file sizes, count of files etc.
Also removing 'ignorecase' makes it faster a little bit.
$res = #{}
$arr | % { $res[$_] = #() }
Get-ChildItem $location -recurse |
? { !$_.PsIsContainer } |
% { $file = $_
$text = [Io.File]::ReadAllText($file.FullName)
$arr |
% { $regex = $_
if ([Regex]::IsMatch($text, $regex, 'ignorecase')) {
$res[$regex] = $file.FullName
}
}
}
$res.GetEnumerator() | % {
$_.Value | Export-Csv "d:\temp\so-res$($_.Key).txt"
}

Resources