Powershell: Searching Content of files and write results to text file - search

I'm new to powershell so I don't know where to start. I want a script that searches in all (pdf, word, excell, powerpoint, ...) file content for a specific string combination.
I tried this script but it doesn't work:
function WordSearch ($sample, $staining, $sampleID, $patientID, $folder)
{
$objConnection = New-Object -com ADODB.Connection
$objRecordSet = New-Object -com ADODB.Recordset
$objConnection.Open(“Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;”)
$objRecordSet.Open(“SELECT System.ItemPathDisplay FROM SYSTEMINDEX WHERE ((Contains(Contents,’$sample’)) or (Contains(Contents,’$sampleID’) and Contains(Contents,’$staining’)) or (Contains(Contents,’$staining’) and Contains(Contents,’$patientID’))) AND System.ItemPathDisplay LIKE ‘$folder\%’”, $objConnection)
if ($objRecordSet.EOF -eq $false) {$objRecordSet.MoveFirst() }
while ($objRecordset.EOF -ne $true) {
$objRecordset.Fields.Item(“System.ItemPathDisplay”).Value
$objRecordset.MoveNext()
}
}
Can someone help me?

You should try this, but first make sure your in the folder you want to start searching down: (if your trying to search your whole computer, start in C:\ , but I imagine the script will take a decent amount of time to run.
$Paths = #()
$Paths = gci . *.* -rec | where { ! $_.PSIsContainer } |? {($_.Extension -eq ".doc") -or ($_.Extension -eq ".ppt") -or ($_.Extension -eq ".pdf") -or ($_.Extension -eq ".xls")} | resolve-path
This will retrieve all the file paths of those file types. If you have Microsoft office 2007 or above you may want to add searches for ".xlsx" or ".docx" or ".pptx"
Then you can begin looking through those files for your "specific string combination
array = #()
foreach($path in $Paths)
{$array += Select-String -Path $Path -Pattern "Search String"}
This will give you all the lines and paths that that string exists on in those files. The actual line output you get may be a little distorted though due to microsoft encrypting their files. Use $array | get-member -MemberType Property to find what items you can index to and the Select-object commandlet to pull those items out.

Related

Issues pulling value of cell using excel com objects in powershell

I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}
Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.

Automate creating configurations from template and excel

I'm having trouble in automation of a configuration.
I have a template of a configuration and need to change all the hostname (marked as YYY) and IP (marked as XXX(only 3rd octet needs replacement)) according to a list of excel values.
Now I have a list of 100 different sites and IPs and I want to have also 100 different configurations.
A friend suggested to use the following Powershell code but it doesn't any create files..:
$replaceValues = Import-Csv -Path "\\ExcelFile.csv"
$file = "\\Template.txt"
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues)
{
$contents = $contents -replace "YYY", $replaceValue.hostname
$contents = $contents -replace "XXX", $replaceValue.site
Copy-Item $file "$($file.$replaceValue.hostname)"
Set-Content -Path "$($file.$replaceValue.hostname)" -Value $contents
echo "$($file.$replaceValue.hostname)"
}
Your code tries to overwrite the same $contents string in the loop, so if the values are replaced the first time you enter the loop, there won't be any YYY or XXX values to replace left..
You need to keep the template text intact, and create a new copy from the template inside the loop. That copy can then be altered the way you want. Every next iteration wil then start off with a fresh copy of the template.
There is no need to first copy the template text to a new location and then overwrite this file with the new contents. Set-Content is happy to create a new file for you if it does not already exist.
Try
$replaceValues = Import-Csv -Path 'D:\Test\Values.csv'
$template = Get-Content -Path 'D:\Test\Template.txt'
foreach ($item in $replaceValues) {
$content = $template -replace 'YYY', $item.hostname -replace 'XXX', $item.site
$newFile = Join-Path -Path 'D:\Test' -ChildPath ('{0}.txt' -f $item.hostname)
Write-Host "Creating file '$newFile'"
$content | Set-Content -Path $newFile
}

How can I copy a column value from excel to csv file without using ComObject

I'm new to Power shell. I have a number of excel files (500+) having a column Animal Count that I would like to save in a new '.csv' file. I have a code to do this using excel Com Objects.
I want to achieve the same without using ComObjects. Could anyone help me in achieving this.
Download PSExcel module from
https://github.com/RamblingCookieMonster/PSExcel
Import it using Import-Module.
then use the following code:
$AnimalCount = #()
$Source = 'D:\Test' # the path to where the Excel files are
ForEach ($File in Get-ChildItem -Path $Source -Filter '*.xlsx' -File) {
$Excel = New-Excel -Path $File
$Cell = ($Excel | Get-WorkSheet | % {$_.Cells | ? {$_.Text -eq "AnimalCount"}})
$count = (($Excel | Get-WorkSheet -Name $Cell.Worksheet).Cells | ? {($_.Start.Row -eq $Cell.Start.Row) -and ($_.Start.Column -eq $Cell.Start.Column + 1)}).Text
$AnimalCount += [PsCustomObject] #{'File' = $File.FullName; 'AnimalCount' = $count }
}
$AnimalCount | Format-Table -AutoSize
$AnimalCount | Export-Csv -Path 'D:\Test\AnimalCount.csv' -UseCulture -NoTypeInformation
The best thing here is that you do not need excel to be installed on the machine that runs this script.

Editing Connection String in a Boatload of XLSB Documents

I have a couple hundred .xlsb files that need their connection string and command text changed in an easily programmable way. They are all buried in different folders deep in the file system. How can I use Powershell or some other program to go through and edit them all so I don't have to do it manually?
I've started looking into Powershell and Format-Hex. I figured I could ask and someone else may be able to set me on the right track. What needs to be done is recursively searching the filesystem from a certain point, detect if "this string" and this number "11111" are in the connection string and command text (respectively) of all xlsb files, and if they are replace them with "that string" and this number "22222". All in xlsb files. I've also looked into using python, but the libraries I found did not mention editing this setting, so I figured some sort of hex detection and replacement would be easier.
Would it be possible to have more info on what is a "connection string" ? To my knowledge this is not part of the properties of an xlsb file.
I suppose it to be the string which is used to create an ODBC Connection so the text you want to modify will be within the code of a macro.
So three issues:
Recursively find all xlsb files within a folder
$Fllt = gci "*.xlsb" -r
Open them in Excel
$Excl = New-Object -ComObject Excel.Application
$Fllt | %{$xl.Workbooks.Open($_.Fullname)}
Replace "this string" by "that string" and "11111" by "22222" in every macro. This is much more difficult.
My suggestion:
#Generation of a test file
$Excl = New-Object -ComObject Excel.Application
$xlve = $Excl.Version
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name AccessVBOM -Value 1 -Force | Out-Null
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name VBAWarnings -Value 1 -Force | Out-Null
#'
Sub Co()
ConnectionString = "this string"
CommandText = "11111"
End Sub
'# | Out-File C:\Temp\Test.txt -Encoding ascii
$Wkbk = $Excl.Workbooks.Add()
$Wkbk.VBProject.VBComponents.Import("C:\Temp\Test.txt") | Out-Null
$Wkbk.SaveAs("C:\Temp\Test.xlsb", 50)
$Excl.Quit()
#Get The files
$Fllt = gci -Path C:\Temp\ -Include *.xlsb -r
#Open Excel and set the security parameters to be able to modify macros
$Excl = New-Object -ComObject Excel.Application
$xlve = $Excl.Version
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name AccessVBOM -Value 1 -Force | Out-Null
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name VBAWarnings -Value 1 -Force | Out-Null
#Loop through the files and modify the macros
$path = "C:\Temp\ModuleVBATemp.txt" #Temp text file to copy and modify the macros
foreach ($File in $Fllt) {
$Wkbk = $Excl.Workbooks.Open($File.Fullname)
if ($Wkbk.HasVBProject) <# Test if any macro #> {
foreach ($Vbco in $Wkbk.VBProject.VBComponents) {
if ($Vbco.Type -eq '1') <# Only modify the modules #> {
#Modification of the script
$Vbco.Export($path) | Out-Null
(gc $path) -replace "this string","that string" -replace "11111","22222" `
| Out-File $path -Encoding ascii
$Wkbk.VBProject.VBComponents.Remove($Vbco)
$Wkbk.VBProject.VBComponents.Import($path) | Out-Null
}}}
$Wkbk.Close($true) #Save the file
}
$Excl.Quit()
It is working on my test file, I hope that your configuration is similar.

Remove known Excel passwords with PowerShell

I have this PowerShell code that loops through Excel files in a specified directory; references a list of known passwords to find the correct one; and then opens, decrypts, and saves that file to a new directory.
But it's not executing as quickly as I'd like (it's part of a larger ETL process and it's a bottleneck). At this point I can remove the passwords faster manually as the script takes ~40 minutes to decrypt 40 workbooks while referencing a list of ~50 passwords.
Is there a cmdlet or function (or something) that's missing which would speed this up, an overlooked flaw in the processing, or is PowerShell, perhaps, just not the right tool for this job?
Original Code (updated code can be found below):
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted\"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$original_Path = "C:\PoShTest\Originals\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "Processing" $file.name -f "DarkYellow"
write-host "Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -eq ".xlsx") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct - Save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0) {
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Close document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
# Move original file to $original_Path
move-item $file.fullname -Destination $original_Path -Force
}
else {
# Close document and Application
write-host "PASSWORD NOT FOUND: " $file.name -f "Magenta"
$ExcelObj.Close()
$ExcelObj.Application.Quit()
}
}
}
$count--
# Next File
}
Write-host "`n Processing Complete" -f "Green"
Updated code:
# Get Current EXCEL Process ID's so they are not affected but the scripts cleanup
# SilentlyContinue in case there are no active Excels
$currentExcelProcessIDs = (Get-Process excel -ErrorAction SilentlyContinue).Id
$a = Get-Date
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$processed_Path = "C:\PoShTest\Processed\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "`n Processing" $file.name -f "DarkYellow"
write-host "`n Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -like "*.xls*") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct, remove $passwd from array and save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0)
{
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Added to keep Excel process memory utilization in check
$ExcelObj.Workbooks.close()
# Move original file to $processed_Path
move-item $file.fullname -Destination $processed_Path -Force
}
else {
# Close Document
$ExcelObj.Workbooks.Close()
}
}
}
$count--
# Next File
}
# Close Document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
Write-host "`nProcessing Complete!" -f "Green"
Write-host "`nFiles w/o a matching password can be found in the Encrypted folder."
Write-host "`nTime Started : " $a.ToShortTimeString()
Write-host "Time Completed : " $(Get-Date).ToShortTimeString()
Write-host "`nTotal Duration : "
NEW-TIMESPAN –Start $a –End $(Get-Date)
# Remove any stale Excel processes created by this script's execution
Get-Process excel -ErrorAction SilentlyContinue | Where-Object{$currentExcelProcessIDs -notcontains $_.id} | Stop-Process
If nothing else I do see one glaring performance issue that should be easy to address. You are opening a new excel instance for testing each individual password for each document. 40 workbooks with 50 passwords mean you have opened 2000 Excel instances one at a time.
You should be able to keep using the same one without a functionality hit. Get this code out of your inner most loop
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
as well as the snippet that would close the process. It would need to be out of the loop as well.
$ExcelObj.Close()
$ExcelObj.Application.Quit()
If that does not help enough you would have to consider doing some sort of parallel processing with jobs etc. I have a basic solution in a CodeReview.SE answer of mine doing something similar.
Basically what it does is run several excels at once where each one works on a chunk of documents which runs faster than one Excel doing them all. Just like I do in the linked answer I caution the automation of Excel COM with PowerShell. COM objects don't always get released properly and locks can be left on files or processes.
You are looping for all 50 passwords regardless of success or not. That means you could find the right password on the first go but you are still going to try the other 49! Set a flag in the loop to break that inner loop when that happens.
As far as the password logic goes you say that
At this point I can remove the passwords faster manually since the script takes ~40 minutes
Why can you do it faster? What do you know that the script does not. I don't see you being able to out perform the script but doing exactly what it does.
With what I see another suggestion would be to keep/track successful passwords and associated file name. So that way when it gets processed again you would know the first password to try.
This solution uses the modules ImportExcel for easier working with Excel files, and PoshRSJob for multithreaded processing.
If you do not have these, install them by running:
Install-Module ImportExcel -scope CurrentUser
Install-Module PoshRSJob -scope CurrentUser
I've raised an issue on the ImportExcel module GitHub page where I've proposed a solution to open encrypted Excel files. The author may propose a better solution (and consider the impact across other functions in the module, but this works for me). For now, you'll need to make a modification to the Import-Excel function yourself:
Open: C:\Username\Documents\WindowsPowerShell\Modules\ImportExcel\2.4.0\ImportExcel.psm1 and scroll to the Import-Excel function. Replace:
[switch]$DataOnly
With
[switch]$DataOnly,
[String]$Password
Then replace the following line:
$xl = New-Object -TypeName OfficeOpenXml.ExcelPackage -ArgumentList $stream
With the code suggested here. This will let you call the Import-Excel function with a -Password parameter.
Next we need our function to repeatedly try and open a singular Excel file using a known set of passwords. Open a PowerShell window and paste in the following function (note: this function has a default output path defined, and also outputs passwords in the verbose stream - make sure no-one is looking over your shoulder or just remove that if you'd prefer):
function Remove-ExcelEncryption
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)]
[String]
$File,
[Parameter(Mandatory=$false)]
[String]
$OutputPath = 'C:\PoShTest\Decrypted',
[Parameter(Mandatory=$true)]
[Array]
$PasswordArray
)
$filename = Split-Path -Path $file -Leaf
foreach($Password in $PasswordArray)
{
Write-Verbose "Attempting to open $file with password: $Password"
try
{
$ExcelData = Import-Excel -path $file -Password $Password -ErrorAction Stop
Write-Verbose "Successfully opened file."
}
catch
{
Write-Verbose "Failed with error $($Error[0].Exception.Message)"
continue
}
try
{
$null = $ExcelData | Export-Excel -Path $OutputPath\$filename
return "Success"
}
catch
{
Write-Warning "Could not save to $OutputPath\$filename"
}
}
}
Finally, we can run code to do the work:
$Start = get-date
$PasswordArray = #('dj7F9vsm','kDZq737b','wrzCgTWk','DqP2KtZ4')
$files = Get-ChildItem -Path 'C:\PoShTest\Encrypted'
$files | Start-RSJob -Name {$_.Name} -ScriptBlock {
Remove-ExcelEncryption -File $_.Fullname -PasswordArray $Using:PasswordArray -Verbose
} -FunctionsToLoad Remove-ExcelEncryption -ModulesToImport Import-Excel | Wait-RSJob | Receive-RSJob
$end = Get-Date
New-TimeSpan -Start $Start -End $end
For me, if the correct password is first in the list it runs in 13 seconds against 128 Excel files. If I call the function in a standard foreach loop, it takes 27 seconds.
To view which files were successfully converted we can inspect the output property on the RSJob objects (this is the output of the Remove-ExcelEncryption function where I've told it to return "Success"):
Get-RSJob | Select-Object -Property Name,Output
Hope that helps.

Resources