I have this PowerShell code that loops through Excel files in a specified directory; references a list of known passwords to find the correct one; and then opens, decrypts, and saves that file to a new directory.
But it's not executing as quickly as I'd like (it's part of a larger ETL process and it's a bottleneck). At this point I can remove the passwords faster manually as the script takes ~40 minutes to decrypt 40 workbooks while referencing a list of ~50 passwords.
Is there a cmdlet or function (or something) that's missing which would speed this up, an overlooked flaw in the processing, or is PowerShell, perhaps, just not the right tool for this job?
Original Code (updated code can be found below):
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted\"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$original_Path = "C:\PoShTest\Originals\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "Processing" $file.name -f "DarkYellow"
write-host "Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -eq ".xlsx") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct - Save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0) {
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Close document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
# Move original file to $original_Path
move-item $file.fullname -Destination $original_Path -Force
}
else {
# Close document and Application
write-host "PASSWORD NOT FOUND: " $file.name -f "Magenta"
$ExcelObj.Close()
$ExcelObj.Application.Quit()
}
}
}
$count--
# Next File
}
Write-host "`n Processing Complete" -f "Green"
Updated code:
# Get Current EXCEL Process ID's so they are not affected but the scripts cleanup
# SilentlyContinue in case there are no active Excels
$currentExcelProcessIDs = (Get-Process excel -ErrorAction SilentlyContinue).Id
$a = Get-Date
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$processed_Path = "C:\PoShTest\Processed\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "`n Processing" $file.name -f "DarkYellow"
write-host "`n Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -like "*.xls*") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct, remove $passwd from array and save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0)
{
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Added to keep Excel process memory utilization in check
$ExcelObj.Workbooks.close()
# Move original file to $processed_Path
move-item $file.fullname -Destination $processed_Path -Force
}
else {
# Close Document
$ExcelObj.Workbooks.Close()
}
}
}
$count--
# Next File
}
# Close Document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
Write-host "`nProcessing Complete!" -f "Green"
Write-host "`nFiles w/o a matching password can be found in the Encrypted folder."
Write-host "`nTime Started : " $a.ToShortTimeString()
Write-host "Time Completed : " $(Get-Date).ToShortTimeString()
Write-host "`nTotal Duration : "
NEW-TIMESPAN –Start $a –End $(Get-Date)
# Remove any stale Excel processes created by this script's execution
Get-Process excel -ErrorAction SilentlyContinue | Where-Object{$currentExcelProcessIDs -notcontains $_.id} | Stop-Process
If nothing else I do see one glaring performance issue that should be easy to address. You are opening a new excel instance for testing each individual password for each document. 40 workbooks with 50 passwords mean you have opened 2000 Excel instances one at a time.
You should be able to keep using the same one without a functionality hit. Get this code out of your inner most loop
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
as well as the snippet that would close the process. It would need to be out of the loop as well.
$ExcelObj.Close()
$ExcelObj.Application.Quit()
If that does not help enough you would have to consider doing some sort of parallel processing with jobs etc. I have a basic solution in a CodeReview.SE answer of mine doing something similar.
Basically what it does is run several excels at once where each one works on a chunk of documents which runs faster than one Excel doing them all. Just like I do in the linked answer I caution the automation of Excel COM with PowerShell. COM objects don't always get released properly and locks can be left on files or processes.
You are looping for all 50 passwords regardless of success or not. That means you could find the right password on the first go but you are still going to try the other 49! Set a flag in the loop to break that inner loop when that happens.
As far as the password logic goes you say that
At this point I can remove the passwords faster manually since the script takes ~40 minutes
Why can you do it faster? What do you know that the script does not. I don't see you being able to out perform the script but doing exactly what it does.
With what I see another suggestion would be to keep/track successful passwords and associated file name. So that way when it gets processed again you would know the first password to try.
This solution uses the modules ImportExcel for easier working with Excel files, and PoshRSJob for multithreaded processing.
If you do not have these, install them by running:
Install-Module ImportExcel -scope CurrentUser
Install-Module PoshRSJob -scope CurrentUser
I've raised an issue on the ImportExcel module GitHub page where I've proposed a solution to open encrypted Excel files. The author may propose a better solution (and consider the impact across other functions in the module, but this works for me). For now, you'll need to make a modification to the Import-Excel function yourself:
Open: C:\Username\Documents\WindowsPowerShell\Modules\ImportExcel\2.4.0\ImportExcel.psm1 and scroll to the Import-Excel function. Replace:
[switch]$DataOnly
With
[switch]$DataOnly,
[String]$Password
Then replace the following line:
$xl = New-Object -TypeName OfficeOpenXml.ExcelPackage -ArgumentList $stream
With the code suggested here. This will let you call the Import-Excel function with a -Password parameter.
Next we need our function to repeatedly try and open a singular Excel file using a known set of passwords. Open a PowerShell window and paste in the following function (note: this function has a default output path defined, and also outputs passwords in the verbose stream - make sure no-one is looking over your shoulder or just remove that if you'd prefer):
function Remove-ExcelEncryption
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)]
[String]
$File,
[Parameter(Mandatory=$false)]
[String]
$OutputPath = 'C:\PoShTest\Decrypted',
[Parameter(Mandatory=$true)]
[Array]
$PasswordArray
)
$filename = Split-Path -Path $file -Leaf
foreach($Password in $PasswordArray)
{
Write-Verbose "Attempting to open $file with password: $Password"
try
{
$ExcelData = Import-Excel -path $file -Password $Password -ErrorAction Stop
Write-Verbose "Successfully opened file."
}
catch
{
Write-Verbose "Failed with error $($Error[0].Exception.Message)"
continue
}
try
{
$null = $ExcelData | Export-Excel -Path $OutputPath\$filename
return "Success"
}
catch
{
Write-Warning "Could not save to $OutputPath\$filename"
}
}
}
Finally, we can run code to do the work:
$Start = get-date
$PasswordArray = #('dj7F9vsm','kDZq737b','wrzCgTWk','DqP2KtZ4')
$files = Get-ChildItem -Path 'C:\PoShTest\Encrypted'
$files | Start-RSJob -Name {$_.Name} -ScriptBlock {
Remove-ExcelEncryption -File $_.Fullname -PasswordArray $Using:PasswordArray -Verbose
} -FunctionsToLoad Remove-ExcelEncryption -ModulesToImport Import-Excel | Wait-RSJob | Receive-RSJob
$end = Get-Date
New-TimeSpan -Start $Start -End $end
For me, if the correct password is first in the list it runs in 13 seconds against 128 Excel files. If I call the function in a standard foreach loop, it takes 27 seconds.
To view which files were successfully converted we can inspect the output property on the RSJob objects (this is the output of the Remove-ExcelEncryption function where I've told it to return "Success"):
Get-RSJob | Select-Object -Property Name,Output
Hope that helps.
Related
I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}
Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.
I have a couple hundred .xlsb files that need their connection string and command text changed in an easily programmable way. They are all buried in different folders deep in the file system. How can I use Powershell or some other program to go through and edit them all so I don't have to do it manually?
I've started looking into Powershell and Format-Hex. I figured I could ask and someone else may be able to set me on the right track. What needs to be done is recursively searching the filesystem from a certain point, detect if "this string" and this number "11111" are in the connection string and command text (respectively) of all xlsb files, and if they are replace them with "that string" and this number "22222". All in xlsb files. I've also looked into using python, but the libraries I found did not mention editing this setting, so I figured some sort of hex detection and replacement would be easier.
Would it be possible to have more info on what is a "connection string" ? To my knowledge this is not part of the properties of an xlsb file.
I suppose it to be the string which is used to create an ODBC Connection so the text you want to modify will be within the code of a macro.
So three issues:
Recursively find all xlsb files within a folder
$Fllt = gci "*.xlsb" -r
Open them in Excel
$Excl = New-Object -ComObject Excel.Application
$Fllt | %{$xl.Workbooks.Open($_.Fullname)}
Replace "this string" by "that string" and "11111" by "22222" in every macro. This is much more difficult.
My suggestion:
#Generation of a test file
$Excl = New-Object -ComObject Excel.Application
$xlve = $Excl.Version
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name AccessVBOM -Value 1 -Force | Out-Null
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name VBAWarnings -Value 1 -Force | Out-Null
#'
Sub Co()
ConnectionString = "this string"
CommandText = "11111"
End Sub
'# | Out-File C:\Temp\Test.txt -Encoding ascii
$Wkbk = $Excl.Workbooks.Add()
$Wkbk.VBProject.VBComponents.Import("C:\Temp\Test.txt") | Out-Null
$Wkbk.SaveAs("C:\Temp\Test.xlsb", 50)
$Excl.Quit()
#Get The files
$Fllt = gci -Path C:\Temp\ -Include *.xlsb -r
#Open Excel and set the security parameters to be able to modify macros
$Excl = New-Object -ComObject Excel.Application
$xlve = $Excl.Version
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name AccessVBOM -Value 1 -Force | Out-Null
New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\$xlve\Excel\Security" `
-Name VBAWarnings -Value 1 -Force | Out-Null
#Loop through the files and modify the macros
$path = "C:\Temp\ModuleVBATemp.txt" #Temp text file to copy and modify the macros
foreach ($File in $Fllt) {
$Wkbk = $Excl.Workbooks.Open($File.Fullname)
if ($Wkbk.HasVBProject) <# Test if any macro #> {
foreach ($Vbco in $Wkbk.VBProject.VBComponents) {
if ($Vbco.Type -eq '1') <# Only modify the modules #> {
#Modification of the script
$Vbco.Export($path) | Out-Null
(gc $path) -replace "this string","that string" -replace "11111","22222" `
| Out-File $path -Encoding ascii
$Wkbk.VBProject.VBComponents.Remove($Vbco)
$Wkbk.VBProject.VBComponents.Import($path) | Out-Null
}}}
$Wkbk.Close($true) #Save the file
}
$Excl.Quit()
It is working on my test file, I hope that your configuration is similar.
I'm writing something in powershell to watch file 'A' for changes and open file 'B' when it does. The only problem is the excel file (B) has vba code to run on open that has to copy over the data on file 'A.' From my research it seems like I have to open file 'A' to do this since it has to be a .xlsx and opening it starts a continuous loop.
I've tried the sleep command, but it seems like it's still watching the file during or before the sleep, and then just opens the file again once it has waited the amount of time I tell it.
How do I make the watcher stop watching for just a minute or so?
Here is the code I'm working with currently:
Function Register-Watcher {
param ($folder)
$filter = "*.xlsx"
$folder = "\\powershell\watcher\test\folder"
$watcher = New-Object IO.FileSystemWatcher $folder, $filter -Property #{
IncludeSubdirectories = $false
EnableRaisingEvents = $true
}
$changeAction = [scriptblock]::Create('
$path = $Event.SourceEventArgs.FullPath
$name = $Event.SourceEventArgs.Name
$changeType = $Event.SourceEventArgs.ChangeType
$timeStamp = $Event.TimeGenerated
Write-Host "The file $name was $changeType at $timeStamp"
$Excel = New-Object -ComObject Excel.Application
$Excel.Workbooks.Open("\\powershell\watcher\test\folder\fileB.xlsm")
sleep 60
')
Register-ObjectEvent $Watcher "Changed" -Action $changeAction
}
Register-Watcher "\\powershell\watcher\test\folder\fileA.xlsx"
$Change
You can stop it from raiseing events using
$watcher.EnableRaisingEvents = $false
You can read more about it here from Microsoft
https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher.enableraisingevents?view=netframework-4.7.2#System_IO_FileSystemWatcher_EnableRaisingEvents
I'm new to powershell and I need some help here. Below is a script I wrote to locate an excel file in folder. The files in the excel sheet would be compared to the contents of another folder on the same machine. Locations are : "C:\MKS_DEV\" and The resultant matched files would be zipped and put in another location as shown in the scripts. These scripts would be used by other people on different machines so the locations of both folders could differ on different machines.
I want to write an argument or using parameters for the location of both folders so that I wouldn't have to specify the location all the time I have to run the scripts and cant figure out how to implement this.
The scripts works perfectly but I just need to incorporate arguments/parameters into it. Any help would be very much appreciated.
Thanks.
Here is the code:
# Creating an object for the Excel COM addin
$ExcelObject = New-Object -ComObject Excel.Application
# Opening the Workbook
$ExcelWorkbook = $ExcelObject.Workbooks.Open("C:\Eric_Source\Test.xls")
# Opening the Worksheet by using the index (1 for the first worksheet)
$ExcelWorksheet = $ExcelWorkbook.Worksheets.Item(1)
# The folder where the files will be copied/The folder which will be zipped
# later
$a = Get-Date
$targetfolder = "C:\"+$a.Day+$a.Month+$a.Year+$a.Hour+$a.Minute+$a.Second
# Check if the folder already exists. Command Test-Path $targetfolder returns
# true or false.
if(Test-Path $targetfolder)
{
# delete the folder if it already exists. The following command deletes a
# particular directory
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
# The following command is used to create a particular directory
New-Item -ItemType directory -Path $targetfolder
# Declaration of variables, COlumn value = 6 for Column F
$row = 1
$col = 6
# Read a value from the worksheet with the following command
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
# change the folder value below to specify the folder where the powershell
# needs to search for the filename that it reads from excel file.
$folder = "C:\MKS_DEV\"
$null = ""
You have multiple ways to parameter your script.
The first one is to us $args[n] [automatic variable]1.
If your script is called MyScript.PS1 you can call it with :
MyScript.PS1 "C:\Eric_Source\Test.xls"
Then inside your script use $args[0] for the first argument.
Another way is to use the reserved word Param at the begining of your script:
Param ($MyParam1, $MyParam2)
When you call your script $MyParam1 will contain the first param and so on.
You could create it as a function and load it.
Function Folder-Deletion ($ExcelWorkbook,$targetfolder) {
$ExcelObject = New-Object -ComObject Excel.Application
$ExcelOpen = $ExcelObject.Workbooks.Open($ExcelWorkbook)
$ExcelWorksheet = $ExcelOpen.Worksheets.Item(1)
$a = Get-Date
if(Test-Path $targetfolder)
{
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
New-Item -ItemType directory -Path $targetfolder
$row = 1
$col = 6
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
}
Then run the function against the spreadsheet and folder like so:
Folder-Deletion C:\Eric_Source\Test.xls C:\MKS_DEV
Or you could create a PowerShell script file (e.g. FolderDeletion.ps1) with the following contents:
param($ExcelWorkbook,$targetfolder)
$ExcelObject = New-Object -ComObject Excel.Application
$ExcelOpen = $ExcelObject.Workbooks.Open($ExcelWorkbook)
$ExcelWorksheet = $ExcelOpen.Worksheets.Item(1)
$a = Get-Date
if(Test-Path $targetfolder)
{
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
New-Item -ItemType directory -Path $targetfolder
$row = 1
$col = 6
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
Then run the script against the spreadsheet and folder like so:
FolderDeletion.ps1 C:\Eric_Source\Test.xls C:\MKS_DEV
I'm new to powershell so I don't know where to start. I want a script that searches in all (pdf, word, excell, powerpoint, ...) file content for a specific string combination.
I tried this script but it doesn't work:
function WordSearch ($sample, $staining, $sampleID, $patientID, $folder)
{
$objConnection = New-Object -com ADODB.Connection
$objRecordSet = New-Object -com ADODB.Recordset
$objConnection.Open(“Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;”)
$objRecordSet.Open(“SELECT System.ItemPathDisplay FROM SYSTEMINDEX WHERE ((Contains(Contents,’$sample’)) or (Contains(Contents,’$sampleID’) and Contains(Contents,’$staining’)) or (Contains(Contents,’$staining’) and Contains(Contents,’$patientID’))) AND System.ItemPathDisplay LIKE ‘$folder\%’”, $objConnection)
if ($objRecordSet.EOF -eq $false) {$objRecordSet.MoveFirst() }
while ($objRecordset.EOF -ne $true) {
$objRecordset.Fields.Item(“System.ItemPathDisplay”).Value
$objRecordset.MoveNext()
}
}
Can someone help me?
You should try this, but first make sure your in the folder you want to start searching down: (if your trying to search your whole computer, start in C:\ , but I imagine the script will take a decent amount of time to run.
$Paths = #()
$Paths = gci . *.* -rec | where { ! $_.PSIsContainer } |? {($_.Extension -eq ".doc") -or ($_.Extension -eq ".ppt") -or ($_.Extension -eq ".pdf") -or ($_.Extension -eq ".xls")} | resolve-path
This will retrieve all the file paths of those file types. If you have Microsoft office 2007 or above you may want to add searches for ".xlsx" or ".docx" or ".pptx"
Then you can begin looking through those files for your "specific string combination
array = #()
foreach($path in $Paths)
{$array += Select-String -Path $Path -Pattern "Search String"}
This will give you all the lines and paths that that string exists on in those files. The actual line output you get may be a little distorted though due to microsoft encrypting their files. Use $array | get-member -MemberType Property to find what items you can index to and the Select-object commandlet to pull those items out.