Powershell multithreading - multithreading

I have a Powershell script that converts Office documents to PDF. I would like to multithread it, but cannot figure out how based on other examples I have seen. The main script (OfficeToPDF.ps1) scans through a list of files and calls separate scripts for each file type/office application (ex. for .doc files WordToPDF.ps1 is called to convert). The main script passes 1 file name at a time to the child script ( I did this for a couple of reasons).
Here is an example of the main script:
$documents_path = "C:\Documents\Test_Docs"
$pdf_out_path = "C:\Documents\Converted_PDFs"
$failed_path = "C:\Documents\Failed_to_Convert"
# Sets the root directory of this script
$PSScriptRoot = Split-Path -parent $MyInvocation.MyCommand.Definition
$date = Get-Date -Format "MM_dd_yyyy"
$Logfile = "$PSScriptRoot\logs\OfficeToTiff_$Date.log"
$word2PDF = "$PSScriptRoot\WordToPDF.ps1"
$arguments = "'$documents_path'", "'$pdf_out_path'", "'$Logfile'"
# Function to write to log file
Function LogWrite
{
Param ([string]$logstring)
$time = Get-Date -Format "hh:mm:ss:fff"
Add-content $Logfile -value "$date $time $logstring"
}
################################################################################
# Word to PDF #
################################################################################
LogWrite "*** BEGIN CONVERSION FROM DOC, DOCX, RTF, TXT, HTM, HTML TO PDF ***"
Get-ChildItem -Path $documents_path\* -Include *.docx, *.doc, *.rtf, *.txt, *.htm? -recurse | ForEach-Object {
$original_document = "$($_.FullName)"
# Verifies that a document exists before calling the convert script
If ($original_document -ne $null)
{
Invoke-Expression "$word2PDF $arguments"
#checks to see if document was successfully converted and deleted. If not, doc is moved to another directory
If(Test-Path -path $original_document)
{
Move-Item $original_document $failed_path
}
}
}
$original_document = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
Here is the script (WordToPDF.ps1) that is called by the main script:
Param($documents, $pdf_out_path, $Logfile)
# Function to write to the log file
Function LogWrite
{
Param ([string]$logstring)
$time = Get-Date -Format "hh:mm:ss:fff"
Add-content $Logfile -value "$date $time $logstring"
}
$word_app = New-Object -ComObject Word.Application
$document = $word_app.Documents.Open($_.FullName)
$original_document = "$($_.FullName)"
# Creates the output file name with path
$pdf_document = "$($pdf_out_path)\$($_.BaseName).pdf"
LogWrite "Converting: $original_document to $pdf_document"
$document.SaveAs([ref] $pdf_document, [ref] 17)
$document.Close()
# Deletes the original document after it has been converted
Remove-Item $original_document
LogWrite "Deleting: $original_document"
$word_app.Quit()
Any suggestions would be appreciated.
Thanks.

I was just going to comment and link you to this question: Can PowerShell run commands in Parallel. I then noted the date of that question and the answers, and with PowerShell v3.0 there are some new features that might work better for you.
The question goes over use of the PowerShell jobs. Which can work but require you to keep up with the job status, so can add a bit extra coding to manage.
PowerShell v3 opened up the door a bit more with workflow which is based on Windows Workflow Foundation. A good article on the basics of how this new command works can be found on Script Guy's blog here. You can basically adjust your code to run your conversion via workflow and it will perform this in parallel:
workflow foreachfile {
foreach -parallel ($f in $files) {
#Put your code here that does the work
}
}
Which from what I can find the thread limit this has is 5 threads at a time. I am not sure how accurate that is but blog post here noted the limitation. However, being that the Application com objects for Word and Excel can be very CPU intensive doing 5 threads at a time would probably work well.

I have a multithreaded powershell environment for indicator of compromise scanning on all AD devices- threaded 625 times with Gearman. http://gearman.org
It is open source and allows for an option to go cross platform. It threads with a server worker flow and runs via Python. Extremely recommended by yours truly- someone that has abused threading in powershell. This isn't so much an answer but something that I had never heard of but love and use daily. Pass it forward. Open source for the win :)
I have also used psjobs before and they are great until a certain point of magnitude. Maybe it is my lack of .net expertise but ps has some querky subtle memory nuances that in a large scale can create some nasty effects.

Related

Need to parse thousands of files for thousands of results - prefer powershell

I am getting consistently pinged from our government contract holder to search for IP addresses in our logs. I have three firewalls, 30 plus servers, etc so you can imagine how unwieldy it becomes. To amplify the problem, I have been provided a list of over 1500 IP addresses for which I am to search all log files...
I have all of the logs downloaded and can use powershell to go through them one by one but it takes forever. I need to be able to run the search using multi-thread in Powershell but cannot figure out the logic to do so. Here's my one by one script...
Any help would be appreciated!
$log = (import-csv C:\temp\FWLogs\IPSearch.csv)
$ip = ($log.IP)
ForEach($log in $log){ Get-ChildItem -Recurse -path C:\temp\FWLogs -filter *.log | Select-String $ip -List | Select Path
}

PowerShell: update O365 AD bulk attributes through csv file

We are trying to bulk update our Azure Active Directory. We have a excel csv list of UserPrincipalNames that we will update the Title, Department, and Office attributes
# Get List of Clinical CMs
$PATH = "C:\Users\cs\Documents\IT Stuff\Project\Azure AD Update\AD-Update-ClinicalCMs-Test.csv"
$CMs = Import-csv $PATH
# Pass CMs into Function
ForEach ($UPN in $CMs) {
# Do AD Update Task Here
Set-Msoluser -UserPrincipalName $UPN -Title "Case Manager" -Department "Clinical" -Office "Virtual"
}
The CSV:
User.1#domain.com
User.2#domain.com
User.3#domain.com
The Set-MsolUser command will work on its own, but it is not working as intended in this For loop. Any help or insight is greatly appreciated
As Jim Xu commented, here my comment as answer.
The input file you show us is not a CSV file, instead, it is a list of UPN values all on a separate line.
To read these values as string array, the easiest thing to is to use Get-Content:
$PATH = "C:\Users\cs\Documents\IT Stuff\Project\Azure AD Update\AD-Update-ClinicalCMs-Test.csv"
$CMs = Get-Content -Path $PATH
Of course, although massive overkill, it can be done using the Import-Csv cmdlet:
$CMs = (Import-Csv -Path $PATH -Header upn).upn

What is the best practice to capture an open Excel document as a COM Object (not creating a new one) on Windows using PowerShell

I try to automate my workflow on my day-to-day job. Since Excel is frequently used. Using PowerShell for Excel automation is crucial for me.
Most of the time I do have my Excel windows open. Thus I use below code block to capture these windows for further automation:
$FullPath = $FilePath + "\" + $WorkBookName
$excel = [Runtime.InteropServices.Marshal]::GetActiveObject("Excel.Application")
#capture already active object
if (($excel.Workbooks | ? {$_.Fullname}).FullName -eq $FullPath) {
$wb = $excel.workbooks | ?{$_.FullName -eq $FullPath}
} else {
# if no document is available open from filepath
$excel, $wb = Open-NewCliExcel -TypeOrPath $FullPath
}
I'm pretty sure that there must be more stable and reliable practices in terms of scalability. My plan is to build an intra-company library to start the automation of other people's processes.

How can I stop Excel processes from running in the background after a PowerShell script?

No matter what I try, Excel 2013 continues to run in the background on Windows 10 no matter what commands I throw at the end of my PowerShell script. I've tried adding all suggestions I've found to the end of my script and the only Excel object I open continues to remain open. Here is what I have at the end of my script. Any other suggestions?
## Quit Excel and Terminate Excel Application process:
$xlsxwb.quit
$xlsxobj.Quit
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsxobj)
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsxwb)
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsxSh1)
Start-Sleep 1
'Excel processes: {0}' -f #(Get-Process excel -ea 0).Count
I ran into the same problem and tried various solutions without success. I got closer when I started releasing all of the COM objects I saved as variables, not just the ones for the workbook, worksheet, and Excel application.
For example, take the following example code:
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $False
$Workbook = $Excel.Workbooks.Open("C:\Temp\test.xlsx")
$Worksheet = $Workbook.Worksheets.Item(1)
$UsedRange = $Worksheet.UsedRange
$Range = $Worksheet.Range("A1:B10")
$Workbook.Close()
$Excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($Range)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($UsedRange)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($Worksheet)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($Workbook)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel)
[GC]::Collect()
If you were to take out just one of the ReleaseComObject statements, the Excel process would remain open. In my code I release all the ones like ranges, tables, etc. first and then I do the worksheet, workbook, and finally the Excel application itself. Then because that only seemed to work like 90% of the time, I added the garbage collection command at the end and finally had a solution that seems to work every time without having to kill the process.
Note: My system is Windows 8.1 with PowerShell v5 and Office 2013.
Here's a simple example below. It will likely require some additional code for more complex procedures.
function _FullQuit {
while ( $this.Workbooks.Count -gt 0 ) {
$this.Workbooks.Item(1).Close()
}
$this.Quit()
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($this)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
function New-Excel {
$object = New-Object -ComObject "Excel.Application"
$object | Add-Member -MemberType ScriptMethod -Name FullQuit -Value {_FullQuit}
$object
}
$xl = New-Excel
$wb1 = $xl.Workbooks.Open("C:\Data1.csv")
$wb2 = $xl.Workbooks.Open("C:\Data2.csv")
$xl.FullQuit()
Create Excel Application.
Make it visible
Get Process Id of the application.
Hide Excel Application.
Stop process my process id.
Sample Code
# Create Excel Application
$excel = New-Object -comobject Excel.Application
# Make it visiable
$excel.Visible = $true
# Get Windows handle of the application
$excelWinHwnd = $excel.Hwnd
# Get Process Id of the application
$process = Get-Process Excel | Where-Object {$_.MainWindowHandle -eq $excelWinHwnd}
$excelProcessId = $process.Id
# Hide the application : Run In background
$excel.Visible = $false
# Kill/Stop the process by id
Stop-Process -Id $excelProcessId
The above solutions did not work for me, in the sequence I needed the final step was .saveas(file.xlsx) which meant that the remaining unsaved document still popped a gui interface requiring user interaction to save/don't save/cancel.
I ended up with the following, which is admittedly rough, but worked for me.
At the beginning of the script:
$existingExcel = #()
Get-Process Excel | % {$existingExcel += $_.ID }
function Stop-Excel
{
Get-process EXCEL | % {IF($_.ID -notmatch $existingExcel){Stop-Process -ID $_.ID}}
}
and at the end of the script
Stop-Excel
This has the advantage of completely destroying any lingering excel processes without terminating any other live excel processes that may be in use by the users that run have to this script.
The disadvantages are that when you next load excel you are presented with a crashed excel document recovery dialogue.
Alternative answer:
I understand that my reply is late, still, consider following approach to get this done.
At the beginning, get the PIDs of any/all instances of excel.exe, using command:
tasklist /FI "imagename eq excel.exe"
Run the part of script that generates its instance of excel.exe. Use the command in step 1 to identify and save PID of newly generated instance of excel.exe (say wxyz), this will be different from already existing PIDs saved in step 1.
At the end of script, close the specific instance (with PID saved in step 2), using command:
TASKKILL /f /PID wxyz
where wxyz is 4 digit PID saved in step 2.
If nothing else works reliably, try to make the Excel application's DisplayAlerts to tru just before quitting.
$xlapp.DisplayAlerts=$true
$xlapp.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject([System.__ComObject]$xlapp)
$xlapp=$null
remove-variable xlapp
[GC]::Collect()
This always works for me.

powershell: if excel already running, get instance, otherwise start it - with exception handling

Using Powershell, I want to import some tab-separated Ascii files into MS Excel. I use a loop for doing so, and right now I have a simple solution that works:
for each file: start Excel , import tsv file, close Excel.
..assuming Excel is in the Path, it's the right version of Excel, Excel 2010
Now I want to switch to a more efficient version: keep excel open.
for each file: grab running instance of excel if there is one, if not, try to start excel. Process file. Keep excel open. At the end, keep it open ( I want to look at the excel files while the script is running, which could take a while. Annoyingly, in the current version of the script excel is being closed while I am looking at the output).
I haven't found a comprehensive solution for this, neither here nor elsewhere on the internet. With "comprehensive" I mean "Exception Handling". In Powershell, it's a bit confusing. There are two ways of dealing with exceptions: using trap and a try-catch block.
Here is my code, thrown together from several internet sources , how can I improve it?
I want to wrap it in a function, but COM objects as return values are problematic. (
What I want is a combination of "simple factory" and "singleton".)
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.Office.Interop.Excel")
try {
$excelApp = [System.Runtime.InteropServices.Marshal]::GetActiveObject("Excel.Application")
} catch [System.Runtime.InteropServices.COMException], [System.Management.Automation.RuntimeException]{
write-host
write-host $("TRAPPED: " + $_.Exception.GetType().FullName);
write-host $("TRAPPED: " + $_.Exception.Message);
write-host "Excel is not running, trying to start it";
$excelApp = New-Object -ComObject "Excel.Application"
if (-not $excelApp){
# excel not installed, not in path
Write-Error "Excel not running, and cannot be started, exiting."
# Todo: test if excel version is correct, e.g. english Excel 2007 or 2010., if not set outfile extension xls.
exit;
}
}
catch [System.Exception]{
write-host $("EXCEPTION: " + $_.Exception.GetType().FullName);
write-host $("EXCEPTION: " + $_.Exception.Message);c
Write-Error 'Something went wrong during creation of "Excel.Application" object, => Exit.'
exit;
}
It's been a while since you asked this, but I have run into the same problem. Your code appears to be only good example of a reasonable solution. I do question your concern, however. Specifically, once Excel is running is there a need to call this routine again in a function? That is, once Excel is open, then you can treat it as a service to open/close workbooks, access worksheets within them, and eventually $excelApp.Quit() the application (although in Powershell v1.0 the .Quit() won't exit the app... but that is OK since I am grabbing a running instance anyways).
The link below discusses a way of starting up and grabbing the PID of Excel instance in order to explicitly kill it if needed.
Discussion on Quiting/Killing Excel from within Powershell
All variables must be null. Excel.Workbooks - all open book.
Check count of Workbooks. If count = 0, quit. After close of Powershell Excel will be close too.
$Workbook.Close()
$WorkSheet = $null;
$Workbook = $null;
If($excelApp.Workbooks.Count -eq 0)
{$excelApp.Quit()}
$excelApp = $null;

Resources