Powershell concurrency with Start-ThreadJob and ForEach-Object -Parallel - multithreading

I've been trying to implement a producer-consumer pattern with multiple producers using BlockingCollection<>, Start-ThreadJob and ForEach-Object -Parallel. The results were mixed. Some code runs, some freezes and some just crashes powershell. So I'm thinking, I must be doing something fundamentally wrong:
using namespace System.Collections.Concurrent
class TestProducerConsumer
{
[int] $result = 0
[BlockingCollection[int]] $queue =
[BlockingCollection[int]]::new()
[void] producer([int]$i) { $this.queue.Add($i) }
[void] consumer() {
$sum = 0
$it = $this.queue.GetConsumingEnumerable()
foreach( $i in $it ) { $sum += $i }
$this.result = $sum
}
[void] Run() {
$job = Start-ThreadJob { ($using:this).consumer() }
1..10 | ForEach-Object -Parallel {
#($using:this).producer($_) # freezing
($using:this).queue.Add($_) # working
}
#Start-Sleep -Seconds 1 # freezing
$this.queue.CompleteAdding()
$job | Receive-Job -Wait
}
}
$t = [TestProducerConsumer]::new(); $t.Run(); $t
In the simplified test case above, there are two lines doing the same thing: One is getting the queue member from the instance and adding the value directly; the other is calling a method on the instance to add the value to the queue. The former works, the latter freezes!?
Also, adding back in the line with Start-Sleep freezes the process.
Tested on Windows 10 with various PowerShell 7.* versions.
EDIT: Probably related to ForEach-Object -Parallel situationally drops pipeline input and similar issues

Related

Register trackingevent for all background jobs?

Good afternoon,
I've been working with trying to register an event based on when all jobs are completed. Im able to successfully register one, but id like to get a message pop-up once all background jobs are completed. Anyone familiar with how to do so?
I attempted the following, but it errors out saying jobs is null:
1..10 | ForEach-Object -Process {
Start-Job { Start-Sleep $_ } -Name "$_" | Out-Null} -OutVariable $jobs
Register-ObjectEvent $jobs StateChanged -Action {
[System.Windows.MessageBox]::Show('Done')
$eventSubscriber | Unregister-Event
$eventSubscriber.Action | Remove-Job
} | Out-Null
I feel like a Do{}Until() loop can do it but, im not sure how to register that to check until the job has completed. Also tried to follow along with some ways other people have done it using different languages, but, I cant pick it up.
I don't want to post everything ive tried so this post doesn't bore anyone. Searched on google as well but, I couldn't find much on registering an object for multiple jobs.
EDIT
Heres what does work:
$job = Start-Job -Name GetLogFiles { Start-Sleep 10 }
Register-ObjectEvent $job StateChanged -Action {
[System.Windows.MessageBox]::Show('Done')
$eventSubscriber | Unregister-Event
$eventSubscriber.Action | Remove-Job
} | Out-Null
Which is what id like to happened, but to evaluate all jobs, not just one.
This is what a personally use when monitoring running jobs:
$jobs= 1..10 | ForEach-Object -Process {
Start-Job { Start-Sleep $using:_ ; "job {0} done" -f $using:_ } -Name "$_"
}
do{
$i = (Get-Job -State Completed).count
$progress = #{
Activity = 'Jobs completed'
Status = "$i of {0}" -f $jobs.Count
PercentComplete = $i / $jobs.count * 100
}
Write-Progress #progress
Start-Sleep -Milliseconds 10
}
until($i -eq $jobs.Count)
$result = Get-Job | Receive-Job
$jobs | Remove-Job
Of course, under certain scenarios where I know some jobs might fail I change the until(...) condition for something different and the do {...} contains the logic for restarting failing jobs.
Edit 1:
It's worth mentioning that Start-Job is not worth your time if you're interested in multithreading, it has been proven to be slower than a linear loop in many scenarios. You should be looking at the ThreadJob Module
Edit 2:
After some testing, this worked for me:
# Clear the Event queue
Get-EventSubscriber|Unregister-Event
# Clear the Job queue
Get-Job|Remove-Job
1..10 | ForEach-Object -Process {
$job = Start-Job { Sleep -Seconds (1..20|Get-Random) } -Name "$_"
Register-ObjectEvent -InputObject $job -EventName StateChanged -Action {
$eventSubscriber | Unregister-Event
$eventSubscriber.Action | Remove-Job
if(-not (Get-EventSubscriber))
{
[System.Windows.MessageBox]::Show('Done')
}
} | Out-Null
}
At first I didn't even know this was possible so thanks for pointing this out. Great question :)

Basic multithreaded Consumers algorithm in PowerShell

I want create a queue of ffmpeg commands and then let N threads consume the queue launching an instance of ffmpeg with parameters:
here some code:
#looping on $items to prepare the params
foreach($input in $items){
{...}
#adding a param in queue
$global:jobsQueue.Enqueue($ffmpegParam)
}
#block code to be executed in different thread
$block = {
Param($queue, $ffmpegDir)
while($true){
if($queue.TryDequeue($params)){
$pinfo = New-Object System.Diagnostics.ProcessStartInfo($ffmpegDir, $params)
$pinfoMap.$input.UseShellExecute = $false
$pinfoMap.$input.CreateNoWindow = $true
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $pinfo
$p.Start()
$p.WaitForExit()
} else {break}
}
}
#miserably failing to start the previous block code
for($i = 0; $i -lt 1; $i++){
Start-Job -Name "process $i" -ScriptBlock $block -ArgumentList $global:jobsQueue,$ffmpegDir
}
A job is actually started but it does nothing and i don't get why. I read those pages but i wasn't able to come up with a solution:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/start-job?view=powershell-5.1
How do I Start a job of a function i just defined?
https://social.technet.microsoft.com/Forums/scriptcenter/en-US/b68c1c68-e0f0-47b7-ba9f-749d06621a2c/calling-a-function-using-startjob

PoshRsJob Performance Issue

Why is using Multithreading in PowerShell so unbelievable slow. Am I doing anything wrong? I am using the PoshRsJob Module.
RSJobs:
(Measure-Command {
$output = Start-RSJob -InputObject $shortDump -ScriptBlock {
Param($out, $shortDump)
$retObj = [pscustomobject]#{
UserMail = $_.Mail
Type = $_.Type
}
# return $retObj
$retObj
} | Wait-RSJob
$out.Add( $( Get-RSJob | Receive-RSJob) )
# $out += $( Get-RSJob | Receive-RSJob )
}).TotalSeconds
and
Standard foreach:
(Measure-Command {
foreach ($obj in $shortDump) {
$retObj = [pscustomobject]#{
UserMail =$obj.Mail
Type = $obj.Type
}
# $out+= $retObj
$out.Add($retObj)
}
}).TotalSeconds
My goal is to build objects faster, because i have ~ 300.000 objects to build.
edit: Here is a another example. It's totally slow!
fast
$out = New-Object System.Collections.ArrayList
"default"
(Measure-Command {
for ($x = 0; $x -lt 100000; $x++)
{
$retObj = [pscustomobject]#{
UserMail = 'test'
Type = 'test2'
Test = 'default'
}
$out.Add($retObj)
}
}).TotalSeconds
$out2 = $out
horribly slow
$out = New-Object System.Collections.ArrayList
$Test = `"RSJobs"`
"RSJobs"
$ScriptBlock = {
[pscustomobject]#{
UserMail = 'test'
Type = 'test2'
Test = $Using:Test
}
}
(Measure-Command {
1..100000 | Start-RSJob -Name {$_} -ScriptBlock $ScriptBlock
$out.Add( $( Get-RSJob | Receive-RSJob) )
}).TotalSeconds
Creating a new runspace has overhead. So with many small jobs then you are adding the overhead every single time.
(measure-command {[pscustomobject]#{'a'='b'}}).totalmilliseconds
0.1773
{start-rsjob -scriptblock {[pscustomobject]#{'a'='b'}}}).totalmilliseconds
93.0173
Then you are adding even more overhead retrieving all of the returned data from the individual jobs into one object, which was basically your goal in the first place.
Basically, build 1 object from 100,000 objects vs create a runspace 100,000 times each creating 1 object then return all of these objects to build 1 object from 100,000 objects.
I don't see how you are going to get any gain in efficiency using runspaces in this application. If there was an expensive calculation to determine each object, and then you made just a few runspaces and ran a subset of your array in each, maybe.

Passing relative paths of scripts to powershell jobs

I have functions in separate files I need to run as jobs in one main file.
I need to be able to pass these functions arguments.
Right now my problem is figuring out how to pass the path of the function files to the jobs in a way that is not completely awful.
I need to have the functions defined at the top of the file for readability (just having a static comment that says "script uses somefunc.ps1" is not adequate)
I also need to refer to the scripts relative path (they will all be in the same folder).
Right now I am using env: to store the path of the scripts, but doing this I need to refer to the script in like 5 places!
This is what I have:
testJobsMain.ps1:
#Store path of functions in env so jobs can find them
$env:func1 = "$PSScriptRoot\func1.ps1"
$env:func2 = "$PSScriptRoot\func2.ps1"
$arrOutput = #()
$Jobs = #()
foreach($i in ('aaa','bbb','ccc') ) {
$Import = {. $env:func1}
$Execute = {func1 -myArg $Using:i}
$Jobs += Start-Job -InitializationScript $Import -ScriptBlock $Execute
}
$JobsOutput = $Jobs | Wait-Job | Receive-Job
$JobsOutput
$Jobs | Remove-Job
#Clean up env
Remove-Item env:\func1
$arrOutput
func1.ps1
function func1( $myArg ) { write-output $myArg }
func2.ps1
function func2( $blah ) { write-output $blah }
You can simply make array of paths, and then pass one of paths/all of them in -ArgumentList param from Start-Job:
#func1.ps1
function add($inp) {
return $inp + 1
}
#func2.ps1
function add($inp) {
return $inp + 2
}
$paths = "$PSScriptRoot\func1.ps1", "$PSScriptRoot\func2.ps1"
$i = 0
ForEach($singlePath in $paths) {
$Execute = {
Param(
[Parameter(Mandatory=$True, Position=1)]
[String]$path
)
Import-Module $path
return add 1
}
Start-Job -Name "Job$i" -ScriptBlock $Execute -ArgumentList $singlePath
$i++
}
for ($i = 0; $i -lt 2; $i++) {
Wait-Job "Job$i"
[int]$result = Receive-Job "Job$i"
}
You can skip all those $i iterators with names, Powershell will name jobs automatically, and easly predictable: Job1, Job2.. So it would make code a lot prettier.

Utilize Results from Synchronized Hashtable (Runspacepool 6000+ clients)

Adapting a script to do multiple functions, starting with test-connection to gather data, will be hitting 6000+ machines so I am using RunspacePools adapted from the below site;
http://learn-powershell.net/2013/04/19/sharing-variables-and-live-objects-between-powershell-runspaces/
The data comes out as below, I would like to get it sorted into an array (I think that's the terminology), so I can sort the data via results. This will be adapted to multiple other functions pulling anything from Serial Numbers to IAVM data.
Is there any way I can use the comma delimited data and have it spit the Values below into columns? IE
Name IPAddress ResponseTime Subnet
x qwe qweeqwe qweqwe
The added values aren't so important at the moment, just the ability to add the values and pull them.
Name Value
—- —–
x-410ZWG \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410ZWG",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-47045Q \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-47045Q",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-440J26 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-440J26",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-410Y45 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410Y45",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DJKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DJKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
nonexistant
x-DDMVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DDMVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-470481 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-470481",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DHKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DHKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-430XXF \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-430XXF",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DLKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DLKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-410S86 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410S86",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-SCH004 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-SCH004",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-431KMS
x-440J22 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-440J22",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
Thank for any help!
Code currently
Function Get-RunspaceData {
[cmdletbinding()]
param(
[switch]$Wait
)
Do {
$more = $false
Foreach($runspace in $runspaces) {
If ($runspace.Runspace.isCompleted) {
$runspace.powershell.EndInvoke($runspace.Runspace)
$runspace.powershell.dispose()
$runspace.Runspace = $null
$runspace.powershell = $null
} ElseIf ($runspace.Runspace -ne $null) {
$more = $true
}
}
If ($more -AND $PSBoundParameters['Wait']) {
Start-Sleep -Milliseconds 100
}
#Clean out unused runspace jobs
$temphash = $runspaces.clone()
$temphash | Where {
$_.runspace -eq $Null
} | ForEach {
Write-Verbose ("Removing {0}" -f $_.computer)
$Runspaces.remove($_)
}
Write-Host ("Remaining Runspace Jobs: {0}" -f ((#($runspaces | Where {$_.Runspace -ne $Null}).Count)))
} while ($more -AND $PSBoundParameters['Wait'])
}
#Begin
#What each runspace will do
$ScriptBlock = {
Param ($computer,$hash)
$Ping = test-connection $computer -count 1 -ea 0
$hash[$Computer]= $Ping
}
#Setup the runspace
$Script:runspaces = New-Object System.Collections.ArrayList
# Data table for all of the runspaces
$hash = [hashtable]::Synchronized(#{})
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$runspacepool = [runspacefactory]::CreateRunspacePool(1, 100, $sessionstate, $Host)
$runspacepool.Open()
#Process
ForEach ($Computer in $Computername) {
#Create the powershell instance and supply the scriptblock with the other parameters
$powershell = [powershell]::Create().AddScript($scriptBlock).AddArgument($computer).AddArgument($hash)
#Add the runspace into the powershell instance
$powershell.RunspacePool = $runspacepool
#Create a temporary collection for each runspace
$temp = "" | Select-Object PowerShell,Runspace,Computer
$Temp.Computer = $Computer
$temp.PowerShell = $powershell
#Save the handle output when calling BeginInvoke() that will be used later to end the runspace
$temp.Runspace = $powershell.BeginInvoke()
Write-Verbose ("Adding {0} collection" -f $temp.Computer)
$runspaces.Add($temp) | Out-Null
}
# Wait for all runspaces to finish
#End
Get-RunspaceData -Wait
$stoptimer = Get-Date
#Display info, and display in GridView
Write-Host
Write-Host "Availability check complete!" -ForegroundColor Cyan
"Execution Time: {0} Minutes" -f [math]::round(($stoptimer – $starttimer).TotalMinutes , 2)
$hash | ogv
When you use runspaces, you write the scriptblock for the runspace pretty much the same way you would for a function. You write whatever you want the return to be to the pipeline, and then either assign it to a variable, pipe it to another cmdlet or function, or just let it output to the console. The difference is that while the function returns it's results automatically, with the runspace they collect in the runspace output buffer and aren't returned until you do the .EndInvoke() on the runspace handle.
As a general rule, the objective of a Powershell script is (or should be) to create objects, and the objective of using the runspaces is to speed up the process by multi-threading. You could return string data from the runspaces back to the main script and then use that to create objects there, but that's going to be a single threaded process. Do your object creation in the runspace, so that it's also multi-threaded.
Here's a sample script that uses a runspace pool to do a pingsweep of a class C subnet:
Param (
[int]$timeout = 200
)
$scriptPath = (Split-Path -Path $MyInvocation.MyCommand.Definition -Parent)
While (
($network -notmatch "\d{1,3}\.\d{1,3}\.\d{1,3}\.0") -and -not
($network -as [ipaddress])
)
{ $network = read-host 'Enter network to scan (ex. 10.106.31.0)' }
$scriptblock =
{
Param (
[string]$network,
[int]$LastOctet,
[int]$timeout
)
$options = new-object system.net.networkinformation.pingoptions
$options.TTL = 128
$options.DontFragment = $false
$buffer=([system.text.encoding]::ASCII).getbytes('a'*32)
$Address = $($network.trim("0")) + $LastOctet
$ping = new-object system.net.networkinformation.ping
$reply = $ping.Send($Address,$timeout,$buffer,$options)
Try { $hostname = ([System.Net.Dns]::GetHostEntry($Address)).hostname }
Catch { $hostname = 'No RDNS' }
if ( $reply.status -eq 'Success' )
{ $ping_result = 'Yes' }
else { $ping_result = 'No' }
[PSCustomObject]#{
Address = $Address
Ping = $ping_result
DNS = $hostname
}
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(100,100)
$RunspacePool.Open()
$Jobs =
foreach ( $LastOctet in 1..254 )
{
$Job = [powershell]::Create().
AddScript($ScriptBlock).
AddArgument($Network).
AddArgument($LastOctet).
AddArgument($Timeout)
$Job.RunspacePool = $RunspacePool
[PSCustomObject]#{
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host 'Working..' -NoNewline
Do {
Write-Host '.' -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host ' Done! Writing output file.'
Write-host "Output file is $scriptPath\$network.Ping.csv"
$(ForEach ($Job in $Jobs)
{ $Job.Pipe.EndInvoke($Job.Result) }) |
Export-Csv $scriptPath\$network.ping.csv -NoTypeInformation
$RunspacePool.Close()
$RunspacePool.Dispose()
The runspace script does a ping on each address, and if it gets successful ping attempts to resolve the host name from DNS. Then it builds a custom object from that data, which is output to the pipeline. At the end, those objects are returned when the .EndInvoke() is done on the runspace jobs and piped directly into Export-CSV, but it could just as easily be output to the console, or saved into a variable.

Resources