Basic multithreaded Consumers algorithm in PowerShell - multithreading

I want create a queue of ffmpeg commands and then let N threads consume the queue launching an instance of ffmpeg with parameters:
here some code:
#looping on $items to prepare the params
foreach($input in $items){
{...}
#adding a param in queue
$global:jobsQueue.Enqueue($ffmpegParam)
}
#block code to be executed in different thread
$block = {
Param($queue, $ffmpegDir)
while($true){
if($queue.TryDequeue($params)){
$pinfo = New-Object System.Diagnostics.ProcessStartInfo($ffmpegDir, $params)
$pinfoMap.$input.UseShellExecute = $false
$pinfoMap.$input.CreateNoWindow = $true
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $pinfo
$p.Start()
$p.WaitForExit()
} else {break}
}
}
#miserably failing to start the previous block code
for($i = 0; $i -lt 1; $i++){
Start-Job -Name "process $i" -ScriptBlock $block -ArgumentList $global:jobsQueue,$ffmpegDir
}
A job is actually started but it does nothing and i don't get why. I read those pages but i wasn't able to come up with a solution:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/start-job?view=powershell-5.1
How do I Start a job of a function i just defined?
https://social.technet.microsoft.com/Forums/scriptcenter/en-US/b68c1c68-e0f0-47b7-ba9f-749d06621a2c/calling-a-function-using-startjob

Related

How to search for a string in multiple text files in an active running log

I am trying to search for a string in multiple text files to trigger an event. The log file is being actively added to by a program. The following script successfully achieves that goal, but it only works for one text file at a time:
$PSDefaultParameterValues = #{"Get-Date:format"="yyyy-MM-dd HH:mm:ss"}
Get-Content -path "C:\Log 0.txt" -Tail 1 -Wait | ForEach-Object { If ($_ -match 'keyword') {
Write-Host "Down : $_" -ForegroundColor Green
Add-Content "C:\log.txt" "$(get-date) down"
Unfortunately it means I have to run 3 instances of this script to search the 3 log files (C:\log 0.txt, C:\log 1.txt and C:'log 2.txt).
What I want to do is run one powershell script to search for that string across all three text files and not three.
I tried using a wildcard in the path ("C:\log*.txt)
I also tried adding a foreach loop:
$PSDefaultParameterValues = #{"Get-Date:format"="yyyy-MM-dd HH:mm:ss"}
$LogGroup = ('C:\log 0.txt', 'C:\Log 1.txt', 'C:\Log 2.txt')
ForEach ($log in $LogGroup) {
Get-Content $log -Tail 1 -Wait | ForEach-Object { If ($_ -match 'keyword') {
Write-Host "Down: $_" -ForegroundColor Green
Add-Content -path "C:\log.txt" "$(get-date) down"
Add-Content -path "C:\log.txt" "$(get-date) down"
}
}
}
This got me no errors but it also didn't work.
I saw others use Get-ChildItem instead of Get-Content but since this worked with one file... shouldn't it work with multiple? I assume it's my lack of scripting ability. Any help would be appreciated. Thanks.
This is how you can apply the same logic you already have for one file but for multiple logs at the same time, the concept is to spawn as many PowerShell instances as log paths there are in the $LogGroup array. Each instance is assigned and will be monitoring 1 log path and when the keyword is matched it will append to the main log file.
The instances are assigned the same RunspacePool, this help us initialize all with a SemaphoreSlim instance which help us ensure thread safety (only 1 thread can write to the main log at a time).
using namespace System.Management.Automation.Runspaces
using namespace System.Threading
# get the log files here
$LogGroup = ('C:\log 0.txt', 'C:\Log 1.txt', 'C:\Log 2.txt')
# this help us write to the main log file in a thread safe manner
$lock = [SemaphoreSlim]::new(1, 1)
# define the logic used for each thread, this is very similar to the
# initial script except for the use of the SemaphoreSlim
$action = {
param($path)
$PSDefaultParameterValues = #{ "Get-Date:format" = "yyyy-MM-dd HH:mm:ss" }
Get-Content $path -Tail 1 -Wait | ForEach-Object {
if($_ -match 'down') {
# can I write to this file?
$lock.Wait()
try {
Write-Host "Down: $_ - $path" -ForegroundColor Green
Add-Content "path\to\mainLog.txt" -Value "$(Get-Date) Down: $_ - $path"
}
finally {
# release the lock so other threads can write to the file
$null = $lock.Release()
}
}
}
}
try {
$iss = [initialsessionstate]::CreateDefault2()
$iss.Variables.Add([SessionStateVariableEntry]::new('lock', $lock, $null))
$rspool = [runspacefactory]::CreateRunspacePool(1, $LogGroup.Count, $iss, $Host)
$rspool.ApartmentState = [ApartmentState]::STA
$rspool.ThreadOptions = [PSThreadOptions]::UseNewThread
$rspool.Open()
$res = foreach($path in $LogGroup) {
$ps = [powershell]::Create($iss).AddScript($action).AddArgument($path)
$ps.RunspacePool = $rspool
#{
Instance = $ps
AsyncResult = $ps.BeginInvoke()
}
}
# block the main thread
do {
$id = [WaitHandle]::WaitAny($res.AsyncResult.AsyncWaitHandle, 200)
}
while($id -eq [WaitHandle]::WaitTimeout)
}
finally {
# clean all the runspaces
$res.Instance.ForEach('Dispose')
$rspool.ForEach('Dispose')
}

Powershell concurrency with Start-ThreadJob and ForEach-Object -Parallel

I've been trying to implement a producer-consumer pattern with multiple producers using BlockingCollection<>, Start-ThreadJob and ForEach-Object -Parallel. The results were mixed. Some code runs, some freezes and some just crashes powershell. So I'm thinking, I must be doing something fundamentally wrong:
using namespace System.Collections.Concurrent
class TestProducerConsumer
{
[int] $result = 0
[BlockingCollection[int]] $queue =
[BlockingCollection[int]]::new()
[void] producer([int]$i) { $this.queue.Add($i) }
[void] consumer() {
$sum = 0
$it = $this.queue.GetConsumingEnumerable()
foreach( $i in $it ) { $sum += $i }
$this.result = $sum
}
[void] Run() {
$job = Start-ThreadJob { ($using:this).consumer() }
1..10 | ForEach-Object -Parallel {
#($using:this).producer($_) # freezing
($using:this).queue.Add($_) # working
}
#Start-Sleep -Seconds 1 # freezing
$this.queue.CompleteAdding()
$job | Receive-Job -Wait
}
}
$t = [TestProducerConsumer]::new(); $t.Run(); $t
In the simplified test case above, there are two lines doing the same thing: One is getting the queue member from the instance and adding the value directly; the other is calling a method on the instance to add the value to the queue. The former works, the latter freezes!?
Also, adding back in the line with Start-Sleep freezes the process.
Tested on Windows 10 with various PowerShell 7.* versions.
EDIT: Probably related to ForEach-Object -Parallel situationally drops pipeline input and similar issues

Passing relative paths of scripts to powershell jobs

I have functions in separate files I need to run as jobs in one main file.
I need to be able to pass these functions arguments.
Right now my problem is figuring out how to pass the path of the function files to the jobs in a way that is not completely awful.
I need to have the functions defined at the top of the file for readability (just having a static comment that says "script uses somefunc.ps1" is not adequate)
I also need to refer to the scripts relative path (they will all be in the same folder).
Right now I am using env: to store the path of the scripts, but doing this I need to refer to the script in like 5 places!
This is what I have:
testJobsMain.ps1:
#Store path of functions in env so jobs can find them
$env:func1 = "$PSScriptRoot\func1.ps1"
$env:func2 = "$PSScriptRoot\func2.ps1"
$arrOutput = #()
$Jobs = #()
foreach($i in ('aaa','bbb','ccc') ) {
$Import = {. $env:func1}
$Execute = {func1 -myArg $Using:i}
$Jobs += Start-Job -InitializationScript $Import -ScriptBlock $Execute
}
$JobsOutput = $Jobs | Wait-Job | Receive-Job
$JobsOutput
$Jobs | Remove-Job
#Clean up env
Remove-Item env:\func1
$arrOutput
func1.ps1
function func1( $myArg ) { write-output $myArg }
func2.ps1
function func2( $blah ) { write-output $blah }
You can simply make array of paths, and then pass one of paths/all of them in -ArgumentList param from Start-Job:
#func1.ps1
function add($inp) {
return $inp + 1
}
#func2.ps1
function add($inp) {
return $inp + 2
}
$paths = "$PSScriptRoot\func1.ps1", "$PSScriptRoot\func2.ps1"
$i = 0
ForEach($singlePath in $paths) {
$Execute = {
Param(
[Parameter(Mandatory=$True, Position=1)]
[String]$path
)
Import-Module $path
return add 1
}
Start-Job -Name "Job$i" -ScriptBlock $Execute -ArgumentList $singlePath
$i++
}
for ($i = 0; $i -lt 2; $i++) {
Wait-Job "Job$i"
[int]$result = Receive-Job "Job$i"
}
You can skip all those $i iterators with names, Powershell will name jobs automatically, and easly predictable: Job1, Job2.. So it would make code a lot prettier.

Powershell runspaces won't execute

I'm at a bit of a loss with the script I am trying to pull.
In short: I want to scan my domain-computers for WinRM connectivity - and I can do that just fine. The problem is, that it takes up to 5 minutes to finish - thats why I want to multithread the task.
Working NON MULTITHREAD code:
# this is normaly a textfile with lots of machine hostnames
$computers = "PC100","PC106","PC124","PC115","PC21"
function checkMachine($computers){
$ErrorActionPreference = "Stop"
foreach ($item in $computers){
#the function contest only performs a ping and returne $true or $false
$connection = ConTest($item)
if($connection){
try{
$winRM = test-wsman -ComputerName $item
if($winRM){
write-host "winRM"
[void] $objListboxLeft.Items.Add($item)
}
}catch{
write-host "NO winRM"
[void] $objListboxCenter.Items.Add($item)
}
}else{
write-host "offline"
[void] $objListboxRight.Items.Add($item)
}
}
}
this is basically just a small portion of what my skript does/will do but it's the part that takes ages.
My failing runspace test - I basically fail to get ANY results at all. Nothing in textboxes, no output on my commandline and I basically have no idea what I am doing wrong.
Multithread code:
function MulticheckMachine($computers){
$ErrorActionPreference = "Stop"
$runspaceCollection = #()
$runspacePool = [RunspaceFactory]::CreateRunspacePool(1,5)
$runspacePool.open()
$scriptBlock = {
Param($item)
$connection = ConTest($item)
if($connection){
try{
test-wsman -ComputerName $item
$winRM = test-wsman -ComputerName $item
if($winRM){
write-host "winRM"
[void] $objListboxLeft.Items.Add($item)
}
}catch{
write-host "NO winRM"
[void] $objListboxCenter.Items.Add($item)
}
}else{
write-host "offline"
[void] $objListboxRight.Items.Add($item)
}
}
Foreach($item in $computers){
$powershell = [PowerShell]::Create().AddScript($scriptBlock).AddArgument($item)
$powershell.runspacePool = $runspacePool
[Collections.Arraylist]$runspaceCollection += New-Object -TypeName PSObject -Property #{
Runspace = $powershell.BeginInvoke()
PowerShell = $powershell
}
$runspaceCollection
}
While($runspaceCollection){
Foreach($runspace in $runspaceCollection.ToArray()){
If($runspace.Runspace.IsCompleted){
$runspace.PowerShell.EndInvoke($runspace.Runspace)
$runspace.PowerShell.Dispose()
$runspaceCollection.Remove($runspace)
}
}
}
}
the runspace code comes from a mix of these guides:
http://blogs.technet.com/b/heyscriptingguy/archive/2013/09/29/weekend-scripter-max-out-powershell-in-a-little-bit-of-time-part-2.aspx
http://newsqlblog.com/2012/05/22/concurrency-in-powershell-multi-threading-with-runspaces/
I hope someone can help me out and tell me where/why I fail. Thanks!
Well, thanks for the hints but the problem was far more basic.
I was trying to get my data at the wrong position. Also, I simplified my script a bit. I don't call functions in functions anymore.
Note1: I did not realize I can/need to work with return values within my scriptblock for the runspace.
Note2: I am now collecting my data and inserting it into my listboxes (or where-ever else I wanted to) at the end of my function within the while loop - where I basically build-back my runspaces.
Note3: All "GUI parts" I reference to are located in a different file and do exist!
I got the duration down to roughly 20 seconds (from almost 5 minutes before)
The number of threads I use is a bit random, it's one of the combinations that works fastest.
Code:
function multiCheckMachine($computers){
$ErrorActionPreference = "Stop"
$runspaceCollection = #()
$runspacePool = [RunspaceFactory]::CreateRunspacePool(1,50)
$runspacePool.open()
$scriptBlock = {
Param($item)
$FQDNitem = "$item.domain.com"
$address = nslookup $FQDNitem
if($address -like "addresses*"){
$address = $address[5] -replace ".* ",""
}else{
$address = $address[4] -replace ".* ",""
}
$con = ping -n 1 $address
if($con[2] -like "*Bytes*"){
$winRM = test-wsman -ComputerName $item
if($winRM){
return "$item.winRM"
}else{
return "$item.NOremote"
}
}else{
return "$item.offline"
}
}
Foreach($item in $computers){
$powershell = [PowerShell]::Create().AddScript($scriptBlock).AddArgument($item)
$powershell.runspacePool = $runspacePool
[Collections.Arraylist]$runspaceCollection += New-Object -TypeName PSObject -Property #{
Runspace = $powershell.BeginInvoke()
PowerShell = $powershell
}
}
While($runspaceCollection){
Foreach($runspace in $runspaceCollection.ToArray()){
If($runspace.Runspace.IsCompleted){
if($runspace.PowerShell.EndInvoke($runspace.Runspace) -like "*winrm"){
[void] $objListboxOnline.Items.Add($runspace.PowerShell.EndInvoke($runspace.Runspace).split(".")[0])
}elseif($runspace.PowerShell.EndInvoke($runspace.Runspace) -like "*NOremote"){
[void] $objListboxNoWinRM.Items.Add($runspace.PowerShell.EndInvoke($runspace.Runspace).split(".")[0])
}elseif($runspace.PowerShell.EndInvoke($runspace.Runspace) -like "*offline"){
[void] $objListboxOffline.Items.Add($runspace.PowerShell.EndInvoke($runspace.Runspace).split(".")[0])
}
$runspace.PowerShell.Dispose()
$runspaceCollection.Remove($runspace)
}
}
}
}

Utilize Results from Synchronized Hashtable (Runspacepool 6000+ clients)

Adapting a script to do multiple functions, starting with test-connection to gather data, will be hitting 6000+ machines so I am using RunspacePools adapted from the below site;
http://learn-powershell.net/2013/04/19/sharing-variables-and-live-objects-between-powershell-runspaces/
The data comes out as below, I would like to get it sorted into an array (I think that's the terminology), so I can sort the data via results. This will be adapted to multiple other functions pulling anything from Serial Numbers to IAVM data.
Is there any way I can use the comma delimited data and have it spit the Values below into columns? IE
Name IPAddress ResponseTime Subnet
x qwe qweeqwe qweqwe
The added values aren't so important at the moment, just the ability to add the values and pull them.
Name Value
—- —–
x-410ZWG \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410ZWG",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-47045Q \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-47045Q",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-440J26 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-440J26",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-410Y45 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410Y45",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DJKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DJKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
nonexistant
x-DDMVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DDMVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-470481 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-470481",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DHKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DHKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-430XXF \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-430XXF",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-DLKVV1 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-DLKVV1",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-410S86 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-410S86",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-SCH004 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-SCH004",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
x-431KMS
x-440J22 \\x-DHMVV1\root\cimv2:Win32_PingStatus.Address="x-440J22",BufferSize=32,NoFragmentation=false,RecordRoute=0,…
Thank for any help!
Code currently
Function Get-RunspaceData {
[cmdletbinding()]
param(
[switch]$Wait
)
Do {
$more = $false
Foreach($runspace in $runspaces) {
If ($runspace.Runspace.isCompleted) {
$runspace.powershell.EndInvoke($runspace.Runspace)
$runspace.powershell.dispose()
$runspace.Runspace = $null
$runspace.powershell = $null
} ElseIf ($runspace.Runspace -ne $null) {
$more = $true
}
}
If ($more -AND $PSBoundParameters['Wait']) {
Start-Sleep -Milliseconds 100
}
#Clean out unused runspace jobs
$temphash = $runspaces.clone()
$temphash | Where {
$_.runspace -eq $Null
} | ForEach {
Write-Verbose ("Removing {0}" -f $_.computer)
$Runspaces.remove($_)
}
Write-Host ("Remaining Runspace Jobs: {0}" -f ((#($runspaces | Where {$_.Runspace -ne $Null}).Count)))
} while ($more -AND $PSBoundParameters['Wait'])
}
#Begin
#What each runspace will do
$ScriptBlock = {
Param ($computer,$hash)
$Ping = test-connection $computer -count 1 -ea 0
$hash[$Computer]= $Ping
}
#Setup the runspace
$Script:runspaces = New-Object System.Collections.ArrayList
# Data table for all of the runspaces
$hash = [hashtable]::Synchronized(#{})
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$runspacepool = [runspacefactory]::CreateRunspacePool(1, 100, $sessionstate, $Host)
$runspacepool.Open()
#Process
ForEach ($Computer in $Computername) {
#Create the powershell instance and supply the scriptblock with the other parameters
$powershell = [powershell]::Create().AddScript($scriptBlock).AddArgument($computer).AddArgument($hash)
#Add the runspace into the powershell instance
$powershell.RunspacePool = $runspacepool
#Create a temporary collection for each runspace
$temp = "" | Select-Object PowerShell,Runspace,Computer
$Temp.Computer = $Computer
$temp.PowerShell = $powershell
#Save the handle output when calling BeginInvoke() that will be used later to end the runspace
$temp.Runspace = $powershell.BeginInvoke()
Write-Verbose ("Adding {0} collection" -f $temp.Computer)
$runspaces.Add($temp) | Out-Null
}
# Wait for all runspaces to finish
#End
Get-RunspaceData -Wait
$stoptimer = Get-Date
#Display info, and display in GridView
Write-Host
Write-Host "Availability check complete!" -ForegroundColor Cyan
"Execution Time: {0} Minutes" -f [math]::round(($stoptimer – $starttimer).TotalMinutes , 2)
$hash | ogv
When you use runspaces, you write the scriptblock for the runspace pretty much the same way you would for a function. You write whatever you want the return to be to the pipeline, and then either assign it to a variable, pipe it to another cmdlet or function, or just let it output to the console. The difference is that while the function returns it's results automatically, with the runspace they collect in the runspace output buffer and aren't returned until you do the .EndInvoke() on the runspace handle.
As a general rule, the objective of a Powershell script is (or should be) to create objects, and the objective of using the runspaces is to speed up the process by multi-threading. You could return string data from the runspaces back to the main script and then use that to create objects there, but that's going to be a single threaded process. Do your object creation in the runspace, so that it's also multi-threaded.
Here's a sample script that uses a runspace pool to do a pingsweep of a class C subnet:
Param (
[int]$timeout = 200
)
$scriptPath = (Split-Path -Path $MyInvocation.MyCommand.Definition -Parent)
While (
($network -notmatch "\d{1,3}\.\d{1,3}\.\d{1,3}\.0") -and -not
($network -as [ipaddress])
)
{ $network = read-host 'Enter network to scan (ex. 10.106.31.0)' }
$scriptblock =
{
Param (
[string]$network,
[int]$LastOctet,
[int]$timeout
)
$options = new-object system.net.networkinformation.pingoptions
$options.TTL = 128
$options.DontFragment = $false
$buffer=([system.text.encoding]::ASCII).getbytes('a'*32)
$Address = $($network.trim("0")) + $LastOctet
$ping = new-object system.net.networkinformation.ping
$reply = $ping.Send($Address,$timeout,$buffer,$options)
Try { $hostname = ([System.Net.Dns]::GetHostEntry($Address)).hostname }
Catch { $hostname = 'No RDNS' }
if ( $reply.status -eq 'Success' )
{ $ping_result = 'Yes' }
else { $ping_result = 'No' }
[PSCustomObject]#{
Address = $Address
Ping = $ping_result
DNS = $hostname
}
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(100,100)
$RunspacePool.Open()
$Jobs =
foreach ( $LastOctet in 1..254 )
{
$Job = [powershell]::Create().
AddScript($ScriptBlock).
AddArgument($Network).
AddArgument($LastOctet).
AddArgument($Timeout)
$Job.RunspacePool = $RunspacePool
[PSCustomObject]#{
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host 'Working..' -NoNewline
Do {
Write-Host '.' -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host ' Done! Writing output file.'
Write-host "Output file is $scriptPath\$network.Ping.csv"
$(ForEach ($Job in $Jobs)
{ $Job.Pipe.EndInvoke($Job.Result) }) |
Export-Csv $scriptPath\$network.ping.csv -NoTypeInformation
$RunspacePool.Close()
$RunspacePool.Dispose()
The runspace script does a ping on each address, and if it gets successful ping attempts to resolve the host name from DNS. Then it builds a custom object from that data, which is output to the pipeline. At the end, those objects are returned when the .EndInvoke() is done on the runspace jobs and piped directly into Export-CSV, but it could just as easily be output to the console, or saved into a variable.

Resources