I want to parallelize some file-parsing actions with network activity in powershell. Quick google for it,
start-thread looked like a solution, but:
The term 'start-thread' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
The same thing happened when I tried start-job.
I also tried fiddling around with System.Threading.Thread
[System.Reflection.Assembly]::LoadWithPartialName("System.Threading")
#This next errors, something about the arguments I can't figure out from the documentation of .NET
$tstart = new-object System.Threading.ThreadStart({DoSomething})
$thread = new-object System.Threading.Thread($tstart)
$thread.Start()
So, I think the best would be to know what I do wrong when I use start-thread, because it seems to work for other people. I use v2.0 and I don't need downward compatibility.
Powershell does not have a built-in command named Start-Thread.
V2.0 does, however, have PowerShell jobs, which can run in the background, and can be considered the equivalent of a thread. You have the following commands at your disposal for working with jobs:
Name Category Synopsis
---- -------- --------
Start-Job Cmdlet Starts a Windows PowerShell background job.
Get-Job Cmdlet Gets Windows PowerShell background jobs that are running in the current ...
Receive-Job Cmdlet Gets the results of the Windows PowerShell background jobs in the curren...
Stop-Job Cmdlet Stops a Windows PowerShell background job.
Wait-Job Cmdlet Suppresses the command prompt until one or all of the Windows PowerShell...
Remove-Job Cmdlet Deletes a Windows PowerShell background job.
Here is an example on how to work with it. To start a job, use start-job and pass a script block which contains the code you want run asynchronously:
$job = start-job { get-childitem . -recurse }
This command will start a job, that gets all children under the current directory recursively, and you will be returned to the command line immediately.
You can examine the $job variable to see if the job has finished, etc. If you want to wait for a job to finish, use:
wait-job $job
Finally, to receive the results from a job, use:
receive-job $job
You can't use threads directly like this, but you can't be blamed for trying since once the whole BCL is lying in front of you it's not entirely silly to expect most of it to work :)
PowerShell runs scriptblocks in pipelines which in turn require runspaces to execute them. I blogged about how to roll your own MT scripts some time ago for v2 ctp3, but the technique (and API) is still the same. The main tools are the [runspacefactory] and [powershell] types. Take a look here:
http://www.nivot.org/2009/01/22/CTP3TheRunspaceFactoryAndPowerShellAccelerators.aspx
The above is the most lightweight way to approach MT scripting. There is background job support in v2 by way of start-job, get-job but I figured you already spotted that and saw that they are fairly heavyweight.
The thing that comes closest to threads and is way more performant than jobs is PowerShell runspaces.
Here is a very basic example:
# the number of threads
$count = 10
# the pool will manage the parallel execution
$pool = [RunspaceFactory]::CreateRunspacePool(1, $count)
$pool.Open()
try {
# create and run the jobs to be run in parallel
$jobs = New-Object object[] $count
for ($i = 0; $i -lt $count; $i++) {
$ps = [PowerShell]::Create()
$ps.RunspacePool = $pool
# add the script block to run
[void]$ps.AddScript({
param($Index)
Write-Output "Index: $index"
})
# optional: add parameters
[void]$ps.AddParameter("Index", $i)
# start async execution
$jobs[$i] = [PSCustomObject]#{
PowerShell = $ps
AsyncResult = $ps.BeginInvoke()
}
}
foreach ($job in $jobs) {
try {
# wait for completion
[void]$job.AsyncResult.AsyncWaitHandle.WaitOne()
# get results
$job.PowerShell.EndInvoke($job.AsyncResult)
}
finally {
$job.PowerShell.Dispose()
}
}
}
finally {
$pool.Dispose()
}
It also allows you to do more advanced things like
Throttle the number of parallel runspaces on the pool
Import functions and variables from the current session
etc.
The answer, now, is quite simple with the ThreadJob module according to Microsoft Docs.
Install-Module -Name ThreadJob -Confirm:$true
$Job1 = Start-ThreadJob `
-FilePath $YourThreadJob `
-ArgumentList #("A", "B")
$Job1 | Get-Job
$Job1 | Receive-Job
Related
I'm currently learning about runspaces in Powershell (my end goal is to set up a job scheduling system) to do this I wrote a basic script in order to learn and use runspaces.
What I Expected To Happen:
I expected that when I run the code up to the commented line, this will queue up the 8 jobs and run them within the RunspacePool , running a maximum of 2 at a time.
Running the single line $JobList.AsynchronousObject a few times and should then see more and more IsComplete flags turning from false to true as the jobs complete as they take 20 seconds each due to the Start-Sleep command.
The BeginInvoke command apparently returns an object implementing the IAsycResult interface.
https://learn.microsoft.com/en-us/dotnet/api/system.iasyncresult?redirectedfrom=MSDN&view=netframework-4.8#examples
In the IAsyncResult remarks in mentions polling the IsComplete property to see if an asychronous operation is completed which although not ideal is what I was trying to do below for learning purposes.
Actual:
All the IsComplete flags are true a second after running the top portion of code which is not what I expected
Question:
Does the IsComplete flag represent just whether the script has started executing and maybe that is why they're all true a second after queuing up?
I'm grateful for any assistance or references to further reading anyone is able to provide.
Many Thanks
Nick
#Set up runspace
$RunspacePool = [runspacefactory]::CreateRunspacePool()
$RunspacePool.SetMinRunspaces(1)
$RunspacePool.SetMaxRunspaces(2)
#Create arraylist to hold references to all the instances running jobs
$JobList = New-Object System.Collections.ArrayList
#Queue up 8 jobs that will take 20 seconds each to complete
#Add the job details to the list so I can poll it's IsComplete property
$RunspacePool.Open()
1..8 | ForEach {
Write-Verbose "Counter: $_" -Verbose
$PowershellInstance = [powershell]::Create()
$PowershellInstance.RunspacePool = $RunspacePool
[void]$PowershellInstance.AddScript({
Start-Sleep -Seconds 20
$ThreadID = [appdomain]::GetCurrentThreadId()
Write-Verbose "$ThreadID thread completed" -Verbose
})
$AsynchronousObject = $PowershellInstance.BeginInvoke()
$JobList.Add(([PSCustomObject]#{
Id = $_
PowerShellInstance = $PowershellInstance
AsynchronousObject = $AsynchronousObject
}))
}
#----------------------------------------------
#List IsComplete should show true as jobs become complete
$JobList.AsynchronousObject
#Clean up
$RunspacePool.Close()
$RunspacePool.Dispose()
There is no issue. You are forgetting what Asynchronous really means.
When you launch Asynchronous jobs, they don't block the current thread (aka. you're current PowerShell prompt) instead, they create a new thread and run from there. The whole point about Asynchronous jobs is that you can run multiple things at once.
So what happens is that Runspace is created, everything gets set up, Jobs are queued and start to run in new threads, then it keeps going (everything is Async and running in separate threads). It then goes right on to execute the last three lines:
#List IsComplete should show true as jobs become complete
$JobList.AsynchronousObject
#Clean up
$RunspacePool.Close()
$RunspacePool.Dispose()
Which kills the Runspace and Disposes of it, thereby "completing" the jobs.
If you run everything up to the commented line first. Then start watching $JobList.AsynchronousObject from the PowerShell prompt, then you will see it stepping through the jobs as expected.
Once complete, then you can execute the final two lines to close and dispose of your runspace.
You will have to look at the Job Wait functions if you want to have things wait for you.
I have a little issue. I'm creating an WPF GUI, with powershell code. I have a function which will perform a task on multiple computers (using parallel workflows). The issue here, is when my task is running, my UI freeze until the task is complete.
I would like to work around with jobs, but i am unable to receive my job when the task is ended.
Here, my simplified function :
parallelPingComputer -ips $ip_list | Select-Object date, Computer, result | out-gridview
The function :
workflow parallelPingCOmputer {
Param($ips)
foreach -parallel($ip in $ips)
{
PingComputer($ip)
}
}
And finally, the "pingcomputer($ip)" is only a ping plus an other task on multiple targets.
I tried to add -AsJob after the parallel ping, and i'm not able to call back the job result when he ended (and not before...)
Can you please help me ? :)
Thank's a lot
I have been banging my head against a wall for a couple of days now trying to get Start-Job and background jobs working to no avail.
I have tried ScriptBlock and ScriptFile but neither seem to do what I want, or I can't seem to get the syntax right.
I have a number of recursive functions and need to split up the script to work in parallel accross many chunks of a larger data set.
No matter how I arrange the Start-Job call, nothing seems to work, and the recursive functions seem to be making everything twice as hard.
Can anyone give me a working example of Start-Job calling a recursive function and having multiple parameters, or point me somewhere where one exists?
Any help appreciated
This works for me:
$sb = {param($path, $currentDepth, $maxDepth) function EnumFiles($dir,$currentDepth,$maxDepth) { if ($currentDepth -gt $maxDepth) { return }; Get-ChildItem $dir -File; Get-ChildItem $dir -Dir | Foreach {EnumFiles $_.FullName ($currentDepth+1) $maxDepth}}; EnumFiles $path $currentDepth $maxDepth }
$job = Start-Job -ScriptBlock $sb -ArgumentList $pwd,0,2
Wait-Job $job | Out-Null
Receive-Job $job
Keep in mind your functions have to be defined in the scriptblock because the script runs in a completely separate PowerShell process. Same goes for any externally defined variables - they have to be passed into Start-Job via the -ArgumentList parameter. The values will be serialized, passed to the PowerShell process executing the job, where they will then be provided to the scriptblock.
I need to execute some operation on a PS script that should be ran in parallel. Using PS Jobs is not a real option since the tasks that must be paralized depends on custom functions that are defined inside a separete Module. Although I know that I can use the -InitializationScript flag and import the module that contains my custom function, I think that I loose speed since importing the hole module is "time consuming" operation.
Bearing in mind all those things I'm trying launching those "tasks" in separate threads that share the runspace. My code looks like:
$ps = [Powershell]::Create().AddScript({ Get-CustomADDomain -dnsdomain $env: })
$threadRes = $ps.beginInvoke()
$ps.EndInvoke($threadRes)
The drawback of this approach is that, since I'm creating a new "powershell process" this runspace do not have my custom modules loaded and thus I'm in the same situation that I got with Jobs.
If I try to attach current runspace to the newly created $ps by using following code:
$ps = [Powershell]::Create()
$ps.runspace = $host.runspace
$ps.AddScript({ Get-CustomADDomain -dnsdomain $env: })
$threadRes = $ps.beginInvoke()
$ps.EndInvoke($threadRes)
I get an error because I'm trying to close current pipeline (bad thing).
I think my second shot is on the right way but I cannot retrieve results from the invocation of the script, or at least I'm not able to see the way to do it.
It's obvious that I must missing something so any advice you may have will be very appretiated!!!!
A new job or runspace isn't going to inherit functions from a module that was imported into the current session. That being said, you don't have to import the entire module. If you've got specific functions in the current session you need to have available in the job, you can add just those functions like this:
function test_function {'This is a test'}
function test_function2 {'This is also a test'}
$job_functions = 'test_function','test_function2'
$init = [scriptblock]::Create(
$(foreach ($job_function in $job_functions)
{
#"
function $job_function
{$((get-item function:$job_function).definition)}
"#
}))
$init
function test_function
{'This is a test'}
function test_function2
{'This is also a test'}
Hi I have a simple script that takes a file and runs another Perl script on it. The script does this to every picture file in the current folder. This is running on a machine with 2 quad core Xeon processors, 16gb of ram, running RedHat Linux.
The first script work.pl basically calls magicplate.pl passes some parameters and the name of the file for magicplate.pl to process. Magic Plate takes about a minute to process each image. Because work.pl is preforming the same function over 100 times and because the system has multiple processors and cores I was thinking about splitting the task up so that it could run multiple times in parallel. I could split the images up to different folders if necessary. Any help would be great. Thank you
Here is what I have so far:
use strict;
use warnings;
my #initialImages = <*>;
foreach my $file (#initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my #tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(#tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(#tmp-1)];
my $orig=$name.".".$exten;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
}
}
You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..
With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.
You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):
use Parallel::ForkManager;
use strict;
use warnings;
my #initialImages = <*>;
foreach my $file (#initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my #tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(#tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(#tmp-1)];
my $orig=$name.".".$exten;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
my $pid = $pm->start and next;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
$pm->finish; # Terminates the child process
}
}
But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.
Import "maigcplate" and use threading.
Start magicplate.pl in the background (you would need to add process throttling)
Import "magicplate" and use fork (add process throttling and a kiddy reaper)
Make "maigcplate" a daemon with a pool of workers = # of CPUs
use an MQ implementation for communication
use sockets for communication
Use webserver(nginx, apache, ...) and wrap in REST for a webservice
etc...
All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.