I am having issue with threading and hoping someone can clear this up for me. My thread is returning duplicate entries in an array. I have been going in circles trying to figure out why. Here is the code:
$arrayofinfo | Start-RSJob -Name {"Command_$($_)"} -throttle 10 -ScriptBlock {
$command = $_
$array_1 = #()
$array_1 = Invoke-Expression " & $command" -EA SilentlyContinue
if(($array_1.count) -gt 20)
{
$array_1 += $command
$array_1 += $array_1
return $array_1
}
} ## end of scriptblock
get-rsjob | wait-rsjob #-Timeout 7
$array_complete = get-rsjob -HasMoreData -ErrorAction SilentlyContinue | Receive-RSJob -ErrorAction SilentlyContinue | Select-Object -ErrorAction SilentlyContinue
What is happening is either the $command is executed twice or results are put in $array_1 twice. Somehow... $array_complete is double in size and contains duplicate entries for each entry. HOW?????? Anything else that looks like it can be improved please comment on. thanks.
Related
I am using PowerShell 2.0 on a Windows 7 desktop. I am attempting to search the enterprise CIFS shares for keywords/regex. I already have a simple single threaded script that will do this but a single keyword takes 19-22 hours. I have created a multithreaded script, first effort at multithreading, based on the article by Surly Admin.
Can Powershell Run Commands in Parallel?
Powershell Throttle Multi thread jobs via job completion
and the links related to those posts.
I decided to use runspaces rather than background jobs as the prevailing wisdom says this is more efficient. Problem is, is I am only getting partial resultant output with the multithreaded script I have. Not sure if it is an I/O thing or a memory thing, or something else. Hopefully someone here can help. Here is the code.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$KeywordInfo = Select-String -pattern KEYWORD -AllMatches -InputObject $File
$KeywordOut = New-Object PSObject -Property #{
Matches = $KeywordInfo.Matches
Path = $KeywordInfo.Path
}
Return $KeywordOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
$Job.Pipe.EndInvoke($Job.Result) | Where {$_.Path} | Format-List | Out-File -FilePath C:\Users\user\Desktop\results.txt -Append -Encoding UTF8 -Width 512
}
Invoke-Item C:\Users\user\Desktop\results.txt
Get-Date
This is the single threaded version I am using that works, including the regex I am using for socials.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($file in $files) {
Select-String -pattern '[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}' -AllMatches -InputObject $file | Select-Object matches, path |
Format-List | Out-File -FilePath C:\Users\user\Desktop\results.tx -Append -Encoding UTF8 -Width 512
}
Get-Date
Invoke-Item C:\Users\user\Desktop\results.txt
I am hoping to build this answer over time as I dont want to over comment. I dont know yet why you are losing data from the multithreading but i think we can increase performace with an updated regex. For starters you have many greedy quantifiers that i think we can shrink down.
[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}
Select-String is case insensitive by default so you dont need the portion in the beginning. Do you have to check for multiple colons? Since you looking for 0 or many :. Same goes for the hyphens. Perhaps these would be better with ? which matches 0 or 1.
ssn:?\s*\d{3}-?\d{2}-?\d{4}
This is assuming you are looking for mostly proper formatted SSN's. If people are hiding them in text maybe you need to look for other delimiters as well.
I would also suggest adding the text to separate files and maybe combining them after execution. If nothing else just to test.
Hoping this will be the start of a proper solution.
It turns out that for some reason the Select-String cmdlet was having problems with the multithreading. I don't have enough of a developer background to be able to tell what is happening under the hood. However I did discover that by using the -quiet option in Select-String, which turns it into a boolean output, I was able to get the results I wanted.
The first pattern match in each document gives a true value. When I get a true then I return the Path of the document to an array. When that is finished I run the pattern match against the paths that were output from the scriptblock. This is not quite as effective performance wise as I had hoped for but still a pretty dramatic improvement over singlethread.
The other issue I ran into was the read/writes to disk by trying to output results to a document at each stage. I have changed that to arrays. While still memory intensive, it is much quicker.
Here is the resulting code. Any additional tips on performance improvement are appreciated:
cls
Remove-Item C:\Users\user\Desktop\output.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$Match = Select-String -pattern 'ssn:?\s*\d{3}-?\d{2}-?\d{4}' -Quiet -InputObject $File
if ( $Match -eq $true ) {
$MatchObjects = Select-Object -InputObject $File
$MatchOut = New-Object PSObject -Property #{
Path = $MatchObjects.FullName
}
}
Return $MatchOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -Path I:\ -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
}
$PathValue = #()
ForEach ($Line in $Results) {
$PathValue += $Line.psobject.properties | % {$_.Value}
}
$UniqValues = $PathValue | sort | Get-Unique
$Output = ForEach ( $Path in $UniqValues ) {
Select-String -Pattern '\d{3}-?\d{2}-?\d{4}' -AllMatches -Path $Path | Select-Object -Property Matches, Path
}
$Output | Out-File -FilePath C:\Users\user\Desktop\output.txt -Append -Encoding UTF8 -Width 512
Invoke-Item C:\Users\user\Desktop\output.txt
I have access to a single-core single-processor VM with which to do logging for my team. I have the following code:
$sb = {
Param($_)
if($_.CONTROLLER -ne ".xx" ){
$posIP = "10." + $_.IP + $_.CONTROLLER
if (Test-Connection -ComputerName $posIP -Count 1 -Quiet) {
$mapPath = "\\" + $posIP + "\c$"
net use $mapPath $password /user:$userName | Out-Null
if(Test-Path $mapPath$dataFile) {
[xml]$periods = Get-Content $mapPath$dataFile
$endDate = $periods.IndataDbf.ingredient.PeriodDetail.PeriodEndDate | select -last 1
$output = "$($_.STORE);$endDate" }
else {
$outPut = $_.STORE + ';' + "$dataFile Not Found" }
net use $mapPath /de | Out-Null
}
else {
$outPut = $_.STORE + ';' + "Map FAILED" }
Write-Output $OutPut
}
}
Import-Csv $inFile | ForEach-Object {
while ((Get-Job -State Running).Count -ge 100) {
Start-Sleep -Seconds 5;
}
Write-Output $_.STORE
Start-Job -Scriptblock $sb -ArgumentList $_ | Write-Verbose
Get-Job -State Completed -HasMoreData 1 | Receive-Job | Out-File -Append -FilePath $outLog
}
Get-Job | Wait-Job | Receive-Job | Out-File -Append -FilePath $outLog
Which runs well, but takes the same amount of time as running the same code without Start-Job and just a loop. However, the previous logging command used BATCH files and automatically opened a couple dozen child command windows to process data, then return, and it runs in under half the time. The code used is the same, so I don't understand why adding more threads didn't make the script run faster. Can anyone tell me why a BATCH file program with a couple dozen child windows runs so much faster with arguably the same code? Any why does the Start-Job command not improve the speed at all? I would think it would try to execute multiple threads simultaneously.
Because there is a lot of overhead when using start-job and whenever you use pipeline.
If you use runspaces instead it maybe faster.Take a look at http://newsqlblog.com/2012/05/22/concurrency-in-powershell-multi-threading-with-runspaces/
I have a script that pulls one line of data from a file on multiple servers. I have a single-threaded version that works just fine, but I want to get it to run faster. Since I only need one line of one file from each server, I'm sure I could run this in parallel. I pulled code from multiple places to get a multi-threaded script running, but when I try to get all the results to print to one output file, nothing prints. I wonder if anyone can look at my code to tell me why this same script, without the Jobs, works fine, but after adding jobs, it doesn't.
$sb = {
Param($computer, $fileName, $outLog)
net use "\\$computer\c$" **** /user:****
if(test-path \\$computer\c$\sc\$fileName){
[xml]$periods = Get-Content \\$computer\c$\sc\$fileName
$endDate = $periods.PeriodDetail | select -last 1
$output = "$computer;$endDate"
}
Else {
$output = "$computer;$fileName Not Found"
}
#Synchronize file usage
$mutex = new-object System.Threading.Mutex $false,'SomeUniqueName'
$mutex.WaitOne() > $null
#Write data to log
Out-File -Append -InputObject $output -FilePath $outLog
#Release file hold
$mutex.ReleaseMutex()
net use "\\$computer\c$" /de
}
foreach($computer in $computerName){
while ((Get-Job -State Running).Count -ge 20) {
Start-Sleep -Seconds 5;
}
Start-Job -Scriptblock $sb -ArgumentList $computer,$fileName,$outLog
}
Get-Job | Wait-Job | Receive-Job
Thank you for all the assistance. Here is the resulting code that works pretty well:
$sb = {
Param($computer, $fileName, $outLog)
net use "\\$computer\c$" $password /user:$userName | Out-Null
if(test-path \\$computer\c$\sc\$fileName){
[xml]$periods = Get-Content \\$computer\c$\sc\$fileName
$endDate = $periods.IndataDbf.ingredient.PeriodDetail.PeriodEndDate | select -last 1
$output = "$computer;$endDate"
}
Else {
$output = "$computer;$fileName Not Found"
}
Write-Output -InputObject $output
net use "\\$computer\c$" /de | Out-Null
}
foreach($computer in $computerName){
while ((Get-Job -State Running).Count -ge 20) {
Start-Sleep -Seconds 5;
}
Start-Job -Scriptblock $sb -ArgumentList $computer,$fileName,$outLog
}
Get-Job | Wait-Job | Receive-Job | Out-File -Append -FilePath $outLog
I'm thinking of doing another Get-Job right before the Start-Job, getting only jobs that are complete with more data, but I haven't tested it yet.
I'm trying to use Start-Job to run a command to collect security logs from some servers.
I'm parsing a .ini file to get the list of servers, number of days etc.
#___Collect Logs from Servers___#
$servList = $iniContent["SERVERS"]["svr"]
$days = $iniContent["DAYS"]["days"]
$date = $(get-date -format ddMMyyyy)
$err = "Error Collecting $($logType) from $($server) or the Event Log is empty! | $(Get-Date -format g) "
$serv = $servList.Split(",")
foreach ($server in $serv){
$outfile = "D:\DCLogs\$($date)_$($server)_$logType.txt"
$ScriptBlock = cmd /c "D:\CollectLog\Dumpel.exe -f $($outFile) -l $($logType) -s $($server) -d $($days)"
Start-Job -ScriptBlock $ScriptBlock
Get-Job | Wait-Job
$file = Get-ChildItem D:\DCLogs -Filter "$($date)_$($server)*" -Name
$len = $file.length/1KB # Check LogFile Size
if ($len -eq 0){
$errCount = 1
write-output $err | Out-File $errLog -append
}
}
It's only starting one job at a time so I know I'm doing something wrong. If someone could please point out the problem I'd greatly appreciate it.
Thank you.
Amelia
Get-Job | Wait-Job in the loop, just serialize the jobs. You can use the loop to start the jobs and then use Get-Job | Wait-Job outside the loop.
Try to define your ScriptBlock using :
$ScriptBlock = {...}
Basic script idea:
Hello. I've created a powershell script which I use to check the filesizes of certain executables, and then keep them in a text file. Next time the script runs, if a filesize differs it will replace the one in the text file with the new one.
The structure:
I have a main script and a folder which contains many scripts, each for every executable of which I want to check the filesize. So the scripts in the folder will return a string containing the link to the executable, which will be fed to the main script.
The code:
$progdir = "C:\script\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\updatechecker\programms\filesizes
if ($filesizes.length -ne $items.length) {
if ($filesizes.length -eq $null) {
Write-Host ("Building filesize database...") -nonewline
}
else {
Write-Host ("Rebuilding filesize database...") -nonewline
}
clear-content C:\programms\filesizes
for ($i=0; $i -le $items.length-1; $i++) {
$command = "c:\programms\" + $items[$i].name
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
echo "Done."
}
else {
...
Question:
This for loop is the one I want to run in parallel. I need your advice on how to do this since I'm new to powershell. I tried to implement a few things I found but they didn't work correctly (took very long to finish, output errors, multiple entries of filesizes in my filesizes file). I suspect it's a synchronization issue and somehow I need to lock the critical parts. Isn't there anything like omp parallel for in powershell? :P
Any help,advice on how to achieve this would be appreciated :)
edit:
Get-Job | Remove-Job -Force
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
$jobWork = {
param ($MyInput)
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
foreach ($item in $items) {
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | out-null
}
Get-Job | Wait-Job
Get-Job | Receive-Job | Out-GridView | out-null
echo "Done."
Edit 2: Used code I found here: http://ryan.witschger.net/?p=22
$mutex = new-object -TypeName System.Threading.Mutex -ArgumentList $false, “RandomGlobalMutexName”;
$MaxThreads = 4
$SleepTimer = 500
$jobWork = {
param ($MyInput)
$webclient = New-Object System.Net.WebClient
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$result = $mutex.WaitOne();
$file = $webclient.ResponseHeaders["Content-Length"]
$file >> C:\programms\filesizes
$mutex.ReleaseMutex();
}
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
Get-Job | Remove-Job -Force
$i = 0
ForEach ($item in $items){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds $SleepTimer
}
$i++
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | Out-Null
}
You can run each iteration of the loop in a background job which is not the same a seperate thread in that it is a whole other PowerShell.exe process. Data is passed from the background processes through serialization.
To approach it using background jobs you'll need to define a script block that will do that actual work and then call the script block with parameters in each iteration of the loop. The script block can report back status via Write-Output or by throwing an exception.
You'll probably want to throttle how many concurrent background jobs are running. Here's an example of how to throttle:
$jobItems = "a", "b", "c", "d", "e"
$jobMax = 2
$jobs = #()
$jobWork = {
param ($MyInput)
if ($MyInput -eq "d") {
throw "an example of an error"
} else {
write-output "Processed $MyInput"
}
}
foreach ($jobItem in $jobItems) {
if ($jobs.Count -le $jobMax) {
$jobs += Start-Job -ScriptBlock $jobWork -ArgumentList $jobItem
} else {
$jobs | Wait-Job -Any
}
}
$jobs | Wait-Job
As an alternative you might try eventing. Take a look at this thread for some examples of how to implement concurrency using events.
PowerShell: Runspace problem with DownloadFileAsync
You might be able to replace DownloadFileAsync with OpenReadAsync