Decreased output with PowerShell multithreading than with singlethread script - multithreading

I am using PowerShell 2.0 on a Windows 7 desktop. I am attempting to search the enterprise CIFS shares for keywords/regex. I already have a simple single threaded script that will do this but a single keyword takes 19-22 hours. I have created a multithreaded script, first effort at multithreading, based on the article by Surly Admin.
Can Powershell Run Commands in Parallel?
Powershell Throttle Multi thread jobs via job completion
and the links related to those posts.
I decided to use runspaces rather than background jobs as the prevailing wisdom says this is more efficient. Problem is, is I am only getting partial resultant output with the multithreaded script I have. Not sure if it is an I/O thing or a memory thing, or something else. Hopefully someone here can help. Here is the code.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$KeywordInfo = Select-String -pattern KEYWORD -AllMatches -InputObject $File
$KeywordOut = New-Object PSObject -Property #{
Matches = $KeywordInfo.Matches
Path = $KeywordInfo.Path
}
Return $KeywordOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
$Job.Pipe.EndInvoke($Job.Result) | Where {$_.Path} | Format-List | Out-File -FilePath C:\Users\user\Desktop\results.txt -Append -Encoding UTF8 -Width 512
}
Invoke-Item C:\Users\user\Desktop\results.txt
Get-Date
This is the single threaded version I am using that works, including the regex I am using for socials.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($file in $files) {
Select-String -pattern '[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}' -AllMatches -InputObject $file | Select-Object matches, path |
Format-List | Out-File -FilePath C:\Users\user\Desktop\results.tx -Append -Encoding UTF8 -Width 512
}
Get-Date
Invoke-Item C:\Users\user\Desktop\results.txt

I am hoping to build this answer over time as I dont want to over comment. I dont know yet why you are losing data from the multithreading but i think we can increase performace with an updated regex. For starters you have many greedy quantifiers that i think we can shrink down.
[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}
Select-String is case insensitive by default so you dont need the portion in the beginning. Do you have to check for multiple colons? Since you looking for 0 or many :. Same goes for the hyphens. Perhaps these would be better with ? which matches 0 or 1.
ssn:?\s*\d{3}-?\d{2}-?\d{4}
This is assuming you are looking for mostly proper formatted SSN's. If people are hiding them in text maybe you need to look for other delimiters as well.
I would also suggest adding the text to separate files and maybe combining them after execution. If nothing else just to test.
Hoping this will be the start of a proper solution.

It turns out that for some reason the Select-String cmdlet was having problems with the multithreading. I don't have enough of a developer background to be able to tell what is happening under the hood. However I did discover that by using the -quiet option in Select-String, which turns it into a boolean output, I was able to get the results I wanted.
The first pattern match in each document gives a true value. When I get a true then I return the Path of the document to an array. When that is finished I run the pattern match against the paths that were output from the scriptblock. This is not quite as effective performance wise as I had hoped for but still a pretty dramatic improvement over singlethread.
The other issue I ran into was the read/writes to disk by trying to output results to a document at each stage. I have changed that to arrays. While still memory intensive, it is much quicker.
Here is the resulting code. Any additional tips on performance improvement are appreciated:
cls
Remove-Item C:\Users\user\Desktop\output.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$Match = Select-String -pattern 'ssn:?\s*\d{3}-?\d{2}-?\d{4}' -Quiet -InputObject $File
if ( $Match -eq $true ) {
$MatchObjects = Select-Object -InputObject $File
$MatchOut = New-Object PSObject -Property #{
Path = $MatchObjects.FullName
}
}
Return $MatchOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -Path I:\ -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
}
$PathValue = #()
ForEach ($Line in $Results) {
$PathValue += $Line.psobject.properties | % {$_.Value}
}
$UniqValues = $PathValue | sort | Get-Unique
$Output = ForEach ( $Path in $UniqValues ) {
Select-String -Pattern '\d{3}-?\d{2}-?\d{4}' -AllMatches -Path $Path | Select-Object -Property Matches, Path
}
$Output | Out-File -FilePath C:\Users\user\Desktop\output.txt -Append -Encoding UTF8 -Width 512
Invoke-Item C:\Users\user\Desktop\output.txt

Related

Powershell export CSV looks weird

I have an issue with my CSV export to Excel with powershell. When I import it looks like pretty bad and I can't find any information that helps me to solve it.
Here I attach an image of the import and the code. I see other CSV imports and it looks normal with its categories spaced by rows in Excel, but I don't know how to do it.
Image of my workbook
$Computers = Get-ADComputer -Filter {OperatingSystem -like "*Server*"} -Properties OperatingSystem | Select-Object -ExpandProperty Name
Foreach($computer in $computers){
if(!(Test-Connection -Cn $computer -BufferSize 16 -Count 1 -ea 0 -quiet))
{write-host "cannot reach $computer offline" -f red}
else {
$outtbl = #()
Try{
$sr=Get-WmiObject win32_bios -ComputerName $Computer -ErrorAction Stop
$Xr=Get-WmiObject –class Win32_processor -ComputerName $computer -ErrorAction Stop
$ld=get-adcomputer $computer -properties Name,Lastlogondate,operatingsystem,ipv4Address,enabled,description,DistinguishedName -ErrorAction Stop
$r="{0} GB" -f ((Get-WmiObject Win32_PhysicalMemory -ComputerName $computer |Measure-Object Capacity -Sum).Sum / 1GB)
$x = gwmi win32_computersystem -ComputerName $computer |select #{Name = "Type";Expression = {if (($_.pcsystemtype -eq '2') )
{'Laptop'} Else {'Desktop Or Other something else'}}},Manufacturer,#{Name = "Model";Expression = {if (($_.model -eq "$null") ) {'Virtual'} Else {$_.model}}},username -ErrorAction Stop
$t= New-Object PSObject -Property #{
serialnumber = $sr.serialnumber
computername = $ld.name
Ipaddress=$ld.ipv4Address
Enabled=$ld.Enabled
Description=$ld.description
Ou=$ld.DistinguishedName.split(',')[1].split('=')[1]
Type = $x.type
Manufacturer=$x.Manufacturer
Model=$x.Model
Ram=$R
ProcessorName=($xr.name | Out-String).Trim()
NumberOfCores=($xr.NumberOfCores | Out-String).Trim()
NumberOfLogicalProcessors=($xr.NumberOfLogicalProcessors | Out-String).Trim()
Addresswidth=($xr.Addresswidth | Out-String).Trim()
Operatingsystem=$ld.operatingsystem
Lastlogondate=$ld.lastlogondate
LoggedinUser=$x.username
}
$outtbl += $t
}
catch [Exception]
{
"Error communicating with $computer, skipping to next"
}
$outtbl | select Computername,enabled,description,ipAddress,Ou,Type,Serialnumber,Manufacturer,Model,Ram,ProcessorName,NumberOfCores,NumberOfLogicalProcessors,Addresswidth,Operatingsystem,loggedinuser,Lastlogondate |export-csv -Append C:\temp\VerynewAdinventory.csv -nti
}
}
As commented, your locale computer uses a different delimiter character that Export-Csv by default uses (that is the comma).
You can check what character your computer (and thus your Excel) uses like this:
[cultureinfo]::CurrentCulture.TextInfo.ListSeparator
To use Export-Csv in a way that you can simply double-click the output csv file to open in Excel, you need to either append switch -UseCulture to it, OR tell it what the delimiter should be if not a comma by appending parameter -Delimiter followed by the character you got from the above code line.
That said, your code does not produce the full table, because the export to the csv file is in the wrong place. As Palle Due commented, you could have seen that if you would indent your code properly.
Also, I would advise to use more self-describing variable names, so not $r or $x, but $memory and $machine for instance.
Nowadays, you should use Get-CimInstance rather than Get-WmiObject
AND adding to an array with += should be avoided as it is both time and memory consuming. (on every addition to an array, which is of fixed size, the entire array has to be rebuilt in memory).
Your code revised:
# set the $ErrorActionPreference to Stop, so you don't have to add -ErrorAction Stop everywhere in the script
# remember the currens value, so you can restore that afterwards.
$oldErrorPref = $ErrorActionPreference
$ErrorActionPreference = 'Stop'
# get an array of computers, gathering all properties you need
$computers = Get-ADComputer -Filter "OperatingSystem -like '*Server*'" -Properties OperatingSystem, LastLogonDate, IPv4Address, Description
$result = foreach ($computer in $computers) {
$serverName = $computer.Name
if(!(Test-Connection -ComputerName $serverName -BufferSize 16 -Count 1 -ErrorAction SilentlyContinue -Quiet)) {
Write-Host "cannot reach $serverName offline" -ForegroundColor Red
continue # skip this computer and proceed with the next one
}
try {
# instead of Get-WmiObject, nowadays you should use Get-CimInstance
$bios = Get-WmiObject -Class Win32_bios -ComputerName $serverName
$processor = Get-WmiObject -Class Win32_Processor -ComputerName $serverName
$memory = Get-WmiObject -Class Win32_PhysicalMemory -ComputerName $serverName
$disks = Get-WmiObject -Class Win32_LogicalDisk -ComputerName $serverName
$machine = Get-WmiObject -Class Win32_ComputerSystem -ComputerName $serverName |
Select-Object #{Name = "Type"; Expression = {
if ($_.pcsystemtype -eq '2') {'Laptop'} else {'Desktop Or Other something else'}}
},
Manufacturer,
#{Name = "Model"; Expression = {
if (!$_.model) {'Virtual'} else {$_.model}}
},
UserName
# output an object to be collected in variable $result
# put the properties in the order you would like in the output
[PsCustomObject] #{
ComputerName = $serverName
Enabled = $computer.Enabled
Description = $computer.description
IpAddress = $computer.IPv4Address
Ou = $computer.DistinguishedName.split(',')[1].split('=')[1]
Type = $machine.type
SerialNumber = $bios.serialnumber
Manufacturer = $machine.Manufacturer
Model = $machine.Model
Ram = '{0} GB' -f (($memory | Measure-Object Capacity -Sum).Sum / 1GB)
ProcessorName = $processor.Name
NumberOfCores = $processor.NumberOfCores
NumberOfLogicalProcessors = $processor.NumberOfLogicalProcessors
Addresswidth = $processor.Addresswidth
OperatingSystem = $computer.OperatingSystem
# {0:N2} returns the number formatted with two decimals
TotalFreeDiskSpace = '{0:N2} GB' -f (($disks | Measure-Object FreeSpace -Sum).Sum / 1GB)
LoggedInUser = $machine.UserName
Lastlogondate = $computer.LastLogonDate
}
}
catch {
Write-Warning "Error communicating with computer $serverName, skipping to next"
}
}
# restore the ErrorActionPreference to its former value
$ErrorActionPreference = $oldErrorPref
# output the completed array in a CSV file
# (using the delimiter characer your local machine has set as ListSeparator)
$result | Export-Csv -Path 'C:\temp\VerynewAdinventory.csv' -UseCulture -NoTypeInformation

using threads to delete files with specific extensions ignoring files over a certain date

In my profession I make forensic images from "foreign" PCs which I extract later on my local storage.
To clean up the data I'd hope to delete all files that aren't relevant for me. (not limited to: audio, movies, systemfiles,...)
Since we're speaking of multiple TB of data, I'd have hoped to use threads, especially since my storage is all flash and the limitation on the disk is somewhat less of a problem.
To speed the process up after an initial manual run, I would want the script to exclude files older then 1 day (since I have done that one already with a manual run).
what I have so far:
$IncludeFiles = "*.log", "*.sys", "*.avi", "*.mpg", "*.mkv", ".mp3", "*.mp4",
"*.mpeg", "*.mov", "*.dll", "*.mof", "*.mui", "*.zvv", "*.wma",
"*.wav", "*.MPA", "*.MID", "*.M4A", "*.AIF", "*.IFF", "*.M3U",
"*.3G2", "*.3GP", "*.ASF", "*.FLV", "*.M4V", "*.RM", "*.SWF",
"*.VOB"
$ScriptBlock = {
Param($mypath = "D:\")
Get-ChildItem -Path $mypath -Recurse -File -Include $file | Where-Object {
$_.CreationTime -gt (Get-Date).AddDays(-1)
}
foreach ($file in $IncludeFiles) {
Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file
}
Get-Job | Wait-Job
$out = Get-Job | Receive-Job
Write-Host $out
the only thing that doesn't work is the limitation that it only looks at files "younger" than 1 day. If I run the script without it, it seems to work perfectly. (as it gives me a list of files with the extensions I want to remove)
Parameter passing doesn't work the way you seem to expect. Param($mypath = "D:\") defines a parameter mypath with a default value of D:\. That default value is superseded by the value you pass into the scriptblock via -ArgumentList. Also the variable $file inside the scriptblock and the variable $file outside the scriptblock are not the same. Because of that an invocation
Start-Job -ScriptBlock $ScriptBlock -ArgumentList '*.log'
will run the command
Get-ChildItem -Path '*.log' -Recurse -File -Include $null | ...
Change your code to something like this to make it work:
$ScriptBlock = {
Param($extension)
$mypath = "D:\"
Get-ChildItem -Path $mypath -Recurse -File -Filter $extension | Where-Object {
$_.CreationTime -gt (Get-Date).AddDays(-1)
}
}
foreach ($file in $IncludeFiles) {
Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file
}
Get-Job | Wait-Job | Receive-Job
Using -Filter should provide better performance than -Include, but accepts only a single string (not a list of strings like -Include), so you can only filter one extension at a time.

Powershell Script to loop through folders checking security permissions

I have part of the code: at the moment its coming empty in the CSV file. But i need a command to specify the path/folders to look at, how do i modify this for that purpose.
Param(
[String]$path,
[String]$outfile = ".\outfile.csv"
)
$output = #()
ForEach ($item in (Get-ChildItem -Path $path -Recurse -Directory)) {
ForEach ($acl in ($item.GetAccessControl().Access)){
$output += $acl |
Add-Member `
-MemberType NoteProperty `
-Name 'Folder' `
-Value $item.FullName `
-PassThru
}
}
$output | Export-Csv -Path $outfile -NoTypeInformation
Ok, let's do this. I've made it into a function, and removed the OutFile part of it. If you want to output it to a file, pipe it to Export-CSV. If you want it saved as a variable, assign it to a variable. Just simpler this way.
Function Get-RecursiveACLs{
Param(
[String]$Path=$(Throw "You must specify a path")
)
$Output = GCI $Path -Recurse -Directory|%{
$PathName=$_.FullName
$_.GetAccessControl().Access|%{
Add-Member -InputObject $_ -NotePropertyName "Path" -NotePropertyValue $PathName -PassThru
}
}
}
Then it's a simple matter of storing it in a variable like:
$ACLList = Get-RecursiveACLs "C:\Example\Path"
Or piping it to output to a CSV if you would prefer:
Get-RecursiveACLs "C:\Example\Path" | Export-CSV "C:\Results.csv" -NoType
Put the function at the top of your script and call it as needed.

PowerShell Write to Same File Multiple Jobs

I have a script that pulls one line of data from a file on multiple servers. I have a single-threaded version that works just fine, but I want to get it to run faster. Since I only need one line of one file from each server, I'm sure I could run this in parallel. I pulled code from multiple places to get a multi-threaded script running, but when I try to get all the results to print to one output file, nothing prints. I wonder if anyone can look at my code to tell me why this same script, without the Jobs, works fine, but after adding jobs, it doesn't.
$sb = {
Param($computer, $fileName, $outLog)
net use "\\$computer\c$" **** /user:****
if(test-path \\$computer\c$\sc\$fileName){
[xml]$periods = Get-Content \\$computer\c$\sc\$fileName
$endDate = $periods.PeriodDetail | select -last 1
$output = "$computer;$endDate"
}
Else {
$output = "$computer;$fileName Not Found"
}
#Synchronize file usage
$mutex = new-object System.Threading.Mutex $false,'SomeUniqueName'
$mutex.WaitOne() > $null
#Write data to log
Out-File -Append -InputObject $output -FilePath $outLog
#Release file hold
$mutex.ReleaseMutex()
net use "\\$computer\c$" /de
}
foreach($computer in $computerName){
while ((Get-Job -State Running).Count -ge 20) {
Start-Sleep -Seconds 5;
}
Start-Job -Scriptblock $sb -ArgumentList $computer,$fileName,$outLog
}
Get-Job | Wait-Job | Receive-Job
Thank you for all the assistance. Here is the resulting code that works pretty well:
$sb = {
Param($computer, $fileName, $outLog)
net use "\\$computer\c$" $password /user:$userName | Out-Null
if(test-path \\$computer\c$\sc\$fileName){
[xml]$periods = Get-Content \\$computer\c$\sc\$fileName
$endDate = $periods.IndataDbf.ingredient.PeriodDetail.PeriodEndDate | select -last 1
$output = "$computer;$endDate"
}
Else {
$output = "$computer;$fileName Not Found"
}
Write-Output -InputObject $output
net use "\\$computer\c$" /de | Out-Null
}
foreach($computer in $computerName){
while ((Get-Job -State Running).Count -ge 20) {
Start-Sleep -Seconds 5;
}
Start-Job -Scriptblock $sb -ArgumentList $computer,$fileName,$outLog
}
Get-Job | Wait-Job | Receive-Job | Out-File -Append -FilePath $outLog
I'm thinking of doing another Get-Job right before the Start-Job, getting only jobs that are complete with more data, but I haven't tested it yet.

Powershell v2.0 Using multiple threads

Basic script idea:
Hello. I've created a powershell script which I use to check the filesizes of certain executables, and then keep them in a text file. Next time the script runs, if a filesize differs it will replace the one in the text file with the new one.
The structure:
I have a main script and a folder which contains many scripts, each for every executable of which I want to check the filesize. So the scripts in the folder will return a string containing the link to the executable, which will be fed to the main script.
The code:
$progdir = "C:\script\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\updatechecker\programms\filesizes
if ($filesizes.length -ne $items.length) {
if ($filesizes.length -eq $null) {
Write-Host ("Building filesize database...") -nonewline
}
else {
Write-Host ("Rebuilding filesize database...") -nonewline
}
clear-content C:\programms\filesizes
for ($i=0; $i -le $items.length-1; $i++) {
$command = "c:\programms\" + $items[$i].name
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
echo "Done."
}
else {
...
Question:
This for loop is the one I want to run in parallel. I need your advice on how to do this since I'm new to powershell. I tried to implement a few things I found but they didn't work correctly (took very long to finish, output errors, multiple entries of filesizes in my filesizes file). I suspect it's a synchronization issue and somehow I need to lock the critical parts. Isn't there anything like omp parallel for in powershell? :P
Any help,advice on how to achieve this would be appreciated :)
edit:
Get-Job | Remove-Job -Force
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
$jobWork = {
param ($MyInput)
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
foreach ($item in $items) {
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | out-null
}
Get-Job | Wait-Job
Get-Job | Receive-Job | Out-GridView | out-null
echo "Done."
Edit 2: Used code I found here: http://ryan.witschger.net/?p=22
$mutex = new-object -TypeName System.Threading.Mutex -ArgumentList $false, “RandomGlobalMutexName”;
$MaxThreads = 4
$SleepTimer = 500
$jobWork = {
param ($MyInput)
$webclient = New-Object System.Net.WebClient
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$result = $mutex.WaitOne();
$file = $webclient.ResponseHeaders["Content-Length"]
$file >> C:\programms\filesizes
$mutex.ReleaseMutex();
}
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
Get-Job | Remove-Job -Force
$i = 0
ForEach ($item in $items){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds $SleepTimer
}
$i++
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | Out-Null
}
You can run each iteration of the loop in a background job which is not the same a seperate thread in that it is a whole other PowerShell.exe process. Data is passed from the background processes through serialization.
To approach it using background jobs you'll need to define a script block that will do that actual work and then call the script block with parameters in each iteration of the loop. The script block can report back status via Write-Output or by throwing an exception.
You'll probably want to throttle how many concurrent background jobs are running. Here's an example of how to throttle:
$jobItems = "a", "b", "c", "d", "e"
$jobMax = 2
$jobs = #()
$jobWork = {
param ($MyInput)
if ($MyInput -eq "d") {
throw "an example of an error"
} else {
write-output "Processed $MyInput"
}
}
foreach ($jobItem in $jobItems) {
if ($jobs.Count -le $jobMax) {
$jobs += Start-Job -ScriptBlock $jobWork -ArgumentList $jobItem
} else {
$jobs | Wait-Job -Any
}
}
$jobs | Wait-Job
As an alternative you might try eventing. Take a look at this thread for some examples of how to implement concurrency using events.
PowerShell: Runspace problem with DownloadFileAsync
You might be able to replace DownloadFileAsync with OpenReadAsync

Resources