PoshRSJob Looping through file directory - multithreading

I am trying to loop through a directory (sorting by smallest file), get the path, and the file name and then pump those results into a utility.exe program.
I am trying to do this multi threading with PoshRSJob, but I am not even seeing the utility program show up in task manager, I am getting an error "A null key is not allowed in a hash literal.", for every file that exists (if 50 files are in the directory, then I get 50 errors). I also cannot test if the throttling is working, because nothing is actually running.
Import-Module C:\PoshRSJob.psm1
Function MultiThread($SourcePath,$DestinationPath,$CommandArg, $MaxThreads){
if($CommandArg -eq "import") {
$fileExt = "txt"
}else{
$fileExt = "ini"
}
$ScriptBlock = {
Param($outfile, $cmdType, $fileExtension)
[pscustomobject] #{
#get the full path
$filepath = $_.fullname
#get file name (minus extension)
$filename = $_.basename
#build output directory
$destinationFile = "$($outfile)\$($filename).$($fileExtension)"
#command to run
$null = .\utility.exe $cmdType -source `"$filepath`" -target `"$destinationFile`"
}
}
#get the object of the passed source directory, and pipe it into start-rsjob
Get-ChildItem $SourcePath | Sort-Object length | Start-RSJob -ScriptBlock $ScriptBlock -ArgumentList $DestinationPath, $CommandArg, $fileExt -Throttle $MaxThreads
Wait-RSJob -ShowProgress | Receive-RSJob
Get-RSJob | Receive-RSJob
}
MultiThread "D:\input" "D:\output" "import" 3

Your scriptblock is creating an object where you are defining $null = .\utility.exe +++ as a property. As it says, value of $null (nothing) can't be a property name.. I would suggest just running the lines..
You might also want to change the Wait-RSJob-part. You don't specify a job, so it never waits for anything. Try:
Try changing the scriptblock to:
Import-Module C:\PoshRSJob.psm1
Function MultiThread($SourcePath,$DestinationPath,$CommandArg, $MaxThreads){
if($CommandArg -eq "import") {
$fileExt = "txt"
}else{
$fileExt = "ini"
}
$ScriptBlock = {
Param($outfile, $cmdType, $fileExtension)
#get the full path
$filepath = $_.fullname
#get file name (minus extension)
$filename = $_.basename
#build output directory
$destinationFile = "$($outfile)\$($filename).$($fileExtension)"
#command to run
$null = .\utility.exe $cmdType -source `"$filepath`" -target `"$destinationFile`"
}
#get the object of the passed source directory, and pipe it into start-rsjob
Get-ChildItem $SourcePath | Sort-Object length | Start-RSJob -ScriptBlock $ScriptBlock -ArgumentList $DestinationPath, $CommandArg, $fileExt -Throttle $MaxThreads
Get-RSJob | Wait-RSJob -ShowProgress | Receive-RSJob
}
MultiThread "D:\input" "D:\output" "import" 3

Related

How to use dash or hyphen to join a string array in Powershell

I wrote code like this:
$count = 0
$path = "C:\Videos\"
$oldvids = Get-ChildItem -Path $path -Include *.* -Recurse
foreach ($oldvid in $oldvids) {
$curpath = $oldvid.DirectoryName
$name = [System.IO.Path]::GetFileNameWithoutExtension($oldvid)
$names = $name.Split(" - ")
$names[0] = ""
$metadata_title = $names -join "-"
$ext = [System.IO.Path]::GetExtension($oldvid)
if ($name.StartsWith("new_") -eq $false)
{
$newvid = $curpath + "/new_" + $name + ".mp4"
if ([System.IO.File]::Exists($newvid) -eq $false)
{
$count++
Write-Output $metadata_title
}
}
}
But this code causes a file name like this:
Chapter 1 - New Video
to become:
Chapter 1---New Video
How can I make sure a single - is actually only one? Do I have to escape it?
The idea is to eliminate first part of the file names, so from:
01 - Chapter 1 - Video 1
to:
Chapter 1 - Video 1
So I wanted to split using " - " and then join everything back without the first element in the split array.
Looking at your example and your explanation of changing metadata with ffmpeg on each file, I guess this is what you need:
$count = 0
$path = 'C:\Videos'
# get a list of old video files (these do not start with 'new_')
$oldvids = Get-ChildItem -Path $path -Filter *.mp4 -File -Recurse |
Where-Object { $_.Name -notmatch '^new_' }
foreach ($oldvid in $oldvids) {
# if the file is called 'C:\Videos\01 - Chapter 1 - Video 1.mp4'
$tempName = $oldvid.Name -replace '^\d+\s*-\s*(.+)', 'new_$1' # --> new_Chapter 1 - Video 1.mp4
# or do
# $tempName = 'new_' + ($oldvid.Name -split '-', 2)[-1].Trim() # --> new_Chapter 1 - Video 1.mp4
# or
# $tempName = $oldvid.Name -replace '^\d+\s*-\s*', 'new_' # --> new_Chapter 1 - Video 1.mp4
# combine the current file path with the temporary name
$outputFile = Join-Path -Path $oldvid.DirectoryName -ChildPath $tempName
#######################################################################
# next do your ffmpeg command to change metadata
# for input you use $oldvid.FullName and for output you use $outputFile
Write-Host "Updated file $($oldvid.Name) as $tempName"
#######################################################################
# when done with ffmpeg, delete the original (or for safety move it to somewhere else)
Write-Host "Deleting file '$($oldvid.Name)'"
$oldvid | Remove-Item -WhatIf
# and rename the updated file by removing the 'new_' part from its name
$newName = ($tempName -replace '^new_').Trim()
Write-Host "Renaming updated file to '$newName'"
$tempName | Rename-Item -NewName $newName
# all done, proceed with the next file
$count++
}
Note: I have added switch -WhatIf to the Remove-Item line. This is a safety measure that will only display what file would be deleted without actually deleting it.
If you are sure the correct file should be deleted, then remove that -WhatIf switch so the original file gets destroyed after maipulating it with ffmpeg.
As per your comment, to send items to the Recycle bin instead of destroying them like Remove-Item does, here's two ways of achieving that:
Method 1: Use COM
function RemoveTo-RecycleBin {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[Alias('FullName')]
[string[]]$Path
)
begin {
$shell = New-Object -ComObject 'Shell.Application'
$Recycler = $Shell.NameSpace(0xa)
}
process {
foreach ($item in $Path) {
[void]$Recycler.MoveHere($item)
}
}
end {
# clean-up the used COM objects
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Recycler)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($shell)
$null = [System.GC]::Collect()
$null = [System.GC]::WaitForPendingFinalizers()
}
}
# usage example, remove all files from the D:\Test directory
Get-ChildItem -Path 'D:\Test' -Filter '*.*' -File | RemoveTo-RecycleBin
# usage example, remove all files and subdirectories from the D:\Test directory
Get-ChildItem -Path 'D:\Test' | RemoveTo-RecycleBin
Method 2: Use the Microsoft.VisualBasic assembly
function RemoveTo-RecycleBin {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[Alias('FullName')]
[string[]]$Path,
[switch]$ShowConfirmationDialog
)
begin {
Add-Type -AssemblyName Microsoft.VisualBasic
$showUI = if ($ShowConfirmationDialog) { 'AllDialogs' } else { 'OnlyErrorDialogs' }
}
process {
foreach ($item in $Path) {
Write-Host $item
# detect if this is a file or a directory
if ((Get-Item -Path $item) -is [System.IO.DirectoryInfo]) {
# first parameter: the absolute full path
# second parameter: one of Microsoft.VisualBasic.FileIO.UIOption values: OnlyErrorDialogs or AllDialogs
# third parameter: one of Microsoft.VisualBasic.FileIO.RecycleOption values: DeletePermanently or SendToRecycleBin
[Microsoft.VisualBasic.FileIO.FileSystem]::DeleteDirectory($item, $showUI, 'SendToRecycleBin')
}
else {
# first parameter: the absolute full path and file name
# second parameter: one of Microsoft.VisualBasic.FileIO.UIOption values: OnlyErrorDialogs or AllDialogs
# third parameter: one of Microsoft.VisualBasic.FileIO.RecycleOption values: DeletePermanently or SendToRecycleBin
[Microsoft.VisualBasic.FileIO.FileSystem]::DeleteFile($item,$showUI, 'SendToRecycleBin')
}
}
}
}
# usage example, remove all files from the D:\Test directory
Get-ChildItem -Path 'D:\Test' -Filter '*.*' -File | RemoveTo-RecycleBin
# usage example, remove all files and subdirectories from the D:\Test directory
Get-ChildItem -Path 'D:\Test' | RemoveTo-RecycleBin
Just choose any of the above functions, put it on top of your script and then change line
$oldvid | Remove-Item -WhatIf
into
$oldvid | RemoveTo-RecycleBin

I am parsing RoboCopy logs from Millions of files, How can I make my code run Faster?

New to StackOverflow, I'll do my best to post correctly :)
Hoping someone can help me to get my code running faster.
The code is run against RoboCopy Migration logs from a massive DFS server migration (20 DFS servers being migrated).
The code first captures the source/destination of the log in question and then looks for the 'Newer', 'Older', 'New File' and 'Extra File' entries/rows. It then checks to see if these files exist at each side, what attributes they have and does a DFSR hash check against both sides (as the files are now being replicated via DFSR).
The main concern is if the hashes match for source and destination and if the temporary attribute is in place.
The problem I am having is that there are millions of files logged under these types (the migration was gargantuan) so the script is taking forever to run. To add to this the client will not allow ports for psremoting/invoke-command.
At present I am running my code without multi-threading, with a copy on each of the DFS servers looking at their respective logs but it is still slow.
I have been looking at running a foreach parallel on looping through each log row (not the loop of log files) but:
With so much data within each log/loop my understanding is that I have to write it out rather than keep it in an PsCustomObject? Otherwise I would run out of RAM?
I don't really understand how to use MUTEXes to get multiple writes to the CSV.
Can someone please advise me on the above 2 points? And maybe give me some more ideas on what I can do to optimise things?
My full code is below..
#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
}
"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
#Get Logs from folder (Recursive)
$Logs = Try{
Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName
}
catch{
$_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
}
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0
#Count Logs
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
|foreach {
$NumberOfFiles=$NumberOfFiles+1
If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
$DesktopFile=$DesktopFile+1
}
}
}
$Expected = $NumberOfFiles - $DesktopFile
"Total Files To Check = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Collect Source and Destination
$S = $Log | Select-String -Pattern 'Source :'
$D = $Log | Select-String -Pattern 'Dest :'
$SourceLocation = $S -replace '\s+Source : ',''
$DestLocation = $D -replace '\s+Dest : ',''
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
|foreach {
#This loop could be a foreach -parallel???
#Check Percent Completed
If($ProcessedFiles>0){
$PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
If($PercentComplete -match ('([0-9]0)')){
"$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log"
($ProcessedFiles/$Expected)*100
}
}
#Count Logs Processed
$ProcessedFiles=$ProcessedFiles+1
#Populate FilePath
$FilePath = $_ -Replace '.*(?=\\\\)', ''
#Populate Error type
$RoboErrorRaw = $_ -replace '\s+','|'
$RoboError = $RoboErrorRaw.split("|")[1]
#Check if file path relates to Source or the Destination and set path variables
if($FilePath -like "$SourceLocation*"){
$SourceFilePath = $FilePath
$DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
}
Elseif($FilePath -like "$DestLocation*"){
$DestFilePath = $FilePath
$SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
$IsAtPartner = Test-Path $SourceFilePath
}
Else{
$DestFilepath = "Could Not Resolve UNC to Source or Destination"
}
#Check if file exists at source and destination
Try{
$IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
}
catch{
$IsAtPartner = $_.Exception
}
Try{
$IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
}
catch{
$IsAtSource = $_.Exception
}
If($IsAtSource){
#Get the file details
Try{
$SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
}
catch{
$SourceFileDetails = 'Failed'
}
if($SourceFileDetails -ne 'Failed'){
#Check has temp attribute
if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
$TempAttribute = "Yes"
}
Else{
$TempAttribute = "No"
}
#Get attributes and last modified
Try{
$AllAttributes = ($SourceFileDetails).Attributes
}
catch{
$AllAttributes = $_.Exception
}
Try{
$Modified = ($SourceFileDetails).LastWriteTime.ToString()
}
catch{
$Modified = $_.Exception
}
}
}
#Check if .bak file
if($filePath -match '\.bak$'){
$Bakfile = "Yes"
}
Else{
$Bakfile = "No"
}
#Get Hashes
If($IsAtPartner -and $IsAtSource){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
ElseIf(!$IsAtSource -and !$IsAtPartner){
$HashSource = 'File Does not Exist at Source'
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtPartner){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtSource){
$HashSource = 'File Does not Exist at Source'
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
Else{
$HashSource = 'ERROR'
$HashDest = 'ERROR'
}
#Compare Valid Hashes
If($HashSource -eq $HashDest){
$HashMatch = 'Yes'
}
Else{
$HashMatch = 'No'
}
#Check Filesize where hashes do not match
If($HashMatch = 'No'){
$FileSizeMB = ($SourceFileDetails).length/1MB
}
#Create output object
$Obj = [PSCustomObject]#{
ErrorType = $RoboError
FilePath = $SourceFilePath
PartnerUNC = $DestFilePath
IsAtSource = $IsAtSource
IsAtDestination = $IsAtPartner
BakFile = $Bakfile
TepmpAttribute = $TempAttribute
LastModified = $Modified
AllAttributes = $AllAttributes
HashSource = $HashSource.FileHash
HashDest = $HashDest.FileHash
HashMatch = $HashMatch
RoboSource = $SourceLocation
RoboDest = $DestLocation
FileSizeMB = $FileSizeMB
SourceLog = $SourceLog.FullName
}
$Source = $SourceLocation.split('\\')[2]
$Destination = $DestLocation.split('\\')[2]
if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)" | Out-Null
}
#export to csv
$obj | Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
$obj | Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
#Increment total size of data
If($HashMatch -eq "Yes"){
$Totalsize = $Totalsize + $SourceFileDetails.Length
}
clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
if($SourceFileDetails){
Remove-Variable -name SourceFileDetails
}
}
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : 24 April 2022 17:29:57
Source : \\Test01\
Dest : \\Test02\
Files : *.*
Exc Files : ~*.*
*.TMP
Exc Dirs : \\Test01\DfsrPrivate
Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0
------------------------------------------------------------------------------
Newer 30720 2021/07/20 14:49:36 \\Test01\Test2121.xls
Older 651776 2020/10/25 21:49:32 \\Test01\testppt.ppt
Older 94720 2019/06/10 11:46:03 \\Test01\Thumbs.db
*EXTRA File 1.7 m 2020/09/17 10:36:57 \\Test02\months.jpg
*EXTRA File 1.8 m 2020/09/17 10:36:57 \\Test02\happy.jpg
New File 6421 2020/10/26 10:32:43 \\Test01\26-10-20.pdf
New File 6321 2020/10/26 10:32:43 \\Test01\Testing20.pdf

Excel.Application: Microsoft Excel cannot access the file '[<filename>]' There are several possible reasons:

I have a PowerShell script that works, it helps me run multiple queries against multiple servers and save each output in different CSV and then merge them together into an Excel file.
$Servers = get-content -Path "Servers.txt"
$DatabaseName ="master"
#$credential = Get-Credential #Prompt for user credentials
$secpasswd = ConvertTo-SecureString "MyPassword" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential ("sa", $secpasswd)
$QueriesFolder = "Queries\"
$ResultFolder = "Results\"
ForEach($Server in $Servers)
{
$DateTime = (Get-Date).tostring("yyyy-MM-dd")
ForEach ($filename in get-childitem -path $QueriesFolder -filter "*.sql" | sort-object {if (($i = $_.BaseName -as [int])) {$i} else {$_}} )
{
$oresults = invoke-sqlcmd -ServerInstance $Server -Database $DatabaseName -Credential $credential -InputFile $filename.fullname
write-host "Executing $filename on $Server"
$BaseNameOnly = Get-Item $filename.fullname | Select-Object -ExpandProperty BaseName
$oresults | export-csv $ResultFolder$BaseNameOnly.csv -NoTypeInformation -Force
}
$All_CSVs = get-childitem -path $ResultFolder -filter "*.csv" | sort-object {if (($i = $_.BaseName -as [int])) {$i} else {$_}}
$Count_CSVs = $All_CSVs.Count
Write-Host "Detected the following CSV files: ($Count_CSVs)"
Write-Host " "$All_CSVs.Name"`n"
$ExcelApp = New-Object -ComObject Excel.Application
$ExcelApp.SheetsInNewWorkbook = $All_CSVs.Count
$output = "C:\Users\FrancescoM\Desktop\CSV\Results\" + $Server + " $DateTime.xlsx"
if (Test-Path $output)
{
Remove-Item $output
Write-Host Removing: $output because it exists already
}
$xlsx = $ExcelApp.Workbooks.Add()
for($i=1;$i -le $Count_CSVs;$i++)
{
$worksheet = $xlsx.Worksheets.Item($i)
$worksheet.Name = $All_CSVs[$i-1].Name
$file = (Import-Csv $All_CSVs[$i-1].FullName)
$file | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | Clip
$worksheet.Cells.Item(1).PasteSpecial()|out-null
}
$xlsx.SaveAs($output)
Write-Host Creating: $output
$ExcelApp.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsx) | Out-Null;
Write-Host "Closing all worksheet"
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($ExcelApp) | Out-Null;
Write-Host "Closing Excel"
[System.GC]::Collect();
[System.GC]::WaitForPendingFinalizers()
Remove-Item "$ResultFolder\*" -Include *.csv
Write-Host "Cleaning all *.csv"
Start-Sleep -Seconds 3
}
In order to make this script more portable I want all the paths mentioned into it to be stored into a variable and then concatenated.
But as soon as I change:
$output = "C:\Users\FrancescoM\Desktop\CSV\Results\" + $Server + " $DateTime.xlsx"
into:
$output = $ResultFolder + $Server + " $DateTime.xlsx"
things get nasty and I receive the error:
Microsoft Excel cannot access the file 'C:\Users\FrancescoM\Documents\Results\0DC80000'.
There are several possible reasons:
• The file name or path does not exist.
• The file is being used by another program.
• The workbook you are trying to save has the same name as a currently open workbook.
At C:\Users\FrancescoM\Desktop\CSV\QueryLauncher.ps1:50 char:2
+ $xlsx.SaveAs($output)
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException
I don't understand, I think I'm concatenating things right.
I also followed this StackOverflow post and restarted my computer after adding "C:\Windows\SysWOW64\config\systemprofile\desktop" but the problem isn't fixed.
How can a variable path mess things up with Excel?
Because you are not defining the full path in the $ResultFolder variable, it will be expanded using the current working directory.
Just look at the path you want it to be:
"C:\Users\FrancescoM\Desktop\CSV\Results\" + $Server + " $DateTime.xlsx"
and the resulting path using the partial $ResultFolder variable:
C:\Users\FrancescoM\Documents\Results\0DC80000
Since you want the output file in a folder on your desktop, set the $output to
$output = Join-Path $([Environment]::GetFolderPath("Desktop")) "CSV\Results\$Server $DateTime.xlsx"
EDIT
From your last comment I understand that you want the output to be in a subfolder called "Results" that resides inside the folder the script itself is in.
In that case do this:
# get the folder this script is running from
$ScriptFolder = if ($PSScriptRoot) { $PSScriptRoot } else { Split-Path $MyInvocation.MyCommand.Path }
# the following paths are relative to the path this script is in
$QueriesFolder = Join-Path -Path $ScriptFolder -ChildPath 'Queries'
$ResultFolder = Join-Path -Path $ScriptFolder -ChildPath 'Results'
# make sure the 'Results' folder exists; create if not
if (!(Test-Path -Path $ResultFolder -PathType Container)) {
New-Item -Path $ResultFolder -ItemType Directory | Out-Null
}
Then, when it becomes time to save the xlsx file, create the full path and filename using:
$output = Join-Path -Path $ResultFolder -ChildPath "$Server $DateTime.xlsx"
$xlsx.SaveAs($output)
P.S. I advice to use the Join-Path cmdlet to combine file paths or to make use of [System.IO.Path]::Combine() instead of joining paths together like you do with this line: $oresults | export-csv $ResultFolder$BaseNameOnly.csv. Using the latter can lead to unforeseen pathnames if ever you forget to postfix the first path part with a backslash.
P.S.2 Excel has its own default output path set in Tools->Options->General->Default File Location and has no idea of the relative path for the script. This is why you should save using a Full path and filename.

Decreased output with PowerShell multithreading than with singlethread script

I am using PowerShell 2.0 on a Windows 7 desktop. I am attempting to search the enterprise CIFS shares for keywords/regex. I already have a simple single threaded script that will do this but a single keyword takes 19-22 hours. I have created a multithreaded script, first effort at multithreading, based on the article by Surly Admin.
Can Powershell Run Commands in Parallel?
Powershell Throttle Multi thread jobs via job completion
and the links related to those posts.
I decided to use runspaces rather than background jobs as the prevailing wisdom says this is more efficient. Problem is, is I am only getting partial resultant output with the multithreaded script I have. Not sure if it is an I/O thing or a memory thing, or something else. Hopefully someone here can help. Here is the code.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$KeywordInfo = Select-String -pattern KEYWORD -AllMatches -InputObject $File
$KeywordOut = New-Object PSObject -Property #{
Matches = $KeywordInfo.Matches
Path = $KeywordInfo.Path
}
Return $KeywordOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
$Job.Pipe.EndInvoke($Job.Result) | Where {$_.Path} | Format-List | Out-File -FilePath C:\Users\user\Desktop\results.txt -Append -Encoding UTF8 -Width 512
}
Invoke-Item C:\Users\user\Desktop\results.txt
Get-Date
This is the single threaded version I am using that works, including the regex I am using for socials.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($file in $files) {
Select-String -pattern '[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}' -AllMatches -InputObject $file | Select-Object matches, path |
Format-List | Out-File -FilePath C:\Users\user\Desktop\results.tx -Append -Encoding UTF8 -Width 512
}
Get-Date
Invoke-Item C:\Users\user\Desktop\results.txt
I am hoping to build this answer over time as I dont want to over comment. I dont know yet why you are losing data from the multithreading but i think we can increase performace with an updated regex. For starters you have many greedy quantifiers that i think we can shrink down.
[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}
Select-String is case insensitive by default so you dont need the portion in the beginning. Do you have to check for multiple colons? Since you looking for 0 or many :. Same goes for the hyphens. Perhaps these would be better with ? which matches 0 or 1.
ssn:?\s*\d{3}-?\d{2}-?\d{4}
This is assuming you are looking for mostly proper formatted SSN's. If people are hiding them in text maybe you need to look for other delimiters as well.
I would also suggest adding the text to separate files and maybe combining them after execution. If nothing else just to test.
Hoping this will be the start of a proper solution.
It turns out that for some reason the Select-String cmdlet was having problems with the multithreading. I don't have enough of a developer background to be able to tell what is happening under the hood. However I did discover that by using the -quiet option in Select-String, which turns it into a boolean output, I was able to get the results I wanted.
The first pattern match in each document gives a true value. When I get a true then I return the Path of the document to an array. When that is finished I run the pattern match against the paths that were output from the scriptblock. This is not quite as effective performance wise as I had hoped for but still a pretty dramatic improvement over singlethread.
The other issue I ran into was the read/writes to disk by trying to output results to a document at each stage. I have changed that to arrays. While still memory intensive, it is much quicker.
Here is the resulting code. Any additional tips on performance improvement are appreciated:
cls
Remove-Item C:\Users\user\Desktop\output.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$Match = Select-String -pattern 'ssn:?\s*\d{3}-?\d{2}-?\d{4}' -Quiet -InputObject $File
if ( $Match -eq $true ) {
$MatchObjects = Select-Object -InputObject $File
$MatchOut = New-Object PSObject -Property #{
Path = $MatchObjects.FullName
}
}
Return $MatchOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -Path I:\ -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
}
$PathValue = #()
ForEach ($Line in $Results) {
$PathValue += $Line.psobject.properties | % {$_.Value}
}
$UniqValues = $PathValue | sort | Get-Unique
$Output = ForEach ( $Path in $UniqValues ) {
Select-String -Pattern '\d{3}-?\d{2}-?\d{4}' -AllMatches -Path $Path | Select-Object -Property Matches, Path
}
$Output | Out-File -FilePath C:\Users\user\Desktop\output.txt -Append -Encoding UTF8 -Width 512
Invoke-Item C:\Users\user\Desktop\output.txt

Powershell v2.0 Using multiple threads

Basic script idea:
Hello. I've created a powershell script which I use to check the filesizes of certain executables, and then keep them in a text file. Next time the script runs, if a filesize differs it will replace the one in the text file with the new one.
The structure:
I have a main script and a folder which contains many scripts, each for every executable of which I want to check the filesize. So the scripts in the folder will return a string containing the link to the executable, which will be fed to the main script.
The code:
$progdir = "C:\script\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\updatechecker\programms\filesizes
if ($filesizes.length -ne $items.length) {
if ($filesizes.length -eq $null) {
Write-Host ("Building filesize database...") -nonewline
}
else {
Write-Host ("Rebuilding filesize database...") -nonewline
}
clear-content C:\programms\filesizes
for ($i=0; $i -le $items.length-1; $i++) {
$command = "c:\programms\" + $items[$i].name
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
echo "Done."
}
else {
...
Question:
This for loop is the one I want to run in parallel. I need your advice on how to do this since I'm new to powershell. I tried to implement a few things I found but they didn't work correctly (took very long to finish, output errors, multiple entries of filesizes in my filesizes file). I suspect it's a synchronization issue and somehow I need to lock the critical parts. Isn't there anything like omp parallel for in powershell? :P
Any help,advice on how to achieve this would be appreciated :)
edit:
Get-Job | Remove-Job -Force
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
$jobWork = {
param ($MyInput)
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$filesize = $webclient.ResponseHeaders["Content-Length"]
$filesize >> C:\programms\filesizes
}
foreach ($item in $items) {
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | out-null
}
Get-Job | Wait-Job
Get-Job | Receive-Job | Out-GridView | out-null
echo "Done."
Edit 2: Used code I found here: http://ryan.witschger.net/?p=22
$mutex = new-object -TypeName System.Threading.Mutex -ArgumentList $false, “RandomGlobalMutexName”;
$MaxThreads = 4
$SleepTimer = 500
$jobWork = {
param ($MyInput)
$webclient = New-Object System.Net.WebClient
$command = "c:\programms\" + $MyInput
$link = & $command
$webclient.OpenRead($link) | Out-Null
$result = $mutex.WaitOne();
$file = $webclient.ResponseHeaders["Content-Length"]
$file >> C:\programms\filesizes
$mutex.ReleaseMutex();
}
$progdir = "C:\programms"
$items = Get-ChildItem -filter *.ps1 -Path $progdir
$webclient = New-Object System.Net.WebClient
$filesizes = get-content C:\programms\filesizes
Get-Job | Remove-Job -Force
$i = 0
ForEach ($item in $items){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds $SleepTimer
}
$i++
Start-Job -ScriptBlock $jobWork -ArgumentList $item.name | Out-Null
}
You can run each iteration of the loop in a background job which is not the same a seperate thread in that it is a whole other PowerShell.exe process. Data is passed from the background processes through serialization.
To approach it using background jobs you'll need to define a script block that will do that actual work and then call the script block with parameters in each iteration of the loop. The script block can report back status via Write-Output or by throwing an exception.
You'll probably want to throttle how many concurrent background jobs are running. Here's an example of how to throttle:
$jobItems = "a", "b", "c", "d", "e"
$jobMax = 2
$jobs = #()
$jobWork = {
param ($MyInput)
if ($MyInput -eq "d") {
throw "an example of an error"
} else {
write-output "Processed $MyInput"
}
}
foreach ($jobItem in $jobItems) {
if ($jobs.Count -le $jobMax) {
$jobs += Start-Job -ScriptBlock $jobWork -ArgumentList $jobItem
} else {
$jobs | Wait-Job -Any
}
}
$jobs | Wait-Job
As an alternative you might try eventing. Take a look at this thread for some examples of how to implement concurrency using events.
PowerShell: Runspace problem with DownloadFileAsync
You might be able to replace DownloadFileAsync with OpenReadAsync

Resources