using threads to delete files with specific extensions ignoring files over a certain date - multithreading

In my profession I make forensic images from "foreign" PCs which I extract later on my local storage.
To clean up the data I'd hope to delete all files that aren't relevant for me. (not limited to: audio, movies, systemfiles,...)
Since we're speaking of multiple TB of data, I'd have hoped to use threads, especially since my storage is all flash and the limitation on the disk is somewhat less of a problem.
To speed the process up after an initial manual run, I would want the script to exclude files older then 1 day (since I have done that one already with a manual run).
what I have so far:
$IncludeFiles = "*.log", "*.sys", "*.avi", "*.mpg", "*.mkv", ".mp3", "*.mp4",
"*.mpeg", "*.mov", "*.dll", "*.mof", "*.mui", "*.zvv", "*.wma",
"*.wav", "*.MPA", "*.MID", "*.M4A", "*.AIF", "*.IFF", "*.M3U",
"*.3G2", "*.3GP", "*.ASF", "*.FLV", "*.M4V", "*.RM", "*.SWF",
"*.VOB"
$ScriptBlock = {
Param($mypath = "D:\")
Get-ChildItem -Path $mypath -Recurse -File -Include $file | Where-Object {
$_.CreationTime -gt (Get-Date).AddDays(-1)
}
foreach ($file in $IncludeFiles) {
Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file
}
Get-Job | Wait-Job
$out = Get-Job | Receive-Job
Write-Host $out
the only thing that doesn't work is the limitation that it only looks at files "younger" than 1 day. If I run the script without it, it seems to work perfectly. (as it gives me a list of files with the extensions I want to remove)

Parameter passing doesn't work the way you seem to expect. Param($mypath = "D:\") defines a parameter mypath with a default value of D:\. That default value is superseded by the value you pass into the scriptblock via -ArgumentList. Also the variable $file inside the scriptblock and the variable $file outside the scriptblock are not the same. Because of that an invocation
Start-Job -ScriptBlock $ScriptBlock -ArgumentList '*.log'
will run the command
Get-ChildItem -Path '*.log' -Recurse -File -Include $null | ...
Change your code to something like this to make it work:
$ScriptBlock = {
Param($extension)
$mypath = "D:\"
Get-ChildItem -Path $mypath -Recurse -File -Filter $extension | Where-Object {
$_.CreationTime -gt (Get-Date).AddDays(-1)
}
}
foreach ($file in $IncludeFiles) {
Start-Job -ScriptBlock $ScriptBlock -ArgumentList $file
}
Get-Job | Wait-Job | Receive-Job
Using -Filter should provide better performance than -Include, but accepts only a single string (not a list of strings like -Include), so you can only filter one extension at a time.

Related

How to modify the file creation time of multiple files in single script

I'm stumped on this. I am making my way through a file share migration to SharePoint. There have been errors stating the "The item created time or modified time is not supported". No worries as I found a script to edit this in PowerShell:
cd "Directory"
Get-ChildItem -force | Select-Object Mode, Name, CreationTime, LastAccessTime, LastWriteTime | ft
$modifyfiles = Get-ChildItem -force | Where-Object {! $\_.PSIsContainer}
foreach($object in $modifyfiles)
{
$object.CreationTime=("1/3/2023 12:00:00")
$object.LastAccessTime=("1/3/2023 12:01:00")
$object.LastWritetime=("1/3/2023 12:02:00")
}
My question is how do I run this so I don't have to cd to each new directory every time. I have quite a few files in different folders that all need editing. I have the list of paths I need changed and I was hoping there would be a way to "pass" those paths in or somehow run this script in a loop.
Assuming your list of folders looks like this and can be placed in a seperate text file:
C:\folderpath\folder1
C:\folderpath\folder2
C:\folderpath\folder3
Then you could just do something like this:
get-content -Path "C:\folderpath\FileContainingFolderPaths.txt" | ForEach-Object {
$folderpath = $_
Get-ChildItem -Path $folderpath -force | Select-Object Mode, Name, CreationTime, LastAccessTime, LastWriteTime | ft
$modifyfiles = Get-ChildItem -Path $folderpath -force | Where-Object {! $\_.PSIsContainer}
foreach($object in $modifyfiles)
{
$object.CreationTime=("1/3/2023 12:00:00")
$object.LastAccessTime=("1/3/2023 12:01:00")
$object.LastWritetime=("1/3/2023 12:02:00")
}
}
Also I'm gonna have to question you on the $\_.PSIsContainer, is that a mistake? I would think it should be $_.PSIsContainer instead?

Powershell | How can I use Multi Threading for my File Deleter Powershell script?

So I've written a Script to delete files in a specific folder after 5 days. I'm currently implementing this in a directory with hundreds of thousands of files and this is taking a lot of time.
This is currently my code:
#Variables
$path = "G:\AdeptiaSuite\AdeptiaSuite-6.9\AdeptiaServer\ServerKernel\web\repository\equalit\PFRepository"
$age = (Get-Date).AddDays(-5) # Defines the 'x days old' (today's date minus x days)
# Get all the files in the folder and subfolders | foreach file
Get-ChildItem $path -Recurse -File | foreach{
# if creationtime is 'le' (less or equal) than $age
if ($_.CreationTime -le $age){
Write-Output "Older than $age days - $($_.name)"
Remove-Item $_.fullname -Force -Verbose # remove the item
}
else{
Write-Output "Less than $age days old - $($_.name)"
}
}
I've searched around the internet for some time now to find out how to use
Runspaces, however I find it very confusing and I'm not sure how to implement it with this script. Could anyone please give me an example of how to use Runspaces for this code?
Thank you very much!
EDIT:
I've found this post: https://adamtheautomator.com/powershell-multithreading/
And ended up changing my script to this:
$Scriptblock = {
# Variables
$path = "G:\AdeptiaSuite\AdeptiaSuite-6.9\AdeptiaServer\ServerKernel\web\repository\equalit\PFRepository"
$age = (Get-Date).AddDays(-5) # Defines the 'x days old' (today's date minus x days)
# Get all the files in the folder and subfolders | foreach file
Get-ChildItem $path -Recurse -File | foreach{
# if creationtime is 'le' (less or equal) than $age
if ($_.CreationTime -le $age){
Write-Output "Older than $age days - $($_.name)"
Remove-Item $_.fullname -Force -Verbose # remove the item
}
else{
Write-Output "Less than $age days old - $($_.name)"
}
}
}
$MaxThreads = 5
$RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads)
$RunspacePool.Open()
$Jobs = #()
1..10 | Foreach-Object {
$PowerShell = [powershell]::Create()
$PowerShell.RunspacePool = $RunspacePool
$PowerShell.AddScript($ScriptBlock).AddArgument($_)
$Jobs += $PowerShell.BeginInvoke()
}
while ($Jobs.IsCompleted -contains $false) {
Start-Sleep 1
}
However I'm not sure if this works correctly now, I don't get any error's however the Terminal doesn't do anything, so I'm not sure wether it works or just doesn't do anything.
I'd love any feedback on this!
The easiest answer is: get PowerShell v7.2.5 (look in the assets for PowerShell-7.2.5-win-x64.zip), download and extract it. It's a no-install PowerShell 7 which has easy multithreading and lets you change foreach { to foreach -parallel {. The executable is pwsh.exe.
But, if it's severely overloading the server, running it several times will only make things worse, right? And I think the Get-ChildItem will be the slowest part, putting the most load on the server, and so doing the delete in parallel probably won't help.
I would first try changing the script to this shape:
$path = "G:\AdeptiaSuite\AdeptiaSuite-6.9\AdeptiaServer\ServerKernel\web\repository\equalit\PFRepository"
$age = (Get-Date).AddDays(-5)
$logOldFiles = [System.IO.StreamWriter]::new('c:\temp\log-oldfiles.txt')
$logNewFiles = [System.IO.StreamWriter]::new('c:\temp\log-newfiles.txt')
Get-ChildItem $path -Recurse -File | foreach {
if ($_.CreationTime -le $age){
$logOldFiles.WriteLine("Older than $age days - $($_.name)")
$_ # send file down pipeline to remove-item
}
else{
$logNewFiles.WriteLine("Less than $age days old - $($_.name)")
}
} | Remove-Item -Force
$logOldFiles.Close()
$logNewFiles.Close()
So it pipelines into remove-item and doesn't send hundreds of thousands of text lines to the console (also a slow thing to do).
If that doesn't help, I would switch to robocopy /L and maybe look at robocopy /L /MINAGE... to do the file listing, then process that to do the removal.
(I also removed the comments which just repeat the lines of code # removed comments which repeat what the code says.
The code tells you what the code says # read the code to see what the code does. Comments should tell you why the code does things, like who wrote the script and what business case was it solving, what is the PFRepository, why is there a 5 day cutoff, or whatever.)

Display Data on Two Columns Within Excel

I am trying to display data within an Excel document where Column A displays the server name and column B displays the .NET version. I'm running into an issue exporting to a .csv because it says that the file path does not exist. I would like some guidance on how I can resolve that issue and how I can display data on the two columns within Excel.
$Servers =
(
"test"
)
foreach ($Server in $Servers)
{
Invoke-Command -ComputerName $Server -ScriptBlock {
Write-Output "$(hostname)"
Get-ChildItem 'HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP' -Recurse | Get-ItemProperty -Name Version,Release -EA 0 | where { $_.PSChildName -match '^(?!S)\p{L}'} | select PSChildName, Version, Release | Select -ExpandProperty Version | Sort-Object Version | Export-Csv -Path C:\Users\User\Desktop\example.csv
}
The main issue is that you're using Export-Csv on the remote hosts since it is inside the Invoke-Command script block, and the likeable error is because the path you are using as export doesn't exist on those hosts.
It's also worth noting that Invoke-Command can run in parallel, -ComputerName as well as -Session can take an array, this removes the need for the foreach loop as well as it is much faster / efficient.
Invoke-Command -ComputerName $servers -ScriptBlock {
Write-Host "Working on $($env:COMPUTERNAME)..."
Get-ChildItem 'HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP' -Recurse |
Get-ItemProperty -Name Version, Release -EA 0 |
ForEach-Object {
if($_.PSChildName -notmatch '^(?!S)\p{L}') {
return # skip this
}
[pscustomobject]#{
HostName = $env:COMPUTERNAME
Version = $_.Version
}
} | Sort-Object Version
} -HideComputerName | Select-Object * -ExcludeProperty RunspaceID |
Export-Csv -Path C:\Users\User\Desktop\example.csv -NoTypeInformation

Decreased output with PowerShell multithreading than with singlethread script

I am using PowerShell 2.0 on a Windows 7 desktop. I am attempting to search the enterprise CIFS shares for keywords/regex. I already have a simple single threaded script that will do this but a single keyword takes 19-22 hours. I have created a multithreaded script, first effort at multithreading, based on the article by Surly Admin.
Can Powershell Run Commands in Parallel?
Powershell Throttle Multi thread jobs via job completion
and the links related to those posts.
I decided to use runspaces rather than background jobs as the prevailing wisdom says this is more efficient. Problem is, is I am only getting partial resultant output with the multithreaded script I have. Not sure if it is an I/O thing or a memory thing, or something else. Hopefully someone here can help. Here is the code.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$KeywordInfo = Select-String -pattern KEYWORD -AllMatches -InputObject $File
$KeywordOut = New-Object PSObject -Property #{
Matches = $KeywordInfo.Matches
Path = $KeywordInfo.Path
}
Return $KeywordOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
$Job.Pipe.EndInvoke($Job.Result) | Where {$_.Path} | Format-List | Out-File -FilePath C:\Users\user\Desktop\results.txt -Append -Encoding UTF8 -Width 512
}
Invoke-Item C:\Users\user\Desktop\results.txt
Get-Date
This is the single threaded version I am using that works, including the regex I am using for socials.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($file in $files) {
Select-String -pattern '[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}' -AllMatches -InputObject $file | Select-Object matches, path |
Format-List | Out-File -FilePath C:\Users\user\Desktop\results.tx -Append -Encoding UTF8 -Width 512
}
Get-Date
Invoke-Item C:\Users\user\Desktop\results.txt
I am hoping to build this answer over time as I dont want to over comment. I dont know yet why you are losing data from the multithreading but i think we can increase performace with an updated regex. For starters you have many greedy quantifiers that i think we can shrink down.
[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}
Select-String is case insensitive by default so you dont need the portion in the beginning. Do you have to check for multiple colons? Since you looking for 0 or many :. Same goes for the hyphens. Perhaps these would be better with ? which matches 0 or 1.
ssn:?\s*\d{3}-?\d{2}-?\d{4}
This is assuming you are looking for mostly proper formatted SSN's. If people are hiding them in text maybe you need to look for other delimiters as well.
I would also suggest adding the text to separate files and maybe combining them after execution. If nothing else just to test.
Hoping this will be the start of a proper solution.
It turns out that for some reason the Select-String cmdlet was having problems with the multithreading. I don't have enough of a developer background to be able to tell what is happening under the hood. However I did discover that by using the -quiet option in Select-String, which turns it into a boolean output, I was able to get the results I wanted.
The first pattern match in each document gives a true value. When I get a true then I return the Path of the document to an array. When that is finished I run the pattern match against the paths that were output from the scriptblock. This is not quite as effective performance wise as I had hoped for but still a pretty dramatic improvement over singlethread.
The other issue I ran into was the read/writes to disk by trying to output results to a document at each stage. I have changed that to arrays. While still memory intensive, it is much quicker.
Here is the resulting code. Any additional tips on performance improvement are appreciated:
cls
Remove-Item C:\Users\user\Desktop\output.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$Match = Select-String -pattern 'ssn:?\s*\d{3}-?\d{2}-?\d{4}' -Quiet -InputObject $File
if ( $Match -eq $true ) {
$MatchObjects = Select-Object -InputObject $File
$MatchOut = New-Object PSObject -Property #{
Path = $MatchObjects.FullName
}
}
Return $MatchOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -Path I:\ -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
}
$PathValue = #()
ForEach ($Line in $Results) {
$PathValue += $Line.psobject.properties | % {$_.Value}
}
$UniqValues = $PathValue | sort | Get-Unique
$Output = ForEach ( $Path in $UniqValues ) {
Select-String -Pattern '\d{3}-?\d{2}-?\d{4}' -AllMatches -Path $Path | Select-Object -Property Matches, Path
}
$Output | Out-File -FilePath C:\Users\user\Desktop\output.txt -Append -Encoding UTF8 -Width 512
Invoke-Item C:\Users\user\Desktop\output.txt

Powershell Multi-Threading Start-Job Help Please

I'm trying to use Start-Job to run a command to collect security logs from some servers.
I'm parsing a .ini file to get the list of servers, number of days etc.
#___Collect Logs from Servers___#
$servList = $iniContent["SERVERS"]["svr"]
$days = $iniContent["DAYS"]["days"]
$date = $(get-date -format ddMMyyyy)
$err = "Error Collecting $($logType) from $($server) or the Event Log is empty! | $(Get-Date -format g) "
$serv = $servList.Split(",")
foreach ($server in $serv){
$outfile = "D:\DCLogs\$($date)_$($server)_$logType.txt"
$ScriptBlock = cmd /c "D:\CollectLog\Dumpel.exe -f $($outFile) -l $($logType) -s $($server) -d $($days)"
Start-Job -ScriptBlock $ScriptBlock
Get-Job | Wait-Job
$file = Get-ChildItem D:\DCLogs -Filter "$($date)_$($server)*" -Name
$len = $file.length/1KB # Check LogFile Size
if ($len -eq 0){
$errCount = 1
write-output $err | Out-File $errLog -append
}
}
It's only starting one job at a time so I know I'm doing something wrong. If someone could please point out the problem I'd greatly appreciate it.
Thank you.
Amelia
Get-Job | Wait-Job in the loop, just serialize the jobs. You can use the loop to start the jobs and then use Get-Job | Wait-Job outside the loop.
Try to define your ScriptBlock using :
$ScriptBlock = {...}

Resources