my script is now nearly finished. I use it to find big folders / subfolders etc..
i did not understand the format parameter -f yet, even though i checked the examples 3 times or even more :).
right now my script orders the sizes by strings. so 15 mb is smaller than 2 mb.
i have around 300 folders to check and it's better to convert that string to a number.
Thank you in advance!
here is my the part of my script which does this :
function Folders-Size($folders)
{
$directories = #()
foreach ($i in $folders)
{
$childItems = (Get-ChildItem $i.FullName -recurse | Measure-Object -property length -sum)
$size = "{0:N2}" -f ($childItems.sum / 1MB)
$name = $i.FullName
$data = New-Object PSObject -Property #{ Name=$name; Size=$size}
$directories += $data
}
$directories = $directories | Sort-Object Size -descending
$directories
}
try:
$directories | Sort-Object { [decimal]::parse($_.Size) } -descending
Related
I need to sort the words in a text file and output them to a file
Function AnalyseTo-Doc{
param ([Parameter(Mandatory=$true)][string]$Pad )
$Lines = Select-String -Path $Pad -Pattern '\b[A-Za-zA-Яа-я]{2,}\b' -AllMatches
$Words = ForEach($Line in $Lines){
ForEach($Match in $Line.Matches){
[PSCustomObject]#{
LineNumber = $Line.LineNumber
Word = $Match.Value
}
}
}
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
Count= $_.Count
Word = $_.Name
Longest= $_.Lenght
}
}
| Sort-Object -Property Count | Select-Object -Last 10
}
AnalyseTo-Doc 1.txt
#Get-Content 1.txt | Sort-Bubble -Verbose | Write-Host Sorted Array: | Select-Object -Last 10 | Out-File .\dz11-11.txt
it's don't work
Sort by the Longest property (consider renaming it to Length), which is intended to contain the word length, but must be redefined to $_.Group[0].Word.Length:Tip of the hat to Daniel.
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
Count= $_.Count
Word = $_.Name
Longest = $_.Group[0].Word.Length
}
} |
Sort-Object -Descending -Property Longest |
Select-Object -First 10
Note that, for conceptual clarity, I've used -Descending to sort by longest words first, which then requires -First 10 instead of -Last 10 to get the top 10.
As for what you tried:
Sorting by the Count property sorts by frequency of occurrence instead, i.e. by how often each word appears in the input file, due to use of Group-Object.
Longest= $_.Length (note that your code had a typo there) accesses the length property of each group object, which is an instance of Microsoft.PowerShell.Commands.GroupInfo, not that of the word being grouped by.
(Since such a GroupInfo instance has no type-native .Length property, but PowerShell automatically provides such a property as an intrinsic member, in the interest of unified handling of collections and scalars. Since a group object itself is considered a scalar (single object), .Length returns 1. PowerShell also provides .Count with the same value - unless present type-natively, which is indeed the case here: a GroupInfo object's type-native .Count property returns the count of elements in the group).
The [pscustomobject] instances wrapping the word at hand are stored in the .Group property, and since they're all the same here, .Group[0].Word.Length can be used to return the length of the word at hand.
Function AnalyseTo-Doc{
param ([Parameter(Mandatory=$true)][string]$Pad )
$Lines = Select-String -Path $Pad -Pattern '\b[A-Za-zA-Яа-я]{2,}\b' -AllMatches
$Words = ForEach($Line in $Lines){
ForEach($Match in $Line.Matches){
[PSCustomObject]#{
LineNumber = $Line.LineNumber
Word = $Match.Value
}
}
}
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
#Count= $_.Count
Word = $_.Name
Longest = $_.Group[0].Word.Length
}
} |
Sort-Object -Descending -Property Longest | Select-Object -First 10 | Out-File .\dz11-11.txt
}
AnalyseTo-Doc 1.txt
I'm looking for a translation of my Excel formula in a form of a script in Powershell, vbscript or Excel VBA. I'm trying to get the list of column headers and the max length of string under it.
Normally, what I do is manually open the .txt file in Excel, from there I can get the header names.. next, I create an array formula =MAX(LEN(A1:A100,000)) for example. This will get the max length of string in the column. I'll do the same formula to other columns.
Right now I can't do this since files have increased to 1GB in size and i can't open them anymore, my desktop crashes. It is also maybe because theyre more than 1 million rows which Excel cant handle. My friend suggested Powershell but I have limited knowledge there.. don't know if it can be done in vbscript or Excel VBA.
Thanks in advance for your help.
Below code works for .csv files but does not with .txt delimited files -
$fileName = "C:\Desktop\EFile.csv"
<#
Sample format of c:\temp\data.csv
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
#>
$colCount = (Import-Csv $fileName | Get-Member | Where-Object {$_.MemberType -eq 'NoteProperty'} | Measure-Object).Count
$csv = Import-Csv $fileName
$csvHeaders = ($csv | Get-Member -MemberType NoteProperty).name
$dict = #{}
foreach($header in $csvHeaders) {
$dict.Add($header,0)
}
foreach($row in $csv)
{
foreach($header in $csvHeaders)
{
if($dict[$header] -le ($row.$header).Length)
{
$dict[$header] =($row.$header).Length
}
}
}
$dict.Keys | % { "key = $_ , Column Length = " + $dict.Item($_) }
This is how I get my data.
$data = #"
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
"#
$csv = ConvertFrom-Csv -Delimiter ',' $data
But you should get your data like this
$fileName = "C:\Desktop\EFile.csv"
$csv = Import-Csv -Path $fileName
And then
# Extract the header names
$headers = $csv | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
# Capture output in $result variable
$result = foreach($header in $headers) {
# Select all items in $header column, find the longest, and select the item for output
$maximum = $csv | Select-Object -ExpandProperty $header | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
# Generate new object holding the information.
# This will end up in $results
[pscustomobject]#{
Header = $header
Max = $maximum.Length
String = $maximum
}
}
# Simple output
$result | Format-Table
This is what I get:
Header Max String
------ --- ------
address 10 test134343
grade 7 Grade-9
id 1 3
name 4 John
Alternatively, if you have memory issues dealing with large files, you may have to get a bit more dirty with the .NET framework. This snippet processes one csv line at a time, instead of reading the entire file into memory.
$fileName = "$env:TEMP\test.csv"
$delimiter = ','
# Open a StreamReader
$reader = [System.IO.File]::OpenText($fileName)
# Read the headers and turn it into an array, and trim away any quotes
$headers = $reader.ReadLine() -split $delimiter | % { $_.Trim('"''') }
# Prepare a hashtable for the results
$result = #{}
# So long as there's more data, keep running
while(-not $reader.EndOfStream) {
# Read a single line and process it as csv
$csv = $reader.ReadLine() | ConvertFrom-Csv -Header $headers -Delimiter $delimiter
# Determine if the item in the result hashtable is smaller than the current, using the header as a key
foreach($header in $headers) {
$item = $csv | Select-Object -ExpandProperty $header
if($result[$header].Maximum -lt $item.Length) {
$result[$header] = [pscustomobject]#{
Header = $header
Maximum = $item.Length
String = $item
}
}
}
}
# Clean up our spent resource
$reader.Close()
# Simple output
$result.Values | Format-Table
I previously had asked a question regarding adding together files and folders with a common name and having them summed up with a total size (Sum of file folder size based on file/folder name). This was successfully answered with the PS script below:
$root = 'C:\DBFolder'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = (Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum ) / 1GB
}
}
This now leaves me with a list that is ordered by the 'Database' Property by default. I have attempted to use a Sort-Object suffix to use the 'Size' property with no joy. I have also attempted to use Export-Csv with confounding results.
Ideally, if I could pass the results of this script to Excel/CSV so I can rinse/repeat across multiple SQL Servers and collate the data and sort within Excel, I would be laughing all the way to the small dark corner of the office where I can sleep.
Just for clarity, the output is looking along the lines of this:
Database Size
-------- ----
DBName1 2.5876876
DBName2 4.7657657
DBName3 3.5676578
Ok, it was one pipe character that I had missed when using the Export-csv function. This resolved my problem.
$root = 'C:\DB\Databases'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = (Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum ) / 1GB
}
} | Export-Csv 'C:\Test\test.csv'
I am using PowerShell 2.0 on a Windows 7 desktop. I am attempting to search the enterprise CIFS shares for keywords/regex. I already have a simple single threaded script that will do this but a single keyword takes 19-22 hours. I have created a multithreaded script, first effort at multithreading, based on the article by Surly Admin.
Can Powershell Run Commands in Parallel?
Powershell Throttle Multi thread jobs via job completion
and the links related to those posts.
I decided to use runspaces rather than background jobs as the prevailing wisdom says this is more efficient. Problem is, is I am only getting partial resultant output with the multithreaded script I have. Not sure if it is an I/O thing or a memory thing, or something else. Hopefully someone here can help. Here is the code.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$KeywordInfo = Select-String -pattern KEYWORD -AllMatches -InputObject $File
$KeywordOut = New-Object PSObject -Property #{
Matches = $KeywordInfo.Matches
Path = $KeywordInfo.Path
}
Return $KeywordOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
Write-Host "Waiting.." -NoNewline
Do {
Write-Host "." -NoNewline
Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
$Job.Pipe.EndInvoke($Job.Result) | Where {$_.Path} | Format-List | Out-File -FilePath C:\Users\user\Desktop\results.txt -Append -Encoding UTF8 -Width 512
}
Invoke-Item C:\Users\user\Desktop\results.txt
Get-Date
This is the single threaded version I am using that works, including the regex I am using for socials.
cls
Get-Date
Remove-Item C:\Users\user\Desktop\results.txt
$files = Get-ChildItem -recurse -erroraction silentlycontinue
ForEach ($file in $files) {
Select-String -pattern '[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}' -AllMatches -InputObject $file | Select-Object matches, path |
Format-List | Out-File -FilePath C:\Users\user\Desktop\results.tx -Append -Encoding UTF8 -Width 512
}
Get-Date
Invoke-Item C:\Users\user\Desktop\results.txt
I am hoping to build this answer over time as I dont want to over comment. I dont know yet why you are losing data from the multithreading but i think we can increase performace with an updated regex. For starters you have many greedy quantifiers that i think we can shrink down.
[sS][sS][nN]:*\s*\d{3}-*\d{2}-*\d{4}
Select-String is case insensitive by default so you dont need the portion in the beginning. Do you have to check for multiple colons? Since you looking for 0 or many :. Same goes for the hyphens. Perhaps these would be better with ? which matches 0 or 1.
ssn:?\s*\d{3}-?\d{2}-?\d{4}
This is assuming you are looking for mostly proper formatted SSN's. If people are hiding them in text maybe you need to look for other delimiters as well.
I would also suggest adding the text to separate files and maybe combining them after execution. If nothing else just to test.
Hoping this will be the start of a proper solution.
It turns out that for some reason the Select-String cmdlet was having problems with the multithreading. I don't have enough of a developer background to be able to tell what is happening under the hood. However I did discover that by using the -quiet option in Select-String, which turns it into a boolean output, I was able to get the results I wanted.
The first pattern match in each document gives a true value. When I get a true then I return the Path of the document to an array. When that is finished I run the pattern match against the paths that were output from the scriptblock. This is not quite as effective performance wise as I had hoped for but still a pretty dramatic improvement over singlethread.
The other issue I ran into was the read/writes to disk by trying to output results to a document at each stage. I have changed that to arrays. While still memory intensive, it is much quicker.
Here is the resulting code. Any additional tips on performance improvement are appreciated:
cls
Remove-Item C:\Users\user\Desktop\output.txt
$Throttle = 5 #threads
$ScriptBlock = {
Param (
$File
)
$Match = Select-String -pattern 'ssn:?\s*\d{3}-?\d{2}-?\d{4}' -Quiet -InputObject $File
if ( $Match -eq $true ) {
$MatchObjects = Select-Object -InputObject $File
$MatchOut = New-Object PSObject -Property #{
Path = $MatchObjects.FullName
}
}
Return $MatchOut
}
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = #()
$Files = Get-ChildItem -Path I:\ -recurse -erroraction silentlycontinue
ForEach ($File in $Files) {
$Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($File)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property #{
File = $File
Pipe = $Job
Result = $Job.BeginInvoke()
}
}
$Results = #()
ForEach ($Job in $Jobs) {
$Results += $Job.Pipe.EndInvoke($Job.Result)
}
$PathValue = #()
ForEach ($Line in $Results) {
$PathValue += $Line.psobject.properties | % {$_.Value}
}
$UniqValues = $PathValue | sort | Get-Unique
$Output = ForEach ( $Path in $UniqValues ) {
Select-String -Pattern '\d{3}-?\d{2}-?\d{4}' -AllMatches -Path $Path | Select-Object -Property Matches, Path
}
$Output | Out-File -FilePath C:\Users\user\Desktop\output.txt -Append -Encoding UTF8 -Width 512
Invoke-Item C:\Users\user\Desktop\output.txt
I'm trying to set up a script to monitor IIS 7.5 logs fro 500 errors. Now I can get it to do that OK but I would like it to check every 30 minutes. Quite naturally I don't want it to warn me about the previous 500 errors it has already reported.
As you can see from the script below I have added a $time variable to take this into account, however I can't seem to find a way to use this variable. Any help would be appreciated.
#Set Time Variable -30
$time = (Get-Date -Format hh:mm:ss (Get-Date).addminutes(-30))
# Location of IIS LogFile
$File = "C:\Users\here\Documents\IIS-log\"+"u_ex"+(get-date).ToString("yyMMdd")+".log"
# Get-Content gets the file, pipe to Where-Object and skip the first 3 lines.
$Log = Get-Content $File | where {$_ -notLike "#[D,S-V]*" }
# Replace unwanted text in the line containing the columns.
$Columns = (($Log[0].TrimEnd()) -replace "#Fields: ", "" -replace "-","" -replace "\(","" -replace "\)","").Split(" ")
# Count available Columns, used later
$Count = $Columns.Length
# Strip out the other rows that contain the header (happens on iisreset)
$Rows = $Log | where {$_ -like "*500 0 0*"}
# Create an instance of a System.Data.DataTable
#Set-Variable -Name IISLog -Scope Global
$IISLog = New-Object System.Data.DataTable "IISLog"
# Loop through each Column, create a new column through Data.DataColumn and add it to the DataTable
foreach ($Column in $Columns) {
$NewColumn = New-Object System.Data.DataColumn $Column, ([string])
$IISLog.Columns.Add($NewColumn)
}
# Loop Through each Row and add the Rows.
foreach ($Row in $Rows) {
$Row = $Row.Split(" ")
$AddRow = $IISLog.newrow()
for($i=0;$i -lt $Count; $i++) {
$ColumnName = $Columns[$i]
$AddRow.$ColumnName = $Row[$i]
}
$IISLog.Rows.Add($AddRow)
}
$IISLog | select time,csuristem,scstatus
OK With KevinD's help and PowerGUI with a fair bit of trial and error, I got it working as I expected. Here's the finished product.
#Set Time Variable -30
$time = (Get-Date -Format "HH:mm:ss"(Get-Date).addminutes(-30))
# Location of IIS LogFile
$File = "C:\Users\here\Documents\IIS-log\"+"u_ex"+(get-date).ToString("yyMMdd")+".log"
# Get-Content gets the file, pipe to Where-Object and skip the first 3 lines.
$Log = Get-Content $File | where {$_ -notLike "#[D,S-V]*" }
# Replace unwanted text in the line containing the columns.
$Columns = (($Log[0].TrimEnd()) -replace "#Fields: ", "" -replace "-","" -replace "\(","" -replace "\)","").Split(" ")
# Count available Columns, used later
$Count = $Columns.Length
# Strip out the other rows that contain the header (happens on iisreset)
$Rows = $Log | where {$_ -like "*500 0 0*"}
# Create an instance of a System.Data.DataTable
#Set-Variable -Name IISLog -Scope Global
$IISLog = New-Object System.Data.DataTable "IISLog"
# Loop through each Column, create a new column through Data.DataColumn and add it to the DataTable
foreach ($Column in $Columns) {
$NewColumn = New-Object System.Data.DataColumn $Column, ([string])
$IISLog.Columns.Add($NewColumn)
}
# Loop Through each Row and add the Rows.
foreach ($Row in $Rows) {
$Row = $Row.Split(" ")
$AddRow = $IISLog.newrow()
for($i=0;$i -lt $Count; $i++) {
$ColumnName = $Columns[$i]
$AddRow.$ColumnName = $Row[$i]
}
$IISLog.Rows.Add($AddRow)
}
$IISLog | select #{n="Time"; e={Get-Date -Format "HH:mm:ss"("$($_.time)")}},csuristem,scstatus | ? { $_.time -ge $time }
Thanks again Kev you're a good man. Hope this code helps someone else out there.
Here's
Try changing your last line to:
$IISLog | select #{n="DateTime"; e={Get-Date ("$($_.date) $($_.time)")}},csuristem,scstatus | ? { $_.DateTime -ge $time }
In the select, we're concatenating the date and time fields, and converting them to a date object, then selecting rows where this field is greater than your $time variable.
You'll also need to change your $time variable:
$time = (Get-Date).AddMinutes(-30)
You want a DateTime object here, not a string.