Write a script to get 10 longest words chart and put them in separate file - string

I need to sort the words in a text file and output them to a file
Function AnalyseTo-Doc{
param ([Parameter(Mandatory=$true)][string]$Pad )
$Lines = Select-String -Path $Pad -Pattern '\b[A-Za-zA-Яа-я]{2,}\b' -AllMatches
$Words = ForEach($Line in $Lines){
ForEach($Match in $Line.Matches){
[PSCustomObject]#{
LineNumber = $Line.LineNumber
Word = $Match.Value
}
}
}
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
Count= $_.Count
Word = $_.Name
Longest= $_.Lenght
}
}
| Sort-Object -Property Count | Select-Object -Last 10
}
AnalyseTo-Doc 1.txt
#Get-Content 1.txt | Sort-Bubble -Verbose | Write-Host Sorted Array: | Select-Object -Last 10 | Out-File .\dz11-11.txt
it's don't work

Sort by the Longest property (consider renaming it to Length), which is intended to contain the word length, but must be redefined to $_.Group[0].Word.Length:Tip of the hat to Daniel.
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
Count= $_.Count
Word = $_.Name
Longest = $_.Group[0].Word.Length
}
} |
Sort-Object -Descending -Property Longest |
Select-Object -First 10
Note that, for conceptual clarity, I've used -Descending to sort by longest words first, which then requires -First 10 instead of -Last 10 to get the top 10.
As for what you tried:
Sorting by the Count property sorts by frequency of occurrence instead, i.e. by how often each word appears in the input file, due to use of Group-Object.
Longest= $_.Length (note that your code had a typo there) accesses the length property of each group object, which is an instance of Microsoft.PowerShell.Commands.GroupInfo, not that of the word being grouped by.
(Since such a GroupInfo instance has no type-native .Length property, but PowerShell automatically provides such a property as an intrinsic member, in the interest of unified handling of collections and scalars. Since a group object itself is considered a scalar (single object), .Length returns 1. PowerShell also provides .Count with the same value - unless present type-natively, which is indeed the case here: a GroupInfo object's type-native .Count property returns the count of elements in the group).
The [pscustomobject] instances wrapping the word at hand are stored in the .Group property, and since they're all the same here, .Group[0].Word.Length can be used to return the length of the word at hand.

Function AnalyseTo-Doc{
param ([Parameter(Mandatory=$true)][string]$Pad )
$Lines = Select-String -Path $Pad -Pattern '\b[A-Za-zA-Яа-я]{2,}\b' -AllMatches
$Words = ForEach($Line in $Lines){
ForEach($Match in $Line.Matches){
[PSCustomObject]#{
LineNumber = $Line.LineNumber
Word = $Match.Value
}
}
}
$Words | Group-Object Word | ForEach-Object {
[PSCustomObject]#{
#Count= $_.Count
Word = $_.Name
Longest = $_.Group[0].Word.Length
}
} |
Sort-Object -Descending -Property Longest | Select-Object -First 10 | Out-File .\dz11-11.txt
}
AnalyseTo-Doc 1.txt

Related

Export CSV Writes Results to Single Line, How Do I get Multiple Lines?

I am looking for duplicates on a share drive so I can let the users know and they can clean it up before we use anything automated. My largest duplicate is close to 400 copies, but the info is all on a single line.
My query is getting the correct results:
$a = Get-ChildItem -Path "S:\" -File -Recurse |
Select-Object -Property Fullname, #{N='Hash';E={(Get-FileHash $_.FullName).Hash}}
$cnt = $a | Group-Object -Property Hash
$cnt |
Select-Object Count, #{N='FullName';E={($_.Group).FullName}}, #{N='Hash';E={($_.Group).Hash}} |
Sort-Object -Property Count -Descending |
Export-Csv C:\Temp\S_Drive_Counts.csv
Here is an example of my results where each entry is on a single line:
"Count","FullName","Hash"
"2","S:\Generation 1\Certification Authority.txt S:\Generation 2\Certification Authority.txt","498868376A5377F731593E9F96EC99F34C69F47537C81B9B32DBAC9321462B83 498868376A5377F731593E9F96EC99F34C69F47537C81B9B32DBAC9321462B83"
I need to pass this info on though, so I'd like to have each entry on a line by itself, similar to this:
"Count","FullName","Hash"
"2","S:\Generation 1\Certification Authority.txt","498868376A5377F731593E9F96EC99F34C69F47537C81B9B32DBAC9321462B83"
"2","S:\Generation 2\Certification Authority.txt","498868376A5377F731593E9F96EC99F34C69F47537C81B9B32DBAC9321462B83"
I can do some string manipulation to the CSV if needed, but I am looking for a way to get it in the correct format before exporting to the CSV.
Unroll your groups. Also, make better use of the pipeline.
Get-ChildItem -Path 'S:\' -File -Recurse |
Select-Object Fullname, #{n='Hash';e={(Get-FileHash $_.FullName).Hash}} |
Group-Object Hash |
ForEach-Object {
$cnt = $_.Count
$_.Group | Select-Object #{n='Count';e={$cnt}}, FullName, Hash
} |
Sort-Object Count, Hash, FullName -Descending |
Export-Csv 'C:\Temp\S_Drive_Counts.csv' -NoType

String manipulation on each noteproperty value

tried many different methods of trying to get this working. The is the closest non-working example i can come up with.
I want to get rid of NoteProperty items with a null value, if i also want to get rid of $ and ; characters from any NoteProperty value in an object while leaving the rest of the value behind could someone please advise me what is wrong with the following code example?
$JournalObject | Get-Member -MemberType NoteProperty | ForEach-Object {
if ($JournalObject.$_.Value -like ';')
{
$JournalObject.$_.Value.Replace(';', '')
}
if ($JournalObject.$_.Value -like '$')
{
$JournalObject.$_.Value.Replace('$', '')
}
if ($JournalObject.$_.Value -eq $null)
{
$JournalObject.PSObject.Properties.Remove($_)
}
}
Kindest regards !!
hoping for your help :)
Something to also note; if you are running a replace that's getting it's object from Get-Member, you aren't touching the original object to begin with.
You can see this for yourself by running another Get-Member where your ForEach-Object is to see what's being passed through the pipeline (spoiler, it's Microsoft.PowerShell.Commands.MemberDefinition)
You can also see this a little better by running it against a string with 1 Get-Member piped then comparing it with a second Get-Member piped:
"asfdasf" | Get-Member (this will return the expected String type)
VS
"asfdasf" | Get-Member | Get-Member (this will comeback as a MemberDefinition object, since you're literally getting the members of the Get-Member result)
Working on a better approach, will update this answer shortly.
This is what I came up with. First you need to prune out the the properties you aren't returning so that you aren't loop in on members that will no longer exist, by finding those members then excluding with select-object.
Then you can loop through and fix the values.
$members = $JournalObject | Get-Member -MemberType NoteProperty | select -expandproperty name
$removelist = $JournalObject | % {
foreach ($member in $members) {
if ($_.$member -eq $null) {
$member
}
}
}
$uremovelist = $removelist | select -unique
$prunedJournalObject = $JournalObject | select * -ExcludeProperty $uremovelist
$members = $prunedJournalObject | Get-Member -MemberType NoteProperty | select -expandproperty
$prunedJournalObject | % {
foreach ($member in $members) {
if ($_.$member -match ';') {
$_.$member = $_.$member.Replace(';', '')
}
if ($_.$member -match '$') {
$_.$member = $_.$member.Replace('$', '')
}
}
$_
}

Passing PowerShell Output to CSV/Excel

I previously had asked a question regarding adding together files and folders with a common name and having them summed up with a total size (Sum of file folder size based on file/folder name). This was successfully answered with the PS script below:
$root = 'C:\DBFolder'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = (Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum ) / 1GB
}
}
This now leaves me with a list that is ordered by the 'Database' Property by default. I have attempted to use a Sort-Object suffix to use the 'Size' property with no joy. I have also attempted to use Export-Csv with confounding results.
Ideally, if I could pass the results of this script to Excel/CSV so I can rinse/repeat across multiple SQL Servers and collate the data and sort within Excel, I would be laughing all the way to the small dark corner of the office where I can sleep.
Just for clarity, the output is looking along the lines of this:
Database Size
-------- ----
DBName1 2.5876876
DBName2 4.7657657
DBName3 3.5676578
Ok, it was one pipe character that I had missed when using the Export-csv function. This resolved my problem.
$root = 'C:\DB\Databases'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = (Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum ) / 1GB
}
} | Export-Csv 'C:\Test\test.csv'

Export-CSV only gets the "Length"

when I try to export to a CSV list, I only get all number for "Length"
.Count property until the split point is reached, then split the CSV array to a new file with a new name that will be used from this point on. What might be the issue?
$RootFolder = Get-Content "c:\DRIVERS\myfile.txt"
foreach ($arrayOfPaths in $RootFolder){
$csv = $arrayofPaths -replace '^\\\\[^\\]+\\([^\\]+)\\([^\\]+).*', 'C:\output\Company_name_${1}_${2}.csv'
$csvIndex = 1
$maxRows = 1000000
$rowsLeft = $maxRows
Get-ChildItem $arrayOfPaths -Recurse | Where-Object {$_.mode -match "d"} | ForEach-Object {
#$csv = $_.FullName -replace '^\\\\[^\\]+\\([^\\]+)\\([^\\]+).*', 'C:\output\Company_name_${1}_${2}.csv'# <- construct CSV path here
$path = $_.FullName
$thisCSV = Get-Acl $path | Select-Object -Expand Access |
Select-Object #{n='Path';e={$path}}, IdentityReference, AccessControlType,
FileSystemRights |
ConvertTo-Csv
if ($thisCSV.count -lt $rowsLeft) {
$thisCSV | Export-Csv $csv -append -noType
$rowsLeft -= $thisCSV.count
} else {
$thisCSV[0..($rowsLeft - 1)] | Export-Csv $csv -append -noType
$csvIndex++
$csv = $csv -replace '\.csv$', "$csvIndex.csv"
if ($thisCSV.count -gt $rowsLeft) {
$thisCSV[$rowsLeft..($thisCSV.count - 1)] | Export-Csv $csv -append -noType
}
$rowsLeft = $maxRows - ($thisCSV.count - $rowsLeft)
}
}
}
Export-CSV is built to take PSCustomObjects as input, not lines of text.
$thisCSV = Get-Acl $path | Select-Object -Expand Access |
Select-Object #{n='Path';e={$path}}, IdentityReference, AccessControlType,
FileSystemRights |
ConvertTo-Csv
The output of this line will be something like:
#TYPE Selected.System.Security.AccessControl.FileSystemAccessRule
"Path","IdentityReference","AccessControlType","FileSystemRights"
"c:\test","BUILTIN\Administrators","Allow","FullControl"
At least three lines, an array of string. What properties does an array of string have?
PS C:\> 'a','b' | Get-Member -MemberType Property
TypeName: System.String
Name MemberType Definition
---- ---------- ----------
Length Property int Length {get;}
Length. The only property you see in the CSV, because Export-CSV is exporting all the properties, and that's the only property.
Fix: Remove | ConvertTo-CSV from the Get-ACL line, leave your custom objects as custom objects and let the export handle converting them.
(This should also fix the counting, because it's not counting 3+ lines of text while trying to export 1+ line of data every time).

how to format data acquired using powershell import-csv

I imported a csv file using import-csv. I only wanted the account number from the list.
I used
$Numbers = Import-Csv csv.csv |Select-Object -ExpandProperty 'Account Number' |Where-Object {$_ -ne "0"} | Out-Null
but i want enclose the account number with "'" and separate the account number with ",". Ideally the list should look like: 'account1','account2',...,'accountlast'. I could not manipulate the $ number variable like array.
Some string manipulation and a -join should be able to get this for you. Although an array which is returned from Import-Csv csv.csv | Select-Object -ExpandProperty 'Account Number' | Where-Object {$_ -ne "0"} would be considered more versatile!
$numbers = "'{0}'" -f ((Import-Csv csv.csv | Select-Object -ExpandProperty 'Account Number' | Where-Object {$_ -ne "0"}) -join "','")
Join the account numbers with ',' and then we use the format operator to enclose that into the outer single quotes.
You are very close. You can import the csv, select the object, then use a foreach to format each object, and finally write to the host with one line instead of breaks for each.
Import-Csv csv.csv | Select-Object -ExpandProperty 'Account Number' | Where-Object {$_ -ne "0"} | ForEach{"'" + $_ + "', "} | Write-Host -NoNewLine

Resources