Powershell Excel find value better performance alternative - excel

I have to go through a loop in excel using the COM Object (no additional modules allow in environment aside from what comes installed with POSH 5).
In each loop I have to look through a worksheet (from a list of variables) for a particular set of values and pull and append data according to it.
My problem isnt so much accomplishing it, but rather the performance hit i get every time I do a Find Value2 in each worksheet.
With future expected massive increase of list of worksheets, and old ones with just more and more columns to parse through and work on in the future, how can I make this smoother and faster.
What I currently do is the following:
$Exl = New-Object -ComObject "Excel.Application"
$Exl.Visible = $false
$Exl.DisplayAlerts = $false
$WB = $Exl.Workbooks.Open($excel)
Foreach ($name in $names) {
$ws = $WB.worksheets | where {$_.name -like "*$name*"}
$range = $ws.Range("C:C")
$findstuff = $range.find($item)
$stuffrow = $findstuff.row
$stuffcolumn = $findstuff.column
}
This last part is what takes A LOT of time, and with each additional sheet and more columns I only see it growing, where it might take 10-20 mins
what can be done to optimize this?
On a side note: while I only need the one row and columnar results, there is also a slight issue with when finding value, it only shows the first result. If in the future there might be a need for the multiple rows and columns where value2 = $variable what should I do? (thats less important though, I asked in case if its related)

Anytime the pipeline is used, there's a performance hit. Instead of using the where object, try something like this (using an if statement):
foreach ($name in $names) {
$ws = if ($WB.worksheets.name -like "*$name*")
$range = Range("C:C")
$findstuff = $range.find($item)
$stuffrow = $findstuff.row
$stuffcolumn = $findstuff.column
}
Note that maybe your line has a typo for the part *where {$_.name -like "*$names*"}*. Maybe it should read *where {$_.name -like "*$name*"}*?
I found my basis from the following bookmark I had: http://community.idera.com/powershell/powershell_com_featured_blogs/b/tobias/posts/speeding-up-your-scripts

So I found a very simple answer.... which is somehow simultaneously EXTREMELY obvious and EXTREMELY unintuitive.
When defining the $range variable Add a pipe to select ONLY the stuff you need.
Instead of:
$range = $ws.Range("C:C")
do:
$range = $ws.Range("C:C") | Select Row, text, value2, column
Why is this unintuitive?
1) Normally Piping would make things slower especially if your pushing many to filter a few
2) One would expect that, especially since its going through the COM object, since it ACTUALLY runs the action when setting a variable rather than just defining. But that is not what happens here. When you set the Variable, it runs AFTER the variable has been defined and gathers the data THE MOMENT the variable is called [I tested this, and saw resource usage at that particular period only], and saves the data after that first variable call. (WHICH IS VERY WEIRD)

Related

How do I optimise performance when selecting first 2 rows per group from Excel spreadsheet in PowerShell?

I have a requirement to select the first X rows for each unique agent ID. My method below works, but it runs into performance issues when the spreadsheet has over about 5k results to consider. I am hopeful that you very smart people can help me optimise the approach to require less processing.
I am using ImportExcel to import the spreadsheet of call records, then I filter out uninteresting rows and I'm left with $UsableCalls as my pool of calls to be evaluated. Sometimes, this pool has only 2k rows. Sometimes it has 16k. It's possible that it might have even more. Unfortunately, it seems like the max this method can support is around 5k-ish results. Anything over that and the process hangs. So if I have 5k rows, then I can really only handle getting the first 1 call per agent. If I have 2k, then I can get the first 2 calls per agent. The number of calls per agent is selectable, and I'd like to have the option to get up to the first 5 calls per agent, but that simply won't work with the way it processes right now.
Ultimately, the goal is to select the first X# calls (rows) for each agent. I then export that as a second spreadsheet. This is the only method I could come up with, but I am certainly open to suggestion.
Here is what I have, how can I improve it?
# this custom function allows the user to select a digit, uses wpf messagebox
$numberagent = select-numberperagent
# collect the unique agent IDs
$UniqueAgentIDs = #()
$UniqueAgentIDs += ($UsableCalls | Select-Object -Property "Agent ID" -Unique )."Agent ID"
# select first X# of each agent's calls
$CallsPerAgent = #()
foreach ($UniqueAgent in $UniqueAgentIDs) {
$CallsPerAgent += ($UsableCalls | ? {$_."Agent ID" -eq "$UniqueAgent"}) | select -First $numberagent
} #close foreach uniqueagent
And here is an example of one of the custom PS Objects in the variable $usableCalls:
PS C:\> $usableCalls[0]
DateTime : 2022-03-03 11:06:16.063
DigitDialed : 781
Agent ID : 261
Agent Name : CCM
Skill group : PAYE.
CallType : PAYE
PPSN : 81
DNIS : 10
ANI : 772606677789
Disposition : Handled
Duration : 818
RingTime : 12
DelayTime : 0
HoldTime : 0
TalkTime : 14
WorkTime : 31
The first thing to improve the speed is to not use += to add stuff to an array.
By doing so, on each addition, the entire array needs to be rebuilt in memory. Better let PowerShell do the collecting of data for you:
# collect the unique agent IDs
$UniqueAgentIDs = ($UsableCalls | Select-Object -Property 'Agent ID' -Unique).'Agent ID'
# select first X# of each agent's calls
$CallsPerAgent = foreach ($UniqueAgent in $UniqueAgentIDs) {
$UsableCalls | Where-Object {$_.'Agent ID' -eq $UniqueAgent} | Select-Object -First $numberagent
}
Without really knowing what objects are in your variable $UsableCalls, you might even be better off using Group-Object to group all calls in the Agent's ID and loop over these groups
$CallsPerAgent = $UsableCalls | Group-Object -Property 'Agent ID' | ForEach-Object {
$_.Group | Select-Object -First $numberagent
}

Powershell runspace output behaves differently depending on how returning custom object is defined

I am experimenting with Powershell runspaces and have noticed a difference in how output is written to the console depending on where I create my custom object. If I create the custom object directly in my script block, the output is written to the console in a table format. However, the table appears to be held open while the runspace pool still has open threads, i.e. it creates a table but I can see the results from finished jobs being appended dynamically to the table. This is the desired behavior. I'll refer to this as behavior 1.
The discrepancy occurs when I add a custom module to the runspace pool and then call a function contained in that module, which then creates a custom object. This object is printed to the screen in a list format for each returned object. This is not the desired behavior. I'll call this behavior 2
I have tried piping the output from behavior 2 to Format-Table but this just creates a new table for each returned object. I can achieve the desired effect somewhat by using Write-Host to print a line of the object values but I don't think this is appropriate considering it seems there is a built in behavior that can achieve my desired result if I can understand it.
My thoughts on the matter are that it has something to do with the asynchronous behavior of the runspace. I'm new to powershell but perhaps when the custom object comes directly from the script block there is a hidden method or type declaration telling powershell to hold the table open and wait for result? This would be overridden when using the second technique because its coming from my custom function?
I would like to understand why this is occurring and how I can achieve behavior 1 while being able to use the custom module, which will eventually be very large. I'm open to a different method technique as well, so long as its possible to essentially see the table of outputs grow as jobs finish. The code used is below.
$ISS = [InitialSessionState]::CreateDefault()
[void]$ISS.ImportPSModule(".\Modules\Test-Item.psm1")
$Pool = [RunspaceFactory]::CreateRunspacePool(1, 5, $ISS, $Host)
$Pool.Open()
$Runspaces = #()
# Script block to run code in
$ScriptBlock = {
Param ( [string]$Server, [int]$Count )
Test-Server -Server $Server -Count $Count
# Uncomment the three lines below and comment out the two
# lines above to test behavior 1.
#[int] $SleepTime = Get-Random -Maximum 4 -Minimum 1
#Start-Sleep -Seconds $SleepTime
#[pscustomobject]#{Server=$Server; Count=$Count;}
}
# Create runspaces and assign to runspace pool
1..10 | ForEach-Object {
$ParamList = #{ Server = "Server A" Count = $_ }
$Runspace = [PowerShell]::Create()
[void]$Runspace.AddScript($ScriptBlock)
[void]$Runspace.AddParameters($ParamList)
$Runspace.RunspacePool = $Pool
$Runspaces += [PSCustomObject]#{
Id = $_
Pipe = $Runspace
Handle = $Runspace.BeginInvoke()
Object = $Object
}
}
# Check for things to be finished
while ($Runspaces.Handle -ne $null)
{
$Completed = $Runspaces | Where-Object { $_.Handle.IsCompleted -eq $true }
foreach ($Runspace in $Completed)
{
$Runspace.Pipe.EndInvoke($Runspace.Handle)
$Runspace.Handle = $null
}
Start-Sleep -Milliseconds 100
}
$Pool.Close()
$Pool.Dispose()
The custom module I'm using is as follows.
function Test-Server {
Param ([string]$Server, [int]$Count )
[int] $SleepTime = Get-Random -Maximum 4 -Minimum 1
Start-Sleep -Seconds $SleepTime
[pscustomobject]#{Server = $Server;Item = $Count}
}
What you have mentioned sounds completely normal to me. That is how powershell is designed because it shares the burden of display. If the user has not specified how to display, PowerShell decides how to.
I couldn't reproduce your issue with the code provided but I think this will solve your problem.
$FinalTable = foreach ($Runspace in $Completed)
{
$Runspace.Pipe.EndInvoke($Runspace.Handle)
$Runspace.Handle = $null
}
$FinalResult will now have the table format you expect.
It appears that my primary issue, aside from errors in my code, was a lack of understanding related to powershell's default object handling. Powershell displays the output of objects as a table when there are less than four key-value pairs and as a list when there are more.
The custom object returned in my test module had more than for key-value pairs while the custom object I returned directly only had two. This resulted in what I thought was odd behavior. I compounded the issue by removing some key-value pairs in my posted code to shorten it and then didn't test it (sorry).
This stackoverflow post has a lengthy answer explaining the behavior some and providing examples for changing the default output.

Modx TV multi select list not saving values

I have a TV multi select list type that is evaluating a snippet:
#EVAL return $modx->runSnippet('getFeaturedResourceTree');
Evaluating this snippett:
<?php
$output = array();
$context = $modx->resource->get('context_key');
$sql = "select * from modx_site_content where context_key = '$context' order by `pagetitle`;";
$results = $modx->query($sql);
foreach($results as $result){
$output[] = $result['pagetitle'].'=='.$result['id'];
}
$output = implode('||', $output);
echo $output;
return;
This does work in the manager, I can select and pick multiple resources in the list. However, when I save the TV, nothing is actuially saved. the TV values are not present in the database and when I reload the resource, the TV field is blank.
what could the problem be here?
I'm fairly certain you can accomplish what you're trying to do with an #SELECT binding rather than #EVAL. This has 2 potential benefits:
#EVAL is Evil, LOL. Not all the time, mind you—there are certainly legitimate uses of #EVAL but I've personally always tried very hard to find an alternative, whenever I've considered using #EVAL.
The method I'm about to show you has worked for me in the past, so I'm speculating it will work for you.
#SELECT pagetitle, id FROM modx_site_content WHERE context_key = 'web' ORDER BY `pagetitle`
If you're using #EVAL because you have multiple contexts and you want the context of the Resource currently being edited, then you could use your Snippet, but I would try:
Rather than echo-ing your output, return it.
Call the snippet in a Chunk, and render the Chunk on a test page to ensure it has the output you want, formatted for the TV Input Options exactly the way it should be.
If the Chunk output passes the test, call it into the TV Input Options field with the #CHUNK binding.
One more note: I can't remember if the current Resource is available in the TV as $modx->resource or $resource, but that might be something you want to double check.

powershell code for comparing two xls files

i'am stuck coding with below requirement.
I have two excel(xls) files(old and new users list). In each file there are 4 fields "Userid", "UserName", "Costcenter", Approving Manager" . Now, i need to check whether each Userid from New user list exists in Old user list. If so, i have to copy/replace the values of "Costcenter" and Approving Manager" in the New User list with the values from the same columns from Old user list. If this condition fails then hightlight the entire row for the "userid" in the New User List for which there is no corresponding matching record in the Old User list and finally not last but least we have to save the New user list. There are about 2000+ userid's
below, i started of coding to get the Userid list from the New user list into an Array. will be doing the same for Old user list. From there on how do i go by modifying the new user list like i explained above?
$objExcel = new-object -comobject excel.application
$UserWorkBook = $objExcel.Workbooks.Open("O:\UserCert\New_Users.xls")
$UserWorksheet = $UserWorkBook.Worksheets.Item(1)
$OldUsers = #()
$intRow = 2 #starting from 2 since i have to exclude the header
do {
$OldUsers = $UserWorksheet.Cells.Item($intRow, 1).Value()
$intRow++
} while ($UserWorksheet.Cells.Item($intRow,1).Value() -ne $null)
Any help Greatly appreciated...
If the userIDs in each list in some type of regular order, then it should be easy to open both workbooks up at the same time, maintain 2 pointers (Row_for_old_userlist and Row_for_new_userlist) and compare the contents of the New User ID with the old one.
If they aren't in some semblance of order, then for each item in the new userlist, you'll have to scan the entire old userlist to find them and then take your action.
I'm not sure that saving in CSV is a valid approach from your requirements - you may not be able to get the data that way.
However - I think you're really asking how to set the value of an Excel spreadsheet cell.
If so, here's some code I use to insert a list of defects into a spreadsheet...and then set the last cell's column A to a different color. Strangely enough, that "value2" part...it's non-negotiable. Don't ask me why it's "value2", but it is.
for ($x=0; $x -lt $defects.count; $x++)
{
$row++
[string] $ENGIssue = $defects[$x]
$sheet.Cells.Item($row,1).value2 = $ENGIssue
}
$range = $sheet.range(("A{0}" -f $row), ("B{0}" -f $row))
$range.select() | Out-Null
$range.Interior.ColorIndex=6

Setting ForceCheckout on an SPList

I'm trying to set the ForceCheckout property on an SPList item and it's just not taking. I'm calling the Update() command as required. All it should take, in essence, is the following two lines.
$myList.ForceCheckout = $false
$myList.Update()
Any ideas why this isn't working? It's remains $true no matter what.
Are you really using $myList, or are you doing something like:
$web.lists["foo"].forcecheckout = $false
$web.lists["foo"].update()
...because the above won't work. Each time you use the Lists collection with an indexer like this, you're getting a new instance of the list. The second line doesn't know about the first line's changes. Ensure you do:
$myList = $web.Lists["foo"]
$myList.forcecheckout = $false
$myList.update()
This will work because you're using the same instance.
-Oisin

Resources