powershell excel Deleting group of rows - excel

I have an xlsx file with thousands of entries
I can within a second filter a column to show only certain information with $workbook.AutoFilter("DATA")
This filter only takes a second however deleting all rows whos first column = "DATA" takes forever with a loop.
Is there a way to capture an array of the hidden rows or a range... or anything that I could .DELETE()
I tried this
[void] [Reflection.Assembly]::LoadWithPartialName( 'System.Windows.Forms' )
$Excel = New-Object -Com Excel.Application
$WorkBook = $Excel.Workbooks.Open($filename)
$Excel.visible = $true
$Excel.selection.autofilter(1,"DATA")
$sheet = $workbook.Sheets.Item(1)
$max = $sheet.UsedRange.Rows.Count
for ($i=2; $i -le $max; $i++)
{
$row = $sheet.Cells.Item($i,1).EntireRow
if ($row.hidden -eq $false)
{
$row.Delete()
}
}
FIXED.. loop backwards $i-- *
However This failed me misserably because for some reason it leaves roughly 10% of the visabled rows undeleted. If I run it twice it works however scaling up this would become a bigger issue.
In a perfect world I would like something like this
$Excel.selection.autofilter(1,"DATA").DELETE()
Thanks in advance for any hints or tricks you geniuses may have.
Update: Thanks Graimer, you are right I have to loop in the other directions, this still takes quite some time with 10,000+ entries... I am looking for a way to do it without the manual loop.
If I go $Excel.visible = $true, and then $Excel.selection.autofilter(1,"DATA")... then as a user I ctrl+A and delete the selected rows... its quicker manually then the looping process... I cant help but think there MUST be some way to script that action.

Turned out to be pretty easy
after applying a fiter select a range from row1 to Lastrow, delete range.
Because the filter is only showing that one value the range cannot select hidden cells

Related

Pull data from a specified row in Excel spreadsheet

I'm working on a PS script to take a row of data from an Excel spreadsheet and populate that data in certain places in a Word document. To elaborate, we have a contract tracking MASTER worksheet that among other things contains data such as name of firm, address, services, contact name. Additionally, we have another TASK worksheet in the same workbook that tracks information such as project owner, project name, contract number, task agree number.
I'm writing a script that does the following:
Ask the user through a message box what kind of contract is being written ("Master", or "Task")
Opens the workbook with the appropriate worksheet opened ("Master" tab or "Task" tab)
Asks the user through a VB InputBox from which Excel row of data they want to use to populate the Word contract
Extracts that row of data from Excel
Outputs certain portions of that row of data to certain location in a Word document
Saves the Word document
Opens the Word document so the user can continue editing it
My question is this - using something like PSExcel, how do I extract that row of data out to variables that can be placed in a Word document. For reference, in case you're going to reply with a snippet of code, here are what the variables are defined as for the Excel portion my script:
$Filepath = "C:\temp\ContractScript\Subconsultant Information Spreadsheet.xlsx"
$Excel = New-Object -ComObject Excel.Application
$Workbook = $Excel.Workbooks.Open($Filepath)
$Worksheet = $Workbook.sheets.item($AgreementType)
$Excel.Visible = $true
#Choosing which row of data
[int]$RowNumber = [Microsoft.VisualBasic.Interaction]::InputBox("Enter the row of data from $AgreementType worksheet you wish to use", "Row")
Additionally, the first row of data in the excel worksheets are the column headings, in case it matters.
I've gotten this far so far:
import-module psexcel
$Consultant = new-object System.Collections.Arraylist
foreach ($data in (Import-XLSX -path $Filepath -Sheet $AgreementType -RowStart $RowNumber))
{
$Consultant.add($data)'
But I'm currently stuck because I can't figure out how to reference the data being added to $consultant.$data. Somehow I need to read in the column headings first so the $data variable can be defined in some way, so when I add the variable $consultant.Address in Word it finds it. Right now I think the variable name is going to end up "$Consultant.1402 S Broadway" which obviously won't work.
Thanks for any help. I'm fairly new to powershell scripting, so anything is much appreciated.
I have the same issue and searching online for solutions in a royal PITA.
I'd love to find a simple way to loop through all of the rows like you're doing.
$myData = Import-XLSX -Path "path to the file"
foreach ($row in $myData.Rows)
{
$row.ColumnName
}
But sadly something logical like that doesn't seem to work. I see examples online that use ForEach-Object and Where-Object which is cumbersome. So any good answers to the OP's question would be helpful for me too.
UPDATE:
Matthew, thanks for coming back and updating the OP with the solution you found. I appreciate it! That will help in the future.
For my current project, I went about this a different way since I ran into lack of good examples for Import-XLSX. It's just quick code to do a local task when needed, so it's not in a production environment. I changed var names, etc. to show an example:
$myDataField1 = New-Object Collections.Generic.List[String]
$myDataField2 = New-Object Collections.Generic.List[String]
# ...
$myDataField10 = New-Object Collections.Generic.List[String]
# PSExcel, the third party library, might want to install it first
Import-Module PSExcel
# Get spreadsheet, workbook, then sheet
try
{
$mySpreadsheet = New-Excel -Path "path to my spreadsheet file"
$myWorkbook = $mySpreadsheet | Get-Workbook
$myWorksheet = $myWorkbook | Get-Worksheet -Name Sheet1
}
catch { #whatever error handling code you want }
# calculate total number of records
$recordCount = $myWorksheet.Dimension.Rows
$itemCount = $recordCount - 1
# specify column positions
$r, $my1stColumn = 1, 1
$r, $my2ndColumn = 1, 2
# ...
$r, $my10thColumn = 1, 10
if ($recordCount -gt 1)
{
# loop through all rows and get data for each cell's value according to column
for ($i = 1; $i -le $recordCount - 1; $i++)
{
$myDataField1.Add($myWorksheet.Cells.Item($r + $i, $my1stColumn).text)
$myDataField2.Add($myWorksheet.Cells.Item($r + $i, $my2ndColumn).text)
# ...
$myDataField10.Add($myWorksheet.Cells.Item($r + $i, $my10thColumn).text)
}
}
#loop through all imported cell values
for ([int]$i = 0; $i -lt $itemCount; $i++)
{
# use the data
$myDataField1[$i]
$myDataField2[$i]
# ...
$myDataField10[$i]
}

Adding Same Column Header Names to all worksheets in an Excel Book

I have a script that will create worksheets based on the number of files that it finds in a directory. From there it changes the name of the sheets to the file name. During that process, I am attempting to add two Column header values of "Hostname" and "IP Address" to every sheet. I can achieve this by activating each sheet individually but this becomes rather cumbersome as the amount of sheets goes past 20+ and thus I am trying to find a dynamic way of doing this regardless the amount of sheets that are present.
This is the code that I have to do everything up to the column header portion:
$WorksheetCount = (Get-ChildItem $Path\Info*.txt).count
$TabNames = Get-ChildItem $Path\Info*.txt
$NewTabNames = Foreach ($file IN $TabNames.Name){$file.Substring(0,$file.Length-4)}
$Break = 0
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $true
$Workbook = $Excel.Workbooks.Add()
$null = $Excel.Worksheets.Add($MissingType, $Excel.Worksheets.Item($Excel.Worksheets.Count),
$WorksheetCount - $Excel.Worksheets.Count, $Excel.Worksheets.Item(1).Type)
1..$WorksheetCount
Start-Sleep -s 1
ForEach ($Name In $NewTabNames){
$Break++
$Excel.Worksheets.Item($Break).Name = $Name
}
I have attempted to insert my code as such:
ForEach ($Name In $NewTabNames){
$Break++
$Excel.Worksheets.Item($Break).Name = $Name
$cells=$Name.Cells
$cells.item(1,1)="Hostname"
$cells.item(1,2)="IP Address"
}
When I attempt to run the script, I get the following error..
You cannot call a method on a null-valued expression.
And then it proceeds to list each line of the code that I had put in. I thought that since I created a variable during the operation, that it was the issue:
$cells=$Name.Cells
I thought That perhaps if I moved it before the ForEach command that it would resolve it but I still receive the same issue. I have looked through various ways of trying to select ranges of sheets within excel via powershell but have not found anything helpful.
Would appreciate any assistance on this.
This is actually my first post in StackOverflow ever and I feel pretty excited to finally help out. I made some small modifications to your code and seems to work fine. I noticed some odd behavior when I removed the $null variable that was getting assigned because it seemed strange to me why it was being done, but after removing that assignment my outlook application open by itself automatically every time I ran the script. I found the site where you got the code from just to see if there were any changes to the original code.
I found this Microsoft documentation very helpful to figure this out.
This is what I modified
ForEach ($Name In $NewTabNames){
$Break++
$Excel.Worksheets($Break).Name = $Name
$Excel.Worksheets($Break).Cells(1,1).Font.Bold = $true
$Excel.Worksheets($Break).Cells(1,1) = "Hostname"
$Excel.Worksheets($Break).Cells(1,2).Font.Bold = $true
$Excel.Worksheets($Break).Cells(1,2) = "IP Address"
}

Powershell: Hiding columns and bordering cells in Excel

I've been diving into how Powershell can use Excel as a COM object, have most of it down but there are two things I'd like to be able to do that I haven't been able to find anywhere, hoping someone can help.
1/ Would like to be able to script hiding a range of columns in the generated Excel spreadsheet.
2/ Would like to be able to have Excel add a border around all cells in the script as well.
Thanks!
Hiding a column:
Here is an example that you can adapt. This is hiding the first column in the active work sheet.
$file = "C:\Users\Micky\Desktop\not locked.xlsx"
[Reflection.Assembly]::LoadWithPartialName("Microsoft.Office.Interop.Excel")|Out-Null
$excel = New-Object Microsoft.Office.Interop.Excel.ApplicationClass
$excel.Visible = $true
$wb = $excel.Workbooks.Open($file)
$ws = $wb.ActiveSheet
$c = $ws.Columns
$c.Item(1).hidden = $true
Cell border:
For the example I use a double border and apply to the first cell, A1.
The XlLineStyle Enum can be found here
$xlDouble = -4119
$item = $ws.Range("A1")
$item.Borders.LineStyle = $xlDouble

Speed up reading an Excel File in Powershell

I wonder if there is any way to speed up reading an Excel file with powershell. Many would say I should stop using the do until, but the problem is I need it badly, because in my Excel sheet there can be 2 rows or 5000 rows. I understand that 5000 rows needs some time. But 2 rows shouldn't need 90sec+.
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
$Path = EXCELFILEPATH
$Workbook = $Excel.Workbooks.open($Path)
$Sheet1 = $Workbook.Worksheets.Item(test)
$URows = #()
Do {$URows += $Sheet1.Cells.Item($Row,1).Text; $row = $row + [int] 1} until (!$Sheet1.Cells.Item($Row,1).Text)
$URows | foreach {
$MyParms = #{};
$SetParms = #{};
And i got this 30 times in the script too:
If ($Sheet1.Cells.Item($Row,2).Text){$var1 = $Sheet1.Cells.Item($Row,2).Text
$MyParms.Add("PAR1",$var1)
$SetParms.Add("PAR1",$var1)}
}
I have the idea of running the $MyParms stuff contemporarily, but I have no idea how. Any suggestions?
Or
Increase the speed of reading, but I have no clue how to achieve that without destroying the "read until nothing is there".
Or
The speed is normal and I shouldn't complain.
Don't use Excel.Application in the first place if you need speed. You can use an Excel spreadsheet as an ODBC data source - the file is analogous to a database, and each worksheet a table. The speed difference is immense. Here's an intro on using Excel spreadsheets without Excel
Appending to an array with the += operator is terribly slow, because it will copy all elements from the existing array to a new array. Use something like this instead:
$URows = for ($row = 1; !$Sheet1.Cells.Item($row, 1).Text; $row++) {
if ($Sheet1.Cells.Item($Row,2).Text) {
$MyParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
$SetParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
}
$Sheet1.Cells.Item($Row,1).Text
}
Your Do loop is basically a counting loop. The canonical form for such loops is
for (init counter; condition; increment counter) {
...
}
so I changed the loop accordingly. Of course you'd achieve the same result like this:
$row = 1
$URows = Do {
...
$row += 1
}
but that would just mean more code without any benefits. This modification doesn't have any performance impact, though.
Relevant in terms of performance are the other two changes:
I moved the code filling the hashtables inside the first loop, so the code won't loop twice over the data. Using index and assignment operators instead of the Add method for assigning values to the hashtable prevents the code from raising an error when a key already exists in the hashtable.
Instead of appending to an array (which has the abovementioned performance impact) the code now simply echoes the cell text in the loop, which PowerShell automatically turns into a list. The list is then assigned to the variable $URows.

Why is writing Excel cell values fast in VBScript but slow in PowerShell?

Why is it that writing cell values to Excel is a lot faster in VBScript than in PowerShell?
Isn't PowerShell the new thing, and VBScript the deprecated MS scripting language?
VBScript example (save to filename.vbs)
This runs in a split second.
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = false
Set objWorkbook = objExcel.Workbooks.Add()
' Edit: increased number of writes to 500 to make speed difference more noticeable
For row = 1 To 500
'Edit: using .cells(row,1) instead of .cells(50,1) - this was a mistake
objWorkbook.workSheets(1).cells(row,1).value = "test"
Next
objWorkbook.SaveAs(CreateObject("Scripting.FileSystemObject").GetParentFolderName(WScript.ScriptFullName) & "\test.xlsx")
objExcel.Quit
msgbox "Done."
PowerShell example (save to filename.ps1) This takes multiple seconds to run (problematic on thousands of records)
#need this to work around bug if you use a non-US locale: http://support.microsoft.com/default.aspx?scid=kb;en-us;320369
[System.Threading.Thread]::CurrentThread.CurrentCulture = "en-US"
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $False
$xls_workbook = $excel.Workbooks.Add()
# Edit: using foreach instead of for
# Edit: increased number of writes to 500 to make speed difference more noticeable
foreach ($row in 1..500) {
# Edit: Commented out print-line, slows down the script
#"Row " + $row
# This is very slow! - http://forums.redmondmag.com/forums/forum_posts.asp?tid=4037&pn=7
$xls_workbook.sheets.item(1).cells.item($row,1) = "test"
}
$xls_workbook.SaveAs($MyInvocation.MyCommand.Definition.Replace($MyInvocation.MyCommand.Name, "") + "test.xlsx")
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
I want to use this for thousands of records. If there is no fast way to do this, PowerShell is not an option. Are there better alternatives?
You can speed things up by not looping through individual cells:
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $True
$xls_workbook = $excel.Workbooks.Add()
$range = $xls_workbook.sheets.item(1).Range("A1:A100")
$range.Value2 = "test"
If you want to write an array of values to a range, here is a nice blog post that demonstrates similar technique:
How to Get Data into an Excel Spreadsheet Very Quickly with PowerShell
some things don't add up here:
your VBScript, writes on ONE cell over and over, while your PowerShell code writes into 100 cells
objWorkbook.workSheets(1).cells(50,1).value = "test"
$xls_workbook.sheets.item(1).cells.item($row,1) = "test"
you are executing "Row " + $row on PowerShell - this might offset comparison too.
If you want to write into multiple cells, you should think about using arrays and wrinting onto whole ranges, because this has better performance.
You can shave a little time off the PowerShell version by eliminating the for loop test and using a foreach.
for ($row = 1; $row -le 100; $row++)
goes to:
foreach ($row in 1..100)
By doing this you eliminate the comparison and increment.
But aside from that, my observations match yours (see my comments on Jook's answer).
You're still interfacing with Excel through COM though. That's adding some overhead due to COMInterop processing.
PowerShell, by its very design and use of cmdlets is a non-standard mess, at least for basic things. VBScript, which any programmer should be able to use and understand, has a general way of doing basic things that does not require special cmdlets to be installed or included with the deployed code. I believe this is a step backwards in many respects.
Before anyone trashes me and says I just don't PowerShell, I must mention I have a long history of UNIX shell scripting behind me. PowerShell is similar, obviously, but to me its not nearly as well implemented.
I do know that reality dictates that I will end up using PowerShell sooner or later - I just hope it evolves into a more "standard" replacement in the future.

Resources