I have a small Excel file with 28 KB in XLSX format and I would like to modify it with Powershell. The file contains 59 rows and 366 columns.
My code walks through the first column and searches for a specific entry and after that it walks through the column found and outputs the content of the found row and the fist row. This is the code:
# Define some parameters.
$year = "2015"
$filename = "C:\...\file.xlsx"
$person = "Lastname, Firstname"
# Open Excel file and select worksheet.
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false
$workbook = $excel.Workbooks.Open($filename)
$worksheet = $workbook.sheets.item($year)
$cells = $worksheet.cells
# Search person name in first column.
$rows = $worksheet.UsedRange.Rows.count
"Rows: $rows"
$row = 1
while ($row -le $rows)
{
$cell = $cells.item($row,1).value2
if ($person -eq $cell) {
break
}
$row++
}
# List row
$cols = $worksheet.UsedRange.Columns.count
"Cols: $cols"
foreach ($col in 2..$cols)
{
$date = $cells.item(1,$col).value2
$data = $cells.item($row,$col).value2
$date = [DateTime]::FromOADate($date)
$msg = $date.ToString("yyyy-MM-dd") + " " + $data
"$msg"
}
# Close workbook and Excel file and release COM object.
$workbook.close()
$excel.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
My problem: The program is terrible slow. It takes more than 5 minutes to iterate the 366 columns!
PS C:\...> Measure-Command { .\program.ps1 }
Days : 0
Hours : 0
Minutes : 5
Seconds : 33
Milliseconds : 580
Ticks : 3335806616
TotalDays : 0,00386088728703704
TotalHours : 0,0926612948888889
TotalMinutes : 5,55967769333333
TotalSeconds : 333,5806616
TotalMilliseconds : 333580,6616
I can hardly believe that this is normal. Instead I think that there is something really wrong with my program. But I have no idea what it is.
What do I have to change to make it faster?
Using loop and find to replace cell values in Excel will take you forever... I have 111 cells to replace and it takes about 40 secs to complete.
However, you may exploit the command Replace which is considerably faster. But to provide a value from a relative cell you have to change your Excel application Reference style to xlR1c1.
Below is my take on how I can replace all cells with string "No registered hostname" with a value of the cell to the left which is IP address for my data.
I have commented out the while loop which I previously used
Since you intend to do an update you may consider this...
$Range=$WorkSheet.Range("B1").EntireColumn
# Replace Cells with No registered hostname
$SearchString="No registered hostname"
# Using Excel reference style xlR1C1 to set the formula for replace
$xls.Application.ReferenceStyle=2
$Range.Replace($SearchString, "=RC[-1]")
# while ($NoDNS=$Range.find($SearchString))
# {
# $NoDNS.Activate()
# $RefRow=$NoDNS.Row
# $NoDNS.value()=$WorkSheet.Cells.Item($RefRow, 1).Text
# }
$xls.Application.ReferenceStyle=1
Using replace only takes a split second to complete all the necessary changes compare to previous while loop.
Related
PowerShell: Format second sheet within xlsx file
I am working with an xlsx file that has two sheets within it.
I am uploading data into these sheets and formatting it.
I am able to successfully format the first sheet but not the second sheet.
This is the code for how I format the first sheet:
# Format Data: Autofit Columns
$lgTime = "[{0:HH:mm:ss}]" -f (Get-Date)
Write-Host "$lgTime Autofitting Data Columns..."
$range2autofit = $worksheet.UsedRange
$rowCount = $range2autofit.Rows.Count
[void] $range2autofit.EntireColumn.Autofit()
$lgTime = "[{0:HH:mm:ss}]" -f (Get-Date)
write-host "$lgTime Creating Excel Table Format ..."
$tableStyle = "TableStyleMedium9"
$tableStyle = "TableStyleLight21"
$Worksheet.Columns.Item("A").NumberFormat = "MM/DD/YYYY"
$ListObject = $WorkBook.ActiveSheet.ListObjects.Add(1, $range2autofit, $null , 1, $null, $tableStyle)
I would like the same formatting to be applied to the second sheet within the file but am having trouble doing that. I have tried using that same code again but with small changes such as:
Workbook.Worksheet.Item(2).UsedRange
Workbook.Worksheet.Item("Sheet2Name").UsedRange
My thought process is that I should be able to use the same code but just access the second sheet in it, I think I am just not accessing the second sheet correctly. That could be completely wrong though.
Edit:
This is where I defined $workbook and added a sheet to the xlsx file which is followed by the renaming of each sheet
$dataFile = "FILE LOCATION.xlsx"
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $false
$workbook = $objExcel.Workbooks.Open($dataFile)
$worksheet = $workbook.Worksheets.Add()
$worksheetOne = $workbook.Worksheets.Item(1)
$worksheetOne.Name = "Sheet1Name"
$worksheetTwo = $workbook.Worksheets.Item(2)
$worksheetTwo.Name = "Sheet2Name"
From the code you added in your edit, it looks like you're getting tripped up when adding a sheet, then using the wrong index later? Worksheets.Add() creates a blank worksheet at index 2 by default, not at the end. For example:
# A good way to check and see what you're doing with your sheets:
$workbook.Worksheets | select Index,Name
Index Name
----- ----
1 Sheet1
2 Sheet2
# add a new sheet
$worksheet = $workbook.Worksheets.Add()
# check again
$workbook.Worksheets | select Index,Name
Index Name
----- ----
1 Sheet1
2 Sheet3 # whoops!
3 Sheet2
To add before/after a specific worksheet instead, you can specify like this:
# missing value is required for COM functions
$newSheet = $workbook.Worksheets.add(
[System.Reflection.Missing]::Value, ## before index n
$workbook.Worksheets.Item(2) ## after index n
)
Then just be careful when you select your sheets, and you should be good to go!
# Setting sheet properties by referring to variable
$ws1 = $workbook.Worksheets.Item(1)
$ws1.Name = "Sheet1Name"
$ws1.UsedRange.EntireColumn.AutoFit()
$ws2 = $workbook.Worksheets.Item(2)
$ws2.Name = "Sheet2Name"
$ws2.UsedRange.EntireColumn.AutoFit()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I found some similar solutions to my issue here, but they do not exactly cover my problem and since I'm still very new to Powershell, I couldn't modify them specifically for my use case. Therefore I have a question.
I am making a weekly manual Excel export (.xlsx) from a system, which doesn't offer any filtering options. The result is a table with around 500 entries.
My goal here is to write a powershell script, which automatically deletes/removes ALL rows, containing a value DIFFERENT from "01.01.2099" in the "Valid until" column (F1 in the Excel-Sheet).
I still haven't written any code, since I'm not sure where or how to start here. I'm sure this is a very simple task and any help from a more experienced Powersheller will be highly appreciated. Thanks!
The biggest challenge here is that you need to test if a cell contains a certain date value or not.
From your image, you can see the dates are formatted in different ways, so comparing the cell's value to a date in a specific format is tricky.
Luckily, the DateTime object has a static method FromOADate() that can do the conversion for you.
Also, you need to delete rows from bottom to top row, otherwise by deleting a row, the index of the ones below that is changed because they all move up one row.
$file = 'D:\Test\Export.xlsx'
# create a datetime variable to chack against
$checkDate = [datetime]::new(2099, 1, 1) # or do (Get-Date -Year 2099 -Month 1 -Day 1).Date
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false
# open the Excel file
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Worksheets.Item(1)
# get the number of rows in the sheet
$rowMax = $sheet.UsedRange.Rows.Count
# loop through the rows to test if the value in column 6 is date 01/01/2099
# do the loop BACKWARDS, otherwise the indices will change on every deletion.
for ($row = $rowMax; $row -ge 2; $row--) {
# convert the formatted date in the cell to real DateTime object with time values set all to 0
# Column 6 is the 'Valid until' column
$cellDate = [datetime]::FromOADate($sheet.Cells.Item($row, 6).Value2).Date
if ($cellDate -ne $checkDate) {
$null = $sheet.Rows($row).EntireRow.Delete()
}
}
# save and exit
$workbook.Close($true)
$excel.Quit()
# clean up the COM objects used
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($sheet)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
In the above code, the column index is hardcoded to be 6
If you aren't sure about that, but do know the columns name, you can insert this snippet:
# get the column index for column named 'Valid until'
$colMax = $sheet.UsedRange.Columns.Count
for ($col = 1; $col -le $colMax; $col++) {
if ($sheet.Cells.Item(1, $col).Value() -eq 'Valid until') { break } # assuming the first row has the headers
}
above the $rowMax = $sheet.UsedRange.Rows.Count line and inside the loop change $sheet.Cells.Item($row, 6).Value2 into $sheet.Cells.Item($row, $col).Value2
I'm working on a PS script to take a row of data from an Excel spreadsheet and populate that data in certain places in a Word document. To elaborate, we have a contract tracking MASTER worksheet that among other things contains data such as name of firm, address, services, contact name. Additionally, we have another TASK worksheet in the same workbook that tracks information such as project owner, project name, contract number, task agree number.
I'm writing a script that does the following:
Ask the user through a message box what kind of contract is being written ("Master", or "Task")
Opens the workbook with the appropriate worksheet opened ("Master" tab or "Task" tab)
Asks the user through a VB InputBox from which Excel row of data they want to use to populate the Word contract
Extracts that row of data from Excel
Outputs certain portions of that row of data to certain location in a Word document
Saves the Word document
Opens the Word document so the user can continue editing it
My question is this - using something like PSExcel, how do I extract that row of data out to variables that can be placed in a Word document. For reference, in case you're going to reply with a snippet of code, here are what the variables are defined as for the Excel portion my script:
$Filepath = "C:\temp\ContractScript\Subconsultant Information Spreadsheet.xlsx"
$Excel = New-Object -ComObject Excel.Application
$Workbook = $Excel.Workbooks.Open($Filepath)
$Worksheet = $Workbook.sheets.item($AgreementType)
$Excel.Visible = $true
#Choosing which row of data
[int]$RowNumber = [Microsoft.VisualBasic.Interaction]::InputBox("Enter the row of data from $AgreementType worksheet you wish to use", "Row")
Additionally, the first row of data in the excel worksheets are the column headings, in case it matters.
I've gotten this far so far:
import-module psexcel
$Consultant = new-object System.Collections.Arraylist
foreach ($data in (Import-XLSX -path $Filepath -Sheet $AgreementType -RowStart $RowNumber))
{
$Consultant.add($data)'
But I'm currently stuck because I can't figure out how to reference the data being added to $consultant.$data. Somehow I need to read in the column headings first so the $data variable can be defined in some way, so when I add the variable $consultant.Address in Word it finds it. Right now I think the variable name is going to end up "$Consultant.1402 S Broadway" which obviously won't work.
Thanks for any help. I'm fairly new to powershell scripting, so anything is much appreciated.
I have the same issue and searching online for solutions in a royal PITA.
I'd love to find a simple way to loop through all of the rows like you're doing.
$myData = Import-XLSX -Path "path to the file"
foreach ($row in $myData.Rows)
{
$row.ColumnName
}
But sadly something logical like that doesn't seem to work. I see examples online that use ForEach-Object and Where-Object which is cumbersome. So any good answers to the OP's question would be helpful for me too.
UPDATE:
Matthew, thanks for coming back and updating the OP with the solution you found. I appreciate it! That will help in the future.
For my current project, I went about this a different way since I ran into lack of good examples for Import-XLSX. It's just quick code to do a local task when needed, so it's not in a production environment. I changed var names, etc. to show an example:
$myDataField1 = New-Object Collections.Generic.List[String]
$myDataField2 = New-Object Collections.Generic.List[String]
# ...
$myDataField10 = New-Object Collections.Generic.List[String]
# PSExcel, the third party library, might want to install it first
Import-Module PSExcel
# Get spreadsheet, workbook, then sheet
try
{
$mySpreadsheet = New-Excel -Path "path to my spreadsheet file"
$myWorkbook = $mySpreadsheet | Get-Workbook
$myWorksheet = $myWorkbook | Get-Worksheet -Name Sheet1
}
catch { #whatever error handling code you want }
# calculate total number of records
$recordCount = $myWorksheet.Dimension.Rows
$itemCount = $recordCount - 1
# specify column positions
$r, $my1stColumn = 1, 1
$r, $my2ndColumn = 1, 2
# ...
$r, $my10thColumn = 1, 10
if ($recordCount -gt 1)
{
# loop through all rows and get data for each cell's value according to column
for ($i = 1; $i -le $recordCount - 1; $i++)
{
$myDataField1.Add($myWorksheet.Cells.Item($r + $i, $my1stColumn).text)
$myDataField2.Add($myWorksheet.Cells.Item($r + $i, $my2ndColumn).text)
# ...
$myDataField10.Add($myWorksheet.Cells.Item($r + $i, $my10thColumn).text)
}
}
#loop through all imported cell values
for ([int]$i = 0; $i -lt $itemCount; $i++)
{
# use the data
$myDataField1[$i]
$myDataField2[$i]
# ...
$myDataField10[$i]
}
I am trying replicate an end users experience by monitoring the time it takes to copy a large file off of a network directory.
I'm using Measure-Command to find the total time it takes to copy an item from a network directory onto the computer that this script is scheduled on (using Windows Scheduler) and output the time in an xlsx.
The issue I'm running into is that every time this script runs off of the scheduler (daily), it overwrites the previous day's data instead of posting the result in the next cell. When I run it manually multiple times, it works just fine and posts separate results under each other. I think the issue is that it's going by the instance (so running the code a handful of times in the same PowerShell instance sees that the $previousRow works, but in the daily schedule it opens a new instance every time and writes over the old data in cell (1,1) and (1,2).
Any suggestions on how to keep historical data?
$seconds = Measure-Command {
Copy-Item -Path X:\shareddrive\test.pdf -Destination C:\Users\Me\Desktop
} | select TotalSeconds
$erroractionpreference = "SilentlyContinue"
$a = New-Object -ComObject Excel.Application
$dt = Get-Date -Format "MM/dd/yyyy hh:mm:ss"
$a.Workbooks.Open("X:\shareddrive\output\timesample.xlsx")
$a.Visible = $true
$a.Worksheets.Item(1)
$previousRow += 1
$a.Cells.Item($previousRow,1) = "OfficeLocation - " + $dt
$a.Cells.Item($previousRow,2) = $seconds.TotalSeconds
$a.ActiveWorkbook.Save()
$a.Workbooks.Close()
$a.Quit()
Remove-Item -Path C:\Users\Me\Desktop\test.pdf
Not sure if you haven't copied all your code in? But it looks like you aren't actually defining $previousRow anywhere? So your existing code runs $previousRow += 1 and that sets $previousRow to 1... which means each time the code runs it will hit row 1 first
To be able to find the row that has the last information in it (ie the value to set $previousValue to + 1) you can use this code:
$filepath = "C:\Folder\ExcelFile.xlsx"
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $False
$WorkBook = $objExcel.Workbooks.Open($filepath)
$WorkSheet = $objExcel.WorkSheets.item(1)
$WorkSheet.activate()
[int]$lastRowvalue = ($WorkSheet.UsedRange.rows.count + 1) - 1
$lastrow = $WorkSheet.Cells.Item($lastRowvalue, 1).Value2
write-host $previousValue
write-host $lastRowvalue
Copy/paste from here (with a few slight modifications): To get the value of last cell used in Excel
That tells you the last row with data... so you would need to add 1 to that first before you set it as the value for $previousRow:
$previousRow = $previousRow + 1
I'm trying to count the cell number of the first row (A1-D1) which is known as header and get that count as the counter.
As all the while find most of them using Usedrange to count the columns:
$headercolcount=($worksheet.UsedRange.Columns).count
But UsedRange will capture maximum count in the whole activesheet, which resulting not identical to the column count in first row if there is extra content data below the header.
I only wish to grab just the first row:
[]
Update:
For clearer view, here is an example.
As 1F & 1G there are no value present, so the answer should be 5 as 1A-1E as it contains data. So how should I grab the 5 correctly?
[]
Get-Process excel | Stop-Process -Force
# Specify the path to the Excel file and the WorkSheet Name
$FilePath = "C:\temp\A_A.xlsx"
$SheetName = "Blad1" # In english this is probably Sheet1
# Create an Object Excel.Application using Com interface
$objExcel = New-Object -ComObject Excel.Application
# Disable the 'visible' property so the document won't open in excel
$objExcel.Visible = $false
$objExcel.DisplayAlerts = $false
# Open Excel file and in $WorkBook
$WorkBook = $objExcel.Workbooks.Open($FilePath)
# Load WorkSheet 'Blad 1' in variable Worksheet
$WorkSheet = $WorkBook.sheets.item($SheetName)
$xlup = -4162
$lastRow = $WorkSheet.cells.Range("A1048576").End($xlup).row
# get the highest amount of columns
$colMax = ($WorkSheet.UsedRange.Columns).count
# initiatie a counter
$count = $null
# set the column you'd like to count
$row = 1
for ($i = 0; $i -le $colMax; $i++){
if($worksheet.rows.Item("$row").columns.Item($i+1).text){
$count++
}
}
$count
This should work. It takes the highest amount of columns. It then loops until it reaches that amount. During the loop it checks if the cell on that row is filled or not, if it is, it adds to the counter.
If you have millions of lines, this might not be the best way but this works for me.
I've testes it with an excel file:
With
$row = 1 this will give : 5
$row = 2 this will give : 6
$row = 3 this will give : 7
$row = 4 this will give : 8
# Specify the path to the Excel file and the WorkSheet Name
$FilePath = "C:\temp\A_A.xlsx"
$SheetName = "Blad1" # In english this is probably Sheet1
# Create an Object Excel.Application using Com interface
$objExcel = New-Object -ComObject Excel.Application
# Disable the 'visible' property so the document won't open in excel
$objExcel.Visible = $false
$objExcel.DisplayAlerts = $false
# Open Excel file and in $WorkBook
$WorkBook = $objExcel.Workbooks.Open($FilePath)
# Load WorkSheet 'Blad 1' in variable Worksheet
$WorkSheet = $WorkBook.sheets.item($SheetName)
$xlup = -4162
$lastRow = $WorkSheet.cells.Range("A1048576").End($xlup).row
$amountofcolumns = $worksheet.UsedRange.Rows(1).Columns.Count
#OUTPUT
write-host "Last Used row:" $lastRow
Write-host "Amount of columns" $amountofcolumns
#show all columnnames
for($i = 1 ; $i -le $amountofcolumns; $i++){
$worksheet.Cells.Item(1,$i).text
}
This will show you how many rows you have AND will show you all values in the first row , ergo your titles.