Compare two EXCEL files to delete duplicate data using Powershell - excel

I have a excel sheet that has number of columns and rows i need to delete all columns except one and delete all rows except predefined one(have an other excel sheet with one column and data(in rows) that are need to be deleted.
I have done the first that is to delete all the columns except, but i was only able to do that with Column number and i want to do this with Header name i.e. "Product Name" as Column number may change with other sheets.
Also want to do the same with Rows so i can compare row data from my reference execl.xlsx file and delete the one that are same.
$file = "C:\TE.xlsx" # here goes the path and name of the excel file.
$ColumnsToKeep = 4 # Specify the column numbers to delete.
$excel = New-Object -comobject Excel.Application # Creating object of excel in powershell.
$excel.DisplayAlerts = $False
$excel.visible = $False
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Sheets.Item(1) # Referring to first sheet.
$maxColumns = $sheet.UsedRange.Columns.Count
$ColumnsToRemove = Compare-Object $ColumnsToKeep (1..$maxColumns) | Where-Object{$_.SideIndicator -eq "=>"} | Select-Object -ExpandProperty InputObject
0..($ColumnsToRemove.Count - 1) | %{$ColumnsToRemove[$_] = $ColumnsToRemove[$_] - $_}
$ColumnsToRemove | ForEach-Object{
[void]$sheet.Cells.Item(1,$_).EntireColumn.Delete()
}
$workbook.SaveAs("C:\data1.XLSX")
$workbook.Close($true)
$excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
Remove-Variable excel

Provided all your columns have headers in the first row, you can use this:
$file = "C:\TE.xlsx"
$columnToKeep = 'product'
$excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false
$excel.Visible = $false
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Worksheets.Item(1)
$maxColumns = $sheet.UsedRange.Columns.Count
# remove all columns except the one with header named $columnToKeep
for ($col = 1; $col -le $maxColumns; $col++) {
if ($sheet.Cells.Item(1, $col).Value() -ne $columnToKeep) {
[void]$sheet.Columns($col).EntireColumn.Delete()
}
}
$workbook.SaveAs("C:\data1.XLSX")
$workbook.Close($true)
$excel.Quit()
## clean-up used Com objects
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($sheet)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

Related

Powershell - Filter an Excel File

I would like to add a filter for an Excel file within a PowerShell script.
So "if you find in column D the entry "Listener", make the whole row invisible, so filter it out, so that only rows are shown where no "Listener" occurs.
Can I implement this with PowerShell somehow? I tried but it didn't work.
i tried it with this :
$column = 4 # column D
$filename = "C:\Users\xxxxx\Desktop\Test.XLS"
$criteria = "Listener"
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$workbook = $Excel.Workbooks.Open($filename)
$worksheet = $workbook.Worksheets.Item(1)
$usedrange = $worksheet.UsedRange
$usedrange.EntireColumn.AutoFilter()
$usedrange.AutoFilter($column, $criteria)
$worksheet.UsedRange.offset($column,4).EntireLine.Delete()
Thanks!
This should work for you:
$column = 4 # column D
$filename = "C:\Users\xxxxx\Desktop\Test.XLS"
$criteria = "Listener"
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$workbook = $Excel.Workbooks.Open($filename)
$worksheet = $workbook.Worksheets.Item(1)
$worksheet.Activate()
# find the number of used rows in the table
$rowMax = $worksheet.UsedRange.Rows.Count
# hide rows that match the criteria
# go from bottom to top
# if your file does not have column headers, use $row -gt 0
for ($row = $rowMax; $row -gt 1; $row--) {
if ($worksheet.Cells.Item($row, $column).Value() -eq $criteria) {
$worksheet.Rows($row).Hidden = $true
}
}
# save and close the workbook
$workbook.Close($true)
# quit Excel and clean up the COM objects from memory
$Excel.Quit()
# IMPORTANT: clean-up used Com objects
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
Edit
I may have misinterpreted your question and instead of hiding the rows that match the criteria like in the code above, all you need is to turn on an Autofilter on column 'D':
$column = 4 # column D
$filename = "C:\Users\xxxxx\Desktop\Test.XLS"
$criteria = "Listener"
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$workbook = $Excel.Workbooks.Open($filename)
$worksheet = $workbook.Worksheets.Item(1)
$worksheet.Activate()
# find the number of used rows in the table
$rowMax = $worksheet.UsedRange.Rows.Count
# add the autofilter to to column D
[void]$worksheet.Range("D$(1):D$($rowMax)").AutoFilter(1,"<>$criteria")
# save and close the workbook
$workbook.Close($true)
# quit Excel and clean up the COM objects from memory
$Excel.Quit()
# IMPORTANT: clean-up used Com objects
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

Copying multiple excel columns one after another with powershell

I'm using powershell to manipulate a group of excel files. I'm copying one column from each file into another file that will contain all the data. The problem is I need to copy the columns one after another.
I have managed to write this to copy a column and paste it to another excel file.
ForEach ($item in $files) {
$FullName = [System.IO.Path]::GetFileName("$item")
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $true
$Workbook = $Excel.workbooks.open($Path + [System.IO.Path]::GetFileName("$item"))
$Worksheets = $Workbooks.worksheets
$Worksheet = $Workbook.Worksheets.Item(1)
$range = $WorkSheet.Range($range1).EntireColumn
$range.Copy() | out-null
$Worksheet = $Workbook.Worksheets.item($merge)
$Range = $Worksheet.Range($range2)
$Worksheet.Paste($range)
$WorkBook.Save()
$WorkBook.Close()
$Excel.quit()
}

Merge content of multiple Excel files into one using PowerShell

I have multiple Excel files with different names in path.
e.g. C:\Users\XXXX\Downloads\report
Each file has a fixed number of columns.
e.g. Date | Downtime | Response
I want to create a new Excel file with merge of all Excel data. New column should be added with client name in which i want to enter file name. Then each Excel file data append below one by one.
e.g. Client name | Date | Downtime | Response
Below code can able to append all excel data but now need to add Client name column.
$path = "C:\Users\XXXX\Downloads\report"
#Launch Excel, and make it do as its told (supress confirmations)
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $True
$Excel.DisplayAlerts = $False
$Files = Get-ChildItem -Path $path
#Open up a new workbook
$Dest = $Excel.Workbooks.Add()
#Loop through files, opening each, selecting the Used range, and only grabbing the first 5 columns of it. Then find next available row on the destination worksheet and paste the data
ForEach($File in $Files)
{
$Source = $Excel.Workbooks.Open($File.FullName,$true,$true)
If(($Dest.ActiveSheet.UsedRange.Count -eq 1) -and ([String]::IsNullOrEmpty($Dest.ActiveSheet.Range("A1").Value2)))
{
#If there is only 1 used cell and it is blank select A1
[void]$source.ActiveSheet.Range("A1","E$(($Source.ActiveSheet.UsedRange.Rows|Select -Last 1).Row)").Copy()
[void]$Dest.Activate()
[void]$Dest.ActiveSheet.Range("A1").Select()
}
Else
{
#If there is data go to the next empty row and select Column A
[void]$source.ActiveSheet.Range("A2","E$(($Source.ActiveSheet.UsedRange.Rows|Select -Last 1).Row)").Copy()
[void]$Dest.Activate()
[void]$Dest.ActiveSheet.Range("A$(($Dest.ActiveSheet.UsedRange.Rows|Select -last 1).row+1)").Select()
}
[void]$Dest.ActiveSheet.Paste()
$Source.Close()
}
$Dest.SaveAs("$path\Merge.xls")
$Dest.close()
$Excel.Quit()
Suggest any effective way to do this. Please provide links if available.
Convert XLS to XLSX :
$xlFixedFormat = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
$excel = New-Object -ComObject excel.application
$excel.visible = $true
$folderpath = "C:\Users\xxxx\Downloads\report\*"
$filetype ="*xls"
Get-ChildItem -Path $folderpath -Include $filetype |
ForEach-Object `
{
$path = ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
"Converting $path to $filetype..."
$workbook = $excel.workbooks.open($_.fullname)
$workbook.saveas($path, $xlFixedFormat)
$workbook.close()
}
$excel.Quit()
$excel = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
If you are willing to use the external module Import-Excel, you could simply loop through the files like so:
$report_directory = ".\reports"
$merged_reports = #()
# Loop through each XLSX-file in $report_directory
foreach ($report in (Get-ChildItem "$report_directory\*.xlsx")) {
# Loop through each row of the "current" XLSX-file
$report_content = foreach ($row in Import-Excel $report) {
# Create "custom" row
[PSCustomObject]#{
"Client name" = $report.Name
"Date" = $row."Date"
"Downtime" = $row."Downtime"
"Response" = $row."Response"
}
}
# Add the "custom" data to the results-array
$merged_reports += #($report_content)
}
# Create final report
$merged_reports | Export-Excel ".\merged_report.xlsx"
Please note that this code is not optimized in terms of performance but it should allow you to get started

Powershell with Excel

What I need to do is to extract the data in the excel row and output them into different rows on Excel. After that, I will need to use the extracted data and perform certain conditions on the extracted data.
This is my current script:
To open excel and apply the formulas
$excel = New-Object -ComObject excel.application
$filepath = 'D:\testexcel.xlsx'
$workbook = $excel.workbooks.open("$filepath")
$worksheet = $workbook.worksheets.item(1)
$excel.Visible = $true
$rows = $worksheet.range("A1").currentregion.rows.count
$worksheet.range("S1:S$rows").formula = $worksheet.range("S1").formula
Function to find the row, apply the formula and output it
function test123(){
param([string]$test123)
$sourcefile = "D:\testexcel.xlsx"
$sheetname = "abc"
$excel = new-object -comobject excel.application
$excel.Visible = $true
$excelworkbook = $excel.Workbooks.open($sourcefile, 2, $true)
$excelworksheet = $excelworkbook.worksheets.item($sheetname)
$row = 1
$column = 1
$found = $false
while(($excelworksheet.cells.item($row, $column).value() -ne $null) -and($found -eq $false)){
if(($excelworksheet.cells.item($row, $column).value()).toupper() -eq $test123.ToUpper()){
write-host $excelworksheet.cells.item($row, $column).value() $excelworksheet.cells.item($row, $column+1).value(),
$excelworksheet.cells.item($row, $column +2).value() $found = $true
}
$row += 1
}
#close workbook
$excelworkbook.close()
$excel.quit()
}
test123 -test123 "Test123"
Please guide me and tell me if this is the right way to do it... Thanks
Please have a look into the ImportExcel module by Douge Finke. This module has the capability to do what you need.
Get it from PowerShell gallery: Install-Module -Name ImportExcel
Github link: https://github.com/dfinke/ImportExcel
you can then do Get-Help Import-Excel -Examples which has pretty good examples.

need some help getting powershell do delete cell content and inserting a formula

Here is the script as it sits. $of is set to the name of the file that gets downloaded with wget. I am getting exception from HRESULT: 0x0800A03EC which has something to do with the range portion and it seeing a 0 based range, but I am giving it 2 : # of rows.
How can I get PowerShell to clear the range and then insert the formula?
#open the downloaded worksheet
$Excel = New-Object -Com Excel.Application
$Workbook = $Excel.Workbooks.Open($of)
$page = 'Project Summary'
$ws = $Workbook.Worksheets | Where-Object {$_.Name -eq $page}
# Set variables for the worksheet cells, and for navigation
$cells = $ws.Cells
$row = 1
$col = 5
# Add the header to the worksheet
$headers = "Region"
$headers | foreach {
$cells.Item($row, $col) = $_
}
# Add the formula to each occupied row
$rows = $worksheet.UsedRange.Rows.Count
$ws.Range("E2:E" + $rows).Clear()
$ws.Range("E2:E" + $rows).Formula = "=SUM(E2:E5)"
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
$Excel.ActiveWorkbook.SaveAs('W:\test.xlsx')
I also tried
$rows | foreach {
$ws.Cells("E" + $rows).Clear()
$ws.Cells("E" + $rows).Formula = "=SUM(E2:E5)"
}
That gives me a Value does not fall within the expected range error.
How can I get this to clear each cell in the range E2:E<lastRow> and then insert a formula?
I have gone back to the following after finding a way to escape the entire formula using #' ... '#; there was clearly an error in my escape sequence earlier, but the following is not performing exactly as expected either.
$ws.Range("E2:E$rows").Clear();
$ws.Range("E2:E$rows").Formula = $Formula1
This now populates the formula, but it comes in as text and not as a formula; I have to solve for this.
#open the downloaded worksheet
$Excel = New-Object -Com Excel.Application
$Workbook = $Excel.Workbooks.Open($of)
$page = 'Project Summary'
$ws = $Workbook.Worksheets | Where-Object {$_.Name -eq $page}
# Set variables for the worksheet cells, and for navigation
$cells = $ws.Cells
$row = 1
$col = 5
# Add the header to the worksheet
$headers = "Region"
#open the downloaded worksheet
$excel = New-Object -Com Excel.Application
$Workbook = $Excel.Workbooks.Open($of)
$page = 'Project Summary'
$ws = $Workbook.worksheets | where-object {$_.Name -eq $page}
$Formula1 = #"
=SUM(E2:E5)
"#
$cells=$ws.Cells
$row=1
$col=5
$range = $ws.UsedRange
$rows = $range.Rows.Count
$ws.Range("E2:E$rows").Clear();
$ws.Range("E2:E$rows").Formula = $Formula1
# Add the header to the worksheet
$headers = "Region"
$headers | foreach {
$cells.item($row, $col) = $_
}
$excel.visible = $true
$excel.DisplayAlerts = $False
$excel.ActiveWorkbook.SaveAs('W:\Test.xlsx')
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
Remove-Item $of
This is the outcome. Escaping may or may not have been the original issue, but the help here guided me to a better way to deal with it if it was the issue. I still never solved why the loop was giving the value error, but this works so I won't have to tackle that yet; that is the next script which this post will also help me with.

Resources