Checking file names in a directory with entries in an excel spreadsheet; What am I doing wrong? - excel

I'm attempting to write a PowerShell script (my first ever, so be gentle) to go through all the file names in a directory and check if they exist in an excel spreadsheet that I have. If a file name does exist in both, I want to move/copy that file to a new directory.
Right now it runs with no errors, but nothing actually happens.
So far I have:
#open excel sheet
$objexcel=new-object -com excel.application
$workbook=$objexcel.workbooks.open("<spreadsheet location>")
#use Sheet2
$worksheet = $workbook.sheets.Item(2)
#outer loop: loop through each file in directory
foreach ($_file in (get-childitem -path "<directory to search>"))
{
$filename = [system.IO.path]::GetFileNameWithoutExtension($_)
#inner loop: check with every entry in excel sheet (if is equal)
$intRowCount = ($worksheet.UsedRange.Rows).count
for ($intRow = 2 ; $intRow -le $intRowCount ; $intRow++)
{
$excelname = $worksheet.cells.item($intRow,1).value2
if ($excelname -eq $filename)
{ #move to separate folder
Copy-Item -path $_file -Destination "<directory for files to be copied to>"
}
#else do nothing
}
}
#close excel sheet
$workbook.close()
$objexcel.quit()

You're trying to define $filename based on the current object ($_), but that variable isn't populated in a foreach loop:
$filename = [system.IO.path]::GetFileNameWithoutExtension($_)
Because of that $filename is always $null and therefore never equal to $excelname.
Replace the foreach loop with a ForEach-Object loop if you want to use $_. I'd also recommend to read the Excel cell values into an array outside that loop. That improves performance and allows you to use the array it in a -contains filter, which would remove the need for having a loop in the first place.
$intRowCount = ($worksheet.UsedRange.Rows).count
$excelnames = for ($intRow = 2; $intRow -le $intRowCount; $intRow++) {
$worksheet.cells.item($intRow,1).value2
}
Get-ChildItem -Path "<directory to search>" |
Where-Object { $excelnames -contains $_.BaseName } |
Copy-Item -Destination "<directory for files to be copied to>"
On a more general note: you shouldn't use variable names starting with an underscore. They're too easily confused with properties of the current object variable ($_name vs. $_.name).

Related

Powershell: search & replace in xlsx except first 3 columns

I need to run PS script on multiple xlsx files where I need to search and replace certain values. Script has to check entire sheet, but needs to ignore first 4 columns, aka, it has to "start" from column number 5. Is there a way how to do that with the script below please? Those first 4 columns need to be present when exporting/saving a final xlsx file. Thank you.
# variables
$name = "1stname"
$surname = "Surname"
# xlsx to work with
$filename = Get-ChildItem -Path .\*.xlsx
$xldata = Import-Excel -Path $filename -WorksheetName "Sheet1"
$columns = $xldata[0].psobject.Properties.Name
#script
foreach ($row in $xldata) {
foreach ($cell in $columns) {
$oldvalue = $row.”$cell”
$newvalue = $oldvalue -replace $name, $surname
$row.”$cell” = $newvalue
}
}
# save xlsx file
$xldata | Export-Excel -Path $filename -WorksheetName “Sheet1” -ClearSheet
You could replace your second foreach loop with a for loop instead, as you'll then be able to skip the first x records as desired.
It would look like this to skip the first 4 columns:
# xlsx to work with
$filename = Get-ChildItem -Path .\*.xlsx
$xldata = Import-Excel -Path $filename -WorksheetName "Sheet1"
$columns = $xldata[0].psobject.Properties.Name
foreach ($row in $xldata) {
for ($i = 4; $i -lt $columns.Count; $i++)
{
$cell = $columns[$i]
$oldvalue = $row."$cell"
$newvalue = $oldvalue -replace $Space, $ReplaceSpace
$row."$cell" = $newvalue
}
}
# save xlsx file
$xldata | Export-Excel -Path $filename -WorksheetName "Sheet1" -ClearSheet
Replace the $i = 4 with another number if you want to start on a different column number instead.

Issues pulling value of cell using excel com objects in powershell

I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}
Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.

Delete extra rows in an excel file with powershell?

I have been tasked to automate part of the logging process on a SPLA server owned by the company. My task is to date, archive, and remove the old files, then move onto a generating a report to be emailed to the department. This task is supposed to be ran at the end of every week.
I figured powershell would be the best option to complete this task. This is my first time working with powershell so I had a bit of learning to do.
My question:
Is it possible to loop through an excel worksheet and delete unused rows using this script?
My condition would be if there are two empty rows -> delete one row and keep going
I am taking info from the log and splitting it into a CSV then converting the CSV to an excel for formatting.
Sample of the Excel spreadsheet, others vary in excess rows between information
Get-ChildItem C:\ScriptsDirectory1\*.log | foreach{
$input = Get-Content $_.FullName #Initialize input
$a = Get-Date #Save the current date (for if/else wrapper)
#=============================================#
# File Name Changer #
#=============================================#
$x = $_.LastWriteTime.ToShortDateString() #Save a temp variable with the LastWriteTime and send it to a string
$new_folder_name = Get-Date $x -Format yyyy.MM.dd #Create a new folder that contains the string information
$des_path = "C:\Archive\ArchivedLogs\$new_folder_name" #Send the new folder to the archive directory
#=============================================#
$data = $input[1..($input.Length - 1)] #Initialize Array and set it to the length of the input file.
$maxLength = 0
$objects = ForEach($record in $data) { #Loop through each object within the array
$split = $record -split ": " #Split objects within array at the ": " string
If($split.Length -gt $maxLength){
$maxLength = $split.Length
}
$properties = #{}
For($i=0; $i -lt $split.Length; $i++) { #Adds the split information to the strings array
$properties.Add([String]($i+1),$split[$i])
}
New-Object -TypeName PSObject -Property $properties
}
$objects | format-table
$headers = [String[]](1..$maxLength)
$objects |
Select-Object $headers |
Export-Csv -NoTypeInformation -Path "C:\Archive\CSVReports\$new_folder_name.csv"#Export CSV path using the new folder name to prevent overwrite
if (test-path $des_path){ #Test if the path exists, and fill the directory with the file to be archived
move-item $_.fullname $des_path
} else {
new-item -ItemType directory -Path $des_path
move-item $_.fullname $des_path
}
} #End of Parser
#===============================================================================#
#======================================#========================================#
#===============================================================================#
# File Archiver and Zipper (After Parse/CSV) #
#===============================================================================#
#======================================#========================================#
#===============================================================================#
$files = Get-ChildItem C:\Archive\ArchivedLogs #Fill the $files variable with the new files in the Archive directory
#********************************#
#Loop Through and Compress/Delete#
#********************************#
foreach ($file in $files) {
Write-Zip $file "C:\Archive\ArchivedLogs\$file.zip" -Level 9 #Write compressed file
} #End of Archiver
Remove-Item C:\Archive\ArchivedLogs\* -exclude *.zip -recurse #Remove the un-needed files within the archive folder
#Run the Formatting and Conversion script for the CSV-to-XLSX
#C:\ScriptsDirectory1\Script\TestRunner1.ps1 #<---Can be Ran using a Invoke call
#===============================================================================#
#======================================#========================================#
#===============================================================================#
# CSV to XLSX Format/Conversion #
#===============================================================================#
#======================================#========================================#
#===============================================================================#
Get-ChildItem C:\Archive\CSVReports | foreach{
$excel_file_path = $_.FullName #Create the file path variable to initialize for formating
$Excel = New-Object -ComObject Excel.Application #Start a new excel application
$Excel.Visible = $True
$Excel.DisplayAlerts=$False
$Excel_Workbook = $Excel.Workbooks.Open($excel_file_path) #Create workbook variable and open a workbook in the path
$FileName = $_.BaseName #Save the base file name of the current value
$Excel.ActiveSheet.ListObjects.add(1,$Excel_Workbook.ActiveSheet.UsedRange,0,1)
$Excel_Workbook.ActiveSheet.UsedRange.EntireColumn.AutoFit()
$SPLA1wksht = $Excel_Workbook.Worksheets.Item(1) #Create the new Sheet (SPLA1wksht)
#*******************************************************#
# Formating for Title Cell #
#*******************************************************#
$SPLA1wksht.Name = 'SPLA Info Report' #Change worksheet name
$SPLA1wksht.Cells.Item(1,1) = $FileName #Title (Date of log) in cell A1
$SPLA1wksht.Cells.Item(1,2) = 'SPLA Weekly Report' #Title for all Excel reports
$SPLA1wksht.Cells.Item(1.2).Font.Size = 18
$SPLA1wksht.Cells.Item(1.2).Font.Bold=$True
$SPLA1wksht.Cells.Item(1.2).Font.Name="Cambria"
$SPLA1wksht.Cells.Item(1.2).Font.ThemeFont = 1
$SPLA1wksht.Cells.Item(1.2).Font.ThemeColor = 5
$SPLA1wksht.Cells.Item(1.2).Font.Color = 8210719
#*******************************************************#
#************************************#
# Adjust and Merge Cell B1 #
#************************************#
$range = $SPLA1wksht.Range("b1","h2")
$range.Style = 'Title'
$range = $SPLA1wksht.Range("b1","g2")
$range.VerticalAlignment = -4108 #Center align vertically (Value -4108 is center)
#************************************#
#***********************************************************************#
# Horizontal Centering for all cells #
#***********************************************************************#
$ColumnRange = $SPLA1wksht.Range("a1","a500").horizontalAlignment =-4108 #Center all cells in this range as -4108
$ColumnRange = $SPLA1wksht.Range("b1","b500").horizontalAlignment =-4108
#**********************************************#
# Delete Blank Rows Inneffective- Logs that have different
#data end up with a different amount of rows and offsets this deletion
# # This method deletes the first row then
#moves onto
#**********************************************# # the next-in-line blank lines and deletes the one
#$SPLA1wksht.Cells.Item(2,1).EntireRow.Delete() # # line until the blank spots are in perfect format
#
#$SPLA1wksht.Cells.Item(4,1).EntireRow.Delete() #
#$SPLA1wksht.Cells.Item(4,1).EntireRow.Delete() #
#$SPLA1wksht.Cells.Item(4,1).EntireRow.Delete() #
#$SPLA1wksht.Cells.Item(4,1).EntireRow.Delete() #
#
#$SPLA1wksht.Cells.Item(19,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(19,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(19,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(19,1).EntireRow.Delete()#
#
#$SPLA1wksht.Cells.Item(25,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(25,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(25,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(25,1).EntireRow.Delete()#
#
#$SPLA1wksht.Cells.Item(33,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(33,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(33,1).EntireRow.Delete()#
#$SPLA1wksht.Cells.Item(33,1).EntireRow.Delete()#
#**********************************************#
#*****************************************************************#
# Final Export as a CSV-to-XLSX file #
#*****************************************************************#
$Excel_Workbook.SaveAs("C:\Archive\ExcelReports\$FileName.xlsx",51) #Save the file in the proper location
$Excel_Workbook.Saved = $True
$Excel.Quit()
# Find a way to optimize this process
#Potential optimization places:
# 1.) Don't open and close excel file and instead just write changes and save
# 2.) Change way empty rows are formatted instead of seperate calls each time
} #End of Format/Converter
#******End******#
#---------------#--#--------------#
#---------------------------------#
# What to Add to the Script #
#---------------------------------#
#---------------#--#--------------#
# -[/] <-Complete -[] <- Incomplete
# -[] Archive or delete CSV Files
# -[] Add a If/Else statement that checks if files are >7 days old
# -[] Compile a weekender report that indicates any SPLA programs changed to keep compliance
# -[] Filter for only SPLA files (Need a list)
# -[] Loop through CSV/Excel file and delete empty rows
The following code worked to run through the program:
for($i = 350 ; $i -ge 0 ; $i--) {
If ($SPLA1wksht.Cells.Item($i, 1).Text-eq "") {
$Range = $SPLA1wksht.Cells.Item($i, 1).EntireRow
[void]$Range.Delete()
echo $i
}
If ($SPLA1wksht.Cells.Item($i, 2).Text-eq "") {
$Range = $SPLA1wksht.Cells.Item($i, 2).EntireRow
[void]$Range.Delete()
echo $i
}
If($i -eq 2){ break;}
}
This should be relatively straight forward
$file = C:\path\to\file.csv
$csv = Import-Csv $file
foreach($row in $csv) {
# logic to delete row
# $csv is an array, so you can could make the row = null to delete it
}
# spit out updated excel sheet
Export-Csv | $csv -NoTypeInformation

Is there a faster way to parse an excel document with Powershell?

I'm interfacing with an MS Excel document via Powershell. There is a possibility of each excel document of having around 1000 rows of data.
Currently this script seems to read the Excel file and write a value to screen at a rate of 1 record every .6 seconds. At first glance that seems extremely slow.
This is my first time reading an Excel file with Powershell, is this the norm? Is there a faster way for me to read and parse the Excel data?
Here is the script output (trimmed for readability)
PS P:\Powershell\ExcelInterfaceTest> .\WRIRMPTruckInterface.ps1 test.xlsx
3/20/2013 4:46:01 PM
---------------------------
2 078110
3 078108
4 078107
5 078109
<SNIP>
242 078338
243 078344
244 078347
245 078350
3/20/2013 4:48:33 PM
---------------------------
PS P:\Powershell\ExcelInterfaceTest>
Here is the Powershell script:
########################################################################################################
# This is a common function I am using which will release excel objects
########################################################################################################
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
########################################################################################################
# Variables
########################################################################################################
########################################################################################################
# Creating excel object
########################################################################################################
$objExcel = new-object -comobject excel.application
# Set to false to not open the app on screen.
$objExcel.Visible = $False
########################################################################################################
# Directory location where we have our excel files
########################################################################################################
$ExcelFilesLocation = "C:/ShippingInterface/" + $args[0]
########################################################################################################
# Open our excel file
########################################################################################################
$UserWorkBook = $objExcel.Workbooks.Open($ExcelFilesLocation)
########################################################################################################
# Here Item(1) refers to sheet 1 of of the workbook. If we want to access sheet 10, we have to modify the code to Item(10)
########################################################################################################
$UserWorksheet = $UserWorkBook.Worksheets.Item(2)
########################################################################################################
# This is counter which will help to iterrate trough the loop. This is simply a row counter
# I am starting row count as 2, because the first row in my case is header. So we dont need to read the header data
########################################################################################################
$intRow = 2
$a = Get-Date
write-host $a
write-host "---------------------------"
Do {
# Reading the first column of the current row
$TicketNumber = $UserWorksheet.Cells.Item($intRow, 1).Value()
write-host $intRow " " $TicketNumber
$intRow++
} While ($UserWorksheet.Cells.Item($intRow,1).Value() -ne $null)
$a = Get-Date
write-host $a
write-host "---------------------------"
########################################################################################################
# Exiting the excel object
########################################################################################################
$objExcel.Quit()
########################################################################################################
#Release all the objects used above
########################################################################################################
$a = Release-Ref($UserWorksheet)
$a = Release-Ref($UserWorkBook)
$a = Release-Ref($objExcel)
In his blog entry Speed Up Reading Excel Files in PowerShell, Robert M. Toups, Jr. explains that while loading to PowerShell is fast, actually reading the Excel cells is very slow. On the other hand, PowerShell can read a text file very quickly, so his solution is to load the spreadsheet in PowerShell, use Excel’s native CSV export process to save it as a CSV file, then use PowerShell’s standard Import-Csv cmdlet to process the data blazingly fast. He reports that this has given him up to a 20 times faster import process!
Leveraging Toups’ code, I created an Import-Excel function that lets you import spreadsheet data very easily.
My code adds the capability to select a specific worksheet within an Excel workbook, rather than just using the default worksheet (i.e. the active sheet at the time you saved the file). If you omit the –SheetName parameter, it uses the default worksheet.
function Import-Excel([string]$FilePath, [string]$SheetName = "")
{
$csvFile = Join-Path $env:temp ("{0}.csv" -f (Get-Item -path $FilePath).BaseName)
if (Test-Path -path $csvFile) { Remove-Item -path $csvFile }
# convert Excel file to CSV file
$xlCSVType = 6 # SEE: http://msdn.microsoft.com/en-us/library/bb241279.aspx
$excelObject = New-Object -ComObject Excel.Application
$excelObject.Visible = $false
$workbookObject = $excelObject.Workbooks.Open($FilePath)
SetActiveSheet $workbookObject $SheetName | Out-Null
$workbookObject.SaveAs($csvFile,$xlCSVType)
$workbookObject.Saved = $true
$workbookObject.Close()
# cleanup
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) |
Out-Null
$excelObject.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) |
Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
# now import and return the data
Import-Csv -path $csvFile
}
These supplemental functions are used by Import-Excel:
function FindSheet([Object]$workbook, [string]$name)
{
$sheetNumber = 0
for ($i=1; $i -le $workbook.Sheets.Count; $i++) {
if ($name -eq $workbook.Sheets.Item($i).Name) { $sheetNumber = $i; break }
}
return $sheetNumber
}
function SetActiveSheet([Object]$workbook, [string]$name)
{
if (!$name) { return }
$sheetNumber = FindSheet $workbook $name
if ($sheetNumber -gt 0) { $workbook.Worksheets.Item($sheetNumber).Activate() }
return ($sheetNumber -gt 0)
}
If the data is static (no formulas involved, just data in cells), you can access the spreadsheet as an ODBC data source and execute SQL (or at least SQL-like) queries against it. Have a look at this reference for setting up your connectionstring (each worksheet in a workbook will be a "table" for this exercise), and use System.Data to query it the same as you would a regular database (Don Jones wrote a wrapper function for this which may help).
This should be faster than launching Excel & picking through cell by cell.

Using Parameters in Powershell to assign folder locations

I'm new to powershell and I need some help here. Below is a script I wrote to locate an excel file in folder. The files in the excel sheet would be compared to the contents of another folder on the same machine. Locations are : "C:\MKS_DEV\" and The resultant matched files would be zipped and put in another location as shown in the scripts. These scripts would be used by other people on different machines so the locations of both folders could differ on different machines.
I want to write an argument or using parameters for the location of both folders so that I wouldn't have to specify the location all the time I have to run the scripts and cant figure out how to implement this.
The scripts works perfectly but I just need to incorporate arguments/parameters into it. Any help would be very much appreciated.
Thanks.
Here is the code:
# Creating an object for the Excel COM addin
$ExcelObject = New-Object -ComObject Excel.Application
# Opening the Workbook
$ExcelWorkbook = $ExcelObject.Workbooks.Open("C:\Eric_Source\Test.xls")
# Opening the Worksheet by using the index (1 for the first worksheet)
$ExcelWorksheet = $ExcelWorkbook.Worksheets.Item(1)
# The folder where the files will be copied/The folder which will be zipped
# later
$a = Get-Date
$targetfolder = "C:\"+$a.Day+$a.Month+$a.Year+$a.Hour+$a.Minute+$a.Second
# Check if the folder already exists. Command Test-Path $targetfolder returns
# true or false.
if(Test-Path $targetfolder)
{
# delete the folder if it already exists. The following command deletes a
# particular directory
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
# The following command is used to create a particular directory
New-Item -ItemType directory -Path $targetfolder
# Declaration of variables, COlumn value = 6 for Column F
$row = 1
$col = 6
# Read a value from the worksheet with the following command
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
# change the folder value below to specify the folder where the powershell
# needs to search for the filename that it reads from excel file.
$folder = "C:\MKS_DEV\"
$null = ""
You have multiple ways to parameter your script.
The first one is to us $args[n] [automatic variable]1.
If your script is called MyScript.PS1 you can call it with :
MyScript.PS1 "C:\Eric_Source\Test.xls"
Then inside your script use $args[0] for the first argument.
Another way is to use the reserved word Param at the begining of your script:
Param ($MyParam1, $MyParam2)
When you call your script $MyParam1 will contain the first param and so on.
You could create it as a function and load it.
Function Folder-Deletion ($ExcelWorkbook,$targetfolder) {
$ExcelObject = New-Object -ComObject Excel.Application
$ExcelOpen = $ExcelObject.Workbooks.Open($ExcelWorkbook)
$ExcelWorksheet = $ExcelOpen.Worksheets.Item(1)
$a = Get-Date
if(Test-Path $targetfolder)
{
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
New-Item -ItemType directory -Path $targetfolder
$row = 1
$col = 6
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
}
Then run the function against the spreadsheet and folder like so:
Folder-Deletion C:\Eric_Source\Test.xls C:\MKS_DEV
Or you could create a PowerShell script file (e.g. FolderDeletion.ps1) with the following contents:
param($ExcelWorkbook,$targetfolder)
$ExcelObject = New-Object -ComObject Excel.Application
$ExcelOpen = $ExcelObject.Workbooks.Open($ExcelWorkbook)
$ExcelWorksheet = $ExcelOpen.Worksheets.Item(1)
$a = Get-Date
if(Test-Path $targetfolder)
{
Remove-Item $targetfolder -Force -Recurse -ErrorAction SilentlyContinue
}
New-Item -ItemType directory -Path $targetfolder
$row = 1
$col = 6
$filename = $ExcelWorksheet.Cells.Item($row,$col).Value2
$filename
Then run the script against the spreadsheet and folder like so:
FolderDeletion.ps1 C:\Eric_Source\Test.xls C:\MKS_DEV

Resources