How to search an entire excel workbook for a particular string using powershell - excel

I need to search for a particular string in a excel spreadsheet that has multiple sheets in it. I am looking for a way to search the entire contents of the excel file similar to the Find All option in Excel with the scope set to the workbook and not just the worksheet.
It would be really nice if there was something similar to the search string for a particular string in regular file, (ie)
gci xcelfile.xls | select-string -pattern $mySearchString
I have searched the internet and I don't see much existing information for searching the contents of an existing excel file using powershell. I am hoping I can get some pointers here to get me to my goal.
Any assistance is much appreciated.
Thanks
Don

Opens Excel
Loads File
Loops through each worksheet
Searches a range
Loops through find next Outputs Index $Column$Row
Exits Excel
$File = "C:\TEST.xlsx"
$SearchString = "TEST"
$Excel = New-Object -ComObject Excel.Application
$Workbook = $Excel.Workbooks.Open($File)
for($i = 1; $i -lt $($Workbook.Sheets.Count() + 1); $i++){
$Range = $Workbook.Sheets.Item($i).Range("A:Z")
$Target = $Range.Find($SearchString)
$First = $Target
Do
{
Write-Host "$i $($Target.AddressLocal())"
$Target = $Range.FindNext($Target)
}
While ($Target -ne $NULL -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()

Related

Issues pulling value of cell using excel com objects in powershell

I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}
Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.

Excel add Row Grouping using powershell

I have below csv file, I want to import into excel and add the row grouping for the child items using powershell. I was able open the file and format the cell. Not sure how to add row grouping.
Data
name,,
one,,
,value1,value2
,value3 ,value4
two,,
,value4,sevalue4
,value5,sevalue5
,value6,sevalue6
,value7,sevalue7
three,,
,value8,sevalue8
,value9,sevalue9
,value10,sevalue10
,value11,sevalue11
I want to convert like this in excel.
Here is the code I have it to open it in excel.
$a = New-Object -comobject Excel.Application
$a.visible = $True
$b = $a.Workbooks.Open("C:\shared\c1.csv")
$c = $b.Worksheets.Item(1)
$d = $c.Cells(1,1)
$d.Interior.ColorIndex = 19
$d.Font.ColorIndex = 11
$d.Font.Bold = $True
$b.Save("C:\shared\c1.xlsx")
How do I add row grouping for this data?
Thanks
SR
Logic Applied:
Group all the consecutive rows for which the value in column A is blank
In the following code, I have opened a CSV file, made the required grouping as per the data shared by you and saved it. While saving it, because of the row grouping, I was not able to save it in csv format. So, I had to change the format to a normal workbook. But, it works.
Code
$objExl = New-Object -ComObject Excel.Application
$objExl.visible = $true
$objExl.DisplayAlerts = $false
$strPath = "C:\Users\gurmansingh\Documents\a.csv" #Enter the path of csv
$objBook = $objExl.Workbooks.open($strPath)
$objSheet = $objBook.Worksheets.item(1)
$intRowCount = $objSheet.usedRange.Rows.Count
for($i=1; $i -le $intRowCount; $i++)
{
if($objSheet.Cells.Item($i,1).text -like "")
{
$startRow = $i
for($j=$i+1; $j -le $intRowCount; $j++)
{
if($objSheet.cells.Item($j,1).text -ne "" -or $j -eq $intRowCount)
{
$endRow = $j-1
if($j -eq $intRowCount)
{
$endRow = $j
}
break
}
}
$str = "A"+$startRow+":A"+$endRow
$objSheet.Range($str).Rows.Group()
$i=$j
}
}
$objBook.SaveAs("C:\Users\gurmansingh\Documents\b",51) #saving in a different format.
$objBook.Close()
$objExl.Quit()
Before:
a.csv
Output after running the code:
b.xlsx
Also, check out how easy it is to do using my Excel PowerShell module.
Install-Module ImportExcel
https://github.com/dfinke/ImportExcel/issues/556#issuecomment-469897886

Checking file names in a directory with entries in an excel spreadsheet; What am I doing wrong?

I'm attempting to write a PowerShell script (my first ever, so be gentle) to go through all the file names in a directory and check if they exist in an excel spreadsheet that I have. If a file name does exist in both, I want to move/copy that file to a new directory.
Right now it runs with no errors, but nothing actually happens.
So far I have:
#open excel sheet
$objexcel=new-object -com excel.application
$workbook=$objexcel.workbooks.open("<spreadsheet location>")
#use Sheet2
$worksheet = $workbook.sheets.Item(2)
#outer loop: loop through each file in directory
foreach ($_file in (get-childitem -path "<directory to search>"))
{
$filename = [system.IO.path]::GetFileNameWithoutExtension($_)
#inner loop: check with every entry in excel sheet (if is equal)
$intRowCount = ($worksheet.UsedRange.Rows).count
for ($intRow = 2 ; $intRow -le $intRowCount ; $intRow++)
{
$excelname = $worksheet.cells.item($intRow,1).value2
if ($excelname -eq $filename)
{ #move to separate folder
Copy-Item -path $_file -Destination "<directory for files to be copied to>"
}
#else do nothing
}
}
#close excel sheet
$workbook.close()
$objexcel.quit()
You're trying to define $filename based on the current object ($_), but that variable isn't populated in a foreach loop:
$filename = [system.IO.path]::GetFileNameWithoutExtension($_)
Because of that $filename is always $null and therefore never equal to $excelname.
Replace the foreach loop with a ForEach-Object loop if you want to use $_. I'd also recommend to read the Excel cell values into an array outside that loop. That improves performance and allows you to use the array it in a -contains filter, which would remove the need for having a loop in the first place.
$intRowCount = ($worksheet.UsedRange.Rows).count
$excelnames = for ($intRow = 2; $intRow -le $intRowCount; $intRow++) {
$worksheet.cells.item($intRow,1).value2
}
Get-ChildItem -Path "<directory to search>" |
Where-Object { $excelnames -contains $_.BaseName } |
Copy-Item -Destination "<directory for files to be copied to>"
On a more general note: you shouldn't use variable names starting with an underscore. They're too easily confused with properties of the current object variable ($_name vs. $_.name).

Finding content of Excel file in Powershell

I am currently working on a fairly large powershell script. However, I got stuck at one part. The issue is the following.
I have various reports with the same file name, they just have a different time stamp at the end. Within the report, I have a field displaying the date from when to when the report is from.
---> 2/1/2015 5:00:00AM to 3/1/2015 5:00:00AM <--- This is what it looks like.
This field is randomly placed on the Excel Sheet. Pretty much in the range of A5 to Z16. What I would like the script to do is:
Read the file / Check the range of cells for the dates, if the date is found and it matches my search criteria, close the sheet and move it to a different folder / If date does not match, close and check next XLS file
This is what I got so far:
$File = "C:\test.XLS"
$SheetName = "Sheet1"
# Setup Excel, open $File and set the the first worksheet
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $true
$Workbook = $Excel.workbooks.open($file)
$Worksheets = $Workbooks.worksheets
$WorkSheet = $WorkBook.sheets.item($SheetName)
$SearchString = "AM" #just for test purposes since it is in every report
$Range = $Worksheet.Range("A1:Z1").EntireColumn
$Search = $Range.find($SearchString)
If you want it to search the entire column for A to Z you would specify the range:
$Range = $Worksheet.Range("A:Z")
Then you should be able to execute a $Range.Find($SearchText) and if the text is found it will spit back the first cell it finds it in, otherwise it returns nothing. So start Excel like you did, then do a ForEach loop, and inside that open a workbook, search for your text, if it is found close it, move it, stop the loop. If it is not found close the workbook, and move to the next file. The following worked just fine for me:
$Destination = 'C:\Temp\Backup'
$SearchText = '3/23/2015 10:12:19 AM'
$Excel = New-Object -ComObject Excel.Application
$Files = Get-ChildItem "$env:USERPROFILE\Documents\*.xlsx" | Select -Expand FullName
$counter = 1
ForEach($File in $Files){
Write-Progress -Activity "Checking: $file" -Status "File $counter of $($files.count)" -PercentComplete ($counter*100/$files.count)
$Workbook = $Excel.Workbooks.Open($File)
If($Workbook.Sheets.Item(1).Range("A:Z").Find($SearchText)){
$Workbook.Close($false)
Move-Item -Path $File -Destination $Destination
"Moved $file to $destination"
break
}
$workbook.close($false)
$counter++
}
I even got ambitious enough to add a progress bar in there so you can see how many files it has to potentially look at, how many it's done, and what file it's looking at right then.
Now this does all assume that you know exactly what the string is going to be (at least a partial) in that cell. If you're wrong, then it doesn't work. Checking for ambiguous things takes much longer, since you can't use Excel's matching function and have to have PowerShell check each cell in the range one at a time.

Is there a faster way to parse an excel document with Powershell?

I'm interfacing with an MS Excel document via Powershell. There is a possibility of each excel document of having around 1000 rows of data.
Currently this script seems to read the Excel file and write a value to screen at a rate of 1 record every .6 seconds. At first glance that seems extremely slow.
This is my first time reading an Excel file with Powershell, is this the norm? Is there a faster way for me to read and parse the Excel data?
Here is the script output (trimmed for readability)
PS P:\Powershell\ExcelInterfaceTest> .\WRIRMPTruckInterface.ps1 test.xlsx
3/20/2013 4:46:01 PM
---------------------------
2 078110
3 078108
4 078107
5 078109
<SNIP>
242 078338
243 078344
244 078347
245 078350
3/20/2013 4:48:33 PM
---------------------------
PS P:\Powershell\ExcelInterfaceTest>
Here is the Powershell script:
########################################################################################################
# This is a common function I am using which will release excel objects
########################################################################################################
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
########################################################################################################
# Variables
########################################################################################################
########################################################################################################
# Creating excel object
########################################################################################################
$objExcel = new-object -comobject excel.application
# Set to false to not open the app on screen.
$objExcel.Visible = $False
########################################################################################################
# Directory location where we have our excel files
########################################################################################################
$ExcelFilesLocation = "C:/ShippingInterface/" + $args[0]
########################################################################################################
# Open our excel file
########################################################################################################
$UserWorkBook = $objExcel.Workbooks.Open($ExcelFilesLocation)
########################################################################################################
# Here Item(1) refers to sheet 1 of of the workbook. If we want to access sheet 10, we have to modify the code to Item(10)
########################################################################################################
$UserWorksheet = $UserWorkBook.Worksheets.Item(2)
########################################################################################################
# This is counter which will help to iterrate trough the loop. This is simply a row counter
# I am starting row count as 2, because the first row in my case is header. So we dont need to read the header data
########################################################################################################
$intRow = 2
$a = Get-Date
write-host $a
write-host "---------------------------"
Do {
# Reading the first column of the current row
$TicketNumber = $UserWorksheet.Cells.Item($intRow, 1).Value()
write-host $intRow " " $TicketNumber
$intRow++
} While ($UserWorksheet.Cells.Item($intRow,1).Value() -ne $null)
$a = Get-Date
write-host $a
write-host "---------------------------"
########################################################################################################
# Exiting the excel object
########################################################################################################
$objExcel.Quit()
########################################################################################################
#Release all the objects used above
########################################################################################################
$a = Release-Ref($UserWorksheet)
$a = Release-Ref($UserWorkBook)
$a = Release-Ref($objExcel)
In his blog entry Speed Up Reading Excel Files in PowerShell, Robert M. Toups, Jr. explains that while loading to PowerShell is fast, actually reading the Excel cells is very slow. On the other hand, PowerShell can read a text file very quickly, so his solution is to load the spreadsheet in PowerShell, use Excel’s native CSV export process to save it as a CSV file, then use PowerShell’s standard Import-Csv cmdlet to process the data blazingly fast. He reports that this has given him up to a 20 times faster import process!
Leveraging Toups’ code, I created an Import-Excel function that lets you import spreadsheet data very easily.
My code adds the capability to select a specific worksheet within an Excel workbook, rather than just using the default worksheet (i.e. the active sheet at the time you saved the file). If you omit the –SheetName parameter, it uses the default worksheet.
function Import-Excel([string]$FilePath, [string]$SheetName = "")
{
$csvFile = Join-Path $env:temp ("{0}.csv" -f (Get-Item -path $FilePath).BaseName)
if (Test-Path -path $csvFile) { Remove-Item -path $csvFile }
# convert Excel file to CSV file
$xlCSVType = 6 # SEE: http://msdn.microsoft.com/en-us/library/bb241279.aspx
$excelObject = New-Object -ComObject Excel.Application
$excelObject.Visible = $false
$workbookObject = $excelObject.Workbooks.Open($FilePath)
SetActiveSheet $workbookObject $SheetName | Out-Null
$workbookObject.SaveAs($csvFile,$xlCSVType)
$workbookObject.Saved = $true
$workbookObject.Close()
# cleanup
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) |
Out-Null
$excelObject.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) |
Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
# now import and return the data
Import-Csv -path $csvFile
}
These supplemental functions are used by Import-Excel:
function FindSheet([Object]$workbook, [string]$name)
{
$sheetNumber = 0
for ($i=1; $i -le $workbook.Sheets.Count; $i++) {
if ($name -eq $workbook.Sheets.Item($i).Name) { $sheetNumber = $i; break }
}
return $sheetNumber
}
function SetActiveSheet([Object]$workbook, [string]$name)
{
if (!$name) { return }
$sheetNumber = FindSheet $workbook $name
if ($sheetNumber -gt 0) { $workbook.Worksheets.Item($sheetNumber).Activate() }
return ($sheetNumber -gt 0)
}
If the data is static (no formulas involved, just data in cells), you can access the spreadsheet as an ODBC data source and execute SQL (or at least SQL-like) queries against it. Have a look at this reference for setting up your connectionstring (each worksheet in a workbook will be a "table" for this exercise), and use System.Data to query it the same as you would a regular database (Don Jones wrote a wrapper function for this which may help).
This should be faster than launching Excel & picking through cell by cell.

Resources