Issues pulling value of cell using excel com objects in powershell - excel

I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}

Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.

Related

Powershell - Creating Excel Workbook - Getting "Insufficient memory to continue the execution of the program"

I'm trying to create an Excel workbook, then populate the cells with data found from searching many txt files.
I read a file and extract all comments AFTER I find "IDENTIFICATION DIVISION" and BEFORE I find "ENVIRONMENT DIVISION"
I then populate two cells in my excel workbook. cell one if the file and cell two is the comments extracted.
I have 256GB of memory on the work server. less than %5 is being used before Powershell throws the memory error.
Can anyone see where I'm going wrong?
Thanks,
-Ron
$excel = New-Object -ComObject excel.application
$excel.visible = $False
$workbook = $excel.Workbooks.Add()
$diskSpacewksht= $workbook.Worksheets.Item(1)
$diskSpacewksht.Name = "XXXXX_Desc"
$col1=1
$diskSpacewksht.Cells.Item(1,1) = 'Program'
$diskSpacewksht.Cells.Item(1,2) = 'Description'
$CBLFileList = Get-ChildItem -Path 'C:\XXXXX\XXXXX' -Filter '*.cbl' -File -Recurse
$Flowerbox = #()
ForEach($CBLFile in $CBLFileList) {
$treat = $false
Write-Host "Processing ... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match 'IDENTIFICATION DIVISION') {
# Write-Host "Match IDENTIFICATION DIVISION" -foregroundcolor green
$treat = $true
}
if ($_ -match 'ENVIRONMENT DIVISION') {
# Write-Host "Match ENVIRONMENT DIVISION" -foregroundcolor green
$col1++
$diskSpacewksht.Cells.Item($col1,1) = $CBLFile.Name
$diskSpacewksht.Cells.Item($col1,2) = [String]$Flowerbox
$Flowerbox = #()
continue
}
if ($treat) {
if ($_ -match '\*(.{62})') {
Foreach-Object {$Flowerbox += $matches[1] + "`r`n"}
$treat = $false
}
}
}
}
$excel.DisplayAlerts = 'False'
$ext=".xlsx"
$path="C:\Desc.txt"
$workbook.SaveAs($path)
$workbook.Close
$excel.DisplayAlerts = 'False'
$excel.Quit()
Not knowing what the contents of the .CBL files could be, I would suggest not to try and do all of this using an Excel COM object, but create a CSV file instead to make things a lot easier.
When finished, you can simply open that csv file in Excel.
# create a List object to collect the 'flowerbox' strings in
$Flowerbox = [System.Collections.Generic.List[string]]::new()
$treat = $false
# get a list of the .cbl files and loop through. Collect all output in variable $result
$CBLFileList = Get-ChildItem -Path 'C:\XXXXX\XXXXX' -Filter '*.cbl' -File -Recurse
$result = foreach ($CBLFile in $CBLFileList) {
Write-Host "Processing ... $($CBLFile.FullName)" -ForegroundColor Green
# using switch -File is an extremely fast way of testing a file line by line.
# instead of '-Regex' you can also do '-WildCard', but then add asterikses around the strings
switch -Regex -File $CBLFile.FullName {
'IDENTIFICATION DIVISION' {
# start collecting Flowerbox lines from here
$treat = $true
}
'ENVIRONMENT DIVISION' {
# stop colecting Flowerbox lines and output what we already have
# output an object with the two properties you need
[PsCustomObject]#{
Program = $CBLFile.Name # or $CBLFile.FullName
Description = $Flowerbox -join [environment]::NewLine
}
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
}
default {
# as I have no idea what these lines may look like, I have to
# assume your regex '\*(.{62})' is correct..
if ($treat -and ($_ -match '\*(.{62})')) {
$Flowerbox.Add($Matches[1])
}
}
}
}
# now you have everything in an array of PSObjects so you can save that as Csv
$result | Export-Csv -Path 'C:\Desc.csv' -UseCulture -NoTypeInformation
Parameter -UseCulture ensures you can double-click the file so it will open correctly in your Excel
You can also create an Excel file from this csv programmatically like:
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false
$workbook = $excel.Workbooks.Open('C:\Desc.csv')
$worksheet = $workbook.Worksheets.Item(1)
$worksheet.Name = "XXXXX_Desc"
# save as .xlsx
# 51 ==> [Microsoft.Office.Interop.Excel.XlFileFormat]::xlWorkbookDefault
# see: https://learn.microsoft.com/en-us/office/vba/api/excel.xlfileformat
$workbook.SaveAs('C:\Desc.xlsx', 51)
# quit Excel and remove all used COM objects from memory
$excel.Quit()
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

How to use powershell to select range and dump that to csv file

Actually, this is a version of question here:
How to use powershell to select and copy columns and rows in which data is present in new workbook.
The goal is to grab certain columns from multiple Excel workbooks and dump everything to one csv file. Columns are always the same.
I'm doing that manually:
$xl = New-Object -ComObject Excel.Application
$xl.Visible = $false
$xl.DisplayAlerts = $false
$counter = 0
$input_folder = "C:\Users\user\Documents\excelfiles"
$output_folder = "C:\Users\user\Documents\csvdump"
Get-ChildItem $input_folder -File |
Foreach-Object {
$counter++
$wb = $xl.Workbooks.Open($_.FullName, 0, 1, 5, "")
try {
$ws = $wb.Worksheets.item('Calls') # => This specific worksheet
$rowMax = ($ws.UsedRange.Rows).count
for ($i=1; $i -le $rowMax-1; $i++) {
$newRow = New-Object -Type PSObject -Property #{
'Type' = $ws.Cells.Item(1+$i,1).text
'Direction' = $ws.Cells.Item(1+$i,2).text
'From' = $ws.Cells.Item(1+$i,3).text
'To' = $ws.Cells.Item(1+$i,4).text
}
$newRow | Export-Csv -Path $("$output_folder\$ESO_Output") -Append -noType -Force
}
}
} catch {
Write-host "No such workbook" -ForegroundColor Red
# Return
}
}
Question:
This works, but is extremely slow because Excel has to select every cell, copy that, then Powershell has to create array and save row by row in output csv file.
Is there a method to select a range in Excel (number of columns times ($ws.UsedRange.Rows).count), cut header line and just append this range (array?) to csv file to make everything much faster?
So that's the final solution
Script is 22 times faster!!! than original solution.
Hope somebody will find that useful :)
PasteSpecial is to filter out empty rows. There is no need to save them into csv
$xl = New-Object -ComObject Excel.Application
$xl.Visible = $false
$xl.DisplayAlerts = $false
$counter = 0
$input_folder = "C:\Users\user\Documents\excelfiles"
$output_folder = "C:\Users\user\Documents\csvdump"
Get-ChildItem $input_folder -File |
Foreach-Object {
$counter++
try {
$new_ws1 = $wb.Worksheets.add()
$ws = $wb.Worksheets.item('Calls')
$rowMax = ($ws.UsedRange.Rows).count
$range = $ws.Range("A1:O$rowMax")
$x = $range.copy()
$y = $new_ws1.Range("A1:O$rowMax").PasteSpecial([System.Type]::Missing,[System.Type]::Missing,$true,$false)
$wb.SaveAs("$($output_folder)\$($_.Basename)",[Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSVWindows)
} catch {
Write-host "No such workbook" -ForegroundColor Red
# Return
}
}
$xl.Quit()
Part above will generate a bunch of csv files.
Part below will read these files in separate loop and combine them together into one.
-exclude is an array of something I want to omit
Remove-Item to remove temporary files
Answer below is based on this post: https://stackoverflow.com/a/27893253/6190661
$getFirstLine = $true
Get-ChildItem "$output_folder\*.csv" -exclude $excluded | foreach {
$filePath = $_
$lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "$($output_folder)\MERGED_CSV_FILE.csv" $linesToWrite
Remove-Item $_.FullName
}

How do I delete Excel files column on Powershell?

I want to delete many Excel files in folder at once.
So I write below code, but when it runs, terminal logged out Delete Method of Range class failed error popping up.
and more confusing thing, in some worksheet's delete process runs successfully.
I think, it causes from can not do release sheet object of powershell well.
can anybody help me? regards.
# Launch Excel
$excel = New-Object -ComObject Excel.Application -Property #{Visible = $false}
$baseDir = Convert-Path $(Split-Path $MyInvocation.InvocationName -Parent)
$files = Get-ChildItem -Recurse | ? { $_.Extension -eq ".xlsx" }
# "${baseDir}\{$_.name}"
# Open Book
$files|
%{
Write-Host $_.Name
$excel.Workbooks.Open("${baseDir}\" + $_.name) | %{
$_.Worksheets | %{
# Delete Column
# $_.Activate
Write-Host $_.Name
#$_.Columns.Item("J").Delete()
#$_.Columns("J:J").EntireColumn.Delete()
#$_.Columns.item(3).Insert()
#$_.Range("J:J").Delete()
$_.Columns("J").Delete()
}
$_.Save()
}
}
# Excel
$excel.Quit()
\[System.Runtime.InteropServices.Marshal]::FinalReleaseComObject($excel) | Out-Null
If you are looking to delete the J Column in every sheet of every workbook, this might help
$files| % {
# Prints name of File
Write-Host $_.Name
# There is always one workbook.
$workbook = $excel.Workbooks.Open($_.FullName) # FullName has the complete path.
$workbook.Worksheets | % {
# prints name of each worksheet
Write-Host $_.Name
# Deletes the column
$_.Range("J:J").EntireColumn.Delete() # prints True if successful.
# Or you can use the above statement in an IF statement.
if ($_.Range("J:J").EntireColumn.Delete()) {
Write-Host "Column J Deleted successfully"
}
# else print it didnt for $_.Name worksheet.
}
$workbook.Save()
}

Read csv stream line by line to create an array for Excel Range

This is my first post - I will be happy to make any corrections required for any mistakes made in the post.
I have been looking through the forums here for a few months and have learned a lot but I cannot seem to accomplish my goal with what I have found.
I need to read a CSV file (Read-Only) when it changes and place the resulting array into and active and open Excel 2016 Tab. I can do this using com and system.io.watcherchangetypes but this is too slow and requires copy paste.
I need to read the csv as fast as possible (under a second) and convert the lines into a usable array for Excel. This whole process has to take under 2 seconds MAX. Some of the CSV's will exceed 180,000 lines as the day goes on.
I work for a Trading Company.
I would be happy with a single column, Tab delimited and multiple Rows. I cant get the multiple rows.
I have to write the range line by line and that takes too long.
I was looking at this one but I am not clear on how to make the whole thing dynamic. There is no set amount of headers and the rows will change as well. I cannot work with any static data at all.
This is the post which prompted me to ask for help: How to use powershell to reorder CSV columns
$export = "\\UNC\to\file\Name.csv"
#$excel = New-Object -ComObject Excel.Application
#$excel.visible = $true
#$workbook = $excel.Workbooks.Add()
$reader = [System.IO.File]::OpenText($export)
$writer = New-Object System.IO.StreamWriter "data2.csv"
for(;;) {
$line = $reader.ReadLine()
if ($null -eq $line) {
break
}
$i=1
$data = $line.Split(",") | %{
if($_ -ne $null)
{
Write-Host $_ $i
++$i
}
}
[void]$data.Length
# $data.GetValue()
#$writer.WriteLine('{0},{1},{2}', $data[0], $data[1], $data[2])
}
$reader.Close()
#$writer.Close()
Any help will be greatly appreciated!
UPDATE:
I figured it out. The result is probably not the most efficient but it gets me what I need for now while i explore how to better accomplish it with what I have learned.
(Measure-Command { $data = [System.io.File]::Open($export, 'Open', 'Read', 'ReadWrite')
$reader = New-Object System.IO.StreamReader($data)
$count = 0
While($text = $reader.Readline())
{
If($text -eq $null)
{
$reader.Close()
$data.close()
}
++$count
}
}).TotalSeconds
$array2 = New-Object 'object[,]' $count,1
$end = ++$count
$file = New-Object System.IO.StreamReader -ArgumentList $export
$stringBuilder = New-Object System.Text.StringBuilder
$list = New-Object System.Collections.Generic.List[System.String]
$a = 0
Measure-Command {
While ($i = $file.ReadLine() -Replace ",","`t")
{
if ($i -eq $null)
{
$file.close()
break loop
}
$null = $stringBuilder.Append($i)
$list.Add($i)
$array2[$a,0] = $i
++$a
}
$outputString = $stringBuilder.ToString()
$array = $list.ToArray()
}
You can do something like this
data = pd.read_csv("data1.csv", sep='\s+',header=None)
dataarraynew13phse = np.array(data)
dataarraynew13phse=dataarraynew13phse.flatten()
sep = '\s+' can be useful to decode tabs, in multiple lines
And then flatten() can make it in a single row or array

PowerShell saving excel sheet in unreadable format

I have the below piece of code that checks for Files to Tapes jobs for a database and gives the output in an excel sheet.
$date = Get-Date
$day = $date.Day
$hour = $date.Hour
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $true
$Excel.DisplayAlerts = $false
$Workbook = $Excel.Workbooks.Add()
$Sheet = $Excel.Worksheets.Item(1)
#Counter variable for rows and columns
$intRow = 1
$intCol = 1
$Sheet.Cells.Item($intRow,1) = "Tasks/Servers"
$Sheet.Cells.Item($intRow,2) = "DateLastRun"
$Sheet.Cells.Item($intRow,3) = "PRX1CSDB01"
$Sheet.Cells.Item($intRow,4) = "PRX1CSDB02"
$Sheet.Cells.Item($intRow,5) = "PRX1CSDB03"
$Sheet.Cells.Item($intRow,6) = "PRX1CSDB11"
$Sheet.Cells.Item($intRow,7) = "PRX1CSDB12"
$Sheet.Cells.Item($intRow,8) = "PRX1CSDB13"
$Sheet.Cells.Item($intRow+1,1) = "File To Tape weekly Full Backup"
$Sheet.UsedRange.Rows.Item(1).Borders.LineStyle = 1
#FTT.txt contains the path for a list of servers
$path = Get-Content D:\Raghav\DB_Integrated\FTT.txt
foreach ($server in $path)
{
If (Test-Path $server)
{
$BckpWeek = gci -path $server | select-object | where {$_.Name -like "*logw*"} | sort LastWriteTime | select -last 1
$Sheet.Cells.Item($intRow+1,$intCol+1) = $BckpWeek.LastWriteTime.ToString('MMddyyyy')
$Sheet.UsedRange.Rows.Item($intRow).Borders.LineStyle = 1
$x = (get-date) - ([datetime]$BckpWeek.LastWriteTime)
if( $x.days -gt 7){$status_week = "Failed"}
else{$status_week = "Successful"}
$Sheet.Cells.Item($intRow+1,$intCol+2) = $status_week
$intCol++
}
else
{
$Sheet.Cells.Item($intRow+1,$intCol+2) = "Path Not Found"
$intCol++
}
}
$Sheet.UsedRange.EntireColumn.AutoFit()
$workBook.SaveAs("C:\Users\Output.xlsx",51)
$excel.Quit()
However, when I try to import the contents of Output.xlsx into a variable say $cc, I get data in an unreadable format.
$cc = Import-Csv "C:\Users\Output.xlsx"
Attached is the image for what I get on exporting output.xlsx into $cc. I tried to put the output in csv format too. But that also doesnt seem to help.Anybody having any idea on this or having faced any similar situation before?
#ZevSpitz - Looking for the OleDbConnection class, I landed up at https://blogs.technet.microsoft.com/pstips/2014/06/02/get-excel-data-without-excel/ . This is what I was looking for. Thank you for pointing me out in the right direction.
#MikeGaruccio - Unfortunately, I didn't find Import-Excel command in Get-Help menu. I am using Powershell 4.0. Anyways, thank you for the suggestion.

Resources