How to search a string in an Excel using Powershell - excel

I have a folder containing multiple excel files. I want to search for a string like "2501" and find which of my files contains this string then output it's file name to a file (or display it). I wrote some scripts and googled but did not find the answer.
I wrote this :
$Location = "C:\1.xlsx"
$SearchStr = "Mike"
$Sel = Select-String -pattern $SearchStr -path $Location
If ($Sel -eq $null)
{
write-host "$Location does not contain $SearchStr" -ForegroundColor Cyan
}
Else
{
write-host "Found $SearchStr `n$Se in $Location"
}
Write-host "end" -ForegroundColor Yellow
This works only if I specify a txt file, it does not work with Excel.

This may help https://gallery.technet.microsoft.com/office/How-to-search-text-in-e16373b8
Also you may need to loop the folder to search in every excel file

Related

Exporting Charts from Excel

I am using Import-Excel (great module BTW) to create an Excel Spreadsheet with 1 chart in it. However, I would like to export this chart as a .png file. There are lots of examples using MS ComObject but I am running on a Mac and Com objects don't exist.
I have been able to get this far in my powershell script.
$package = Export-Excel -InputObject $data -Path $filename -TableName Xyz -ExcelChartDefinition $c -AutoNameRange -PassThru
Write-Host "Package is " $package.ToString()
$wb = $package.Workbook
Write-Host "Workbook is " $wb.ToString()
foreach ($ws in $wb.Worksheets) {
Write-Host "Worksheet is " $ws.ToString()
foreach ($excelchart in $ws.Drawings) {
Write-Host "Chart is " $excelchart.ToString()
}
}
with the following output:
Package is OfficeOpenXml.ExcelPackage
Workbook is OfficeOpenXml.ExcelWorkbook
Worksheet is Sheet1
Chart is OfficeOpenXml.Drawing.Chart.ExcelBarChart
Now, I would just like to know how to save this as a .png file.
Check out https://github.com/dfinke/ImportExcel/blob/master/Export-charts.ps1
It is an example, so your mileage may vary. If you have issues with it, please post that on the GitHub repo, and I can track it.

New-item "Illegal Characters in path" when I use a Variable that contains a here string

foreach ($Target in $TargetUSBs)
{
$LogPath= #"
$SourceUSB\$(((Get-CimInstance -ClassName Win32_volume)|where {$_.DriveType -eq "2" -and $_.DriveLetter -eq $Target}).SerialNumber)_
$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Year)$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Month)
$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Day)_$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Hour)
$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Minute)$(((Get-CimInstance -ClassName Win32_OperatingSystem).LocalDateTime).Second).txt
"#
$LogPath = $LogPath.Replace("`n","").Trim()
New-item -Path "$LogPath"
}
The Irony is when I copy and paste the contents of my variable and manually create a new-item -path and paste said contents it works but when I use the variable it does not...
Brief summary of my goal I am taking a USB labelled ORIGINAL and obtaining the S/N of every USB plugged in at the time and creating separate log files for each with the title consisting of SERIALNUMBER_DATE_TIME.txt and these files are created in ORIGINAL USB
$LogPath contains for example the following: E:\Mattel\1949721369_2018912_93427.txt
Yet when I use the Variable in New-item it indicates "Illegal characters in Path"
FYI $LogPath is a System.String not an object
$TargetUSBs is filled with all USB drives plugged into the system
this method of using a variable for a path usually works fine for me only difference is the here-string I used this time around does this cause my problem? I hope not because I really don't want to fill that variable all on one line. New-Item's helpfiles shows <String[]> for -path parameter does this mean I have to use a string array? and if so how do I convert this to make this work?
Your problem is that Windows uses CRLF line endings (Unix only LF), so you still have CR chars in your path.
To fix this just use:
.Replace("`r`n","")
However you can easily simplify your code so you do not require the messy here-string or replace/trim...
By using a single Get-Date call you can format it to your desired output. This means you can just build the Path as a simple string and involves much less code:
foreach ($Target in $TargetUSBs)
{
$SerialNumber = Get-CimInstance -ClassName Win32_volume | where {$_.DriveType -eq "2" -and $_.DriveLetter -eq $Target} | Select-Object -ExpandProperty SerialNumber
$DateTime = Get-Date -Format "yyyyMd_Hms"
New-item -Path "$SourceUSB\$SerialNumber_$DateTime.txt"
}

Remove known Excel passwords with PowerShell

I have this PowerShell code that loops through Excel files in a specified directory; references a list of known passwords to find the correct one; and then opens, decrypts, and saves that file to a new directory.
But it's not executing as quickly as I'd like (it's part of a larger ETL process and it's a bottleneck). At this point I can remove the passwords faster manually as the script takes ~40 minutes to decrypt 40 workbooks while referencing a list of ~50 passwords.
Is there a cmdlet or function (or something) that's missing which would speed this up, an overlooked flaw in the processing, or is PowerShell, perhaps, just not the right tool for this job?
Original Code (updated code can be found below):
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted\"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$original_Path = "C:\PoShTest\Originals\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "Processing" $file.name -f "DarkYellow"
write-host "Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -eq ".xlsx") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct - Save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0) {
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Close document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
# Move original file to $original_Path
move-item $file.fullname -Destination $original_Path -Force
}
else {
# Close document and Application
write-host "PASSWORD NOT FOUND: " $file.name -f "Magenta"
$ExcelObj.Close()
$ExcelObj.Application.Quit()
}
}
}
$count--
# Next File
}
Write-host "`n Processing Complete" -f "Green"
Updated code:
# Get Current EXCEL Process ID's so they are not affected but the scripts cleanup
# SilentlyContinue in case there are no active Excels
$currentExcelProcessIDs = (Get-Process excel -ErrorAction SilentlyContinue).Id
$a = Get-Date
$ErrorActionPreference = "SilentlyContinue"
CLS
# Paths
$encrypted_path = "C:\PoShTest\Encrypted"
$decrypted_Path = "C:\PoShTest\Decrypted\"
$processed_Path = "C:\PoShTest\Processed\"
$password_Path = "C:\PoShTest\Passwords\Passwords.txt"
# Load Password Cache
$arrPasswords = Get-Content -Path $password_Path
# Load File List
$arrFiles = Get-ChildItem $encrypted_path
# Create counter to display progress
[int] $count = ($arrfiles.count -1)
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
# Loop through each file
$arrFiles| % {
$file = get-item -path $_.fullname
# Display current file
write-host "`n Processing" $file.name -f "DarkYellow"
write-host "`n Items remaining: " $count `n
# Excel xlsx
if ($file.Extension -like "*.xls*") {
# Loop through password cache
$arrPasswords | % {
$passwd = $_
# Attempt to open file
$Workbook = $ExcelObj.Workbooks.Open($file.fullname,1,$false,5,$passwd)
$Workbook.Activate()
# if password is correct, remove $passwd from array and save new file without password to $decrypted_Path
if ($Workbook.Worksheets.count -ne 0)
{
$Workbook.Password=$null
$savePath = $decrypted_Path+$file.Name
write-host "Decrypted: " $file.Name -f "DarkGreen"
$Workbook.SaveAs($savePath)
# Added to keep Excel process memory utilization in check
$ExcelObj.Workbooks.close()
# Move original file to $processed_Path
move-item $file.fullname -Destination $processed_Path -Force
}
else {
# Close Document
$ExcelObj.Workbooks.Close()
}
}
}
$count--
# Next File
}
# Close Document and Application
$ExcelObj.Workbooks.close()
$ExcelObj.Application.Quit()
Write-host "`nProcessing Complete!" -f "Green"
Write-host "`nFiles w/o a matching password can be found in the Encrypted folder."
Write-host "`nTime Started : " $a.ToShortTimeString()
Write-host "Time Completed : " $(Get-Date).ToShortTimeString()
Write-host "`nTotal Duration : "
NEW-TIMESPAN –Start $a –End $(Get-Date)
# Remove any stale Excel processes created by this script's execution
Get-Process excel -ErrorAction SilentlyContinue | Where-Object{$currentExcelProcessIDs -notcontains $_.id} | Stop-Process
If nothing else I do see one glaring performance issue that should be easy to address. You are opening a new excel instance for testing each individual password for each document. 40 workbooks with 50 passwords mean you have opened 2000 Excel instances one at a time.
You should be able to keep using the same one without a functionality hit. Get this code out of your inner most loop
# New Excel Object
$ExcelObj = $null
$ExcelObj = New-Object -ComObject Excel.Application
$ExcelObj.Visible = $false
as well as the snippet that would close the process. It would need to be out of the loop as well.
$ExcelObj.Close()
$ExcelObj.Application.Quit()
If that does not help enough you would have to consider doing some sort of parallel processing with jobs etc. I have a basic solution in a CodeReview.SE answer of mine doing something similar.
Basically what it does is run several excels at once where each one works on a chunk of documents which runs faster than one Excel doing them all. Just like I do in the linked answer I caution the automation of Excel COM with PowerShell. COM objects don't always get released properly and locks can be left on files or processes.
You are looping for all 50 passwords regardless of success or not. That means you could find the right password on the first go but you are still going to try the other 49! Set a flag in the loop to break that inner loop when that happens.
As far as the password logic goes you say that
At this point I can remove the passwords faster manually since the script takes ~40 minutes
Why can you do it faster? What do you know that the script does not. I don't see you being able to out perform the script but doing exactly what it does.
With what I see another suggestion would be to keep/track successful passwords and associated file name. So that way when it gets processed again you would know the first password to try.
This solution uses the modules ImportExcel for easier working with Excel files, and PoshRSJob for multithreaded processing.
If you do not have these, install them by running:
Install-Module ImportExcel -scope CurrentUser
Install-Module PoshRSJob -scope CurrentUser
I've raised an issue on the ImportExcel module GitHub page where I've proposed a solution to open encrypted Excel files. The author may propose a better solution (and consider the impact across other functions in the module, but this works for me). For now, you'll need to make a modification to the Import-Excel function yourself:
Open: C:\Username\Documents\WindowsPowerShell\Modules\ImportExcel\2.4.0\ImportExcel.psm1 and scroll to the Import-Excel function. Replace:
[switch]$DataOnly
With
[switch]$DataOnly,
[String]$Password
Then replace the following line:
$xl = New-Object -TypeName OfficeOpenXml.ExcelPackage -ArgumentList $stream
With the code suggested here. This will let you call the Import-Excel function with a -Password parameter.
Next we need our function to repeatedly try and open a singular Excel file using a known set of passwords. Open a PowerShell window and paste in the following function (note: this function has a default output path defined, and also outputs passwords in the verbose stream - make sure no-one is looking over your shoulder or just remove that if you'd prefer):
function Remove-ExcelEncryption
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)]
[String]
$File,
[Parameter(Mandatory=$false)]
[String]
$OutputPath = 'C:\PoShTest\Decrypted',
[Parameter(Mandatory=$true)]
[Array]
$PasswordArray
)
$filename = Split-Path -Path $file -Leaf
foreach($Password in $PasswordArray)
{
Write-Verbose "Attempting to open $file with password: $Password"
try
{
$ExcelData = Import-Excel -path $file -Password $Password -ErrorAction Stop
Write-Verbose "Successfully opened file."
}
catch
{
Write-Verbose "Failed with error $($Error[0].Exception.Message)"
continue
}
try
{
$null = $ExcelData | Export-Excel -Path $OutputPath\$filename
return "Success"
}
catch
{
Write-Warning "Could not save to $OutputPath\$filename"
}
}
}
Finally, we can run code to do the work:
$Start = get-date
$PasswordArray = #('dj7F9vsm','kDZq737b','wrzCgTWk','DqP2KtZ4')
$files = Get-ChildItem -Path 'C:\PoShTest\Encrypted'
$files | Start-RSJob -Name {$_.Name} -ScriptBlock {
Remove-ExcelEncryption -File $_.Fullname -PasswordArray $Using:PasswordArray -Verbose
} -FunctionsToLoad Remove-ExcelEncryption -ModulesToImport Import-Excel | Wait-RSJob | Receive-RSJob
$end = Get-Date
New-TimeSpan -Start $Start -End $end
For me, if the correct password is first in the list it runs in 13 seconds against 128 Excel files. If I call the function in a standard foreach loop, it takes 27 seconds.
To view which files were successfully converted we can inspect the output property on the RSJob objects (this is the output of the Remove-ExcelEncryption function where I've told it to return "Success"):
Get-RSJob | Select-Object -Property Name,Output
Hope that helps.

Powershell - parsing a PDF file for a literal or image

Using Powershell & running PowerGUI. I have a PDF file that I need to search through in order to find if there was an attachment referenced within the content of a particular page. Either that, or I need to search for images, such as a Microsoft Word or Excel icon or a PDF icon within the document.
I am using the following code to read in the page:
Add-Type -Path "c:\itextsharp-all-5.4.5\itextsharp-dll-core\itextsharp.dll"
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "c:\files\searchfile.pdf"
for ($page = 1; $page -le 3; $page++) {
$lines = [char[]]$reader.GetPageContent($page) -join "" -split "`n"
foreach ($line in $lines) {
if ($line -match "^\[") {
$line = $line -replace "\\([\S])", $matches[1]
$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
}
}
}
However, the above gives a few bits of text, but mostly unprintable characters.
How can you search a PDF file using Powershell searching for either a literal (like ".doc" or ".xlsx")? Can a PDF be searched for a graphic (like the Excel or Word icon)?
Without seeing the PDF raw content, it's not easy to give specific help, so if you can share a sample PDF or it's contents that would be helpful.
Once you know what to look for in the stream, you can search by reading in the file line by line and using the -match operator:
$file = [io.file]::ReadAllLines('C:\test​.pdf')
$title = ($file -match "<rdf:li")[0].Split(">")[1].Split("<")[0]
$description = ($file -match "<rdf:li")[2].Split(">")[1].Split("<")[0]
write-host ("Title: " + $title)
write-host ("Description: " + $description)
I doubt very much that the contents of the file will tell you much more than that an image exists at particular page coordinates (although I'm by no means a PDF expert) but it may also include the binary file stream, in which case you may be able to save that stream as a file (I haven't tried it as yet).

Powershell: Searching Content of files and write results to text file

I'm new to powershell so I don't know where to start. I want a script that searches in all (pdf, word, excell, powerpoint, ...) file content for a specific string combination.
I tried this script but it doesn't work:
function WordSearch ($sample, $staining, $sampleID, $patientID, $folder)
{
$objConnection = New-Object -com ADODB.Connection
$objRecordSet = New-Object -com ADODB.Recordset
$objConnection.Open(“Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;”)
$objRecordSet.Open(“SELECT System.ItemPathDisplay FROM SYSTEMINDEX WHERE ((Contains(Contents,’$sample’)) or (Contains(Contents,’$sampleID’) and Contains(Contents,’$staining’)) or (Contains(Contents,’$staining’) and Contains(Contents,’$patientID’))) AND System.ItemPathDisplay LIKE ‘$folder\%’”, $objConnection)
if ($objRecordSet.EOF -eq $false) {$objRecordSet.MoveFirst() }
while ($objRecordset.EOF -ne $true) {
$objRecordset.Fields.Item(“System.ItemPathDisplay”).Value
$objRecordset.MoveNext()
}
}
Can someone help me?
You should try this, but first make sure your in the folder you want to start searching down: (if your trying to search your whole computer, start in C:\ , but I imagine the script will take a decent amount of time to run.
$Paths = #()
$Paths = gci . *.* -rec | where { ! $_.PSIsContainer } |? {($_.Extension -eq ".doc") -or ($_.Extension -eq ".ppt") -or ($_.Extension -eq ".pdf") -or ($_.Extension -eq ".xls")} | resolve-path
This will retrieve all the file paths of those file types. If you have Microsoft office 2007 or above you may want to add searches for ".xlsx" or ".docx" or ".pptx"
Then you can begin looking through those files for your "specific string combination
array = #()
foreach($path in $Paths)
{$array += Select-String -Path $Path -Pattern "Search String"}
This will give you all the lines and paths that that string exists on in those files. The actual line output you get may be a little distorted though due to microsoft encrypting their files. Use $array | get-member -MemberType Property to find what items you can index to and the Select-object commandlet to pull those items out.

Resources