Powershell function to use excel to convert between file formats - excel

I have a need to use scheduled tasks or chron jobs to convert files between xlsx and csv and xml and xlsx.
I didn't find anything I loved so I wrote a module to make my life easier.
Initial requirements had me knowing that I had access to static full file names when the job was scheduled. After writing a simple script I expanded it for features I thought might come in handy.

I saved this into a psm1 file and load it with my profile.
Features:
Pipeline safe(ish)
-flexible outputs
-extension option if you like the name but not the file type
-delete option in case you want to clean up after yourself
Comments up front create workable help in PowerShell, but are fairly readable too.
<#
.SYNOPSIS
This function will take files and use the excel application to convert them.
.DESCRIPTION
This function allows you to use the full power of Excel to open and save files. The infile can be any file that Excel can open. The outfile can be xlsx, xsl, xml, or csv. Also, there is an option to delete the destination file before runing the save operation to avoid prompts when overwriting, and to erase the origin file after the process has completed.
.EXAMPLE
Convert-Excel -infile 'Source.xml' -outfile 'destination.xlsx' -delete $true
Converts source.xml to xlsx file type.
Deletes source.xml when done.
Deletes destination.xlsx before it converts.
.EXAMPLE
Convert-Excel -infile 'Source.xlsx' -outfile 'destination.csv'
Converts xlsx to csv.
Leaves both files behind when done.
.EXAMPLE
Convert-Excel -infile 'Source.csv'
Converts infile Source.csv (or whatever format) to xlsx of the same name.
Leaves both behind when done.
.EXAMPLE
Convert-Excel -infile 'Source.xlsx' -Extension '.csv'
Converts xlsx to csv. By passing just the extension it will use the same base file name.
Leaves both files behind when done.
.EXAMPLE
Convert-Excel -infile 'C:\Users\notI\PRD-06661-12082017 - Copy.xml' -outfile 'C:\Users\notI\PRD-06661-12082017-Copy.csv'
Loads full path xml
Saves full path csv
.EXAMPLE
dir *.xml | Sort-Object -Property LastWriteTime | Convert-Excel -Extension ".csv"
Similar to above but uses the pipeline to do multiple conversions.
If full outfile name is given, it will create just one file over and over again. In this example it would go in chronological order creating csv files.
.EXAMPLE
Dir *.xml | Convert-Excel -extension ".csv" -delete $True | Convert-Excel -extension ".xml"
Thats just weird, but it might solve your problem, and it works.
.PARAMETER infile
Name of the origin file to use. If the full path is not given it will be opened from the context the script is running in.
.PARAMETER outfile (extension)
Name of the destination file to create. If the full path is not given it will save in the default destination of Excel.
.PARAMETER delete
If $true it will delete the target location file if it exists before conversion and the origin file after conversion. Functions like a move with clobber.
If anything else or blank it will leave origin in place and if destination exists it will prompt for overwriting.
#>
function Convert-Excel{
param(
[parameter(ValueFromPipeLineByPropertyName=$True,ValueFromPipeline=$True)]
[Alias('FullName')]
[string] $infile,
[Alias('Extension')]
[string] $outfile,
[bool] $delete
)
Begin {
$begin_outfile = $outfile
Try {$ExcelWB = new-object -comobject excel.application}
Catch {Write-Host "This host does not seem to have Excel installed"}
}
Process{
#Check infile
if (-not($infile)) {
Write-Output "You must supply a value for -infile"
break
}
else {
Try {$file = Get-Item $infile}
Catch {Write-Output "$infile does not seem to exist, or I can't get to it"; break}
}
#Check outfile
#Reset value for pipeline loop
$outfile = $begin_outfile
#If blank just presume xlsx
if (-not($outfile)) {
$outfile = $file.FullName -replace '\.[^.]+$',".xlsx"
Write-Verbose "No outfile supplied, setting outfile to $outfile"
}
#If startswith a dot, use as an extension.
If ($outfile.StartsWith(".")) {
$outfile = $file.FullName -replace '\.[^.]+$',$outfile
Write-Verbose "Extension supplied, setting outfile to $outfile"
}
#derive XlFileFormat from extension
if($outfile -cmatch '\.[^.]+$') {
$extens="" #Reset for pipeline loop
switch ($Matches[0])
{
".xlsx" {$extens = 51}
".csv" {$extens = 6}
".xml" {$extens = 46}
".xls" {$extens = -4143}
".xlsm" {$extens = 52}
default {$extens = 51}
}
}
else {
break #if it can't find an extension in regex
}
if ($file.FullName -eq $outfile) {
#Nobody needs us to create a copy of an existing file.
write-verbose "Goal already achieved, moving on"
}
Else {
if(Test-Path ($outfile)){
#Avoid prompting to overwrite by removing an existing file of the same name
Remove-Item -path ($outfile)
}
Try {
Write-Verbose "Loop Check $infile"
if ($file.Extension -eq ".xml") {
#Make assumptions for XML. If you need more control don't automate
$Workbook = $ExcelWB.Workbooks.OpenXML($file.FullName,1)
}
else {
#Act Normal
$Workbook = $ExcelWB.Workbooks.Open($file.FullName)
}
$Workbook.SaveAs($outfile,$extens)
$Workbook.Close($false)
}
Catch {
Write-Host "Unable to convert file $file because Excel cannot open or save it without help"
break
}
if ($delete) {#If asked to delete
if(Test-Path ($outfile)){ #And a file now exists where outfile said it should be
if(Test-Path ($infile)){ #And there is a file at infile
Remove-Item -path ($infile) #Delete it
}
}
}
#Mostly to keep from breaking the pipeline, but not bad as an output of a file creator
Return $outfile
}
}
End{
#Cleanup
$ExcelWB.quit()
}
}
export-modulemember -function Convert-Excel

Related

Using Powershell with MICROSOFT.ACE.OLEDB.12.0 to convert between CSV XML XLS XLSX XLSM

How do I convert files between CSV, XLS, XLSM, and XLSX to CSV, XLS, XLSX, and XML in Powershell without using Excel.Application? I would like to just use MICROSOFT.ACE.OLEDB.12.0.
I wrote this as a module first, but then prefer it as a ps1 to .include.
You could also pop it in the top of a file and call it underneath.
A note about creating excel spreadsheets without using excel: I am not very good at it. It seems to be working with the data I throw at it. If you have a better way of setting up the schema and writing into the file I would be game, but I only needed success, not elegance.
''''Powershell
<#
.SYNOPSIS
This function will take 2 filenames and use the excel application to convert from the first file to the second file.
.DESCRIPTION
This function allows you to use the full power of excel to open and save files. The infile can be any file that excel can open. The outfile can be xlsx, xls or csv. Also, there is an option to delete the destination file before runing the save operation to avoid prompts when overwriting, and to erase the origin file after the process has completed.
.EXAMPLE
Convert-OLEDB -infile 'Source.xls' -outfile 'destination.xlsx' -delete $true
Converts source.xls to xlsx file type.
Deletes source.xls when done.
Deletes destination.xlsx before it converts.
.EXAMPLE
Convert-OLEDB -infile 'Source.xlsx' -outfile 'destination.csv'
Converts xlsx to csv.
Leaves both files behind when done.
.EXAMPLE
Convert-OLEDB -infile 'Source.csv'
Converts infile Source.csv (or whatever format) to xlsx of the same name.
Leaves both behind when done.
.EXAMPLE
Convert-OLEDB -infile 'Source.xlsx' -Extension '.csv'
Converts xlsx to csv. By passing just the extension it will use the same base file name.
Leaves both files behind when done.
.EXAMPLE
dir *.xls | Sort-Object -Property LastWriteTime | Convert-OLEDB -Extension ".csv"
Similar to above but uses the pipeline to do multiple conversions.
If full outfile name is given, it will create just one file over and over again. In this example it would go in chronological order creating csv files.
.EXAMPLE
Dir *.xls | Convert-OLEDB -extension ".csv" -delete $True | Convert-OLEDB -extension ".xls"
That's just weird, but it might solve your problem, and it works.
.PARAMETER infile
Name of the origin file to use. If the full path is not given it will be opened from the context the script is running in.
.PARAMETER outfile (extension)
Name of the destination file to create. If the full path is not given it will save in the default destination of Excel.
.PARAMETER delete
If $true it will delete the target location file if it exists before conversion and the origin file after conversion. Functions like a move with clobber.
If anything else or blank it will leave origin in place and if destination exists it will prompt for overwrite.
#>
function Convert-OLEDB{
param(
[parameter(ValueFromPipeLineByPropertyName=$True,ValueFromPipeline=$True)]
[Alias('FullName')]
[string] $infile,
[Alias('Extension')]
[string] $outfile,
[bool] $delete
)
Begin {
$begin_outfile = $outfile
$oledb = (New-Object system.data.oledb.oledbenumerator).GetElements() | Sort -Property SOURCES_NAME | where {$_.SOURCES_NAME -like "Microsoft.ACE.OLEDB*"} | Select -Last 1
If ($oledb -eq $null) {
Write-Output "MICROSOFT.ACE.OLEDB does not seem to exist on this computer."
Write-Output "Please see https://www.microsoft.com/en-us/download/details.aspx?id=54920"
Write-Output "This can also happen if you have not installed the 32 or 64 bit version "
Write-Output "and you are running this script in the architecture that is missing it."
Write-Output "Solution is to do silent install on missing driver"
break
} else {
$Provider=$Oledb.SOURCES_NAME
Write-Verbose "Provider $Provider found on this computer"
}
}
Process{
#Check infile
if (-not($infile)) {
Write-Output "You must supply a value for -infile"
break
}
else {
Try {
$file = Get-Item $infile
$OleDbConn = New-Object "System.Data.OleDb.OleDbConnection";
$OleDbCmd = New-Object "System.Data.OleDb.OleDbCommand";
$OleDbAdapter = New-Object "System.Data.OleDb.OleDbDataAdapter";
$DataTable = New-Object "System.Data.DataTable";
$Source = $file.FullName
Switch($file.Extension){
".xls" {$OleDbConn.ConnectionString = "Provider=$Provider;Data Source=""$Source"";Extended Properties=""EXCEL 12.0;HDR=Yes;IMEX=1"";"}
".xlsx" {$OleDbConn.ConnectionString = "Provider=$Provider;Data Source=""$Source"";Extended Properties=""EXCEL 12.0 XML;HDR=Yes;IMEX=1"";"}
".xlsm" {$OleDbConn.ConnectionString = "Provider=$Provider;Data Source=""$Source"";Extended Properties=""EXCEL 12.0 MACRO;HDR=Yes;IMEX=1"";"}
Default {
$Source = $file.DirectoryName
$OleDbConn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=""$Source"";Extended Properties=""text;HDR=Yes;IMEX=1"";"
}
}
$OleDbCmd.Connection = $OleDbConn;
$OleDbConn.Open();
Switch($file.Extension){
{$_ -in (".xls", ".xlsx",".xlsm")} {$FirstSheet = $OleDbConn.getSchema("Tables")[0].Table_Name}
Default {$FirstSheet = $file.Name}
}
$OleDbCmd.CommandText = "select * from [$FirstSheet];”
$OleDbAdapter.SelectCommand = $OleDbCmd;
$RowsReturned = $OleDbAdapter.Fill($DataTable);
$OleDbConn.Close();
}
Catch {Write-Output "$infile does not seem to exist, or I can't get to it"; break}
}
#Check outfile
#Reset value for pipeline loop
$outfile = $begin_outfile
#If blank just presume xlsx
if (-not($outfile)) {
$outfile = $file.FullName -replace '\.[^.]+$',".xlsx"
Write-Verbose "No outfile supplied, setting outfile to $outfile"
}
#If startswith a dot, use as an extension.
If ($outfile.StartsWith(".")) {
$outfile = $file.FullName -replace '\.[^.]+$',$outfile
Write-Verbose "Extension supplied, setting outfile to $outfile"
}
if ($file.FullName -eq $outfile) {
#Nobody needs us to create a copy of an existing file.
write-verbose "Goal already achieved, moving on"
}
Else {
if(Test-Path ($outfile)){
#Avoid prompting to overwrite by removing an existing file of the same name
Remove-Item -path ($outfile)
}
Try {
#derive XlFileFormat from extension
if($outfile -cmatch '\.[^.]+$') {
$extens="" #Reset for pipeline loop
switch ($Matches[0])
{
".csv" {
#Out to CSV
$DataTable | Export-Csv "$outfile" -NoTypeInformation
}
".xml" {
#Simple, poorly formed:
#$DataTable.TableName = $FirstSheet
#$DataTable.WriteXml("$outfile")
#BetterForm, strongly Typed, empty/Null differentiation
Export-Clixml -Path "$outfile" -InputObject $DataTable
}
Default {
#Out to xlsx
$OleDbConnOut = New-Object "System.Data.OleDb.OleDbConnection";
$OleDbCmdOut = New-Object "System.Data.OleDb.OleDbCommand";
$OleDbAdapterOut = New-Object "System.Data.OleDb.OleDbDataAdapter";
$DataTable2 = New-Object "System.Data.DataTable";
If($Matches[0] -eq ".xls"){$ext_swap = ""}
If($Matches[0] -eq ".xlsx"){$ext_swap = " XML"}
$OleDbConnOut.ConnectionString = "Provider=$Provider;Data Source=""$outfile"";MODE=ReadWrite;Extended Properties=""Excel 12.0$ext_swap"";"
$OleDbCmdOut.Connection = $OleDbConnOut;
$OleDbConnOut.Open();
$Create = "CREATE TABLE [Sheet2] ("
$Stuff = "INSERT INTO [Sheet2] ("
$Stuff2 = "VALUES ("
$DataTable.Columns | % {
$name = $_.ColumnName.Replace("#",".")
$Type = $_.DataType
$Stuff += "[$name], "
$Stuff2 += "?, "
$Create += "[$name] $Type, "
$atname = "#" + $name
}
$Stuff = $Stuff.TrimEnd(", ") + ")"
$Stuff2 = $Stuff2.TrimEnd(", ") + ")"
$Create = $Create.TrimEnd(", ") + ")"
$OleDbCmdOut.CommandText = $Create
$UpdateCount = $OleDbCmdOut.ExecuteNonQuery()
$OleDbAdapterOut.InsertCommand = $Stuff + " " + $Stuff2
$DataTable | % {
$Insert = "INSERT INTO [Sheet2] VALUES ("
$_.ItemArray | % {
$val = $_
$Type = $_.GetType().Name
switch ($Type) {
"Double" {
#Write "$Type"
$Insert += "$val, "
}
"DBNull" {
#Write "$Type"
$Insert += "NULL, "
}
Default {
#Write "$Type"
$val = $val.Replace("""","""""")
$Insert += """$val"", "
}
}
}
$Insert = $Insert.TrimEnd(", ") + ")"
$OleDbCmdOut.CommandText = $Insert
Try {
Write-Verbose "Attempting to run this command: $Insert"
$updatedcount = $OleDbCmdOut.ExecuteNonQuery()
}
Catch{Write-Output "$Error[0] /nIt seems like there was a problem with this command: $Insert"}
}
$OleDbConnOut.Close();
}
}
}
else {
Write-Output "Unable to determine the output file extenstion, this really shouldn''t happen"
break #if it can't find an extension in regex
}
}
Catch {
Write-Host "Unable to convert file $file because powershell with OLEDB cannot open or save it without help"
break
}
if ($delete) {#If asked to delete
if(Test-Path ($outfile)){ #And a file now exists where outfile said it should be
if(Test-Path ($infile)){ #And there is a file at infile
Remove-Item -path ($infile) #Delete it
}
}
}
}
}
End{
#Cleanup
}
}
'''

Rename multiple files with string from .txt file using PowerShell

Im currently working on a programm that needs a .xml file, reads it into a Oracle Database and afterwards exports a new .xml file. But the problem is that the new file has to have the exact same name as the original file.
I saved the original filenames into a .txt file and i'm now trying to search for a keyword inside the lines to rename the right files with the correct names inside the .txt file. Here an example:
My 4 files (exported from the Database):
PM.Data_information.xml
PM.Data_location.xml
PM.Cover_quality.xml
PM.Cover_adress.xml
Content of Namefile.txt (original names):
PM.Data_information_provide_SE-R-SO_V0220_657400509_3_210.xml
PM.Data_location_provide_SE-R-SO_V0220_9191200509_3_209.xml
PM.Cover_quality_provide_SE-R-SO_V0220_354123509_3_211.xml
PM.Cover_adress_provide_SE-R-SO_V0220_521400509_3_212.xml
I only worked out how to get a line by selecting the linenumber:
$content = Get-Content C:\Namefile.txt
$informationanme = $content[0]
Rename-Item PM.Data_information.xml -NewName $informationname
Isn't there a way to select that line by searching for the keyword inside the string?
$content = Get-Content C:\temp\ps\NewFile.txt
$files = Get-ChildItem c:\temp\ps\
$content |
%{
$currentLine = $_
$file = $files | Where-Object { $currentLine.StartsWith($_.Name.Replace(".xml", "")) }
Rename-Item $file.Name $currentLine
}
This code should do the trick. Note you will need to have all of your files that need renaming in one folder. Set the folder path to the $files variable (currently set to c:\temp\ps). Set the path where your NewFile.txt is to the $content path.
The code works by looping around each line in the NewFile.txt and finding any file where the name matches the start of the line (if there are any files that do not follow this pattern you will obviously need to update the code but hopefully gives you a good starting point).
other solution ;)
gci -Path "c:\temp" -File -Filter "*.xml" | % { rni $_.fullname (sls "C:\temp\Namefile.txt" -Pattern ([System.IO.Path]::GetFileNameWithoutExtension($_.fullname))).Line }

Powershell - Pulling string from txt, splitting it, then concatenating it for archive

I have an application where I am getting a list of new\modified files from git status, then I take the incomplete strings from that file, concatenate them with the root dir file path, then move those files to an archive. I have it half working, but the nature of how I am using powershell does not provide error reports and the process is obviously erroring out. Here is the code I am trying to use. (It has gone through several iterations, please excuse the commented out portions) Basically I am trying to Get-Content from the txt file, then replace ? with \ (for some reason the process that creates the txt love forward slashes...), then split that string at the spaces. The only part of the string I am interested in is the last part, which I am trying to concatenate with the known working root directory, then I am attempting to move those to an archive location. Before you ask, this is something we are not willing to track in git, due to the nature of the files (they are test outputs that are time stamped, we want to save them on a per test run basis, not in git) I am still fairly new to powershell and have been banging my head against this rock for far too long.
Get-Content $outfile | Foreach-Object
{
#$_.Replace("/","\")
#$lineSplit = $_.Split(' ')
$_.Split(" ")
$filePath = "$repo_dir\$_[-1]"
$filePath.Replace('/','\')
"File Path Created: $filePath"
$untrackedLegacyTestFiles += $filePath
}
Get-Content $untrackedLegacyTestFiles | Foreach-Object
{
Copy-Item $_ $target_root -force
"Copying File: $_ to $target_root"
}
}
the $outfile is a text file where each line has a partial file path leading to a txt file generated by a test application we use. This info is provided by git, so it looks like this in the $outfile txt file:
!! Some/File/Path/Doc.txt
The "!!" mean git sees it as a new file, however it could be several characters from a " M" to "??". Which is why I am trying to split it on the spaces and take only the last element.
My desired output would be to take the the last element of the split string from the $outfile (Some/File/Path/Doc.txt) and concatenate it with the $repo_dir to form a complete file path, then move the Doc.txt to an archive location ($target_root).
To combine a path in PowerShell, you should use the Join-Path cmdlet. To extract the path from your string, you can use a regex:
$extractedPath = [regex]::Match('!! Some/File/Path/Doc.txt', '.*\s(.+)$').Groups[1].Value
$filePath = Join-Path $repo_dir $extractedPath
The Join-Path cmldet will also convert all forward slashes to backslashes so no need to replace them :-).
Your whole script could look like this:
Get-Content $outfile | Foreach-Object {
$path = Join-Path $repo_dir ([regex]::Match($_, '.*\s(.+)$').Groups[1].Value)
Copy-Item $path $target_root -force
}
If you don't like to use regexin your code, you can also extract the path using:
$extractedPath = '!! Some/File/Path/Doc.txt' -split ' ' | select -Last 1
or
$extractedPath = ('!! Some/File/Path/Doc.txt' -split ' ')[-1]

powershell convert all excel worksheets in csv just by pasting the file into a monitored folder

it's not really a question because I already have an answer. I just wanted to return my gratitude to everyone who post and help on the internet. It really took me a good while to put everything I found together.
This script covers:
- Folder monitoring though events: so you just need to paste an excel file into a specified folder (and its childrens) to automatically trigger the conversion.
- Excel (xls/xlsx) to csv. It picks up all worksheets from all excels within that folder and merges all the content into a single .csv file... it the content differs from the multiple excels/worksheets, it throws an exception... But it obviously works in a more simple scenario with a single excel file with a single workbook.
- Logs into a file for all different actions/events caught.
- Flushes memory to avoid event listening issues.
Here we go:
<#-----------------------------------------------------------------------#
Purpose: To enable users to paste excel/txt files into a single folder and automatically convert them to .csv.
The resulting .csv adopts its parent's folder name.
If more thn 1 excel/txt file is pasted on the same folder, these multiple files are merged into a single .csv file as long as they have the same structure, otherwise
an exception is raised.
This script watches constantly these events on a defined folder in this script itself:
-Creation of files/folders -> Currently enabled.
-Modification of files/folders -> Currently disabled. (see main region to activate this event)
-Deletion of files/folders -> Currently disabled. (see main region to activate this event)
-Renaming of files/folders -> Currently disabled. (see main region to activate this event)
This script, when run from powershell, will output to console some messages, but it also maintains a physical log file on the server.
The location of this log file is defined within the writeToLogFile function.
#------------------------------------------------------------------------#>
#------------------------------------------------------------------------#
########################## FUNCTIONS REGION ##############################
#------------------------------------------------------------------------#
function writeToLogFile {
param ([string]$strLogLine = $(throw 'must supply a log line to be inserted!')) #param1: the log line to be inserted/appended to the log
$logFile = "D:\OneLoader\OneLoaderLog.txt" #Path and name of log file. Modify this if required.
#checks if log file is greater than 5mb and renames it with system date to allow the creation of a new log file.
if ((Get-Item $logFile).length -gt 5mb) { #Checks if it's greater than 5 megabytes.
$renamedLog = $logFile -replace ".txt$", "_$(get-date -Format yyyyMMMdd).txt" #prepares new log file name, it sufixes sysdate to allow the creation of a new log file.
rename-item -path $logFile -newname $renamedLog #renames the current log file.
}
$line = "$(Get-Date): $strLogLine" #Prepares the log line to be inserted: It prefixes the sytem date to whatever you want to insert in the log.
Write-host $line
Add-content $logFile -value $line #Appends a log line to the the log file.
}
<#-----------------------------------------------------------------------#
Function getCSVFileName
Purpose: Given an excel file, it checks for the file's existance and returns a new file name with the name equals to the parent folder and the extension as .csv
Parameters:
1) parameter $strFileName: Full path name to an excel file. e.g.: D:\Folder1\ChildFolder\ExcelFile.xlsx
#------------------------------------------------------------------------#>
function getCSVFileName {
param ( [string]$strFileName = $(throw 'must supply a file name!')) #parm1: The excel file name.
#Test if the path to the excel file is correct. if not, exits the function.
if (-not (Test-Path $strFileName)) {
throw "Path $strFileName does not exist."
}
$parentFolder= Split-Path (Split-Path $strFileName -Parent) -Leaf #Obtains the most inner folder name. E.g: C:\Folder1\Folder2\file.txt --> RESULT: Folder2
#$justFileName = split-path $strFileName -leaf -resolve #Obtains just the file name. E.g: C:\Folder1\Folder2\file.txt --> RESULT: file.txt
$baseFolder = Split-path $strFileName #Obtainsthe file's base folder name. E.g: C:\Folder1\Folder2\file.txt --> RESULT: C:\Folder1\Folder2
$fileNameToCSV = $baseFolder + '\' + $parentFolder + '.csv' #Build a string for the new .csv file name. The file is renamed to match the parent's folder name.
return $fileNameToCSV
} #End of function getCSVFileName
<#-----------------------------------------------------------------------#
Function xls-csv
Purpose: Given an excel file and a sheet name within that file, it converts the contents of that sheet into a .csv file and places it in the same location as the excel file
Parameters:
1) parameter $strFileName: Full path name to an excel file. e.g.: D:\Folder1\ChildFolder\ExcelFile.xlsx
#------------------------------------------------------------------------#>
function xls-csv {
param (
[string]$strFileName = $(throw 'Must supply a file name!') #parm 1: Excel file. full path.
)
try{
$newFileNameCSV = getCSVFileName $strFileName #Obtains new .csv file name from function
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine="Exception occured while renaming file to csv. FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
return
}
#Checking if the excel file was not already converted to CSV. If it was, it appends the content of the xls. This is done in case there are more than one excel file in the same directory.
writeToLogFile "Converting $strFileName" #Writes a line to the log
#BEGIN SECTION CONFIG
#These parameters are required to setup the connection to the OLEDB adapter. Must not change unless necessary.
$strProvider = "Provider=Microsoft.ACE.OLEDB.12.0"
$strDataSource = "Data Source = $strFileName"
$strExtend = "Extended Properties='Excel 12.0;HDR=Yes;IMEX=1';"
#END SECTION CONFIG
#BEGIN SECTION CONNECTION
Try {
#These steps stablish the connection and the query command that is passed to the OleDB adapter. Must not change unless necessary.
$objConn = New-Object System.Data.OleDb.OleDbConnection("$strProvider;$strDataSource;$strExtend")
$sqlCommand = New-Object System.Data.OleDb.OleDbCommand
$sqlCommand.Connection = $objConn
$objConn.open()
#END SECTION CONNECTION
#BEGIN SECTION SELECT QUERY
#Obtains all worksheets within the excel file and converts the content of each one of them into csv
$objConn.GetSchema("Tables") |
ForEach-Object {
if($_.Table_Type -eq "TABLE")
{
$wrksheet= $_.Table_Name
$strQuery = "Select * from [$wrksheet]" #Query to read all content from worksheet
$sqlCommand.CommandText = $strQuery
$da = New-Object system.Data.OleDb.OleDbDataAdapter($sqlCommand)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt) #fills a datatable with the content of the worksheet
if (-not (Test-Path $newFileNameCSV )) {
#Pipes the contents of the datatable into a NEW CSV File. Export-Csv function is a native PowerShell function.
$dt | Export-Csv $newFileNameCSV -Delimiter ',' -NoTypeInformation
} else {
#Pipes the contents of the datatable and APPENDS it to the existing CSV File. Export-Csv function is a native PowerShell function.
$dt | Export-Csv $newFileNameCSV -Delimiter ',' -NoTypeInformation -Append
} #end of if-else
#END SECTION SELECT QUERY
}
}
$objConn.close()
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine = "Exception occured while converting excel file: '$strFileName', worksheet name: $wrksheet --> FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
$objConn.close() #close the file in case of errors so it doesn't get locked by a user.
return
}
try {
#Once and excel file is converted successfully, the extension is changed so it's not picked up again by the script.
if ($strFileName -like '*.xlsx') {
$renamedExcel = $strFileName -replace ".xlsx$", ".old" #Renames xlsx to .old extension
}else {
$renamedExcel = $strFileName -replace ".xls$", ".old" #Renames xls to .old extension
}
rename-item -path $strFileName -newname $renamedExcel #changes the extension of the recently converted excel file.
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine = "Exception occured while renaming the original file. FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
return
}
writeToLogFile "Converted $strFileName to $newFileNameCSV" #Writes a line to the log
return
} #end of function xls-csv
<#-----------------------------------------------------------------------#
Function txt-csv
Purpose: Given a txt file, it converts the contents of that txt into a .csv file and places it in the same location as the txt file
Parameters:
1) parameter $strFileName: Full path name to a txt file. e.g.: D:\Folder1\ChildFolder\ExcelFile.txt
#------------------------------------------------------------------------#>
function txt-csv {
param ( [string]$strFileName = $(throw 'Must supply a file name!') ) #parm 1: txt file. full path.
try{
$newFileNameCSV = getCSVFileName $strFileName #Obtains new .csv file name from function
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine="Exception occured while renaming file to csv. FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
return
}
#Checking if the txt file was not already converted to CSV. If it was, it appends the content of the txt. This is done in case there are more than one txt files in the same directory.
writeToLogFile "Converting $strFileName" #Writes a line to the log
try {
if (-not (Test-Path $newFileNameCSV )) {
#Pipes the contents of the txt file into a NEW CSV File. Export-Csv function is a native PowerShell function.
Import-Csv -Path $strFileName | Export-Csv -Path $newFileNameCSV -Delimiter ',' -NoTypeInformation
} else {
#Pipes the contents of the txt file and APPENDS it to the existing CSV File. Export-Csv function is a native PowerShell function.
Import-Csv -Path $strFileName | Export-Csv -Path $newFileNameCSV -Delimiter ',' -NoTypeInformation -Append
} #end of if-else
#Once a txt file is converted successfully, the extension is changed so it's not picked up again by the script.
$renamedFile = $strFileName -replace ".txt$", ".old" #Renames xlsx to .old extension
rename-item -path $strFileName -newname $renamedFile #changes the extension of the recently converted excel file.
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine = "Exception occured while exporting to csv. FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
return
}
writeToLogFile "Converted $strFileName to $newFileNameCSV" #Writes a line to the log
return
} #end of function txt-csv
<#-----------------------------------------------------------------------#
Function convertDirectory
Purpose: Given directory, it looks for .xls and .xlsx files recursively and calls the xls-csv function to perform the conversion
Parameters:
1) parameter $Directory: Full directory path to scan for excel files
#------------------------------------------------------------------------#>
function convertDirectory {
param ( [string]$Directory = $(throw 'Must supply a folder name!'))
if (-not (Test-Path $Directory)) {
throw "Path '$Directory' does not exist."
return
}
#### BEGIN EXCEL CONVERSION SECTION ###
#Gets list of files within the folder and filters by .xls and .xlsx extensions
$dir = Get-ChildItem -path $($Directory + "\*") -include *.xls,*.xlsx
foreach($file in $dir) #Loops through all excel files
{
writeToLogFile "Found file $file candidate to conversion" #Writes a line to the log
xls-csv $file.FullName "Sheet1$" #Calls the function to convert the excel file into csv. "Sheet1" is static as of now.
}
#### END EXCEL CONVERSION SECTION ###
#### BEGIN TXT CONVERSION SECTION ###
#Gets list of files within the folder and filters by .txt extension
$dir = Get-ChildItem -path $($Directory + "\*") -include *.txt
foreach($file in $dir) { #Loops through all excel files
writeToLogFile "Found file $file candidate to conversion" #Writes a line to the log
txt-csv $file.FullName #Calls the function to convert the txt file into csv.
}
#### END EXCEL CONVERSION SECTION ##
} #end of function convertDirectory
<#-----------------------------------------------------------------------#
Function flushMemory
Purpose: since this is a While{true} script, it may end abruptly. This function is called at beginning of MAIN REGION to clear all possible allocated
space of memory and to, more importantly, unregister all posible IO event handlers on the OneLoader directory.
Parameters: None.
#------------------------------------------------------------------------#>
Function flushMemory {
# Find out how much memory is being consumed by your Sesssion:
#[System.gc]::gettotalmemory("forcefullcollection") /1MB #Uncomment in case of debugging a memory leak
# Force a collection of memory by the garbage collector:
[System.gc]::collect()
# Dump all variables not locked by the system:
foreach ($i in (ls variable:/*)) {rv -ea 0 $i.Name} # -verbose $i.Name} #you can include the verbose argument to get the list of variables out of bound.
#Check memory usage again and force another collection:
#[System.gc]::gettotalmemory("forcefullcollection") /1MB #Uncomment in case of debugging a memory leak
[System.gc]::collect()
#Check Memory once more:
#[System.gc]::gettotalmemory("forcefullcollection") /1MB #Uncomment in case of debugging a memory leak
#Unregister events created by previous instances of this script
get-eventsubscriber -force | unregister-event -force #THIS LINE IS REALLY IMPORTANT
}
#------------------------------------------------------------------------#
############################ MAIN REGION #################################
#------------------------------------------------------------------------#
writeToLogFile "Script 'OneLoader monitor' initiated at $(Get-Date)" #Writes a line to the log
###IMPORTANT! KEEP CALL TO flushMemory function
flushMemory #!!!!!!IMPORTANT
### SET FOLDER TO WATCH + FILES TO WATCH + SUBFOLDERS YES/NO
$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = "D:\OneLoader" #IMPORTANT: Defines the OneLoader folder to be monitored. Don't put a slash "\" on the end or a puppy will die.
$watcher.Filter = "*.*"
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
### DEFINE ACTIONS AFTER A EVENT IS DETECTED
$action = {
$eventFullPath = $Event.SourceEventArgs.FullPath #obtains full path of file that got created
$changeType = $Event.SourceEventArgs.ChangeType #obtains the type of event captured.
$logLine = "$changeType, $eventFullPath" #Prepares the log line that will be inserted in the log file.
writeToLogFile $logLine #Writes a line to the log
try{
$eventBaseDirectory = split-path $eventFullPath #extracts the base directory from the full path of the file recently created.
convertDirectory $eventBaseDirectory #calls the function to convert all excel files within the directory caught by the event.
}
Catch {
$ErrorMessage = $_.Exception.Message
$FailedItem = $_.Exception.ItemName
$logLine = "Exception occured while discovering excel files in folder. FailedItem: $FailedItem. The error message was $ErrorMessage"
writeToLogFile $logLine #Writes a line to the log
}
} #End of $action
### DECIDE WHICH EVENTS SHOULD BE WATCHED + SET CHECK FREQUENCY.
#Uncomment the events you want this script to monitor over the folder.
$created = Register-ObjectEvent $watcher "Created" -Action $action
#$changed = Register-ObjectEvent $watcher "Changed" -Action $action
#$deleted = Register-ObjectEvent $watcher "Deleted" -Action $action
#$renamed = Register-ObjectEvent $watcher "Renamed" -Action $action
while ($true) {sleep 5}
function Save-CSVasExcel {
param (
[string]$CSVFile = $(Throw 'No file provided.')
)
BEGIN {
function Resolve-FullPath ([string]$Path) {
if ( -not ([System.IO.Path]::IsPathRooted($Path)) ) {
# $Path = Join-Path (Get-Location) $Path
$Path = "$PWD\$Path"
}
[IO.Path]::GetFullPath($Path)
}
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
$CSVFile = Resolve-FullPath $CSVFile
$xl = New-Object -com 'Excel.Application'
}
PROCESS {
$wb = $xl.workbooks.open($CSVFile)
$xlOut = $CSVFile -replace '\.csv$', '.xlsx'
$ws = $wb.Worksheets.Item(1)
$range = $ws.UsedRange
[void]$range.EntireColumn.Autofit()
$num = 1
$dir = Split-Path $xlOut
$base = $(Split-Path $xlOut -Leaf) -replace '\.xlsx$'
$nextname = $xlOut
while (Test-Path $nextname) {
$nextname = Join-Path $dir $($base + "-$num" + '.xlsx')
$num++
}
$wb.SaveAs($nextname, 51)
}
END {
$xl.Quit()
$null = $ws, $wb, $xl | % {Release-Ref $_}
# del $CSVFile
}
}
 
function Save-ExcelasCSV {
param (
[string[]]$files = $(Throw 'No files provided.'),
[string]$OutFolder,
[switch]$Overwrite
)
BEGIN {
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
$xl = New-Object -ComObject 'Excel.Application'
$xl.DisplayAlerts = $false
$xl.Visible = $false
}
PROCESS {
foreach ($file in $files) {
$file = Get-Item $file | ? {$_.Extension -match '^\.xlsx?$'}
if (!$file) {continue}
$wb = $xl.Workbooks.Open($file.FullName)
if ($OutFolder) {
$CSVfilename = Join-Path $OutFolder ($file.BaseName + '.csv')
} else {
$CSVfilename = $file.DirectoryName + '\' + $file.BaseName + '.csv'
}
if (!$Overwrite -and (Test-Path $CSVfilename)) {
$num = 1
$folder = Split-Path $CSVfilename
$base = (Split-Path $CSVfilename -Leaf).Substring(0, (Split-Path $CSVfilename -Leaf).LastIndexOf('.'))
$ext = $CSVfilename.Substring($CSVfilename.LastIndexOf('.'))
while (Test-Path $CSVfilename) {
$CSVfilename = Join-Path $folder $($base + "-$num" + $ext)
$num += 1
}
$wb.SaveAs($CSVfilename, 6) # 6 -> csv
} else {
$wb.SaveAs($CSVfilename, 6) # 6 -> csv
}
$wb.Close($True)
$CSVfilename
}
}
END {
$xl.Quit()
$null = $wb, $xl | % {try{ Release-Ref $_ }catch{}}
}
}
also, this might be of use if the Excel doc has multiple pages:
http://www.codeproject.com/Articles/451744/Extract-worksheets-from-Excel-into-separate-files

Powershell - parsing a PDF file for a literal or image

Using Powershell & running PowerGUI. I have a PDF file that I need to search through in order to find if there was an attachment referenced within the content of a particular page. Either that, or I need to search for images, such as a Microsoft Word or Excel icon or a PDF icon within the document.
I am using the following code to read in the page:
Add-Type -Path "c:\itextsharp-all-5.4.5\itextsharp-dll-core\itextsharp.dll"
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "c:\files\searchfile.pdf"
for ($page = 1; $page -le 3; $page++) {
$lines = [char[]]$reader.GetPageContent($page) -join "" -split "`n"
foreach ($line in $lines) {
if ($line -match "^\[") {
$line = $line -replace "\\([\S])", $matches[1]
$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
}
}
}
However, the above gives a few bits of text, but mostly unprintable characters.
How can you search a PDF file using Powershell searching for either a literal (like ".doc" or ".xlsx")? Can a PDF be searched for a graphic (like the Excel or Word icon)?
Without seeing the PDF raw content, it's not easy to give specific help, so if you can share a sample PDF or it's contents that would be helpful.
Once you know what to look for in the stream, you can search by reading in the file line by line and using the -match operator:
$file = [io.file]::ReadAllLines('C:\test​.pdf')
$title = ($file -match "<rdf:li")[0].Split(">")[1].Split("<")[0]
$description = ($file -match "<rdf:li")[2].Split(">")[1].Split("<")[0]
write-host ("Title: " + $title)
write-host ("Description: " + $description)
I doubt very much that the contents of the file will tell you much more than that an image exists at particular page coordinates (although I'm by no means a PDF expert) but it may also include the binary file stream, in which case you may be able to save that stream as a file (I haven't tried it as yet).

Resources