How to change delimiter in excel CSV saving using powershell - excel

Using Powershell and Excel 2016, I'm trying to open a .xlsx file, extract a single page, and save this page as a .csv with a " ; " delimiter. The problem is that while Excel expects " ; " delimiter when opening a csv file, it always saves them with a " , " delimiter.
I'd prefer to not have to change any settings, this is a script i'm writing for a project that needs to work natively on any pc, so having to go and change settings every time I need it to run on another computer would be problematic.
I already checked that the list delimiter settigs in windows was indeed a " ; ", and it is.
I tried every type of CSV saving described in the microsoft doc (https://learn.microsoft.com/fr-fr/office/vba/api/excel.xlfileformat),
what's weird is that when saving a file from the GUI version, I only have 3 versions of CSV, instead of 5 listed on the website, and one of them is "CSV with " ; " delimiter", which works as intended, but I can't seem to use this type of file when saving using Excel via Powershell
There's apparently a "local" flag that can be activated for Excel to use the delimiter settings of windows, but I have no idea of how ot activate it in Powershell and I'd prefer not to use this since it means that the program wouldn't work on a Windows with a different delimiter configuration.
# Args[0] : file to open
# [1] : file to save
# page_to_extract : name of the page I need
# I open an Excel session
$excel_session = New-Object -Com Excel.Application
$excel_session.displayAlerts = $false
# I open the file I need to extract the page from
$excel_workbook = $excel_session.workbooks.open($args[0])
# I load in the page
$excel_worksheet = $excel_workbook.worksheets($page_to_extract)
# I save the page using a csv type (6,22,24,62,23)
$excel_worksheet.saveAs($args[1], 6)
$excel_session.quit()
This code always saves my csv with a " , " delimiter, I need " ; " instead.
I need to use Powershell and ONLY Powershell for this, no windows settings, no excel settings.

I had success with the following code with my own data. This uses your COM Object assignment code. I added logic to extract the cells that contain data, add that data to a new custom object on each row iteration, store each custom object in an array, and finally pipe the array into Export-Csv. Your specified delimiter ; is used in the Export-Csv command.
$excel_session = New-Object -Com Excel.Application
$excel_session.displayAlerts = $false
# I open the file I need to extract the page from
$excel_workbook = $excel_session.workbooks.open($args[0])
# I load in the page
$excel_worksheet = $excel_workbook.worksheets($page_to_extract)
# Get Range of Used Cells in Worksheet
$range = $excel_worksheet.usedrange
# Get First Row Column Text to be Used as Object Properties
$headers = $range.rows.item(1).value2
# Loop through Rows and Columns to Extract Data
# First loop traverses rows
# Second loop traverses columns
$output = for ($i = 2; $i -le $range.rows.count; $i++) {
$hash = [ordered]#{}
for ($j = 1; $j -le $range.columns.count; $j++) {
[void]$hash.Add($headers.GetValue(1,$j),$range.rows.item($i).columns.item($j).Text)
}
[pscustomobject]$hash
}
$output | Export-Csv file.csv -NoType -Delimiter ';'
# Clean Up COM Objects
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel_workbook)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel_session)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

An inefficient, but simple and pragmatic workaround is to:
Use your code as-is to let Excel temporarily produce an interim ,-separated CSV file.
Import that file with Import-Csv (which uses , by default), and export again with Export-Csv -Delimiter ';'.
In the context of your code:
(Import-Csv $args[1]) | Export-Csv $args[1] -Delimiter ';' -NoTypeInformation
Note:
The Import-Csv call is enclosed in (...) to ensure that the input file is read in full up front, which enables writing back to the same file in the same pipeline.
Export-Csv, sadly, defaults to ASCII(!) encoding; if your data contains non-ASCII characters, specify an appropriate encoding with -Encoding.

The List Separator is a Windows regional setting.
To change it, please see :
https://support.office.com/en-us/article/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba
Change the separator in all .csv text files In Microsoft Windows,
click the Start button, and then click Control Panel.
Open the dialog box for changing Regional and Language settings.
Type a new separator in the List separator box.
Click OK twice.
Note: After you change the list separator character for your
computer, all programs use the new character as a list separator. You
can change the character back to the default character by following
the same procedure.
You should now be able to change the csv character delimiter.
Please note that you'll need to restart your computer to make the change in effect. You can check your current List Separator value in your Powershell session with (Get-Culture).TextInfo.ListSeparator
You can also check this post, which has a lot of screenshot and different other options on how to do so: https://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files

My recommendation is to avoid Excel and use the database objects instead. Example:
[CmdletBinding()]
param(
[Parameter(Position = 0,Mandatory = $true)]
[ValidateNotNullOrEmpty()]
$ExcelFileName,
[Parameter(Position = 1,Mandatory = $true)]
[ValidateNotNullOrEmpty()]
$SheetName
)
$queryString = 'SELECT * FROM [{0}$A1:end]' -f $SheetName
$connectionString = ("Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source=$((Get-Item -LiteralPath $ExcelFileName -ErrorAction Stop).FullName);" +
"Extended Properties=Excel 8.0;")
try {
$connection = New-Object Data.OleDb.OleDbConnection($connectionString)
$command = New-Object Data.OleDb.OleDbCommand($queryString)
$command.Connection = $connection
$connection.Open()
$adapter = New-Object Data.OleDb.OleDbDataAdapter($command)
$dataTable = New-Object Data.DataTable
[Void] $adapter.Fill($dataTable)
$dataTable
}
catch [Management.Automation.MethodInvocationException] {
Write-Error $_
}
finally {
$connection.Close()
}
If the above script is Import-ExcelSheet.ps1, you could export to a ;-delimited CSV file by running a command such as:
Import-ExcelSheet "C:\Import Files\ExcelFile.xlsx" "Sheet1" |
Export-Csv C:\Import Files\Test.Csv" --Delimiter ';' -NoTypeInformation
If you have the 32-bit version of Excel installed, you will need to run the above script in the 32-bit version of PowerShell.
If you don't want to license Excel or can't install it on some computer where you want to run the script, you can install the Access database engine instead:
https://www.microsoft.com/en-us/download/details.aspx?id=54920

Related

Use powershell clipboard to copy rows from excel in browser

I hope you can help me with my PowerShell problem. I want to use the Set-Clipboaed and Get-Clipboard commands from PowerShell to copy specific rows from my excel file in two different textfields in a specific browser site.
The problem is that I always get the content copied from the rows in my excel file, only in the first textfield. For example: The content of row A1 is "Hello" and the content from row A2 is "World". If I copy the rows in the textfields of the browser site, both Strings "Hello" "World" will be displayed only in textfield1. My goal is to have the string "Hello" in textfield1 and the string "World" in textfield2 and later to have a command to use keys on my keyboard to paste the content like in KeePass with the credentials.
Here's what I've done so far. I used a little help from the site https://lazywinadmin.com/2014/03/powershell-read-excel-file-using-com.html and tried to split both rows in two different strings and then tried to copy both string with "Get-Clipboard" in the browser site.
#Specify the path of the excel file
$FilePath = "PathToMyExcelFile\Test-Excel-Auto2.xlsx"
#Specify the Sheet name
$SheetName = "table1"
# Create an Object Excel.Application using Com interface
$objExcel = New-Object -ComObject Excel.Application
#$objExcel = new-object -c excel.application
# Disable the 'visible' property so the document won't open in excel
$objExcel.Visible = $false
# Open the Excel file and save it in $WorkBook
$WorkBook = $objExcel.Workbooks.Open($FilePath)
# Load the WorkSheet 'BuildSpecs'
$WorkSheet = $WorkBook.sheets.item($SheetName)
Set-Clipboard $WorkSheet.Range("A1").Text
Set-Clipboard $WorkSheet.Range("A2").Text -Append
(Get-Clipboard) -split "'n'"
# Get-Clipboard) -split '\t|\r?\n'
# Get-Clipboard.Split ( "'\t|\r?\n'")
# Set-Clipboard $WorkSheet.Range("A1").Text
# $variable = Get-Clipboard
# Set-Clipboard $WorkSheet.Range("A2").Text -Append
# $variable2 = Get-Clipboard
I miss the part, how to get both strings copied in the two different textfields in my browser site.
Thanks in advance for your help.
MarT22
Setting $WorkSheet.Range("A2").Text means you're only copying the cell text ("world"). You're then appending the text with what's already in your clipboard ("hello"), which gives you just a concatenated string ("helloworld").
You need to set the clipboard content to a variable and store it for later. Depending on how many different fields you need, you'll either want to set the content into a temporary string or two, or you will want to look into hashtables.
It looks like you were going in the right direction in your commented out code. You'll want to do something like:
$Hello = Set-Clipboard $Worksheet.Range("A1").Text
$World = Set-Clipboard $Worksheet.Range("A2").Text
Then later when you need to paste the content you need to put the variables back into your clipboard, then paste the content.
For example, with something like KeePass you're going to be looking at username/password combos, so what you'd get would be:
PS C:\> $Username = Set-Clipboard $Worksheet.Range("A1").Text # ("Hello")
PS C:\> $Password = Set-Clipboard $Worksheet.Range("A2").Text # ("World")
PS C:\> Set-Clipboard $Username
PS C:\> Get-Clipboard
Hello
PS C:\> Set-Clipboard $Password
PS C:\> Get-Clipboard
World

Complicated Transposed Table With Powershell or Excel

I'm trying to manipulate the layout of a file-share report. Basically what the layout looks like now is this:
Path,Username/Group
path1,user1
path2,user1
path3,user1
path1,user2
path3,group1
path2,group2
It's showing folder paths and what users have access to them.
I'd like to change this to the following layout:
user1,user2,group1,group2
path1,path1,path3,path2
path2
path3
Whether it be importing the data into excel and manipulating it in excel or using powershell script to manipulate the data, I'm not quite sure what to do to get it the way I want.
I've tried importing this text file into excel and trying to transpose but I can't figure out how to show a list of file paths for each user. I've messed around in Access with it as well, but I'm not experienced enough in access to get it to display properly. I tried a few things in powershell but it amounted to a bunch of text documents named after the users with a list of file paths in each document. Not quite as neat as I'd like it unfortunately.
PowerShell could do it. Assuming the data is what you show in the question it looks like a CSV file. You could do:
$DataIn = Import-CSV $file
$HTOut = #{}
$File | Group 'Username/Group' | ForEach{$HTOut.add($_.Name,$_.Group.Path)}
New-Object PSObject -Prop $HTOut | Export-CSV $file
I thought about it, and this doesn't do exactly what I had said, it would make one object with a property for each user/group, and that property's value would be all of the paths for that person/group. What you really want is X objects that iterate through all of those paths. For that the first 3 lines stay the same, except that we capture the number of paths for the user/group with the most paths. Then we make that many objects iterating through paths for each user.
$DataIn = Import-CSV $file
$HTOut = #{}
$MaxPaths = $DataIn | Group 'Username/Group' | ForEach{$HTOut.add($_.Name,$_.Group.Path);$_} |% Count |Sort -Descend |Select -first 1
$Results = For($i=0;$i -le $MaxPaths;$i++){
$Record = New-Object PSObject
$HTOut.Keys|ForEach{Add-Member -InputObject $Record -NotePropertyName $_ -NotePropertyValue $(([array]$HTOut["$_"])[$i])}
$Record
}

Why does PowerShell not want to save my file as a CSV

$Path = 'D:/ETL_Data/TwitchTVData.xlsx'
$csvPath = 'D:/ETL_Data/TwitchTVData2.csv'
# Open the Excel document and pull in the 'Sheet1' worksheet
$Excel = New-Object -Com Excel.Application
$Workbook = $Excel.Workbooks.Open($Path)
$page = 'Sheet1'
$ws = $Workbook.Worksheets | Where-Object {$_.Name -eq $page}
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
# Set variables for the worksheet cells, and for navigation
$cells = $ws.Cells
$row = 1
$col = 4
$formula = #"
=NOW()
"#
# Add the formula to the worksheet
$range = $ws.UsedRange
$rows = $range.Rows.Count
for ($i=0; $i -ne $rows; $i++) {
$cells.Item($row, $col) = $formula
$row++
}
$ws.Columns.Item("A:D").EntireColumn.AutoFit() | Out-Null
$ws.Columns.Range("D1:D$rows").NumberFormat = "yyyy-MM-dd hh:mm"
$Excel.ActiveWorkbook.SaveAs($csvPath)
$Excel.Quit()
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
https://www.experts-exchange.com/questions/27996530/How-to-convert-and-xlsx-spreadsheet-into-CSV.html#answer38780402-20
I was attempting to follow that, but for some reason for me the SaveAs() doesn't work. It gives me an error
cannot access the file 'D://ETL_Data/E567DF00
What do I have to do to get this to save over to CSV?
Edit:
Exact error without the fileformat parameter 6 as suggested in the comments:
Microsoft Excel cannot access the file 'D:\//ETL_Data/8011FF00'. There are
several possible reasons:
o The file name or path does not exist.
o The file is being used by another program.
o The workbook you are trying to save has the same name as a currently open
workbook.
At D:\PS_Scripts\twitchExcelAddSnapShot.ps1:32 char:1
+ $Excel.ActiveWorkbook.SaveAs($csvPath)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException
Exact error with fileformat parameter 6:
The file could not be accessed. Try one of the following:
o Make sure the specified folder exists.
o Make sure the folder that contains the file is not read-only.
o Make sure the file name does not contain any of the following characters:
< > ? [ ] : | or *
o Make sure the file/path name doesn't contain more than 218 characters.
At D:\PS_Scripts\twitchExcelAddSnapShot.ps1:32 char:1
+ $Excel.ActiveWorkbook.SaveAs($csvPath,6)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException
While PowerShell is pretty forgiving when it comes to path separators, COM+ servers (like Excel.Application) might not be.
Change the $csvPath variable value to use \ instead of /:
$csvPath = 'D:\ETL_Data\TwitchTVData2.csv'
To complement Mathias R. Jessen's helpful answer with background information:
It seems that Excel's application-specific behavior is the cause of your problem, unrelated to the underlying foundational subsystems or APIs used.
(Excel's automation API happens to be a COM server.)
I'm unclear on why Excel acts this way - it also does so in interactive use, though you could argue that at least in programmatic use it should allow / too.
To offer generalized advice:
On Windows, to be safe, use \, especially when dealing with application-level automation APIs, though at the level of system APIs / should work as well - see below.
In cross-platform code, use / (but watch out for the exceptions above; [System.IO.Path]::DirectorySeparatorChar reports the platform-appropriate (primary) character).
Though rarely used, Windows at the API level allows interchangeable use of \ and / (which apparently goes back to the DOS 2.0 days), when support for directories was introduced), and that is also reflected in higher-level subsystems, as the following examples demonstrate.
# PS: OK
# Should output c:\windows\win.ini
(Get-Item c:/windows/win.ini).FullName
# .NET: OK
# Should return the 1st child dir.'s path.
# Note that the child directory names will be appended with "\", not "/"
[System.IO.Directory]::GetDirectories('c:/windows/system32') | Select-Object -First 1
# COM (see Excel exception below): OK
# Should return $true
(New-Object -ComObject Scripting.FileSystemObject).FileExists('c:/windows/win.ini')
# Windows API: OK
# Should return a value such as 32.
(Add-Type -PassThru WinApiHelper -MemberDefinition '[DllImport("kernel32.dll")] public static extern uint GetFileAttributes(string lpFileName);')::GetFileAttributes('c:/windows/win.ini')
# cmd.exe: INCONSISTENTLY SUPPORTED
# Note: *quoting matters* here, so that tokens with / aren't mistaken for options.
# attrib: works
cmd /c 'attrib "c:/windows/win.ini"'
# dir: works with DIRECTORIES, but fails with FILES
cmd /c 'dir /b "c:/windows/system32"' # OK
cmd /c 'dir /b "c:/windows/win.ini"' # FAILS with 'File not found'
cmd /c 'dir /b "c:/windows\win.ini"' # Using \ for the FILE component (only) works.
Here's a minimal example that demonstrates Excel's problem with /:
# Create temporary dir.
$null = mkdir c:\tmp -force
$xl=New-Object -Com Excel.Application
$wb = $xl.Workbooks.Add()
# OK - with "\"
$wb.SaveAs('c:\tmp\t1.xlsx')
# FAILS - with "/":
# "Microsoft Excel cannot access the file 'C:\//tmp/F5D39400'"
$wb.SaveAs('c:/tmp/t2.xlsx')
$xl.Quit()
# Clean up temp. dir.

save as "proper" csv / delete quotes from CSV except for where comma exists

I am downloading a CSV from a SharePoint site. It comes with a .csv file extension.
When I inspect the file's contents by opening it in Notepad, I see data that looks like this sample row:
"TITLE",OFFICE CODE,="","CUSTOMER'S NAME",ACCOUNT
I want the data look like this:
TITLE,OFFICE CODE,,"CUSTOMER'S NAME",ACCOUNT
One way to solve this problem is manually. When I open the file in Excel and save it (without altering anything), it prompts me with the following: fileOrig.csv may contain features that are not compatible with CSV (Comma delimited). Do you want to keep the workbook in this format? When I save it, and then inspect it in Notepad, the data is formatted according to how I want it do look.
Is there a quick way to resave the original CSV with PowerShell?
If there is no quick way to resave the file with PowerShell, I would like to use PowerShell to parse it.
These are the parsing rules I want to introduce:
Remove encapsulating doublequote from cells that do not contain a , char
Remove the = char
I tried writing a test script that just looks at the column that potentially contains , chars. It is supposed to find the cells that do not contain a , char, and remove the doublequotes that encapsulate the text. It does not work, because I think it tosses the doublequote upon Import-Csv
$source = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$dest2 = 'I:\dir\fileFinal.csv'
get-content $source |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $dest -Force
$testcsv = Import-Csv $dest
foreach($test in $testcsv)
{
#Write-Host $test."CUSTOMER NAME"
if($test."CUSTOMER NAME" -NotLike "*,*") {
$test."CUSTOMER NAME" -replace '"', ''
}
}
$testcsv | Export-Csv -path $dest2 -Force
Can someone please help me either with implementing the logic above, or if you know of a better way to save the file as a proper CSV, can you please let me know?
Since Excel can handle the problem, why not use a vbs script to automate it? Use notepad to create "Fix.vbs" with the following lines:
Set objExcel = CreateObject("Excel.Application")
Set objWorkbook = objExcel.Workbooks.Open("C:\test\test.csv")
objworkbook.Application.DisplayAlerts = False
objworkbook.Save
objexcel.quit
run it from a command prompt and it should do the trick.
I see that there's already an approved answer, I'm just offering an alternative.
If you want to keep it in PowerShell you could do this:
$File = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$Output = 'I:\dir\fileFinal.csv'
$CSV = Import-Csv $file
$Members = $test|gm -MemberType Properties|select -ExpandProperty name
$test|%{$row=$_;$Members|%{if(!($row.$_ -match "\w+")){$row.$_=$null}};$_=$row}|export-csv $dest -NoTypeInformation -Force
gc $file|%{($_.split(",") -replace "^`"(.*)`"$","`$1") -join ","}|Out-File $Output
That imports the CSV, makes sure that there are words (letters, numbers, and/or underscores... don't ask my why underscores are considered words, RegEx demands that it be so!) in each property for each entry, exports the CSV, then runs through the file again as just text splitting at commas and if it shows up enclosed in double quotes it strips those, re-joins the line, and then outputs it to a file. The only thing that I don't think shows up like your "preferred output" in the OP is that instead of "CUSTOMER'S NAME" you get CUSTOMER'S NAME.

Powershell - parsing a PDF file for a literal or image

Using Powershell & running PowerGUI. I have a PDF file that I need to search through in order to find if there was an attachment referenced within the content of a particular page. Either that, or I need to search for images, such as a Microsoft Word or Excel icon or a PDF icon within the document.
I am using the following code to read in the page:
Add-Type -Path "c:\itextsharp-all-5.4.5\itextsharp-dll-core\itextsharp.dll"
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "c:\files\searchfile.pdf"
for ($page = 1; $page -le 3; $page++) {
$lines = [char[]]$reader.GetPageContent($page) -join "" -split "`n"
foreach ($line in $lines) {
if ($line -match "^\[") {
$line = $line -replace "\\([\S])", $matches[1]
$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
}
}
}
However, the above gives a few bits of text, but mostly unprintable characters.
How can you search a PDF file using Powershell searching for either a literal (like ".doc" or ".xlsx")? Can a PDF be searched for a graphic (like the Excel or Word icon)?
Without seeing the PDF raw content, it's not easy to give specific help, so if you can share a sample PDF or it's contents that would be helpful.
Once you know what to look for in the stream, you can search by reading in the file line by line and using the -match operator:
$file = [io.file]::ReadAllLines('C:\test​.pdf')
$title = ($file -match "<rdf:li")[0].Split(">")[1].Split("<")[0]
$description = ($file -match "<rdf:li")[2].Split(">")[1].Split("<")[0]
write-host ("Title: " + $title)
write-host ("Description: " + $description)
I doubt very much that the contents of the file will tell you much more than that an image exists at particular page coordinates (although I'm by no means a PDF expert) but it may also include the binary file stream, in which case you may be able to save that stream as a file (I haven't tried it as yet).

Resources