save as "proper" csv / delete quotes from CSV except for where comma exists - excel

I am downloading a CSV from a SharePoint site. It comes with a .csv file extension.
When I inspect the file's contents by opening it in Notepad, I see data that looks like this sample row:
"TITLE",OFFICE CODE,="","CUSTOMER'S NAME",ACCOUNT
I want the data look like this:
TITLE,OFFICE CODE,,"CUSTOMER'S NAME",ACCOUNT
One way to solve this problem is manually. When I open the file in Excel and save it (without altering anything), it prompts me with the following: fileOrig.csv may contain features that are not compatible with CSV (Comma delimited). Do you want to keep the workbook in this format? When I save it, and then inspect it in Notepad, the data is formatted according to how I want it do look.
Is there a quick way to resave the original CSV with PowerShell?
If there is no quick way to resave the file with PowerShell, I would like to use PowerShell to parse it.
These are the parsing rules I want to introduce:
Remove encapsulating doublequote from cells that do not contain a , char
Remove the = char
I tried writing a test script that just looks at the column that potentially contains , chars. It is supposed to find the cells that do not contain a , char, and remove the doublequotes that encapsulate the text. It does not work, because I think it tosses the doublequote upon Import-Csv
$source = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$dest2 = 'I:\dir\fileFinal.csv'
get-content $source |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $dest -Force
$testcsv = Import-Csv $dest
foreach($test in $testcsv)
{
#Write-Host $test."CUSTOMER NAME"
if($test."CUSTOMER NAME" -NotLike "*,*") {
$test."CUSTOMER NAME" -replace '"', ''
}
}
$testcsv | Export-Csv -path $dest2 -Force
Can someone please help me either with implementing the logic above, or if you know of a better way to save the file as a proper CSV, can you please let me know?

Since Excel can handle the problem, why not use a vbs script to automate it? Use notepad to create "Fix.vbs" with the following lines:
Set objExcel = CreateObject("Excel.Application")
Set objWorkbook = objExcel.Workbooks.Open("C:\test\test.csv")
objworkbook.Application.DisplayAlerts = False
objworkbook.Save
objexcel.quit
run it from a command prompt and it should do the trick.

I see that there's already an approved answer, I'm just offering an alternative.
If you want to keep it in PowerShell you could do this:
$File = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$Output = 'I:\dir\fileFinal.csv'
$CSV = Import-Csv $file
$Members = $test|gm -MemberType Properties|select -ExpandProperty name
$test|%{$row=$_;$Members|%{if(!($row.$_ -match "\w+")){$row.$_=$null}};$_=$row}|export-csv $dest -NoTypeInformation -Force
gc $file|%{($_.split(",") -replace "^`"(.*)`"$","`$1") -join ","}|Out-File $Output
That imports the CSV, makes sure that there are words (letters, numbers, and/or underscores... don't ask my why underscores are considered words, RegEx demands that it be so!) in each property for each entry, exports the CSV, then runs through the file again as just text splitting at commas and if it shows up enclosed in double quotes it strips those, re-joins the line, and then outputs it to a file. The only thing that I don't think shows up like your "preferred output" in the OP is that instead of "CUSTOMER'S NAME" you get CUSTOMER'S NAME.

Related

What does #"String"# syntax mean in powershell

Not sure what this is an cant find it on the internet
$myString = #"
string
"#
$myRegularString = "string"
write-output $myString.getType() # outputs System.String
write-output $myRegularString.getType() # outputs System.String
I encountered the latter when someone converted the text of a string to a .pbk file. They originally said there was a problem converting it to utf8 but when I imported the .pbk properties from a text using get-content $myString -encoding utf8 file it was fine
So whats the difference? Is one fancier?
Those are here-strings, they allow to preserve formatting, including line breaks:
$string = #"
this is
a test
string
"#
Also please be aware that when you use Get-Content -Path test.txt the result is an array of strings (each line is an array item). If you want to get the file content as a single string object you will need to use Get-Content -Path test.txt -Raw.

How to change delimiter in excel CSV saving using powershell

Using Powershell and Excel 2016, I'm trying to open a .xlsx file, extract a single page, and save this page as a .csv with a " ; " delimiter. The problem is that while Excel expects " ; " delimiter when opening a csv file, it always saves them with a " , " delimiter.
I'd prefer to not have to change any settings, this is a script i'm writing for a project that needs to work natively on any pc, so having to go and change settings every time I need it to run on another computer would be problematic.
I already checked that the list delimiter settigs in windows was indeed a " ; ", and it is.
I tried every type of CSV saving described in the microsoft doc (https://learn.microsoft.com/fr-fr/office/vba/api/excel.xlfileformat),
what's weird is that when saving a file from the GUI version, I only have 3 versions of CSV, instead of 5 listed on the website, and one of them is "CSV with " ; " delimiter", which works as intended, but I can't seem to use this type of file when saving using Excel via Powershell
There's apparently a "local" flag that can be activated for Excel to use the delimiter settings of windows, but I have no idea of how ot activate it in Powershell and I'd prefer not to use this since it means that the program wouldn't work on a Windows with a different delimiter configuration.
# Args[0] : file to open
# [1] : file to save
# page_to_extract : name of the page I need
# I open an Excel session
$excel_session = New-Object -Com Excel.Application
$excel_session.displayAlerts = $false
# I open the file I need to extract the page from
$excel_workbook = $excel_session.workbooks.open($args[0])
# I load in the page
$excel_worksheet = $excel_workbook.worksheets($page_to_extract)
# I save the page using a csv type (6,22,24,62,23)
$excel_worksheet.saveAs($args[1], 6)
$excel_session.quit()
This code always saves my csv with a " , " delimiter, I need " ; " instead.
I need to use Powershell and ONLY Powershell for this, no windows settings, no excel settings.
I had success with the following code with my own data. This uses your COM Object assignment code. I added logic to extract the cells that contain data, add that data to a new custom object on each row iteration, store each custom object in an array, and finally pipe the array into Export-Csv. Your specified delimiter ; is used in the Export-Csv command.
$excel_session = New-Object -Com Excel.Application
$excel_session.displayAlerts = $false
# I open the file I need to extract the page from
$excel_workbook = $excel_session.workbooks.open($args[0])
# I load in the page
$excel_worksheet = $excel_workbook.worksheets($page_to_extract)
# Get Range of Used Cells in Worksheet
$range = $excel_worksheet.usedrange
# Get First Row Column Text to be Used as Object Properties
$headers = $range.rows.item(1).value2
# Loop through Rows and Columns to Extract Data
# First loop traverses rows
# Second loop traverses columns
$output = for ($i = 2; $i -le $range.rows.count; $i++) {
$hash = [ordered]#{}
for ($j = 1; $j -le $range.columns.count; $j++) {
[void]$hash.Add($headers.GetValue(1,$j),$range.rows.item($i).columns.item($j).Text)
}
[pscustomobject]$hash
}
$output | Export-Csv file.csv -NoType -Delimiter ';'
# Clean Up COM Objects
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel_workbook)
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel_session)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
An inefficient, but simple and pragmatic workaround is to:
Use your code as-is to let Excel temporarily produce an interim ,-separated CSV file.
Import that file with Import-Csv (which uses , by default), and export again with Export-Csv -Delimiter ';'.
In the context of your code:
(Import-Csv $args[1]) | Export-Csv $args[1] -Delimiter ';' -NoTypeInformation
Note:
The Import-Csv call is enclosed in (...) to ensure that the input file is read in full up front, which enables writing back to the same file in the same pipeline.
Export-Csv, sadly, defaults to ASCII(!) encoding; if your data contains non-ASCII characters, specify an appropriate encoding with -Encoding.
The List Separator is a Windows regional setting.
To change it, please see :
https://support.office.com/en-us/article/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba
Change the separator in all .csv text files In Microsoft Windows,
click the Start button, and then click Control Panel.
Open the dialog box for changing Regional and Language settings.
Type a new separator in the List separator box.
Click OK twice.
Note: After you change the list separator character for your
computer, all programs use the new character as a list separator. You
can change the character back to the default character by following
the same procedure.
You should now be able to change the csv character delimiter.
Please note that you'll need to restart your computer to make the change in effect. You can check your current List Separator value in your Powershell session with (Get-Culture).TextInfo.ListSeparator
You can also check this post, which has a lot of screenshot and different other options on how to do so: https://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files
My recommendation is to avoid Excel and use the database objects instead. Example:
[CmdletBinding()]
param(
[Parameter(Position = 0,Mandatory = $true)]
[ValidateNotNullOrEmpty()]
$ExcelFileName,
[Parameter(Position = 1,Mandatory = $true)]
[ValidateNotNullOrEmpty()]
$SheetName
)
$queryString = 'SELECT * FROM [{0}$A1:end]' -f $SheetName
$connectionString = ("Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source=$((Get-Item -LiteralPath $ExcelFileName -ErrorAction Stop).FullName);" +
"Extended Properties=Excel 8.0;")
try {
$connection = New-Object Data.OleDb.OleDbConnection($connectionString)
$command = New-Object Data.OleDb.OleDbCommand($queryString)
$command.Connection = $connection
$connection.Open()
$adapter = New-Object Data.OleDb.OleDbDataAdapter($command)
$dataTable = New-Object Data.DataTable
[Void] $adapter.Fill($dataTable)
$dataTable
}
catch [Management.Automation.MethodInvocationException] {
Write-Error $_
}
finally {
$connection.Close()
}
If the above script is Import-ExcelSheet.ps1, you could export to a ;-delimited CSV file by running a command such as:
Import-ExcelSheet "C:\Import Files\ExcelFile.xlsx" "Sheet1" |
Export-Csv C:\Import Files\Test.Csv" --Delimiter ';' -NoTypeInformation
If you have the 32-bit version of Excel installed, you will need to run the above script in the 32-bit version of PowerShell.
If you don't want to license Excel or can't install it on some computer where you want to run the script, you can install the Access database engine instead:
https://www.microsoft.com/en-us/download/details.aspx?id=54920

Complicated Transposed Table With Powershell or Excel

I'm trying to manipulate the layout of a file-share report. Basically what the layout looks like now is this:
Path,Username/Group
path1,user1
path2,user1
path3,user1
path1,user2
path3,group1
path2,group2
It's showing folder paths and what users have access to them.
I'd like to change this to the following layout:
user1,user2,group1,group2
path1,path1,path3,path2
path2
path3
Whether it be importing the data into excel and manipulating it in excel or using powershell script to manipulate the data, I'm not quite sure what to do to get it the way I want.
I've tried importing this text file into excel and trying to transpose but I can't figure out how to show a list of file paths for each user. I've messed around in Access with it as well, but I'm not experienced enough in access to get it to display properly. I tried a few things in powershell but it amounted to a bunch of text documents named after the users with a list of file paths in each document. Not quite as neat as I'd like it unfortunately.
PowerShell could do it. Assuming the data is what you show in the question it looks like a CSV file. You could do:
$DataIn = Import-CSV $file
$HTOut = #{}
$File | Group 'Username/Group' | ForEach{$HTOut.add($_.Name,$_.Group.Path)}
New-Object PSObject -Prop $HTOut | Export-CSV $file
I thought about it, and this doesn't do exactly what I had said, it would make one object with a property for each user/group, and that property's value would be all of the paths for that person/group. What you really want is X objects that iterate through all of those paths. For that the first 3 lines stay the same, except that we capture the number of paths for the user/group with the most paths. Then we make that many objects iterating through paths for each user.
$DataIn = Import-CSV $file
$HTOut = #{}
$MaxPaths = $DataIn | Group 'Username/Group' | ForEach{$HTOut.add($_.Name,$_.Group.Path);$_} |% Count |Sort -Descend |Select -first 1
$Results = For($i=0;$i -le $MaxPaths;$i++){
$Record = New-Object PSObject
$HTOut.Keys|ForEach{Add-Member -InputObject $Record -NotePropertyName $_ -NotePropertyValue $(([array]$HTOut["$_"])[$i])}
$Record
}

Powershell - Pulling string from txt, splitting it, then concatenating it for archive

I have an application where I am getting a list of new\modified files from git status, then I take the incomplete strings from that file, concatenate them with the root dir file path, then move those files to an archive. I have it half working, but the nature of how I am using powershell does not provide error reports and the process is obviously erroring out. Here is the code I am trying to use. (It has gone through several iterations, please excuse the commented out portions) Basically I am trying to Get-Content from the txt file, then replace ? with \ (for some reason the process that creates the txt love forward slashes...), then split that string at the spaces. The only part of the string I am interested in is the last part, which I am trying to concatenate with the known working root directory, then I am attempting to move those to an archive location. Before you ask, this is something we are not willing to track in git, due to the nature of the files (they are test outputs that are time stamped, we want to save them on a per test run basis, not in git) I am still fairly new to powershell and have been banging my head against this rock for far too long.
Get-Content $outfile | Foreach-Object
{
#$_.Replace("/","\")
#$lineSplit = $_.Split(' ')
$_.Split(" ")
$filePath = "$repo_dir\$_[-1]"
$filePath.Replace('/','\')
"File Path Created: $filePath"
$untrackedLegacyTestFiles += $filePath
}
Get-Content $untrackedLegacyTestFiles | Foreach-Object
{
Copy-Item $_ $target_root -force
"Copying File: $_ to $target_root"
}
}
the $outfile is a text file where each line has a partial file path leading to a txt file generated by a test application we use. This info is provided by git, so it looks like this in the $outfile txt file:
!! Some/File/Path/Doc.txt
The "!!" mean git sees it as a new file, however it could be several characters from a " M" to "??". Which is why I am trying to split it on the spaces and take only the last element.
My desired output would be to take the the last element of the split string from the $outfile (Some/File/Path/Doc.txt) and concatenate it with the $repo_dir to form a complete file path, then move the Doc.txt to an archive location ($target_root).
To combine a path in PowerShell, you should use the Join-Path cmdlet. To extract the path from your string, you can use a regex:
$extractedPath = [regex]::Match('!! Some/File/Path/Doc.txt', '.*\s(.+)$').Groups[1].Value
$filePath = Join-Path $repo_dir $extractedPath
The Join-Path cmldet will also convert all forward slashes to backslashes so no need to replace them :-).
Your whole script could look like this:
Get-Content $outfile | Foreach-Object {
$path = Join-Path $repo_dir ([regex]::Match($_, '.*\s(.+)$').Groups[1].Value)
Copy-Item $path $target_root -force
}
If you don't like to use regexin your code, you can also extract the path using:
$extractedPath = '!! Some/File/Path/Doc.txt' -split ' ' | select -Last 1
or
$extractedPath = ('!! Some/File/Path/Doc.txt' -split ' ')[-1]

PowerShell Import-CSV adding spaces between characters

I have a CSV file which looks all fine until it is imported into PowerShell, when its imported each character is followed by a space like C : \ instead of C:\.
It would be easy enough to format the cells to text in Excel (which works) but this CSV file is created on multiple servers by in an automation policy so going through each of these files and formatting them will take a while as you can imagine.
I was wondering if there was a way in which I can format the cells first in PowerShell then import the CSV.
PowerShell code I am using:
$data = import-csv -Path $path -UseCulture -Header #("Path", "Folder", "Size")
CSV Snippet:
C:\,C:\,14.0GB
C:\Program Files,Program Files,4.5GB
C:\Program Files\Microsoft Office,Microsoft Office,2.8GB
It sounds like the file might be Unicode, but without the proper byte order marks, which would cause PowerShell to use the default ASCII encoding. If that is the case, you'll need to specify the encoding:
$data = import-csv -Encoding Unicode -Path $path ...
Another option is to convert the file to ASCII prior to the import [credit to OP for the command]:
Get-content C:\path\TestXml.csv | Set-Content -Encoding Ascii TestXml.csv
this might present a different problem, but it may work for removing all the spaces
$data = import-csv -Path $path -UseCulture -Header #("Path", "Folder", "Size")
$data | % {
$_.path = $_.path -replace '\s'
$_.folder = $_.folder -replace '\s'
$_.size = $_.size -replace '\s'
}
$data
It would be beneficial to get a snippet of what the CSV looks like. Can you provide the header and 1 or 2 rows?
Is the header specified in the CSV file?
When using the Header parameter, delete the original header row from the CSV file. Otherwise, Import-Csv creates an extra object from the items in the header row.
https://technet.microsoft.com/en-us/library/hh849891.aspx
You are specifying the UseCulture switch which will use the default delimiter specified by the environment. You can run the following command to find your culture's delimiter:
(Get-Culture).TextInfo.ListSeparator
https://technet.microsoft.com/en-us/library/hh849891.aspx
#Tony Hinkle sent me in the right direction so I have marked his answer as correct, here is the code that I have used:
Set the content of the CSV to ascii encoding
Get-content C:\Users\sam\Desktop\TestXml.csv | Set-Content -Encoding Ascii TestXml.csv
Then import the Csv
$data = Import-Csv C:\Users\sam\Desktop\TestXml.csv -Header #("Path", "Folder", "Size")

Resources