Merge multiple CSV into one without using Excel.Application - excel

I created a PowerShell script that allows me to merge multiple .CSV into one .XLSX file.
It works well on my computer:
$path = "C:\Users\Francesco\Desktop\CSV\Results\*"
$csvs = Get-ChildItem $path -Include *.csv
$y = $csvs.Count
Write-Host "Detected the following CSV files: ($y)"
Write-Host " "$csvs.Name"`n"
$outputfilename = "Final Registry Results"
Write-Host Creating: $outputfilename
$excelapp = New-Object -ComObject Excel.Application
$excelapp.SheetsInNewWorkbook = $csvs.Count
$xlsx = $excelapp.Workbooks.Add()
for ($i=1;$i -le $y;$i++) {
$worksheet = $xlsx.Worksheets.Item($i)
$worksheet.Name = $csvs[$i-1].Name
$file = (Import-Csv $csvs[$i-1].FullName)
$file | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | clip
$worksheet.Cells.Item(1).PasteSpecial() | Out-Null
}
$output = "C:\Users\Francesco\Desktop\CSV\Results\Results.xlsx"
$xlsx.SaveAs($output)
$excelapp.Quit()
The problem is that I need to run this on several servers and servers are well known for not having Office installed so I cannot use Excel.Application.
Is there a way to merge multiple CSV into one CSV or XLSX without using Excel.Application and saving each CSV into a different sheet?

#AnsgarWiechers is right, ImportExcel is powerful and not difficult to use. However for your specific case you can use a more limited approach, using OleDb (or ODBC or ADO) to write to an Excel file like a database. Here is some sample code showing how to write to an Excel file using OleDb.
$provider = 'Microsoft.ACE.OLEDB.12.0'
$dataSource = 'C:\users\user\OleDb.xlsb'
$connStr = "Provider=$provider;Data Source=$dataSource;Extended Properties='Excel 12.0;HDR=YES'"
$objConn = [Data.OleDb.OleDbConnection]::new($connStr)
$objConn.Open()
$cmd = $objConn.CreateCommand()
$sheetName = 'Demo'
$cmd.CommandText = 'CREATE TABLE $sheetName (Name TEXT,Age NUMBER)'
$cmd.ExecuteNonQuery()
$cmd.CommandText = "INSERT INTO demo (Name,Age) VALUES ('Adam', 20)"
$cmd.ExecuteNonQuery()
$cmd.CommandText = "INSERT INTO demo (Name,Age) VALUES ('Bob',30)"
$cmd.ExecuteNonQuery()
$cmd.Dispose()
$objConn.Close()
$objConn.Dispose()
You didn't say much about the CSV files you'll be processing. If column data varies, to create the table you'll have to get the attribute (column) names from the CSV header (either by reading the first line of the CSV file, or by enumerating the properties of the first item returned by Import-CSV).
If your CSV files have a large number of lines, writing one line at a time may be slow. In that case using a DataSet and OleDbDataAdapter might improve performance (but I haven't tested). But at that point you might as well use OleDb to read the .csv directly into a DataSet, create a OleDbDataAdapter, set the adapter's InsertCommand property, and finally call the adapters Update method. I don't have time to write and test all that.
This is not intended as a full solution, just a demo of how to use OleDb to write to an Excel file.
Note: I tested this on a server that didn't have Office or Excel installed. The Office data providers pre-installed on that machine were 32-bit, but I was using 64-bit PowerShell. To get 64-bit drivers I installed the Microsoft Access Database Engine 2016 Redistributable and that's what I used for testing.

Time has passed and I have found a new solution: Install-Module -Name ImportExcel
This way the module takes care of the job like in this script.

Related

Performance increase for PowerShell dataset to Excel

I have a PowerShell script which pulls data from DB and pushes it to a excel sheet. I am facing slowness (45 mins approx) while copying the records in the dataset to the excel sheet as the number of records exceed 200K. And I am trying to loop them one by one using the below snippet, which takes more time. Is there a way in which I can transfer the data from dataset to excel more efficiently?
$cells=$Worksheet.Cells
$row=1
foreach ($rec in $dataset.Tables[0].Rows)
{
$row++
$col=1
$cells.item($Row,$col)=$USR.ID
$col++
$cells.item($Row,$col)=$USR.Name
$col++
$cells.item($Row,$col)=$USR.Age
$col++
}
You shoud try PSExcel module. There's no need to create COM object and even have Excel installed. Your example would look like this and be lightning fast:
$dataset.Tables[0] |
Select-Object ID,Name,Age |
Export-XLSX -Path $FullName -AutoFit -WorksheetName 'MyData'
A nice little workaround I saw sometime ago was to format the rows as a CSV string and simply paste them in. For the environment I was using, this proved to be more efficient than creating a file using Export-CSV, then loading it in Excel.
#Row data joined with tabs
$data = #("[A1]", "[A2]", "[A3]", "[A4]", "[A5]", "[A6]") -join "`t"
#Multiple rows joined with new lines
$dataToPaste = "{0}`n{1}`n{2}" -f $data, $data.replace("A", "B"), $data.replace("A", "C")
$excel = New-Object -ComObject Excel.Application
$book = $excel.Workbooks.Add()
$sheet = $book.Worksheets.Add()
#Activate where to put data
$sheet.Range("B2").Activate() | Out-Null
#Copy data to clipboard and paste into sheet.
$dataToPaste | Clip
$sheet.Paste()
$excel.Visible = $true
#Cleanup
[Runtime.InteropServices.Marshal]::ReleaseComObject($excel) | Out-Null
$excel = $null
I did find that, very rarely, the Paste method throws an error, which was fixed by retrying a second time if it failed:
try{
$sheet.Paste()
}catch{
$sheet.Paste()
}
This may not be a preferred option if you are running something on a PC being used by someone, as the user could copy something to the clipboard after the script does (but before $sheet.Paste()) and invalidate your data.

Process .xlsx to csv with Powershell using rename and set delimiter

I have an Excel file that I receive and want to process it to a CSV using Powershell.
I have to alter it quite specifically so it can be a reliable input for a program that will process the csv info.
I don't know the exact headers, but i know there can be duplicates.
What I do is open the xlsx file with excel and save it as CSV:
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $True
$objExcel.DisplayAlerts = $True
$Workbook = $objExcel.Workbooks.open($xlsx1)
$WorkSheet = $WorkBook.sheets.item($sheet)
$xlCSV = 6
$Workbook = $objExcel.Workbooks.open($xlsx2)
$WorkSheet = $WorkBook.sheets.item($sheet)
$WorkBook.SaveAs($csv2,$xlCSV)
Now, the XLSX file will have comma's, so first I want to change them to dots.
I tried this, but it's not working:
$objRange = $worksheet.UsedRange
$objRange.Replace ",", "."
It errors out saying: Unexpected token '", "'.
Then when saving I want to set the Delimiter to comma, as it uses ";" standard.
With something like:
$WorkBook.SaveAs($csv2,$xlCSV) -delimiter ","
The last problem is the duplicate headers; this prevents PS to use Import-CSV. Here I tried, when file is separated with a comma it works:
Get-Content $downloads\BBKS_DIR_AUTO_COMMA.csv -totalcount 1 >$downloads\Headers.txt
But then I need to rename de duplicate names like I can have Regio, Regio, Regio.
I want to change this to Regio, Regio2, Regio3
My plan was to lookup the data of the txt, search for duplicates, and then ad an incremental nummer.
In the end I need to add a column with incremental numbers, but always with four numbers, like; 0001, 0002, 0010, 0020, 0200, 1500, I wont exceed 9999. How can this be done?
If you can help me, if only partially I'm very happy.
Further, I'm running Windows 7 x64, Powershell 3.0, Excel 2016 (if relevant)
If easier, its fine to go back to Command prompt for some tasks.
Personally, I wouldn't try and work with Excel sheets via Excel itself and COM - I'd use the excellent module https://github.com/dfinke/ImportExcel
Then you can import from the sheet straight to a native Powershell object array, and re-export with Export-Csv -Delimiter.
Edit: To answer follow ups :
Once you've loaded the module you can do "Get-Module ImportExcel | Select-Object -ExpandProperty ExportedCommands" to see what it makes available.
To import your Excel in the first place, do something like :
$WorkBook = Import-Excel
And if you need to take care of duplicate column names, you can do :
$WorkBook = Import-Excel -Header #("Regio1", "Regio2", "Regio")
Where the array you pass to -Header needs to include every column you want from the workbook.

Excel com-object via powershell

I am doing data output to csv file via powershell. Generally things goes well.
I have exported the data to csv file. It contains about 10 columns. When I open it with MS Excel it's all contained in first column. I want to split it by several columns programmatically via powershell(same GUI version offers). I could make looping and stuff to split the every row and then put values to appropriate cell but then it would take way too much time.
I believe there should be an elegant solution to make one column split to multiple. Is there a way to make it in one simple step without looping?
This is what I came up with so far:
PS, The CSV file is 100% FINE. The delimiter is ','
Get-Service | Export-Csv -NoTypeInformation c:\1.csv -Encoding UTF8
$xl = New-Object -comobject Excel.Application
$xl.Visible = $true
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open('c:\1.csv')
$ws = $wb.Sheets|?{$_.name -eq '1'}
$ws.Activate()
$col = $ws.Cells.Item(1,1).EntireColumn
This will get you the desired functionality; add to your code. Check out the MSDN page for more information on TextToColumns.
# Select column
$columnA = $ws.Range("A1").EntireColumn
# Enumerations
$xlDelimited = 1
$xlTextQualifier = 1
# Convert Text To Columns
$columnA.texttocolumns($ws.Range("A1"),$xlDelimited,$xlTextQualifier,$true,$false,$false,$true,$false)
$ws.columns.autofit()
I had to create a CSV which had "","" as delimiter to test this out. The file with "," was fine in excel.
# Opens with all fields in column A, used to test TextToColumns works
"Name,""field1"",""field2"",""field3"""
"Test,""field1"",""field.2[]"",""field3"""
# Opens fine in Excel
Name,"field1","field2","field3"
Test,"field1","field.2[]","field3"
Disclaimer: Tested with $ws = $wb.Worksheets.item(1)

How to convert .xls to .csv using Powershell without Excel installed

Is there a way to convert .xls to .csv without Excel being installed using Powershell?
I don't have access to Excel on a particular machine so I get an error when I try:
New-Object -ComObject excel.application
New-Object : Retrieving the COM class factory for component with CLSID
{00000000-0000-0000-0000-000000000000} failed due to the following
error: 80040154 Class not registered (Exception from HRESULT:
0x80040154 (REGDB_E_CLASSNOTREG)).
Forward
Depending on what you already have installed on your system you might need the Microsoft Access Database Engine 2010 Redistributable for this solution to work. That will give you access to the provider: "Microsoft.ACE.OLEDB.12.0"
Disclaimer: Not super impressed with the result and someone with more background could make this answer better but here it goes.
Code
$strFileName = "C:\temp\Book1.xls"
$strSheetName = 'Sheet1$'
$strProvider = "Provider=Microsoft.ACE.OLEDB.12.0"
$strDataSource = "Data Source = $strFileName"
$strExtend = "Extended Properties='Excel 8.0;HDR=Yes;IMEX=1';"
$strQuery = "Select * from [$strSheetName]"
$objConn = New-Object System.Data.OleDb.OleDbConnection("$strProvider;$strDataSource;$strExtend")
$sqlCommand = New-Object System.Data.OleDb.OleDbCommand($strQuery)
$sqlCommand.Connection = $objConn
$objConn.open()
$da = New-Object system.Data.OleDb.OleDbDataAdapter($sqlCommand)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt)
$dataReader.close()
$objConn.close()
$dt
Create an ODBC connection to the excel file $strFileName. You need to know your sheet name and populate $strSheetName which helps build $strQuery. When then use several objects to create a connection and extract the data from the sheet as a System.Data.DataTable. In my test file, with one populated sheet, I had two columns of data. After running the code the output of $dt is:
letter number
------ ------
a 2
d 34
b 0
e 4
You could then take that table and then ExportTo-CSV
$dt | Export-Csv c:\temp\data.csv -NoTypeInformation
This was built based on information gathered from:
Scripting Guy
PowerShell Code Repository

powershell excel access without installing Excel

I need to be able to read an existing (password protected) Excel spreadsheet (an .xlsx file) from Powershell - but I don't want to install Excel. Every approach I've found assumes that Excel is installed on the workstation where the script is running.
I've tried the Excel viewer, but it doesn't seem to work; it won't invoke properly. I've looked at other solutions on stackoverflow, but all of them seem to want to update the excel spreadsheet, and I'm hoping I don't have to go that far.
Am I missing something obvious?
See the Detailed Article from Scripting Guy here. You have to use classic COM ADO in your Powershell Script.
Hey, Scripting Guy! How Can I Read from Excel Without Using Excel?
Relevant Powershell Snippet:
$strFileName = "C:\Data\scriptingGuys\Servers.xls"
$strSheetName = 'ServerList$'
$strProvider = "Provider=Microsoft.Jet.OLEDB.4.0"
$strDataSource = "Data Source = $strFileName"
$strExtend = "Extended Properties=Excel 8.0"
$strQuery = "Select * from [$strSheetName]"
$objConn = New-Object System.Data.OleDb.OleDbConnection("$strProvider;$strDataSource;$strExtend")
$sqlCommand = New-Object System.Data.OleDb.OleDbCommand($strQuery)
$sqlCommand.Connection = $objConn
$objConn.open()
$DataReader = $sqlCommand.ExecuteReader()
While($DataReader.read())
{
$ComputerName = $DataReader[0].Tostring()
"Querying $computerName ..."
Get-WmiObject -Class Win32_Bios -computername $ComputerName
}
$dataReader.close()
$objConn.close()
That said, you have stated that your Excel file is password protected.
According to this Microsoft Support article, you cannot open password protected Excel files using OLEDB Connections.
From the Article:
On the Connection tab, browse to your workbook file. Ignore the "User
ID" and "Password" entries, because these do not apply to an Excel
connection. (You cannot open a password-protected Excel file as a data
source. There is more information on this topic later in this
article.)
If you don't have Excel installed, EPPlus is the best solution I know of to access Excel files from PowerShell. Refer to my answer here to setup EPPlus for PowerShell.
The following code creates a passwort protected Excel file containing the output of Get-Process and then reads back the process information from the password protected file:
# Load EPPlus
$DLLPath = "C:\Windows\System32\WindowsPowerShell\v1.0\Modules\EPPlus\EPPlus.dll"
[Reflection.Assembly]::LoadFile($DLLPath) | Out-Null
$FileName = "$HOME\Downloads\Processes.xlsx"
$Passwort = "Excel"
# Create Excel File with Passwort
$ExcelPackage = New-Object OfficeOpenXml.ExcelPackage
$Worksheet = $ExcelPackage.Workbook.Worksheets.Add("FromCSV")
$ProcessesString = Get-Process | ConvertTo-Csv -NoTypeInformation | Out-String
$Format = New-object -TypeName OfficeOpenXml.ExcelTextFormat -Property #{TextQualifier = '"'}
$null=$Worksheet.Cells.LoadFromText($ProcessesString,$Format)
$ExcelPackage.SaveAs($FileName,$Passwort)
# Open Excel File with Passwort
$ExcelPackage = New-Object OfficeOpenXml.ExcelPackage -ArgumentList $FileName,$Passwort
# Select First Worksheet
$Worksheet = $ExcelPackage.Workbook.Worksheets[1]
# Get Process data from Cells
$Processes = 0..$Worksheet.Dimension.Columns | % {
# Get all Cells in a row
$Row = $Worksheet.Cells[($Worksheet.Dimension.Start.Row+$_),$Worksheet.Dimension.Start.Column,($Worksheet.Dimension.Start.Row+$_),$Worksheet.Dimension.End.Column]
# Join values of all Cells in a row to a comma separated string
($Row | select -ExpandProperty Value) -join ','
} | ConvertFrom-Csv
Refer to my answer here for more options to protect Excel files.

Resources