The following PowerShell snippet will list all worksheets and named ranges in an excel spreadsheet via OleDbConnection.GetOleDbSchemaTable():
$file = "C:\Users\zippy\Documents\Foo.xlsx";
$cnStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$($file)`";Extended Properties=`"Excel 12.0 Xml;HDR=YES`";";
$cn = New-Object System.Data.OleDb.OleDbConnection $cnStr;
$cn.Open();
# to list the sheets
$worksheets = $cn.GetOleDbSchemaTable([System.Data.OleDb.OleDbSchemaGuid]::Tables,$null);
$cn.Close();
$cn.Dispose();
$worksheets | Format-List;
This will however, not list tables (called lists in Excel 2003), or a named range that refers to a list.
If I pass an OleDbSchemaGuid of type Procedures or Views I get a MethodInvocationException with a message of Operation is not supported for this type of object.
Is this possible to list the tables by tweaking with the connection strings or restrictions parameter?
Try this simple source:
using (var connection = (OleDbConnection)GetConnection())
{
connection.Open();
var dt = connection.GetSchema("TABLES");
var list=dt.Select().Where(w => w["TABLE_NAME"].ToString()).ToList();
//TODO:
}
Related
Background
I am writing a PowerShell script to write some data to an Excel file (.xlsx) with Microsoft.ACE.OLEDB like this:
$fileName = "C:\tmp\createtest.xlsx"
$sheetName = "record"
$provider = "Provider=Microsoft.ACE.OLEDB.12.0"
$dataSource = "Data Source = $fileName"
$extend = "Extended Properties=Excel 12.0"
$ddlSQL = "CREATE TABLE [$sheetName] (ID CHAR(4), NAME VARCHAR(20))"
$conn = New-Object System.Data.OleDb.OleDbConnection("$provider;$dataSource;$extend")
$sqlCommand = New-Object System.Data.OleDb.OleDbCommand
$sqlCommand.Connection = $conn
$conn.open()
$sqlCommand.CommandText = $ddlSQL
$sqlCommand.ExecuteNonQuery()
...
$conn.close()
Problem
When you create C:\tmp\createtest.xlsx with an empty sheet named record manually, the CREATE TABLE statement creates the record1 sheet automatically.
I want to stop this behavior and let the CREATE TABLE statement throw an exception like ordinary RDBMS.
Question
Is there any way to stop OLEDB to create the (sheet name)1 sheet automatically when the Excel file has a sheet that has the same name?
I found a solution. Execute CREATE TABLE with a suffix $ for table name:
$ddlSQL = "CREATE TABLE [${sheetName}$] (ID CHAR(4), NAME VARCHAR(20))"
If the book and the sheet record already exists, this statement will be finished without error. If the book or the sheet record doesn't exist, This statement throws OleDbException. Either way, no new sheet will be created, so you can test the existence of the sheet safely.
If you want to use the sheet record regardless of existing sheet and avoid to create record1 sheet automatically, you can do it like this:
$checkExistenceSQL = "CREATE TABLE [${sheetName}$] (ID CHAR(4), NAME VARCHAR(20))"
$ddlSQL = "CREATE TABLE [$sheetName] (ID CHAR(4), NAME VARCHAR(20))"
$conn.open()
try {
try {
# Check existing sheet and open if it exists
$sqlCommand.CommandText = $checkExistenceSQL
$sqlCommand.ExecuteNonQuery() > $null
} catch {
try {
# Create new sheet if it doesn't exist
$sqlCommand.CommandText = $ddlSQL
$sqlCommand.ExecuteNonQuery() > $null
} catch {
throw $PSItem
}
}
$insertSQL = "INSERT INTO [${sheetName}$] VALUES (...)"
$sqlCommand.CommandText = $insertSQL
$sqlCommand.ExecuteNonQuery() > $null
} finally {
$conn.close()
}
Note that you have to execute CREATE TABLE first even if the sheet already exists because it affects the effectiveness of the datatype constraint. Also, you have to suffix the table name with $ in the INSERT statement, because it fails if the sheet record already exists. See https://satob.hatenablog.com/entry/2021/11/24/003818 and https://satob.hatenablog.com/entry/2021/11/25/012835 for details.
I'm trying to figure out why the behavior I'm seeing and the "documented" behavior are different. I've read both of these articles:Read and Write Excel Documents Using OLEDB and Working with MS Excel(xls / xlsx) Using MDAC and Oledb and this is text from the second link.
If you read in the second link it says:
To Retrieve Schema Information of Excel Workbook :
You can get the worksheets that are present in the excel workbook using GetOleDbSchemaTable. Use the following snippet.
DataTable dtSchema = null;
dtSchema = conObj.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
Here dtSchema will hold the list of all workbooks. Say we have two workbooks : wb1, wb2. The above code will return a list of wb1, wb1$,wb2,wb2$. We need to filter out $ elements.
However when I run this code I only get "wb1$ and wb2$". I can easily remove the $ in code but I'm trying to make sure I'm not going to have code that breaks when I put it on a different computers/OS/environment and it behaves as is documented. Can somebody tell my what or if something changed since these were written or if I'm missing some key piece. Something to note this is being developed in VS2015, Windows 7 Pro, and Office 2010 installed.
//Connection String
//string connstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties='Excel 8.0;HDR=NO;IMEX=1';"; // Extra blank space cannot appear in Office 2007 and the last version. And we need to pay attention on semicolon.
//string connstring = Provider = Microsoft.JET.OLEDB.4.0; Data Source = " + path + "; Extended Properties = 'Excel 8.0;HDR=NO;IMEX=1'; "; //This connection string is appropriate for Office 2007 and the older version. We can select the most suitable connection string according to Office version or our program.
using (OleDbConnection conn = new OleDbConnection(_connectionString))
{
conn.Open();
//DataTable sheetNames = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" }); //Get All Sheets Name
DataTable sheetNames = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null); //Get All Sheets Name
// Loop through all Sheets to get data
foreach (DataRow dr in sheetNames.Rows)
{
string sheetName = dr["TABLE_NAME"].ToString();
//if (!sheetName.EndsWith("$"))
// continue;
Debug.Print(sheetName);
}
return sheetNames;
Thanks
dbl
Is there a way to convert .xls to .csv without Excel being installed using Powershell?
I don't have access to Excel on a particular machine so I get an error when I try:
New-Object -ComObject excel.application
New-Object : Retrieving the COM class factory for component with CLSID
{00000000-0000-0000-0000-000000000000} failed due to the following
error: 80040154 Class not registered (Exception from HRESULT:
0x80040154 (REGDB_E_CLASSNOTREG)).
Forward
Depending on what you already have installed on your system you might need the Microsoft Access Database Engine 2010 Redistributable for this solution to work. That will give you access to the provider: "Microsoft.ACE.OLEDB.12.0"
Disclaimer: Not super impressed with the result and someone with more background could make this answer better but here it goes.
Code
$strFileName = "C:\temp\Book1.xls"
$strSheetName = 'Sheet1$'
$strProvider = "Provider=Microsoft.ACE.OLEDB.12.0"
$strDataSource = "Data Source = $strFileName"
$strExtend = "Extended Properties='Excel 8.0;HDR=Yes;IMEX=1';"
$strQuery = "Select * from [$strSheetName]"
$objConn = New-Object System.Data.OleDb.OleDbConnection("$strProvider;$strDataSource;$strExtend")
$sqlCommand = New-Object System.Data.OleDb.OleDbCommand($strQuery)
$sqlCommand.Connection = $objConn
$objConn.open()
$da = New-Object system.Data.OleDb.OleDbDataAdapter($sqlCommand)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt)
$dataReader.close()
$objConn.close()
$dt
Create an ODBC connection to the excel file $strFileName. You need to know your sheet name and populate $strSheetName which helps build $strQuery. When then use several objects to create a connection and extract the data from the sheet as a System.Data.DataTable. In my test file, with one populated sheet, I had two columns of data. After running the code the output of $dt is:
letter number
------ ------
a 2
d 34
b 0
e 4
You could then take that table and then ExportTo-CSV
$dt | Export-Csv c:\temp\data.csv -NoTypeInformation
This was built based on information gathered from:
Scripting Guy
PowerShell Code Repository
I wonder if there is any way to speed up reading an Excel file with powershell. Many would say I should stop using the do until, but the problem is I need it badly, because in my Excel sheet there can be 2 rows or 5000 rows. I understand that 5000 rows needs some time. But 2 rows shouldn't need 90sec+.
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
$Path = EXCELFILEPATH
$Workbook = $Excel.Workbooks.open($Path)
$Sheet1 = $Workbook.Worksheets.Item(test)
$URows = #()
Do {$URows += $Sheet1.Cells.Item($Row,1).Text; $row = $row + [int] 1} until (!$Sheet1.Cells.Item($Row,1).Text)
$URows | foreach {
$MyParms = #{};
$SetParms = #{};
And i got this 30 times in the script too:
If ($Sheet1.Cells.Item($Row,2).Text){$var1 = $Sheet1.Cells.Item($Row,2).Text
$MyParms.Add("PAR1",$var1)
$SetParms.Add("PAR1",$var1)}
}
I have the idea of running the $MyParms stuff contemporarily, but I have no idea how. Any suggestions?
Or
Increase the speed of reading, but I have no clue how to achieve that without destroying the "read until nothing is there".
Or
The speed is normal and I shouldn't complain.
Don't use Excel.Application in the first place if you need speed. You can use an Excel spreadsheet as an ODBC data source - the file is analogous to a database, and each worksheet a table. The speed difference is immense. Here's an intro on using Excel spreadsheets without Excel
Appending to an array with the += operator is terribly slow, because it will copy all elements from the existing array to a new array. Use something like this instead:
$URows = for ($row = 1; !$Sheet1.Cells.Item($row, 1).Text; $row++) {
if ($Sheet1.Cells.Item($Row,2).Text) {
$MyParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
$SetParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
}
$Sheet1.Cells.Item($Row,1).Text
}
Your Do loop is basically a counting loop. The canonical form for such loops is
for (init counter; condition; increment counter) {
...
}
so I changed the loop accordingly. Of course you'd achieve the same result like this:
$row = 1
$URows = Do {
...
$row += 1
}
but that would just mean more code without any benefits. This modification doesn't have any performance impact, though.
Relevant in terms of performance are the other two changes:
I moved the code filling the hashtables inside the first loop, so the code won't loop twice over the data. Using index and assignment operators instead of the Add method for assigning values to the hashtable prevents the code from raising an error when a key already exists in the hashtable.
Instead of appending to an array (which has the abovementioned performance impact) the code now simply echoes the cell text in the loop, which PowerShell automatically turns into a list. The list is then assigned to the variable $URows.
I'm trying to use powershell and Sharepoint 2013 CSOM to copy attachments of one item to a new item in another list. I've been able to successfully generate an attachments folder for the new item, so in theory all I need to do is move the files from the old attachments folder to the new one. CopyTo and MoveTo only seem to work for moving files within a list, so I thought to use OpenBinaryDirect and SaveBinaryDirect with the site context. However, in powershell, calling either of these methods results in the following error: Method invocation failed because [System.RuntimeType] doesn't contain a method named 'OpenBinaryDirect'.
$attachments = $item.AttachmentFiles
if($attachments.Count -gt 0)
{
#Creates a temporary attachment for the new item to genereate a folder, will be deleted later.
$attCI = New-Object Microsoft.SharePoint.Client.AttachmentCreationInformation
$attCI.FileName = "TempAttach"
$enc = New-Object System.Text.ASCIIEncoding
$buffer = [byte[]] $enc.GetBytes("Temp attachment contents")
$memStream = New-Object System.IO.MemoryStream (,$buffer)
$attCI.contentStream = $memStream
$newItem.AttachmentFiles.Add($attCI)
$ctx.load($newItem)
$sourceIN = $sourceList.Title
$archIN = $archList.Title
$sourcePath = "/" + "Lists/$sourceIN/Attachments/" + $item.Id
$archPath = "/" + "Lists/$archIN/Attachments/" + $newItem.Id
$sFolder = $web.GetFolderByServerRelativeUrl($sourcePath)
$aFolder = $web.GetFolderByServerRelativeURL($archPath)
$ctx.load($sFolder)
$ctx.load($aFolder)
$ctx.ExecuteQuery()
$sFiles = $sFolder.Files
$aFiles = $aFolder.Files
$ctx.load($sFiles)
$ctx.load($aFiles)
$ctx.ExecuteQuery()
foreach($file in $sFiles)
{
$fileInfo = [Microsoft.SharePoint.Client.File].OpenBinaryDirect($ctx, $file.ServerRelativeUrl)
[Microsoft.Sharepoint.Client.File].SaveBinaryDirect($ctx, $archPath, $fileInfo.Stream, $true)
}
}
$ctx.ExecuteQuery()
Any help on either getting the BinaryDirect methods to work or just a generalized strategy for copying attachments across lists using powershell + CSOM would be greatly appreciated.
You have the wrong syntax for invoking a static method. You want [Microsoft.SharePoint.Client.File]::OpenBinaryDirect( ... )
Note the double colons syntax :: between the type name and the method name. Same for SaveBinaryDirect invocation.