I'm using OleDb to read from an excel workbook with many sheets.
I need to read the sheet names, but I need them in the order they are defined in the spreadsheet; so If I have a file that looks like this;
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
\__GERMANY__/\__UK__/\__IRELAND__/
Then I need to get the dictionary
1="GERMANY",
2="UK",
3="IRELAND"
I've tried using OleDbConnection.GetOleDbSchemaTable(), and that gives me the list of names, but it alphabetically sorts them. The alpha-sort means I don't know which sheet number a particular name corresponds to. So I get;
GERMANY, IRELAND, UK
which has changed the order of UK and IRELAND.
The reason I need it to be sorted is that I have to let the user choose a range of data by name or index; they can ask for 'all the data from GERMANY to IRELAND' or 'data from sheet 1 to sheet 3'.
Any ideas would be greatly appreciated.
if I could use the office interop classes, this would be straightforward. Unfortunately, I can't because the interop classes don't work reliably in non-interactive environments such as windows services and ASP.NET sites, so I needed to use OLEDB.
Can you not just loop through the sheets from 0 to Count of names -1? that way you should get them in the correct order.
Edit
I noticed through the comments that there are a lot of concerns about using the Interop classes to retrieve the sheet names. Therefore here is an example using OLEDB to retrieve them:
/// <summary>
/// This method retrieves the excel sheet names from
/// an excel workbook.
/// </summary>
/// <param name="excelFile">The excel file.</param>
/// <returns>String[]</returns>
private String[] GetExcelSheetNames(string excelFile)
{
OleDbConnection objConn = null;
System.Data.DataTable dt = null;
try
{
// Connection String. Change the excel file to the file you
// will search.
String connString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + excelFile + ";Extended Properties=Excel 8.0;";
// Create connection object by using the preceding connection string.
objConn = new OleDbConnection(connString);
// Open connection with the database.
objConn.Open();
// Get the data table containg the schema guid.
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if(dt == null)
{
return null;
}
String[] excelSheets = new String[dt.Rows.Count];
int i = 0;
// Add the sheet name to the string array.
foreach(DataRow row in dt.Rows)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
i++;
}
// Loop through all of the sheets if you want too...
for(int j=0; j < excelSheets.Length; j++)
{
// Query each excel sheet.
}
return excelSheets;
}
catch(Exception ex)
{
return null;
}
finally
{
// Clean up.
if(objConn != null)
{
objConn.Close();
objConn.Dispose();
}
if(dt != null)
{
dt.Dispose();
}
}
}
Extracted from Article on the CodeProject.
Since above code do not cover procedures for extracting list of sheet name for Excel 2007,following code will be applicable for both Excel(97-2003) and Excel 2007 too:
public List<string> ListSheetInExcel(string filePath)
{
OleDbConnectionStringBuilder sbConnection = new OleDbConnectionStringBuilder();
String strExtendedProperties = String.Empty;
sbConnection.DataSource = filePath;
if (Path.GetExtension(filePath).Equals(".xls"))//for 97-03 Excel file
{
sbConnection.Provider = "Microsoft.Jet.OLEDB.4.0";
strExtendedProperties = "Excel 8.0;HDR=Yes;IMEX=1";//HDR=ColumnHeader,IMEX=InterMixed
}
else if (Path.GetExtension(filePath).Equals(".xlsx")) //for 2007 Excel file
{
sbConnection.Provider = "Microsoft.ACE.OLEDB.12.0";
strExtendedProperties = "Excel 12.0;HDR=Yes;IMEX=1";
}
sbConnection.Add("Extended Properties",strExtendedProperties);
List<string> listSheet = new List<string>();
using (OleDbConnection conn = new OleDbConnection(sbConnection.ToString()))
{
conn.Open();
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dtSheet.Rows)
{
if (drSheet["TABLE_NAME"].ToString().Contains("$"))//checks whether row contains '_xlnm#_FilterDatabase' or sheet name(i.e. sheet name always ends with $ sign)
{
listSheet.Add(drSheet["TABLE_NAME"].ToString());
}
}
}
return listSheet;
}
Above function returns list of sheet in particular excel file for both excel type(97,2003,2007).
Can't find this in actual MSDN documentation, but a moderator in the forums said
I am afraid that OLEDB does not preserve the sheet order as they were in Excel
Excel Sheet Names in Sheet Order
Seems like this would be a common enough requirement that there would be a decent workaround.
This is short, fast, safe, and usable...
public static List<string> ToExcelsSheetList(string excelFilePath)
{
List<string> sheets = new List<string>();
using (OleDbConnection connection =
new OleDbConnection((excelFilePath.TrimEnd().ToLower().EndsWith("x"))
? "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + excelFilePath + "';" + "Extended Properties='Excel 12.0 Xml;HDR=YES;'"
: "provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + excelFilePath + "';Extended Properties=Excel 8.0;"))
{
connection.Open();
DataTable dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dt.Rows)
if (drSheet["TABLE_NAME"].ToString().Contains("$"))
{
string s = drSheet["TABLE_NAME"].ToString();
sheets.Add(s.StartsWith("'")?s.Substring(1, s.Length - 3): s.Substring(0, s.Length - 1));
}
connection.Close();
}
return sheets;
}
Another way:
a xls(x) file is just a collection of *.xml files stored in a *.zip container.
unzip the file "app.xml" in the folder docProps.
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<Properties xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">
<TotalTime>0</TotalTime>
<Application>Microsoft Excel</Application>
<DocSecurity>0</DocSecurity>
<ScaleCrop>false</ScaleCrop>
-<HeadingPairs>
-<vt:vector baseType="variant" size="2">
-<vt:variant>
<vt:lpstr>Arbeitsblätter</vt:lpstr>
</vt:variant>
-<vt:variant>
<vt:i4>4</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
-<TitlesOfParts>
-<vt:vector baseType="lpstr" size="4">
<vt:lpstr>Tabelle3</vt:lpstr>
<vt:lpstr>Tabelle4</vt:lpstr>
<vt:lpstr>Tabelle1</vt:lpstr>
<vt:lpstr>Tabelle2</vt:lpstr>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0300</AppVersion>
</Properties>
The file is a german file (Arbeitsblätter = worksheets).
The table names (Tabelle3 etc) are in the correct order. You just need to read these tags;)
regards
I have created the below function using the information provided in the answer from #kraeppy (https://stackoverflow.com/a/19930386/2617732). This requires the .net framework v4.5 to be used and requires a reference to System.IO.Compression. This only works for xlsx files and not for the older xls files.
using System.IO.Compression;
using System.Xml;
using System.Xml.Linq;
static IEnumerable<string> GetWorksheetNamesOrdered(string fileName)
{
//open the excel file
using (FileStream data = new FileStream(fileName, FileMode.Open))
{
//unzip
ZipArchive archive = new ZipArchive(data);
//select the correct file from the archive
ZipArchiveEntry appxmlFile = archive.Entries.SingleOrDefault(e => e.FullName == "docProps/app.xml");
//read the xml
XDocument xdoc = XDocument.Load(appxmlFile.Open());
//find the titles element
XElement titlesElement = xdoc.Descendants().Where(e => e.Name.LocalName == "TitlesOfParts").Single();
//extract the worksheet names
return titlesElement
.Elements().Where(e => e.Name.LocalName == "vector").Single()
.Elements().Where(e => e.Name.LocalName == "lpstr")
.Select(e => e.Value);
}
}
I like the idea of #deathApril to name the sheets as 1_Germany, 2_UK, 3_IRELAND. I also got your issue to do this rename for hundreds of sheets. If you don't have a problem to rename the sheet name then you can use this macro to do it for you. It will take less than seconds to rename all sheet names. unfortunately ODBC, OLEDB return the sheet name order by asc. There is no replacement for that. You have to either use COM or rename your name to be in the order.
Sub Macro1()
'
' Macro1 Macro
'
'
Dim i As Integer
For i = 1 To Sheets.Count
Dim prefix As String
prefix = i
If Len(prefix) < 4 Then
prefix = "000"
ElseIf Len(prefix) < 3 Then
prefix = "00"
ElseIf Len(prefix) < 2 Then
prefix = "0"
End If
Dim sheetName As String
sheetName = Sheets(i).Name
Dim names
names = Split(sheetName, "-")
If (UBound(names) > 0) And IsNumeric(names(0)) Then
'do nothing
Else
Sheets(i).Name = prefix & i & "-" & Sheets(i).Name
End If
Next
End Sub
UPDATE:
After reading #SidHoland comment regarding BIFF an idea flashed. The following steps can be done through code. Don't know if you really want to do that to get the sheet names in the same order. Let me know if you need help to do this through code.
1. Consider XLSX as a zip file. Rename *.xlsx into *.zip
2. Unzip
3. Go to unzipped folder root and open /docprops/app.xml
4. This xml contains the sheet name in the same order of what you see.
5. Parse the xml and get the sheet names
UPDATE:
Another solution - NPOI might be helpful here
http://npoi.codeplex.com/
FileStream file = new FileStream(#"yourexcelfilename", FileMode.Open, FileAccess.Read);
HSSFWorkbook hssfworkbook = new HSSFWorkbook(file);
for (int i = 0; i < hssfworkbook.NumberOfSheets; i++)
{
Console.WriteLine(hssfworkbook.GetSheetName(i));
}
file.Close();
This solution works for xls. I didn't try xlsx.
Thanks,
Esen
This worked for me. Stolen from here: How do you get the name of the first page of an excel workbook?
object opt = System.Reflection.Missing.Value;
Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
Excel.Workbook workbook = app.Workbooks.Open(WorkBookToOpen,
opt, opt, opt, opt, opt, opt, opt,
opt, opt, opt, opt, opt, opt, opt);
Excel.Worksheet worksheet = workbook.Worksheets[1] as Microsoft.Office.Interop.Excel.Worksheet;
string firstSheetName = worksheet.Name;
Try this. Here is the code to get the sheet names in order.
private Dictionary<int, string> GetExcelSheetNames(string fileName)
{
Excel.Application _excel = null;
Excel.Workbook _workBook = null;
Dictionary<int, string> excelSheets = new Dictionary<int, string>();
try
{
object missing = Type.Missing;
object readOnly = true;
Excel.XlFileFormat.xlWorkbookNormal
_excel = new Excel.ApplicationClass();
_excel.Visible = false;
_workBook = _excel.Workbooks.Open(fileName, 0, readOnly, 5, missing,
missing, true, Excel.XlPlatform.xlWindows, "\\t", false, false, 0, true, true, missing);
if (_workBook != null)
{
int index = 0;
foreach (Excel.Worksheet sheet in _workBook.Sheets)
{
// Can get sheet names in order they are in workbook
excelSheets.Add(++index, sheet.Name);
}
}
}
catch (Exception e)
{
return null;
}
finally
{
if (_excel != null)
{
if (_workBook != null)
_workBook.Close(false, Type.Missing, Type.Missing);
_excel.Application.Quit();
}
_excel = null;
_workBook = null;
}
return excelSheets;
}
As per MSDN, In a case of spreadsheets inside of Excel it might not work because Excel files are not real databases. So you will be not able to get the sheets name in order of their visualization in workbook.
Code to get sheets name as per their visual appearance using interop:
Add reference to Microsoft Excel 12.0 Object Library.
Following code will give the sheets name in the actual order stored in workbook, not the sorted name.
Sample Code:
using Microsoft.Office.Interop.Excel;
string filename = "C:\\romil.xlsx";
object missing = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook wb =excel.Workbooks.Open(filename, missing, missing, missing, missing,missing, missing, missing, missing, missing, missing, missing, missing, missing, missing);
ArrayList sheetname = new ArrayList();
foreach (Microsoft.Office.Interop.Excel.Worksheet sheet in wb.Sheets)
{
sheetname.Add(sheet.Name);
}
I don't see any documentation that says the order in app.xml is guaranteed to be the order of the sheets. It PROBABLY is, but not according to the OOXML specification.
The workbook.xml file, on the other hand, includes the sheetId attribute, which does determine the sequence - from 1 to the number of sheets. This is according to the OOXML specification. workbook.xml is described as the place where the sequence of the sheets is kept.
So reading workbook.xml after it is extracted form the XLSX would be my recommendation. NOT app.xml. Instead of docProps/app.xml, use xl/workbook.xml and look at the element, as shown here -
`
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<fileVersion appName="xl" lastEdited="5" lowestEdited="5" rupBuild="9303" />
<workbookPr defaultThemeVersion="124226" />
- <bookViews>
<workbookView xWindow="120" yWindow="135" windowWidth="19035" windowHeight="8445" />
</bookViews>
- <sheets>
<sheet name="By song" sheetId="1" r:id="rId1" />
<sheet name="By actors" sheetId="2" r:id="rId2" />
<sheet name="By pit" sheetId="3" r:id="rId3" />
</sheets>
- <definedNames>
<definedName name="_xlnm._FilterDatabase" localSheetId="0" hidden="1">'By song'!$A$1:$O$59</definedName>
</definedNames>
<calcPr calcId="145621" />
</workbook>
`
I'm using an OleDbConnection to create an Excel file:
String bewegungenDateiname = System.IO.Path.ChangeExtension(System.IO.Path.GetTempFileName(), ".xls");
string strConnectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ System.IO.Path.GetDirectoryName(bewegungenDateiname) + #"\" + System.IO.Path.GetFileName(bewegungenDateiname)
+ #";Extended Properties='Excel 8.0;HDR=YES'";
using (System.Data.OleDb.OleDbConnection objConn = new System.Data.OleDb.OleDbConnection(strConnectionString))
using (System.Data.OleDb.OleDbCommand cmd = new System.Data.OleDb.OleDbCommand("", objConn))
{
objConn.Open();
cmd.CommandText = "CREATE TABLE [Test] ([MyDecimal] DECIMAL NULL)";
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
Decimal value = 12.34m;
cmd.Parameters.AddWithValue("#P01", value);
cmd.CommandText = "INSERT INTO [Test$] ([MyDecimal]) VALUES (#P01)";
cmd.ExecuteNonQuery();
}
System.Diagnostics.Process.Start(bewegungenDateiname);
Now when Excel 2013 opens the Excel file it will Show:
MyDecimal
1234
So in my case Excel is losing the dot. Now I'm running a german Version of Windows/Office and if I use the following line to add the Parameter it will work:
cmd.Parameters.AddWithValue("#P01", value.ToString());
German localization of numbers uses a colon instead of the dot to separate the fractions from the number value (meaning 12,34 instead of 12.34). So it seems the OleDbConnection uses the wrong culture variant to write the Excel file?
I fear my Version might break with a different Version of Excel or a different locale - is there a way to fix this and get decimal values to Excel without such risks?
I would use some other way to create Excel files, if it is without this flaw.
With Excel 2013 try using the following:
strConnectionString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0;HDR=No;IMEX=1""", _filePath)
I have no idea if it will solve the problem.
This is my first attempt to read an Excel 2007 file via ADO.net, and I must be missing something b/c when I try to run the query, I get an exception. When I started looking, it's b/c the table (worksheet) isn't there. Can someone please tell me what I'm doing wrong?
Here is my code:
string cs = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=My File.xlsx;Extended Properties=""Excel 12.0;IMEX=1;""";
using (OleDbConnection con = new OleDbConnection(cs))
{
con.Open();
string query = "SELECT * FROM [Sheet1$]";
OleDbCommand cmd = new OleDbCommand(query, con);
OleDbDataAdapter adapter = new OleDbDataAdapter(cmd);
DataTable dt = new DataTable();
DataTable worksheets = con.GetSchema("Tables");
adapter.Fill(dt);
.
.
.
}
Take a look at the accepted answer here
The First Column of the excel file to put in string variable C#?
It works for Excel 2003 but I think it could easily be adapted to work with 2007.
I'm working on a SharePoint workflow, and the first step requires me to open an Excel workbook and read two things: a range of categories (from a range named, conveniently enough, Categories) and a category index (in the named range CategoryIndex). Categories is a list of roughly 100 cells, and CategoryIndex is a single cell.
I'm using ADO.NET to query the workbook
string connectionString =
"Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source=" + temporaryFileName + ";" +
"Extended Properties=\"Excel 12.0 Xml;HDR=YES\"";
OleDbConnection connection = new OleDbConnection(connectionString);
connection.Open();
OleDbCommand categoryIndexCommand = new OleDbCommand();
categoryIndexCommand.Connection = connection;
categoryIndexCommand.CommandText = "Select * From CategoryIndex";
OleDbDataReader indexReader = categoryIndexCommand.ExecuteReader();
if (!indexReader.Read())
throw new Exception("No category selected.");
object indexValue = indexReader[0];
int categoryIndex;
if (!int.TryParse(indexValue.ToString(), out categoryIndex))
throw new Exception("Invalid category manager selected");
OleDbCommand selectCommand = new OleDbCommand();
selectCommand.Connection = connection;
selectCommand.CommandText = "SELECT * FROM Categories";
OleDbDataReader reader = selectCommand.ExecuteReader();
if (!reader.HasRows || categoryIndex >= reader.RecordsAffected)
throw new Exception("Invalid category/category manager selected.");
connection.Close();
Don't judge the code itself too harshly; it's been through a lot. Anyway, the first command never executes correctly. It doesn't throw an exception. It just returns an empty data set. (HasRows is true, and Read() returns false, but there is no data there) The second command works perfectly. These are both named ranges.
They are populated differently, however. There's a web service call that fills up Categories. Those values are displayed in a dropdown box. The selected index goes into CategoryIndex. After hours of banging my head, I decided to write a couple of lines of code so that the dropdown's value goes into a different cell, then I copy the value using a couple of lines of C# into CategoryIndex, so that the data is set identically. That turned out to be a blind alley, too.
Am I missing something? Why would one query work perfectly and the other fail to return any data?
I have found the issue. Excel was apparently unable to parse the value in the cell, so it was returning nothing. What I had to do was adjust the connection string to the following:
string connectionString =
"Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source=" + temporaryFileName + ";" +
"Extended Properties=\"Excel 12.0 Xml;HDR=NO;IMEX=1\"";
It would have been helpful if it would have thrown an exception or given any indication of why it was failing, but that's beside the point now. The option IMEX=1 tells Excel to treat all values as strings only. I'm quite capable of parsing my own integers, thankyouverymuch, Excel, so I didn't need its assistance.
I have one problem. I need to get the excel sheet name in a work book, which looks on the very left sheets tab -the first one from my point of view.
I am using this code:
public static string GetFirstExcelSheetName(OleDbConnection connToExcel)
{
DataTable dtSheetName =
connToExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
List<String> lstExcelSheet = new List<string>(dtSheetName.Rows.Count);
foreach (DataRow row in dtSheetName.Rows)
lstExcelSheet.Add(row["TABLE_NAME"].ToString());
return lstExcelSheet[0];
}
The problem here is it is returning the rows not in the visual tab order but in a very different order - most probably the row created date.
How can it be possible to get the sheetnames table ordered according to their tab order so that I can easily get the 1st excel sheet name?
Thanks,
kalem keki
It should be the zero-th item in the workbooks(?) collection.
I think you have the right index, wrong collection.
Sorry, didn't notice you're using the rows collection of a datatable.
That's a different problem.
How do you create the datatable?
You might have to change the sort property of the dataview.
Dim dtSheetnames As DataTable = oleDBExcelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, New Object() {Nothing, Nothing, Nothing, "TABLE"})
Dim FirstSheetName As String = dtSheetnames.Rows(0)!TABLE_NAME.ToString
The row 0 is not the first sheet in the excel file, rows are sorted by alphabetical order in this collection :/
I recommend using the NPOI library (http://npoi.codeplex.com/) rather than OleDB to retrieve data (including metadata) from Excel.
IIRC, OleDB will also fail for sheet names that include spaces or dollar signs.
OleDbConnection oconn = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + Session["path"].ToString() + "; Extended Properties=Excel 12.0;Persist Security Info=False;");
oconn.Open();
myCommand.Connection = oconn;
DataTable dbSchema = oconn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
throw new Exception("Error: Could not determine the name of the first worksheet.");
}
string firstSheetName = dbSchema.Rows[0]["TABLE_NAME"].ToString();