I'm using OleDb to read from an excel workbook with many sheets.
I need to read the sheet names, but I need them in the order they are defined in the spreadsheet; so If I have a file that looks like this;
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
\__GERMANY__/\__UK__/\__IRELAND__/
Then I need to get the dictionary
1="GERMANY",
2="UK",
3="IRELAND"
I've tried using OleDbConnection.GetOleDbSchemaTable(), and that gives me the list of names, but it alphabetically sorts them. The alpha-sort means I don't know which sheet number a particular name corresponds to. So I get;
GERMANY, IRELAND, UK
which has changed the order of UK and IRELAND.
The reason I need it to be sorted is that I have to let the user choose a range of data by name or index; they can ask for 'all the data from GERMANY to IRELAND' or 'data from sheet 1 to sheet 3'.
Any ideas would be greatly appreciated.
if I could use the office interop classes, this would be straightforward. Unfortunately, I can't because the interop classes don't work reliably in non-interactive environments such as windows services and ASP.NET sites, so I needed to use OLEDB.
Can you not just loop through the sheets from 0 to Count of names -1? that way you should get them in the correct order.
Edit
I noticed through the comments that there are a lot of concerns about using the Interop classes to retrieve the sheet names. Therefore here is an example using OLEDB to retrieve them:
/// <summary>
/// This method retrieves the excel sheet names from
/// an excel workbook.
/// </summary>
/// <param name="excelFile">The excel file.</param>
/// <returns>String[]</returns>
private String[] GetExcelSheetNames(string excelFile)
{
OleDbConnection objConn = null;
System.Data.DataTable dt = null;
try
{
// Connection String. Change the excel file to the file you
// will search.
String connString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + excelFile + ";Extended Properties=Excel 8.0;";
// Create connection object by using the preceding connection string.
objConn = new OleDbConnection(connString);
// Open connection with the database.
objConn.Open();
// Get the data table containg the schema guid.
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if(dt == null)
{
return null;
}
String[] excelSheets = new String[dt.Rows.Count];
int i = 0;
// Add the sheet name to the string array.
foreach(DataRow row in dt.Rows)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
i++;
}
// Loop through all of the sheets if you want too...
for(int j=0; j < excelSheets.Length; j++)
{
// Query each excel sheet.
}
return excelSheets;
}
catch(Exception ex)
{
return null;
}
finally
{
// Clean up.
if(objConn != null)
{
objConn.Close();
objConn.Dispose();
}
if(dt != null)
{
dt.Dispose();
}
}
}
Extracted from Article on the CodeProject.
Since above code do not cover procedures for extracting list of sheet name for Excel 2007,following code will be applicable for both Excel(97-2003) and Excel 2007 too:
public List<string> ListSheetInExcel(string filePath)
{
OleDbConnectionStringBuilder sbConnection = new OleDbConnectionStringBuilder();
String strExtendedProperties = String.Empty;
sbConnection.DataSource = filePath;
if (Path.GetExtension(filePath).Equals(".xls"))//for 97-03 Excel file
{
sbConnection.Provider = "Microsoft.Jet.OLEDB.4.0";
strExtendedProperties = "Excel 8.0;HDR=Yes;IMEX=1";//HDR=ColumnHeader,IMEX=InterMixed
}
else if (Path.GetExtension(filePath).Equals(".xlsx")) //for 2007 Excel file
{
sbConnection.Provider = "Microsoft.ACE.OLEDB.12.0";
strExtendedProperties = "Excel 12.0;HDR=Yes;IMEX=1";
}
sbConnection.Add("Extended Properties",strExtendedProperties);
List<string> listSheet = new List<string>();
using (OleDbConnection conn = new OleDbConnection(sbConnection.ToString()))
{
conn.Open();
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dtSheet.Rows)
{
if (drSheet["TABLE_NAME"].ToString().Contains("$"))//checks whether row contains '_xlnm#_FilterDatabase' or sheet name(i.e. sheet name always ends with $ sign)
{
listSheet.Add(drSheet["TABLE_NAME"].ToString());
}
}
}
return listSheet;
}
Above function returns list of sheet in particular excel file for both excel type(97,2003,2007).
Can't find this in actual MSDN documentation, but a moderator in the forums said
I am afraid that OLEDB does not preserve the sheet order as they were in Excel
Excel Sheet Names in Sheet Order
Seems like this would be a common enough requirement that there would be a decent workaround.
This is short, fast, safe, and usable...
public static List<string> ToExcelsSheetList(string excelFilePath)
{
List<string> sheets = new List<string>();
using (OleDbConnection connection =
new OleDbConnection((excelFilePath.TrimEnd().ToLower().EndsWith("x"))
? "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + excelFilePath + "';" + "Extended Properties='Excel 12.0 Xml;HDR=YES;'"
: "provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + excelFilePath + "';Extended Properties=Excel 8.0;"))
{
connection.Open();
DataTable dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dt.Rows)
if (drSheet["TABLE_NAME"].ToString().Contains("$"))
{
string s = drSheet["TABLE_NAME"].ToString();
sheets.Add(s.StartsWith("'")?s.Substring(1, s.Length - 3): s.Substring(0, s.Length - 1));
}
connection.Close();
}
return sheets;
}
Another way:
a xls(x) file is just a collection of *.xml files stored in a *.zip container.
unzip the file "app.xml" in the folder docProps.
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<Properties xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">
<TotalTime>0</TotalTime>
<Application>Microsoft Excel</Application>
<DocSecurity>0</DocSecurity>
<ScaleCrop>false</ScaleCrop>
-<HeadingPairs>
-<vt:vector baseType="variant" size="2">
-<vt:variant>
<vt:lpstr>Arbeitsblätter</vt:lpstr>
</vt:variant>
-<vt:variant>
<vt:i4>4</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
-<TitlesOfParts>
-<vt:vector baseType="lpstr" size="4">
<vt:lpstr>Tabelle3</vt:lpstr>
<vt:lpstr>Tabelle4</vt:lpstr>
<vt:lpstr>Tabelle1</vt:lpstr>
<vt:lpstr>Tabelle2</vt:lpstr>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0300</AppVersion>
</Properties>
The file is a german file (Arbeitsblätter = worksheets).
The table names (Tabelle3 etc) are in the correct order. You just need to read these tags;)
regards
I have created the below function using the information provided in the answer from #kraeppy (https://stackoverflow.com/a/19930386/2617732). This requires the .net framework v4.5 to be used and requires a reference to System.IO.Compression. This only works for xlsx files and not for the older xls files.
using System.IO.Compression;
using System.Xml;
using System.Xml.Linq;
static IEnumerable<string> GetWorksheetNamesOrdered(string fileName)
{
//open the excel file
using (FileStream data = new FileStream(fileName, FileMode.Open))
{
//unzip
ZipArchive archive = new ZipArchive(data);
//select the correct file from the archive
ZipArchiveEntry appxmlFile = archive.Entries.SingleOrDefault(e => e.FullName == "docProps/app.xml");
//read the xml
XDocument xdoc = XDocument.Load(appxmlFile.Open());
//find the titles element
XElement titlesElement = xdoc.Descendants().Where(e => e.Name.LocalName == "TitlesOfParts").Single();
//extract the worksheet names
return titlesElement
.Elements().Where(e => e.Name.LocalName == "vector").Single()
.Elements().Where(e => e.Name.LocalName == "lpstr")
.Select(e => e.Value);
}
}
I like the idea of #deathApril to name the sheets as 1_Germany, 2_UK, 3_IRELAND. I also got your issue to do this rename for hundreds of sheets. If you don't have a problem to rename the sheet name then you can use this macro to do it for you. It will take less than seconds to rename all sheet names. unfortunately ODBC, OLEDB return the sheet name order by asc. There is no replacement for that. You have to either use COM or rename your name to be in the order.
Sub Macro1()
'
' Macro1 Macro
'
'
Dim i As Integer
For i = 1 To Sheets.Count
Dim prefix As String
prefix = i
If Len(prefix) < 4 Then
prefix = "000"
ElseIf Len(prefix) < 3 Then
prefix = "00"
ElseIf Len(prefix) < 2 Then
prefix = "0"
End If
Dim sheetName As String
sheetName = Sheets(i).Name
Dim names
names = Split(sheetName, "-")
If (UBound(names) > 0) And IsNumeric(names(0)) Then
'do nothing
Else
Sheets(i).Name = prefix & i & "-" & Sheets(i).Name
End If
Next
End Sub
UPDATE:
After reading #SidHoland comment regarding BIFF an idea flashed. The following steps can be done through code. Don't know if you really want to do that to get the sheet names in the same order. Let me know if you need help to do this through code.
1. Consider XLSX as a zip file. Rename *.xlsx into *.zip
2. Unzip
3. Go to unzipped folder root and open /docprops/app.xml
4. This xml contains the sheet name in the same order of what you see.
5. Parse the xml and get the sheet names
UPDATE:
Another solution - NPOI might be helpful here
http://npoi.codeplex.com/
FileStream file = new FileStream(#"yourexcelfilename", FileMode.Open, FileAccess.Read);
HSSFWorkbook hssfworkbook = new HSSFWorkbook(file);
for (int i = 0; i < hssfworkbook.NumberOfSheets; i++)
{
Console.WriteLine(hssfworkbook.GetSheetName(i));
}
file.Close();
This solution works for xls. I didn't try xlsx.
Thanks,
Esen
This worked for me. Stolen from here: How do you get the name of the first page of an excel workbook?
object opt = System.Reflection.Missing.Value;
Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
Excel.Workbook workbook = app.Workbooks.Open(WorkBookToOpen,
opt, opt, opt, opt, opt, opt, opt,
opt, opt, opt, opt, opt, opt, opt);
Excel.Worksheet worksheet = workbook.Worksheets[1] as Microsoft.Office.Interop.Excel.Worksheet;
string firstSheetName = worksheet.Name;
Try this. Here is the code to get the sheet names in order.
private Dictionary<int, string> GetExcelSheetNames(string fileName)
{
Excel.Application _excel = null;
Excel.Workbook _workBook = null;
Dictionary<int, string> excelSheets = new Dictionary<int, string>();
try
{
object missing = Type.Missing;
object readOnly = true;
Excel.XlFileFormat.xlWorkbookNormal
_excel = new Excel.ApplicationClass();
_excel.Visible = false;
_workBook = _excel.Workbooks.Open(fileName, 0, readOnly, 5, missing,
missing, true, Excel.XlPlatform.xlWindows, "\\t", false, false, 0, true, true, missing);
if (_workBook != null)
{
int index = 0;
foreach (Excel.Worksheet sheet in _workBook.Sheets)
{
// Can get sheet names in order they are in workbook
excelSheets.Add(++index, sheet.Name);
}
}
}
catch (Exception e)
{
return null;
}
finally
{
if (_excel != null)
{
if (_workBook != null)
_workBook.Close(false, Type.Missing, Type.Missing);
_excel.Application.Quit();
}
_excel = null;
_workBook = null;
}
return excelSheets;
}
As per MSDN, In a case of spreadsheets inside of Excel it might not work because Excel files are not real databases. So you will be not able to get the sheets name in order of their visualization in workbook.
Code to get sheets name as per their visual appearance using interop:
Add reference to Microsoft Excel 12.0 Object Library.
Following code will give the sheets name in the actual order stored in workbook, not the sorted name.
Sample Code:
using Microsoft.Office.Interop.Excel;
string filename = "C:\\romil.xlsx";
object missing = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook wb =excel.Workbooks.Open(filename, missing, missing, missing, missing,missing, missing, missing, missing, missing, missing, missing, missing, missing, missing);
ArrayList sheetname = new ArrayList();
foreach (Microsoft.Office.Interop.Excel.Worksheet sheet in wb.Sheets)
{
sheetname.Add(sheet.Name);
}
I don't see any documentation that says the order in app.xml is guaranteed to be the order of the sheets. It PROBABLY is, but not according to the OOXML specification.
The workbook.xml file, on the other hand, includes the sheetId attribute, which does determine the sequence - from 1 to the number of sheets. This is according to the OOXML specification. workbook.xml is described as the place where the sequence of the sheets is kept.
So reading workbook.xml after it is extracted form the XLSX would be my recommendation. NOT app.xml. Instead of docProps/app.xml, use xl/workbook.xml and look at the element, as shown here -
`
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<fileVersion appName="xl" lastEdited="5" lowestEdited="5" rupBuild="9303" />
<workbookPr defaultThemeVersion="124226" />
- <bookViews>
<workbookView xWindow="120" yWindow="135" windowWidth="19035" windowHeight="8445" />
</bookViews>
- <sheets>
<sheet name="By song" sheetId="1" r:id="rId1" />
<sheet name="By actors" sheetId="2" r:id="rId2" />
<sheet name="By pit" sheetId="3" r:id="rId3" />
</sheets>
- <definedNames>
<definedName name="_xlnm._FilterDatabase" localSheetId="0" hidden="1">'By song'!$A$1:$O$59</definedName>
</definedNames>
<calcPr calcId="145621" />
</workbook>
`
This is my code that extracts value from an xlsx file and print it on Eclipse console
public class testcode {
public void readexcel(String filepath, String filename, String sheetname) throws IOException
{
//Create an object of file class to open xlsx file
File file = new File(filepath+"\\"+filename);
//Create an object of FileInputStream to read an xlsx file
FileInputStream inputstream = new FileInputStream(file);
Workbook workbook = null;
//Find file extension name by using substring
String FileExtensionName = filename.substring(filename.indexOf("."));
//Check condition whether file is xlsx or xls
if(FileExtensionName.equalsIgnoreCase("xlsx"))
workbook = new XSSFWorkbook(inputstream);
else
workbook = new HSSFWorkbook(inputstream);
//Read sheet inside the workbook by its name
Sheet sheet = workbook.getSheet(sheetname);
//Find number of rows in sheet
int rowCount = sheet.getLastRowNum() - sheet.getFirstRowNum();
//Create a loop over all the rows of excel file to read it
for(int i = 0; i<rowCount+1;i++)
{
Row row = sheet.getRow(i);
//Create loop to print cell values in a row
for(int j = 0; j<row.getLastCellNum();j++)
{
//Print excel value in console System.out.println(row.getCell(j).getStringCellValue()+"||");
}
System.out.println();
}
}
public static void main(String[] args) throws IOException {
testcode objExcelFile = new testcode();
//Prepare path of excel file
String filepath = "C:\\Users\\malfoy\\Desktop";
objExcelFile.readexcel(filepath,"testfile.xlsx", "read");
}
}
I am using office 2007 edition and I am getting an exception which says
"The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)"
How to fix it?
The line
String FileExtensionName = filename.substring(filename.indexOf("."));
returns a value with the dot (in your case ".xlsx")
So the following if statement returns a HSSFWorkbook instance instead of XSSFWorkbook.
To correct it use
String FileExtensionName = filename.substring(filename.lastIndexOf(".")+1);
I want to import Excel data to Microsoft Dynamics Ax. I've used this code from this site:
http://axpedia.blogspot.com.tr/2013/02/import-from-excel-file-using-x-in-ax.html
but when i try to run i get this error:
SysExcellworksheet could not start (C)\Classes\SysExcellWorksheets\cells(C)\Jobs\ProductType (line 35)
I am newbie Can you help me about this?
static void ProductType(Args _args)
{
SysExcelApplication application;
SysExcelWorkbooks workbooks;
SysExcelWorkbook workbook;
SysExcelWorksheets worksheets;
SysExcelWorksheet worksheet;
SysExcelCells cells;
COMVariantType type;
Name name;
FileName filename;
ProductType productType;
int row;
int _productTypeId;
str _productType;
str _description;
;
application = SysExcelApplication::construct();
workbooks = application.workbooks();
//specify the file path that you want to read
filename = "Path\\filename.xlsx";
try
{
workbooks.open(filename);
}
catch (Exception::Error)
{
throw error("File cannot be opened.");
}
workbook = workbooks.item(1);
worksheets = workbook.worksheets();
worksheet = worksheets.itemFromNum(4); //Here 3 is the worksheet Number
cells = worksheet.cells();
do
{
row++;
_productTypeId = any2int(cells.item(row, 1).value().toString());
_productType = cells.item(row, 2).value().bStr();
_description = cells.item(row, 3).value().bStr();
productType.ID = _productTypeId;
productType.ProductType = _productType;
productType.Description = _description;
productType.insert();
type = cells.item(row+1, 1).value().variantType();
}
while (type != COMVariantType::VT_EMPTY);
application.quit();
}
You probably translated the error, the exact error when I test this is:
SysExcelWorksheet object not initialized.
This is probably because on the following line you are trying to retrieve the 4th worksheet of the excel, but your excel file has less:
worksheet = worksheets.itemFromNum(4);
If your data is on the first worksheet, use '1' instead of '4':
worksheet = worksheets.itemFromNum(1);
I tried to read the existing excel file and copied to another path, and then i tried to insert data to the file i cloned. But I can' insert data. I don't know where I made mistake. But Here I've given my code.
Error occured in the line sheetData.InsertAt<Row>(row,++rowIndex);. Please help me out.
Package spreadsheetPackage = Package.Open(destinationFile, FileMode.Open, FileAccess.ReadWrite);
using (var document = SpreadsheetDocument.Open(spreadsheetPackage))
{
foreach (System.Data.DataTable table in ds.Tables)
{
var workbookPart = document.WorkbookPart;
var workbook = workbookPart.Workbook;
var sheet = workbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
Worksheet ws = ((WorksheetPart)(workbookPart.GetPartById(sheet.Id))).Worksheet;
SheetData sheetData = ws.GetFirstChild<SheetData>();
//Sheet sheet = sheets.FirstOrDefault();
if(sheet==null)
throw new Exception("No sheed found in the template file. Please add the sheet");
int rowIndex = 10, colIndex = 0;
bool flag = false;
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
var sharedStringPart = workbookPart.SharedStringTablePart;
var values = sharedStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
var rows = worksheetPart.Worksheet.Descendants<Row>();
List<String> columns = new List<string>();
foreach (System.Data.DataColumn column in table.Columns)
{
columns.Add(column.ColumnName);
}
foreach (System.Data.DataRow dsrow in table.Rows)
{
Row row = new Row();
foreach (String col in columns)
{
DocumentFormat.OpenXml.Spreadsheet.Cell cell = new Cell();
cell.DataType = DocumentFormat.OpenXml.Spreadsheet.CellValues.String;
cell.CellValue = new DocumentFormat.OpenXml.Spreadsheet.CellValue(dsrow[col].ToString());
row.AppendChild<Cell>(cell);
}
sheetData.InsertAt<Row>(row,++rowIndex);
}
}
Before you append Row, you should mentioned the row index to the instance...
row.RowIndex = (UInt32)rowIndex++;
I need to read data from a single worksheet in an Excel 2007 workbook using the Open XML SDK 2.0. I have spent a lot of time searching for basic guidelines to doing this, but I have only found help on creating spreadsheets.
How do I iterate rows in a worksheet and then iterate the cells in each row, using this SDK?
The other answer seemed more like a meta-answer. I have been struggling with this since using LINQ does work with separated document parts. The following code includes a wrapper function to get the value from a Cell, resolving any possible string lookups.
public void ExcelDocTest()
{
Debug.WriteLine("Running through sheet.");
int rowsComplete = 0;
using (SpreadsheetDocument spreadsheetDocument =
SpreadsheetDocument.Open(#"path\to\Spreadsheet.xlsx", false))
{
WorkbookPart workBookPart = spreadsheetDocument.WorkbookPart;
foreach (Sheet s in workBookPart.Workbook.Descendants<Sheet>())
{
WorksheetPart wsPart = workBookPart.GetPartById(s.Id) as WorksheetPart;
Debug.WriteLine("Worksheet {1}:{2} - id({0}) {3}", s.Id, s.SheetId, s.Name,
wsPart == null ? "NOT FOUND!" : "found.");
if (wsPart == null)
{
continue;
}
Row[] rows = wsPart.Worksheet.Descendants<Row>().ToArray();
//assumes the first row contains column names
foreach (Row row in wsPart.Worksheet.Descendants<Row>())
{
rowsComplete++;
bool emptyRow = true;
List<object> rowData = new List<object>();
string value;
foreach (Cell c in row.Elements<Cell>())
{
value = GetCellValue(c);
emptyRow = emptyRow && string.IsNullOrWhiteSpace(value);
rowData.Add(value);
}
Debug.WriteLine("Row {0}: {1}", row,
emptyRow ? "EMPTY!" : string.Join(", ", rowData));
}
}
}
Debug.WriteLine("Done, processed {0} rows.", rowsComplete);
}
public static string GetCellValue(Cell cell)
{
if (cell == null)
return null;
if (cell.DataType == null)
return cell.InnerText;
string value = cell.InnerText;
switch (cell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared strings table.
// Get worksheet from cell
OpenXmlElement parent = cell.Parent;
while (parent.Parent != null && parent.Parent != parent
&& string.Compare(parent.LocalName, "worksheet", true) != 0)
{
parent = parent.Parent;
}
if (string.Compare(parent.LocalName, "worksheet", true) != 0)
{
throw new Exception("Unable to find parent worksheet.");
}
Worksheet ws = parent as Worksheet;
SpreadsheetDocument ssDoc = ws.WorksheetPart.OpenXmlPackage as SpreadsheetDocument;
SharedStringTablePart sstPart = ssDoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// lookup value in shared string table
if (sstPart != null && sstPart.SharedStringTable != null)
{
value = sstPart.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
break;
//this case within a case is copied from msdn.
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
return value;
}
Edit: Thanks #Nitin-Jadhav for the correction to GetCellValue().
The way I do this is with Linq. There are lots of sample around on this subject from using the SDK to just going with pure Open XML (no SDK). Take a look at:
Office Open XML Formats: Retrieving
Excel 2007 Cell Values (uses pure
OpenXML, not SDK, but the concepts
are really close)
Using LINQ to Query Tables in Excel
2007 (uses Open XML SDK, assumes
ListObject)
Reading Data from SpreadsheetML
(probably best "overall introduction"
article)