I'm using OleDb to read from an excel workbook with many sheets.
I need to read the sheet names, but I need them in the order they are defined in the spreadsheet; so If I have a file that looks like this;
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
|_____|_____|____|____|____|____|____|____|____|
\__GERMANY__/\__UK__/\__IRELAND__/
Then I need to get the dictionary
1="GERMANY",
2="UK",
3="IRELAND"
I've tried using OleDbConnection.GetOleDbSchemaTable(), and that gives me the list of names, but it alphabetically sorts them. The alpha-sort means I don't know which sheet number a particular name corresponds to. So I get;
GERMANY, IRELAND, UK
which has changed the order of UK and IRELAND.
The reason I need it to be sorted is that I have to let the user choose a range of data by name or index; they can ask for 'all the data from GERMANY to IRELAND' or 'data from sheet 1 to sheet 3'.
Any ideas would be greatly appreciated.
if I could use the office interop classes, this would be straightforward. Unfortunately, I can't because the interop classes don't work reliably in non-interactive environments such as windows services and ASP.NET sites, so I needed to use OLEDB.
Can you not just loop through the sheets from 0 to Count of names -1? that way you should get them in the correct order.
Edit
I noticed through the comments that there are a lot of concerns about using the Interop classes to retrieve the sheet names. Therefore here is an example using OLEDB to retrieve them:
/// <summary>
/// This method retrieves the excel sheet names from
/// an excel workbook.
/// </summary>
/// <param name="excelFile">The excel file.</param>
/// <returns>String[]</returns>
private String[] GetExcelSheetNames(string excelFile)
{
OleDbConnection objConn = null;
System.Data.DataTable dt = null;
try
{
// Connection String. Change the excel file to the file you
// will search.
String connString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + excelFile + ";Extended Properties=Excel 8.0;";
// Create connection object by using the preceding connection string.
objConn = new OleDbConnection(connString);
// Open connection with the database.
objConn.Open();
// Get the data table containg the schema guid.
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if(dt == null)
{
return null;
}
String[] excelSheets = new String[dt.Rows.Count];
int i = 0;
// Add the sheet name to the string array.
foreach(DataRow row in dt.Rows)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
i++;
}
// Loop through all of the sheets if you want too...
for(int j=0; j < excelSheets.Length; j++)
{
// Query each excel sheet.
}
return excelSheets;
}
catch(Exception ex)
{
return null;
}
finally
{
// Clean up.
if(objConn != null)
{
objConn.Close();
objConn.Dispose();
}
if(dt != null)
{
dt.Dispose();
}
}
}
Extracted from Article on the CodeProject.
Since above code do not cover procedures for extracting list of sheet name for Excel 2007,following code will be applicable for both Excel(97-2003) and Excel 2007 too:
public List<string> ListSheetInExcel(string filePath)
{
OleDbConnectionStringBuilder sbConnection = new OleDbConnectionStringBuilder();
String strExtendedProperties = String.Empty;
sbConnection.DataSource = filePath;
if (Path.GetExtension(filePath).Equals(".xls"))//for 97-03 Excel file
{
sbConnection.Provider = "Microsoft.Jet.OLEDB.4.0";
strExtendedProperties = "Excel 8.0;HDR=Yes;IMEX=1";//HDR=ColumnHeader,IMEX=InterMixed
}
else if (Path.GetExtension(filePath).Equals(".xlsx")) //for 2007 Excel file
{
sbConnection.Provider = "Microsoft.ACE.OLEDB.12.0";
strExtendedProperties = "Excel 12.0;HDR=Yes;IMEX=1";
}
sbConnection.Add("Extended Properties",strExtendedProperties);
List<string> listSheet = new List<string>();
using (OleDbConnection conn = new OleDbConnection(sbConnection.ToString()))
{
conn.Open();
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dtSheet.Rows)
{
if (drSheet["TABLE_NAME"].ToString().Contains("$"))//checks whether row contains '_xlnm#_FilterDatabase' or sheet name(i.e. sheet name always ends with $ sign)
{
listSheet.Add(drSheet["TABLE_NAME"].ToString());
}
}
}
return listSheet;
}
Above function returns list of sheet in particular excel file for both excel type(97,2003,2007).
Can't find this in actual MSDN documentation, but a moderator in the forums said
I am afraid that OLEDB does not preserve the sheet order as they were in Excel
Excel Sheet Names in Sheet Order
Seems like this would be a common enough requirement that there would be a decent workaround.
This is short, fast, safe, and usable...
public static List<string> ToExcelsSheetList(string excelFilePath)
{
List<string> sheets = new List<string>();
using (OleDbConnection connection =
new OleDbConnection((excelFilePath.TrimEnd().ToLower().EndsWith("x"))
? "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + excelFilePath + "';" + "Extended Properties='Excel 12.0 Xml;HDR=YES;'"
: "provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + excelFilePath + "';Extended Properties=Excel 8.0;"))
{
connection.Open();
DataTable dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drSheet in dt.Rows)
if (drSheet["TABLE_NAME"].ToString().Contains("$"))
{
string s = drSheet["TABLE_NAME"].ToString();
sheets.Add(s.StartsWith("'")?s.Substring(1, s.Length - 3): s.Substring(0, s.Length - 1));
}
connection.Close();
}
return sheets;
}
Another way:
a xls(x) file is just a collection of *.xml files stored in a *.zip container.
unzip the file "app.xml" in the folder docProps.
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<Properties xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">
<TotalTime>0</TotalTime>
<Application>Microsoft Excel</Application>
<DocSecurity>0</DocSecurity>
<ScaleCrop>false</ScaleCrop>
-<HeadingPairs>
-<vt:vector baseType="variant" size="2">
-<vt:variant>
<vt:lpstr>Arbeitsblätter</vt:lpstr>
</vt:variant>
-<vt:variant>
<vt:i4>4</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
-<TitlesOfParts>
-<vt:vector baseType="lpstr" size="4">
<vt:lpstr>Tabelle3</vt:lpstr>
<vt:lpstr>Tabelle4</vt:lpstr>
<vt:lpstr>Tabelle1</vt:lpstr>
<vt:lpstr>Tabelle2</vt:lpstr>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0300</AppVersion>
</Properties>
The file is a german file (Arbeitsblätter = worksheets).
The table names (Tabelle3 etc) are in the correct order. You just need to read these tags;)
regards
I have created the below function using the information provided in the answer from #kraeppy (https://stackoverflow.com/a/19930386/2617732). This requires the .net framework v4.5 to be used and requires a reference to System.IO.Compression. This only works for xlsx files and not for the older xls files.
using System.IO.Compression;
using System.Xml;
using System.Xml.Linq;
static IEnumerable<string> GetWorksheetNamesOrdered(string fileName)
{
//open the excel file
using (FileStream data = new FileStream(fileName, FileMode.Open))
{
//unzip
ZipArchive archive = new ZipArchive(data);
//select the correct file from the archive
ZipArchiveEntry appxmlFile = archive.Entries.SingleOrDefault(e => e.FullName == "docProps/app.xml");
//read the xml
XDocument xdoc = XDocument.Load(appxmlFile.Open());
//find the titles element
XElement titlesElement = xdoc.Descendants().Where(e => e.Name.LocalName == "TitlesOfParts").Single();
//extract the worksheet names
return titlesElement
.Elements().Where(e => e.Name.LocalName == "vector").Single()
.Elements().Where(e => e.Name.LocalName == "lpstr")
.Select(e => e.Value);
}
}
I like the idea of #deathApril to name the sheets as 1_Germany, 2_UK, 3_IRELAND. I also got your issue to do this rename for hundreds of sheets. If you don't have a problem to rename the sheet name then you can use this macro to do it for you. It will take less than seconds to rename all sheet names. unfortunately ODBC, OLEDB return the sheet name order by asc. There is no replacement for that. You have to either use COM or rename your name to be in the order.
Sub Macro1()
'
' Macro1 Macro
'
'
Dim i As Integer
For i = 1 To Sheets.Count
Dim prefix As String
prefix = i
If Len(prefix) < 4 Then
prefix = "000"
ElseIf Len(prefix) < 3 Then
prefix = "00"
ElseIf Len(prefix) < 2 Then
prefix = "0"
End If
Dim sheetName As String
sheetName = Sheets(i).Name
Dim names
names = Split(sheetName, "-")
If (UBound(names) > 0) And IsNumeric(names(0)) Then
'do nothing
Else
Sheets(i).Name = prefix & i & "-" & Sheets(i).Name
End If
Next
End Sub
UPDATE:
After reading #SidHoland comment regarding BIFF an idea flashed. The following steps can be done through code. Don't know if you really want to do that to get the sheet names in the same order. Let me know if you need help to do this through code.
1. Consider XLSX as a zip file. Rename *.xlsx into *.zip
2. Unzip
3. Go to unzipped folder root and open /docprops/app.xml
4. This xml contains the sheet name in the same order of what you see.
5. Parse the xml and get the sheet names
UPDATE:
Another solution - NPOI might be helpful here
http://npoi.codeplex.com/
FileStream file = new FileStream(#"yourexcelfilename", FileMode.Open, FileAccess.Read);
HSSFWorkbook hssfworkbook = new HSSFWorkbook(file);
for (int i = 0; i < hssfworkbook.NumberOfSheets; i++)
{
Console.WriteLine(hssfworkbook.GetSheetName(i));
}
file.Close();
This solution works for xls. I didn't try xlsx.
Thanks,
Esen
This worked for me. Stolen from here: How do you get the name of the first page of an excel workbook?
object opt = System.Reflection.Missing.Value;
Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
Excel.Workbook workbook = app.Workbooks.Open(WorkBookToOpen,
opt, opt, opt, opt, opt, opt, opt,
opt, opt, opt, opt, opt, opt, opt);
Excel.Worksheet worksheet = workbook.Worksheets[1] as Microsoft.Office.Interop.Excel.Worksheet;
string firstSheetName = worksheet.Name;
Try this. Here is the code to get the sheet names in order.
private Dictionary<int, string> GetExcelSheetNames(string fileName)
{
Excel.Application _excel = null;
Excel.Workbook _workBook = null;
Dictionary<int, string> excelSheets = new Dictionary<int, string>();
try
{
object missing = Type.Missing;
object readOnly = true;
Excel.XlFileFormat.xlWorkbookNormal
_excel = new Excel.ApplicationClass();
_excel.Visible = false;
_workBook = _excel.Workbooks.Open(fileName, 0, readOnly, 5, missing,
missing, true, Excel.XlPlatform.xlWindows, "\\t", false, false, 0, true, true, missing);
if (_workBook != null)
{
int index = 0;
foreach (Excel.Worksheet sheet in _workBook.Sheets)
{
// Can get sheet names in order they are in workbook
excelSheets.Add(++index, sheet.Name);
}
}
}
catch (Exception e)
{
return null;
}
finally
{
if (_excel != null)
{
if (_workBook != null)
_workBook.Close(false, Type.Missing, Type.Missing);
_excel.Application.Quit();
}
_excel = null;
_workBook = null;
}
return excelSheets;
}
As per MSDN, In a case of spreadsheets inside of Excel it might not work because Excel files are not real databases. So you will be not able to get the sheets name in order of their visualization in workbook.
Code to get sheets name as per their visual appearance using interop:
Add reference to Microsoft Excel 12.0 Object Library.
Following code will give the sheets name in the actual order stored in workbook, not the sorted name.
Sample Code:
using Microsoft.Office.Interop.Excel;
string filename = "C:\\romil.xlsx";
object missing = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook wb =excel.Workbooks.Open(filename, missing, missing, missing, missing,missing, missing, missing, missing, missing, missing, missing, missing, missing, missing);
ArrayList sheetname = new ArrayList();
foreach (Microsoft.Office.Interop.Excel.Worksheet sheet in wb.Sheets)
{
sheetname.Add(sheet.Name);
}
I don't see any documentation that says the order in app.xml is guaranteed to be the order of the sheets. It PROBABLY is, but not according to the OOXML specification.
The workbook.xml file, on the other hand, includes the sheetId attribute, which does determine the sequence - from 1 to the number of sheets. This is according to the OOXML specification. workbook.xml is described as the place where the sequence of the sheets is kept.
So reading workbook.xml after it is extracted form the XLSX would be my recommendation. NOT app.xml. Instead of docProps/app.xml, use xl/workbook.xml and look at the element, as shown here -
`
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<fileVersion appName="xl" lastEdited="5" lowestEdited="5" rupBuild="9303" />
<workbookPr defaultThemeVersion="124226" />
- <bookViews>
<workbookView xWindow="120" yWindow="135" windowWidth="19035" windowHeight="8445" />
</bookViews>
- <sheets>
<sheet name="By song" sheetId="1" r:id="rId1" />
<sheet name="By actors" sheetId="2" r:id="rId2" />
<sheet name="By pit" sheetId="3" r:id="rId3" />
</sheets>
- <definedNames>
<definedName name="_xlnm._FilterDatabase" localSheetId="0" hidden="1">'By song'!$A$1:$O$59</definedName>
</definedNames>
<calcPr calcId="145621" />
</workbook>
`
My following 3 line code is used to create one file with sheet1(Year 2019) now would like to execute another page and need to store the result of it in same file as sheet2(Year 2020).
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sheet = wb.createSheet("Year 2019");
FileOutputStream fos = new FileOutputStream("C:\\Users\\dp\\Desktop\\MData1_Test_1.xlsx");
Row header_1 = sheet.createRow(rowNum[0]);
rowNum[0] = (rowNum[0] + 1);
header_1.createCell(0).setCellValue("Analytics");
Any help will be appreciated.
You can do something like this:
// open your first file
try (InputStream inp = new FileInputStream(fut.get())) {
Workbook wb = WorkbookFactory.create(inp);
// get sheet by name
Sheet sheet = wb.getSheet("Year 2019");
if (null != sheet) {
// if sheet exists => delete it
int index = wb.getSheetIndex(String name);
wb.removeSheetAt(index);
}
// create new sheet
sheet = wb.createSheet("Year 2019");
// proceed adding data
// row = sheet.createRow(0);
// row.createCell(0).setCellValue("data");
// ...
}
You can repeat these steps as many time as you need to manage as many sheets as you want.
This is my code that extracts value from an xlsx file and print it on Eclipse console
public class testcode {
public void readexcel(String filepath, String filename, String sheetname) throws IOException
{
//Create an object of file class to open xlsx file
File file = new File(filepath+"\\"+filename);
//Create an object of FileInputStream to read an xlsx file
FileInputStream inputstream = new FileInputStream(file);
Workbook workbook = null;
//Find file extension name by using substring
String FileExtensionName = filename.substring(filename.indexOf("."));
//Check condition whether file is xlsx or xls
if(FileExtensionName.equalsIgnoreCase("xlsx"))
workbook = new XSSFWorkbook(inputstream);
else
workbook = new HSSFWorkbook(inputstream);
//Read sheet inside the workbook by its name
Sheet sheet = workbook.getSheet(sheetname);
//Find number of rows in sheet
int rowCount = sheet.getLastRowNum() - sheet.getFirstRowNum();
//Create a loop over all the rows of excel file to read it
for(int i = 0; i<rowCount+1;i++)
{
Row row = sheet.getRow(i);
//Create loop to print cell values in a row
for(int j = 0; j<row.getLastCellNum();j++)
{
//Print excel value in console System.out.println(row.getCell(j).getStringCellValue()+"||");
}
System.out.println();
}
}
public static void main(String[] args) throws IOException {
testcode objExcelFile = new testcode();
//Prepare path of excel file
String filepath = "C:\\Users\\malfoy\\Desktop";
objExcelFile.readexcel(filepath,"testfile.xlsx", "read");
}
}
I am using office 2007 edition and I am getting an exception which says
"The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)"
How to fix it?
The line
String FileExtensionName = filename.substring(filename.indexOf("."));
returns a value with the dot (in your case ".xlsx")
So the following if statement returns a HSSFWorkbook instance instead of XSSFWorkbook.
To correct it use
String FileExtensionName = filename.substring(filename.lastIndexOf(".")+1);
i have exported my datatable as shown in the code below. It works fine when exporting the large data but when i try to open the xls file it prompt me this error :
"Excel found unreadable content in filename.xls. Do you want to recover the contents of this workbook? If you trust the source of this workbook, click Yes."
DataTable exportTable = Record.GetAll();
Workbook workbook = new Workbook();
Worksheet sheet = new Worksheet("Test");
for (int x = 0; x < exportTable.Columns.Count; x++)
{
sheet.Cells[0, x] = new Cell(exportTable.Columns[x].ColumnName.ToString());
}
for (int i = 0; i < exportTable.Rows.Count; i++)
{
for (int j = 0; j < exportTable.Columns.Count; j++)
{
sheet.Cells[(i + 1), j] = new Cell(exportTable.Rows[i][j].ToString());
}
}
workbook.Worksheets.Add(sheet);
workbook.Save(#"C:\temp1\Test.xls");
System.IO.FileInfo file = new System.IO.FileInfo(#"C:\temp1\Test.xls");
Response.ClearContent();
Response.AddHeader("Content-Disposition", "attachment; filename=" + file.Name);
Response.AddHeader("Content-Length", file.Length.ToString());
Response.ContentEncoding = System.Text.Encoding.UTF8;
Response.ContentType = "application/ms-excel";
Response.TransmitFile(file.FullName);
Response.End();
If you are using Excel 2010 or above ( I am not sure about 2007) the file will be saved as XLSX even if you set the extension to XLS. Change the extension to XLSX and it should be fine. Otherwise Excel will detect the difference between the extension and the file structure and will show the message above.
This source code:
using (ExcelPackage xlPackage = new ExcelPackage(newFile, template))
{
ExcelWorksheet worksheet = null;
foreach (DataTable dt in dsExcel.Tables)
{
worksheet = xlPackage.Workbook.Worksheets.Add(dt.TableName);
worksheet = xlPackage.Workbook.Worksheets[dt.TableName];
ExcelCell cell;
const int startRow = 9;
int row = startRow;
int col = 1;
foreach (DataRow dr in dt.Rows)
{
foreach (DataColumn dc in dt.Columns)
{
worksheet.Cell(row, col).Value = dr[dc].ToString();
col++;
}
col = 1;
row++;
}
}
xlPackage.Save();
}
I am getting error at xlpackage.save i.e. object reference not set to an instance.
How to generate an excel file with multiplesheets using an excel template?
Looks like this is a bug, documented here. Unfortunately, it looks like the fix is to edit the source code of ExcelPackage itself.