Using OleDBDataReader to read strings from an Excel spreadsheet - excel

I have an Excel document and one column is of type text. Some cells contain combinations of letters and digits, while other cells contain numbers only. I found out that if the cell value contains only digits, the reader interprets the values in the cells as a long integer and it truncates it. So, rather than getting "120500923", I get back "1.20501e+008"
Here is my code:
OleDbConnection myConnection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source="myfile.xls";Extended Properties=\"Excel 8.0;HDR=NO;IMEX=1\"");
OleDbCommand myCommand = new OleDbCommand("Select * from [Sheet1];");
myConnection.Open();
myCommand.Connection = myConnection;
OleDbDataReader myReader = myCommand.ExecuteReader();
string[] myLine = new string[5];
while (myReader.Read())
{
for (int i = 0; i < 5; i++) //read the first 5 columns
{
myLine[i] = (string)myReader[i].ToString();
}
}
How can I instruct the reader to treat all the cell values as strings and not as numbers? I prefer not to modify the excel spreadsheet, because it is edited by people and I want to keep it simple.
Thanks,
Nick

After some googling and experimenting, I found out some useful stuff.
I discovered that if I replace the connection string from Jet to ACE, the cell value will be considered a string.
So, I changed the first line in the code as below:
OleDbConnection myConnection = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=\"myfile.xls\";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"");
Since my computer is a Windows 10/64 bit, the default MDAC driver installed was 64 bit. I had to uninstall it and install the 32 bit version (I use SharpDevelop and apparently I can only compile 32 bit applications), from http://www.microsoft.com/en-us/download/details.aspx?id=13255. I am able to use version 12, even if I don't have Office 2003 or later installed.

Related

Excel and Libre Office conflict over Open XML output

Open XML is generating .xlsx files that can be read by Open Office, but not by Excel itself.
With this as my starting point( Export DataTable to Excel with Open Xml SDK in c#) I have added code to create a .xlsx file. Attempting to open with Excel, I'm asked if I want to repair the file. Saying yes gets "The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt." After many hours of trying to jiggle the data from my table to make this work, I finally threw up my hands in despair and made a spreadsheet with a single number in the first cell.
Still corrupt.
Renaming it to .zip and exploring shows intact .xml files. On a whim, I took a legit .xlsx file created by Excel, unzipped it, rezipped without changing contents and renamed back to .xlsx. Excel declared it corrupt. So this is clearly not a content issue, but file a format issue. Giving up on Friday, I sent some of the sample files home and opened them there with Libre Office. There were no issues at all. File content was correct and Calc had no problem. I'm using Excel for Office 365, 32 bit.
// ignore the bits (var list) that get data from the database. I've reduced this to just the output of a single header line
List< ReportFilingHistoryModel> list = DB.Reports.Report.GetReportClientsFullHistoryFiltered<ReportFilingHistoryModel>(search, client, report, signature);
MemoryStream memStream = new MemoryStream();
using (SpreadsheetDocument workbook = SpreadsheetDocument.Create(memStream, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
sheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "History" };
sheets.Append(sheet);
Row headerRow = new Row();
foreach( var s in "Foo|Bar".Split('|'))
{
var cell = new Cell();
cell.DataType = CellValues.Number;
cell.CellValue = new CellValue("5");
headerRow.AppendChild(cell);
}
sheetData.AppendChild(headerRow);
}
memStream.Seek(0, SeekOrigin.Begin);
Guid result = DB.Reports.Report.AddClientHistoryList( "test.xlsx", memStream.GetBuffer(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
return Ok(result);
This should just work. I've noticed other stack overflow discussions that direct back to the first link I mentioned above. I seem to be doing it right (and Calc concurs). There have been discussions of shared strings and whatnot, but by using plain numbers I shouldn't be having issues. What am I missing here?
In working on this, I went with the notion that some extraneous junk on the end of a .zip file is harmless. 7-Zip, Windows Explorer and Libre Office all seem to agree (as does some other zip program I used at home whose name escapes me). Excel, however, does not. Using the pointer at memStream.GetBuffer() was fine, but using its length was not. (The preceding Seek() was unnecessary.) Limiting the write of the data to a length equal to the current output position keeps Excel from going off the rails.

Proper way to get excel sheet names using C# and oledb

I'm trying to figure out why the behavior I'm seeing and the "documented" behavior are different. I've read both of these articles:Read and Write Excel Documents Using OLEDB and Working with MS Excel(xls / xlsx) Using MDAC and Oledb and this is text from the second link.
If you read in the second link it says:
To Retrieve Schema Information of Excel Workbook :
You can get the worksheets that are present in the excel workbook using GetOleDbSchemaTable. Use the following snippet.
DataTable dtSchema = null;
dtSchema = conObj.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
Here dtSchema will hold the list of all workbooks. Say we have two workbooks : wb1, wb2. The above code will return a list of wb1, wb1$,wb2,wb2$. We need to filter out $ elements.
However when I run this code I only get "wb1$ and wb2$". I can easily remove the $ in code but I'm trying to make sure I'm not going to have code that breaks when I put it on a different computers/OS/environment and it behaves as is documented. Can somebody tell my what or if something changed since these were written or if I'm missing some key piece. Something to note this is being developed in VS2015, Windows 7 Pro, and Office 2010 installed.
//Connection String
//string connstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties='Excel 8.0;HDR=NO;IMEX=1';"; // Extra blank space cannot appear in Office 2007 and the last version. And we need to pay attention on semicolon.
//string connstring = Provider = Microsoft.JET.OLEDB.4.0; Data Source = " + path + "; Extended Properties = 'Excel 8.0;HDR=NO;IMEX=1'; "; //This connection string is appropriate for Office 2007 and the older version. We can select the most suitable connection string according to Office version or our program.
using (OleDbConnection conn = new OleDbConnection(_connectionString))
{
conn.Open();
//DataTable sheetNames = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" }); //Get All Sheets Name
DataTable sheetNames = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null); //Get All Sheets Name
// Loop through all Sheets to get data
foreach (DataRow dr in sheetNames.Rows)
{
string sheetName = dr["TABLE_NAME"].ToString();
//if (!sheetName.EndsWith("$"))
// continue;
Debug.Print(sheetName);
}
return sheetNames;
Thanks
dbl

Excel file created with OleDbConnection uses invalid CultureInfo

I'm using an OleDbConnection to create an Excel file:
String bewegungenDateiname = System.IO.Path.ChangeExtension(System.IO.Path.GetTempFileName(), ".xls");
string strConnectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ System.IO.Path.GetDirectoryName(bewegungenDateiname) + #"\" + System.IO.Path.GetFileName(bewegungenDateiname)
+ #";Extended Properties='Excel 8.0;HDR=YES'";
using (System.Data.OleDb.OleDbConnection objConn = new System.Data.OleDb.OleDbConnection(strConnectionString))
using (System.Data.OleDb.OleDbCommand cmd = new System.Data.OleDb.OleDbCommand("", objConn))
{
objConn.Open();
cmd.CommandText = "CREATE TABLE [Test] ([MyDecimal] DECIMAL NULL)";
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
Decimal value = 12.34m;
cmd.Parameters.AddWithValue("#P01", value);
cmd.CommandText = "INSERT INTO [Test$] ([MyDecimal]) VALUES (#P01)";
cmd.ExecuteNonQuery();
}
System.Diagnostics.Process.Start(bewegungenDateiname);
Now when Excel 2013 opens the Excel file it will Show:
MyDecimal
1234
So in my case Excel is losing the dot. Now I'm running a german Version of Windows/Office and if I use the following line to add the Parameter it will work:
cmd.Parameters.AddWithValue("#P01", value.ToString());
German localization of numbers uses a colon instead of the dot to separate the fractions from the number value (meaning 12,34 instead of 12.34). So it seems the OleDbConnection uses the wrong culture variant to write the Excel file?
I fear my Version might break with a different Version of Excel or a different locale - is there a way to fix this and get decimal values to Excel without such risks?
I would use some other way to create Excel files, if it is without this flaw.
With Excel 2013 try using the following:
strConnectionString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0;HDR=No;IMEX=1""", _filePath)
I have no idea if it will solve the problem.

Determine first sheet name of an Excel file when reading xls as a database

I have a need to read in some client data from an Excel sheet. I found it very quick and easy to read the *.xls as data, this way:
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\MyExcelFile.xls;Extended Properties=\"Excel 8.0;HDR=YES\"";
using (var conn = new System.Data.OleDb.OleDbConnection(connString)) {
conn.Open();
System.Data.OleDb.OleDbCommand cmd = new System.Data.OleDb.OleDbCommand("Select * From [SheetName$]", conn);
OleDbDataReader reader = cmd.ExecuteReader();
int firstNameOrdinal = reader.GetOrdinal("First Name");
int lastNameOrdinal = reader.GetOrdinal("Last Name");
while (reader.Read()) {
Console.WriteLine("First Name: {0}, Last Name: {1}",
reader.GetString(firstNameOrdinal),
reader.GetString(lastNameOrdinal));
}
}
The problem comes in at the 4th line: SELECT * FROM [SheetName$]. In this example, you need to replace "SheetName" with the name of the worksheet you are trying to read. Most of the time, the client names this the same, and all is fine.
But sometimes they don't, and that means some manual work on our part to fix it. There is always just one sheet (in this case anyway) and so I'd rather just tell the code to "use the first sheet, whatever it is named".
I haven't been able to find a way to accomplish this using this technique, so I'm reaching out to the community. Thanks in advance for any advice.
PS - We have to go through a length approval process for external files (e.g. FileHelpers) and I prefer using this process because it does not require a third-party library. I'd prefer to limit answers to this method if possible.
Thanks.
The answer here might suit your needs:
http://social.msdn.microsoft.com/forums/en-US/adodotnetdataproviders/thread/711cf5f9-75fc-4a02-9a96-08aec48dad69/
It uses .GetOleDbSchemaTable() to get a list of sheet names in a workbook.

Can I import INTO excel from a data source without iteration?

Currently I have an application that takes information from a SQLite database and puts it to Excel. However, I'm having to take each DataRow, iterate through each item, and put each value into it's own cell and determine highlighting. What this is causing is 20 minutes to export a 9000 record file into Excel. I'm sure it can be done quicker than that. My thoughts are that I could use a data source to fill the Excel Range and then use the column headers and row numbers to format only those rows that need to be formatted. However, when I look online, no matter what I seem to type, it always shows examples of using Excel as a database, nothing about importing into excel. Unless I'm forgetting a key word or to. Now, this function has to be done in code as it's part of a bigger application. Otherwise I would just have Excel connect to the DB and pull the information itself. Unfortunately that's not the case. Any information that could assist me in quick loading an excel sheet would be appreciated. Thanks.Additional Information:Another reason why the pulling of the information from the DB has to be done in code is that not every computer this is loaded on will have Excel on it. The person using the application may simply be told to export the data and email it to their supervisor. The setup app includes the needed dlls for the application to make the proper format.Example Code (Current):
For Each strTemp In strColumns
excelRange = worksheet.Cells(1, nCounter)
excelRange.Select()
excelRange.Value2 = strTemp
excelRange.Interior.Color = System.Drawing.Color.Gray.ToArgb()
excelRange.BorderAround(Excel.XlLineStyle.xlContinuous, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexAutomatic, Type.Missing)
nCounter += 1
Next
Now, this is only example code in terms of the iteration I'm doing. Where I'm really processing the information from the database I'm iterating through a dataTable's Rows, then iterating through the items in the dataRow and doing essentially the same as above; value by value, selecting the range and putting the value in the cell, formatting the cell if it's part of a report (not always gray), and moving onto the next set of data. What I'd like to do is put all of the data in the excel sheet (A2:??, not a row, but multiple rows) then iterate through the reports and format each row then. That way, the only time I iterate through all of the records is when every record is part of a report.
Ideal Code
excelRange = worksheet.Cells("A2", "P9000")
excelRange.DataSource = ds 'ds would be a queried dataSet, and I know there is no excelRange.DataSource.
'Iteration code to format cells
Update:
I know my examples were in VB, but it's because I was also trying to write a VB version of the application since my boss prefers VB. However, here's my final code using a Recordset. The ConvertToRecordset function was obtained from here.
private void CreatePartSheet(Excel.Worksheet excelWorksheet)
{
_dataFactory.RevertDatabase();
excelWorksheet.Name = "Part Sheet";
string[] strColumns = Constants.strExcelPartHeaders;
CreateSheetHeader(excelWorksheet, strColumns);
System.Drawing.Color clrPink = System.Drawing.Color.FromArgb(203, 192, 255);
System.Drawing.Color clrGreen = System.Drawing.Color.FromArgb(100, 225, 137);
string[] strValuesAndTitles = {/*...Column Names...*/};
List<string> lstColumns = strValuesAndTitles.ToList<string>();
System.Data.DataSet ds = _dataFactory.GetDataSet(Queries.strExport);
ADODB.Recordset rs = ConvertToRecordset(ds.Tables[0]);
excelRange = excelWorksheet.get_Range("A2", "ZZ" + rs.RecordCount.ToString());
excelRange.Cells.CopyFromRecordset(rs, rs.RecordCount, rs.Fields.Count);
int nFieldCount = rs.Fields.Count;
for (int nCounter = 0; nCounter < rs.RecordCount; nCounter++)
{
int nRowCounter = nCounter + 2;
List<ReportRecord> rrPartReports = _lstReports.FindAll(rr => rr.PartID == nCounter).ToList<ReportRecord>();
excelRange = (Excel.Range)excelWorksheet.get_Range("A" + nRowCounter.ToString(), "K" + nRowCounter.ToString());
excelRange.Select();
excelRange.NumberFormat = "#";
if (rrPartReports.Count > 0)
{
excelRange.Interior.Color = System.Drawing.Color.FromArgb(230, 216, 173).ToArgb(); //Light Blue
foreach (ReportRecord rr in rrPartReports)
{
if (lstColumns.Contains(rr.Title))
{
excelRange = (Excel.Range)excelWorksheet.Cells[nRowCounter, lstColumns.IndexOf(rr.Title) + 1];
excelRange.Interior.Color = rr.Description.ToUpper().Contains("TAG") ? clrGreen.ToArgb() : clrPink.ToArgb();
if (rr.Description.ToUpper().Contains("TAG"))
{
rs.Find("PART_ID=" + (nCounter + 1).ToString(), 0, ADODB.SearchDirectionEnum.adSearchForward, "");
excelRange.AddComment(Environment.UserName + ": " + _dataFactory.GetTaggedPartPrevValue(rs.Fields["POSITION"].Value.ToString(), rr.Title));
}
}
}
}
if (nRowCounter++ % 500 == 0)
{
progress.ProgressComplete = ((double)nRowCounter / (double)rs.RecordCount) * (double)100;
Notify();
}
}
rs.Close();
excelWorksheet.Columns.AutoFit();
progress.Message = "Done Exporting to Excel";
Notify();
_dataFactory.RestoreDatabase();
}
Can you use ODBC?
''http://www.ch-werner.de/sqliteodbc/
dbName = "c:\docs\test"
scn = "DRIVER=SQLite3 ODBC Driver;Database=" & dbName _
& ";LongNames=0;Timeout=1000;NoTXN=0;SyncPragma=NORMAL;StepAPI=0;"
Set cn = CreateObject("ADODB.Connection")
cn.Open scn
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select * from test", cn
Worksheets("Sheet3").Cells(2, 1).CopyFromRecordset rs
BTW, Excel is quite happy with HTML and internal style sheets.
I have used the Excel XML file format in the past to write directly to an output file or stream. It may not be appropriate for your application, but writing XML is much faster and bypasses the overhead of interacting with the Excel Application. Check out this Introduction to Excel XML post.
Update:
There are also a number of libraries (free and commercial) which can make creating excel document easier for example excellibrary which doesn't support the new format yet. There are others mentioned in the answers to Create Excel (.XLS and .XLSX) file from C#
Excel has the facility to write all the data from a ADO or DAO recordset in a single operation using the CopyFromRecordset method.
Code snippet:
Sheets("Sheet1").Range("A1").CopyFromRecordset rst
I'd normally recommend using Excel to pull in the data from SQLite. Use Excel's "Other Data Sources". You could then choose your OLE DB provider, use a connection string, what-have-you.
It sounds, however, that the real value of your code is the formatting of the cells, rather than the transfer of data.
Perhaps refactor the process to:
have Excel import the data
use your code to open the Excel spreadsheet, and apply formatting
I'm not sure if that is an appropriate set of processes for you, but perhaps something to consider?
Try this out:
http://office.microsoft.com/en-au/excel-help/use-microsoft-query-to-retrieve-external-data-HA010099664.aspx
Perhaps post some code, and we might be able to track down any issues.
I'd consider this chain of events:
query the SQLite database for your dataset.
move the data out of ADO.NET objects, and into POCO objects. Stop using DataTables/Rows.
use For Each to insert into Excel.

Resources