Is there an equivalent to SCHEMA.INI for reading Excel Workbooks - excel

I am currently working on a project that will import data from multiple different sources in a variety of formats and structures - e.g., CSV, fixed-length, other-delimited (tab, pipe, etc.) plain-text, and Excel worksheets/workbooks. For this, I'm attempting to build "generic" readers for these files which will throw the files' contents into a DataTable/DataSet I can use in other methods. The plain-text files are pretty simple as I've created a large SCHEMA.INI file which contains field definitions for each of the files the system will handle. That SCHEMA.INI resides in a "processing folder" where the files are temporarily stored until their data has been integrated with other systems. A defined text files' data can be easily extracted using this method:
Private Function TextFileToDataTable(ByVal TextFile As IO.FileInfo) As DataTable
Dim TextFileData As New DataTable("TextFileData")
Using TapeFileConnect As New OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;Data Source='" + TextFile.DirectoryName + "';Extended Properties='Text';")
Using TapeAdapter As New OleDb.OleDbDataAdapter(String.Format("SELECT * FROM {0};", TextFile.Name), TapeFileConnect)
Try
TapeAdapter.Fill(TextFileData)
Catch ex As Exception
TextFileData = Nothing
End Try
End Using
End Using
Return TextFileData
End Function
This works well because a plain-text file isn't terribly complex in its data structure. A single file generally (at least for my requirements) contains, at most, one single table's worth of data - unless, of course, it's some sort of complex XML or JSON structure file, which can/should be handled completely differently anyway - so there's no need to go iterating through different elements beyond this.
NOTE: The code above is dependent on the SCHEMA.INI file being present in the same directory as the plain-text file being read and there being a section within that SCHEMA.INI defined with the same name as that plain-text file.
EXAMPLE:
[EXAMPLE_TEXT_FILE.TXT]
CharacterSet=ANSI
Format=FixedLength
ColNameHeader=FALSE
DateTimeFormat="YYYYMMDD"
COL1=CUSTOMER_NUMBER TEXT WIDTH 20
COL2=CUSTOMER_FIRSTNAME TEXT WIDTH 30
COL3=CUSTOMER_LASTNAME TEXT WIDTH 40
COL4=CUSTOMER_ADDR1 TEXT WIDTH 40
COL5=CUSTOMER_ADDR2 TEXT WIDTH 40
COL6=CUSTOMER_ADDR3 TEXT WIDTH 40
...
Excel workbooks, however, can be a bit trickier. Several of the workbooks I have to process contain multiple worksheets worth of data that I want to consolidate into a single DataSet with a DataTable for each worksheet. The basic functionality is, again, fairly straightforward and I've come up with the following method to read any and all sheets into a DataSet:
Private Function ExcelFileToDataSet(ByVal ExcelFile As IO.FileInfo, ByVal HasHeaderRow As Boolean) As DataSet
Dim ExcelFileData As New DataSet("ExcelFileData")
Dim ExcelConnectionString As String = String.Empty
Dim UseHeaders As String = "NO"
Select Case ExcelFile.Extension.ToUpper.Trim
Case ".XLS"
ExcelConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
Case ".XLSX"
ExcelConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
End Select
If HasHeaderRow Then
UseHeaders = "YES"
End If
ExcelConnectionString = String.Format(ExcelConnectionString, ExcelFile.FullName, UseHeaders)
Try
Using ExcelConnection As New OleDb.OleDbConnection(ExcelConnectionString)
Dim ExcelSchema As New DataTable
ExcelConnection.Open()
ExcelSchema = ExcelConnection.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, Nothing)
For Each ExcelSheet As DataRow In ExcelSchema.Rows
Dim SheetTable As New DataTable
Using ExcelAdapter As New OleDb.OleDbDataAdapter
Dim SheetName As String = ExcelSheet("TABLE_NAME").ToString
Dim ExcelCommand As New OleDb.OleDbCommand
SheetTable.TableName = SheetName.Substring(0, SheetName.Length - 1)
ExcelCommand.Connection = ExcelConnection
ExcelCommand.CommandText = String.Format("SELECT * FROM [{0}]", SheetName)
ExcelAdapter.SelectCommand = ExcelCommand
ExcelAdapter.Fill(SheetTable)
End Using
ExcelFileData.Tables.Add(SheetTable)
Next ExcelSheet
End Using
Catch ex As Exception
ExcelFileData = Nothing
End Try
Return ExcelFileData
End Function
The above code will work in a majority of the cases I deal with, but my "difficulty" is that there may be some worksheets that have header rows and some that don't within the same workbook. Also, for those worksheets that do not have a header row, I'd like to be able to define the field names and data types similar to how I can with the plain-text SCHEMA.INI. The only thing I have going for me in these cases is that the "client" provides me with a data map to help me identify what data elements are in each field.
What I'd like to know is if there is a way similar to the text file's SCHEMA.INI to define the structure of an Excel workbook and the worksheet(s) it contains - including column data types to avoid the OleDb driver from "misinterpreting" a column's data - ahead of time. I imagine this could be any sort of structured file such as INI, XML, or whatever, but it would need to be capable of identifying whether or not a particular sheet contains a header row or, in lieu of such a row, the (expected) column definitions. Does any such "standard definition" file exist for Excel workbooks?
One thing to note: As you may have noticed in the code for the ExcelFileToDataSet() method, I may be dealing with the older .XLS (97-03) format or the .XLSX (07+) format, so I can't necessarily rely on the workbook being Open XML compliant. I suppose I could try breaking the methods out to one for each extension, but I'd rather find something that I can use regardless of which file format the Excel file is using.

Related

how to import excel to datagrid, then filter by db values

my question about import excel to datagridview but there is an extra case.
I have also a oledb database with store code and store names.
I want it to show only store codes from db that are in the database after imported.
my codes here;
Dim conn As OleDbConnection
Dim dtr As OleDbDataReader
Dim dta As OleDbDataAdapter
Dim cmd As OleDbCommand
Dim dts As DataSet
Dim excel As String
Dim OpenFileDialog As New OpenFileDialog
OpenFileDialog1.FileName = ""
OpenFileDialog1.InitialDirectory = My.Computer.FileSystem.SpecialDirectories.Desktop
OpenFileDialog1.Filter = "All Files (*.*)|*.*|Excel files (*.xlsx)|*.xlsx|CSV Files (*.csv)|*.csv|XLS Files (*.xls)|*xls"
If (OpenFileDialog1.ShowDialog(Me) = System.Windows.Forms.DialogResult.OK) Then
DataGridView1.Columns.Clear()
Dim fi As New FileInfo(OpenFileDialog1.FileName)
Dim FileName As String = OpenFileDialog1.FileName
excel = fi.FullName
conn = New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + excel + ";Extended Properties=Excel 12.0;")
dta = New OleDbDataAdapter("Select * From [Sheet1$]", conn)
dts = New DataSet
dta.Fill(dts, "[Sheet1$]")
DataGridView1.DataSource = dts
DataGridView1.DataMember = "[Sheet1$]"
conn.Close()
End If
firstly sorry for my terrible english :)
images as follows;
Main Form
Store List Form
I want only the ones in the store list to be displayed in datagrid.. :\
It's not exactly clear what your current presentation/display looks like, what the problem is, and what your desired presentation/display should look like. But you have asked about selecting only one part of the data you are importing, which is presumably found in only one column of the imported Excel data.
When the datatable is created, it has the columns and rows from the Excel worksheet. The columns will be data from the first row, and the rows will be the records from the succeeding rows in the worksheet. You can access both the header data and the row data easily. The code below is VERY rough but for you to see how to gain access to the data in the datatable which you have already very successfully imported in the limited code shown above.
Dim columns = datatable.Columns
Dim rows = datatable.Rows
Dim columns1 = columns(0)
Dim rows1 = rows(0)
Dim element1 = rows1(0)
Columns will have all the headers, so you can locate the column with the store codes or store names. Then the rows will have the data for each store. So above, rows1 is the first row of data and element1 is the data in that row from columns1, and so on. The (0) is the index into the respective collections.
You will, of course, have to write code to extract the data you want and if necessary eliminate duplicates, but the data is all there already.
Hopefully getting the data into a list and then sorting, filtering and selecting the data should be relatively straightforward, but if not, add a comment. That's kind of a different problem. You asked about getting only the store codes.
Added: Based on your additional images and explanation, you are looking to perform an SQL INNER JOIN operation. From the w3schools.com page on SQL INNER JOIN, "The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns." This is something you will have to study and learn, but it should provide what you need in this case. You will need to define and construct both tables and then perform the JOIN.
And, by the way, you could also follow the link provided in the first comment by T.S., and if that solves your problem, it's a far simpler solution.

Excel 2010 Macro - Creating txt files with names from ColA and content ColB. Stacko. solutions not work

I have found some answers/examples here on stackoverflow for an issue where in Microsoft Excell 2010, I want to create a txt files for each cell from for e.g. ColumnA which would contain file names, and ColumnB which would contain what is inside certain text file, however one example doesn't work at all, and second bugs after few files created.
You can use the CreateTextFile method which will create your file and provide a TextStream object which you can use to write to the text files. Microsoft Docs
Here's a code example that will do what you asked.
Sub CreateTxt()
Dim my_range As Range
Dim pth As String
Set my_range = Selection
For Each x In my_range.Rows:
pth = "C:\excel_test\" + x.Cells(1) + ".txt" 'file name in column A
Set file_sys = CreateObject("Scripting.FileSystemObject")
Set txt_file = file_sys.CreateTextFile(pth, True)
txt_file.WriteLine (x.Cells(2)) 'content in Column B
txt_file.Close
Next x
End Sub
Just remember, in order to create a file you need adequate permissions on the path you're writing to - I had to run excel as administrator to get the functionality.
Also, the True value in the CreateTextFile method is necessary to overwrite any files with the existing file name, if set to false it will throw an error when trying to write to the file.

importing data from many workbooks in different folders

I am looking to import/copy data from many workbooks into a summary workbook. The workbooks are arranged in different sub-folders, I.e
C:\data1\results_2001.xlm
C:\data2\results_2002.xlm
C:\data3\results_2003.xlm
The names are similar but differ slightly to differentiate them. At present, I import the files individually, and I want to automate the process. The results files (above) are amongst other excel files so I cannot target them by file type.
How would I import these files by partial file name?
One option is to create an array of the filepaths to your excel sheets and then loop over the array and get the data you want into your summary sheet.
Sub CreateSummary()
Dim wkbs() As Variant, wkb As Integer, owb As Workbook
wkbs = Array("C:\data1\results_2001.xlm", "C:\data2\results_2002.xlm", "C:\data3\results_2003.xlm")
For wkb = 0 To UBound(wkbs)
Set owb = Application.Workbooks.Open(wkbs(wkb)) //Open each workbook
With owb
//Get the data you want into your summary workbook
.Close
End With
Next wkb
End Sub
Another way, especially if only a one time operation: Go into Cmd.exe, do a Dir for the files you're looking for, and send it to a text file (eg, something like dir c:\results_*.xlm /s /b > c:\myList.txt). Then import the text file to your worksheet, step thru each cell in the list, opening each workbook in turn.
You can do it in any languages, but for you who is asking this question, i think it's gonna be a little challenging, so here is what you need to do :
create a function that will list files/folders from given path
loop through all items found, if it's a folder , recursive it
if the item fits your target(name, extension, ...) , read it and load the content to the summary
something like this, i believe you will achieve this easily using VBA, look here
Literally, it will be like this, please note that this is not valid code, just something i write down to help you figure it out :
function loopthepath (string pathtoloop)
foreach(dirItem item in pathtoloop.getdirItem)
{
if (item is folder)
{
loopthepath(pathtoloop + item)
}
else
{
if (item fits mydescription)
{
load the content to the summary
}
}
}

Unexpected null members when reading XLSX with Excel Data Reader (.Net)

I'm reading an XLSX (Microsoft Excel XML file) using the Excel Data Reader from http://exceldatareader.codeplex.com/ and am having a problem with missing data. Data which is in the source Excel spreadsheet is missing from the data set returned by the library.
Here's a bit more detail of what I'm doing:
Created a simple test spreadsheet in Excel with one sheet, a header row and two data rows. Save and close Excel.
Open the file and pass the stream into the CreateOpenXmlReader() method and get back an IExcelDataReader.
Call the AsDataSet() method on the IExcelDataReader and get back a DataSet.
Get the ItemArray from row 1 of table 0.
Loop through the ItemArray. Discovered there is data missing (i.e. there are System.DBNull members where I expected System.string members).
Here's a bit more analysis...
I debugged the code and looked inside the ExcelDataReader object model. Found a non-public string array called "SST" which appears to contain the data from the spreadsheet as a single linear (one-dimensional) array.
On closer inspection, I found that the data I was looking for was also missing from this array. In this raw data, the member does not exist at all.
My guess is that for some reason the parser is not picking up the data from the OOXML and concluding that the cell is empty. Looking at the OOXML itself, the data seems to be split across the sharedStrings.xml and sheet1.xml files, so perhaps the parser is having a tough time putting all this together.
Saving the file in binary format (Excel 97 to 2003) and reading that in solves the problem so on the surface that seems to confirm my suspicion is with reading the OOXML format.
Suggestions?
As a stop gap I'm converting all files to binary format, but that seems like a kludge. Is there some way to get my OOXML formatted Excel files to read in properly with Excel Data Reader?
To retrieve an Excel spreadsheet (.xlsx) and load it into a DataSet, you don't need to mess with XML readers or a separate library like Excel Data Reader. The code for reading an entire spreadsheet into a DataSet is pretty simple when using the normal OleDb functions in .NET:
Sub readInMyExcelFile
Dim xlsFile as string = "myexcelfile"
Dim conStr As String = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & xlsFile & ";Extended Properties=""Excel 12.0 Xml;HDR=YES"""
Dim dtSheets As New DataTable
Dim tmp As String
Dim sqlText as Sting
Using cn As New OleDbConnection(conStr)
cn.Open()
dtSheets = cn.GetSchema("Tables")
End Using
//Dataset for the spreadsheet
Dim ds as New DataSet
/Loop through the names of all the worksheets in the file.
For Each rw as DataRow in dtSheets.Rows
tmp = rw("TABLE_NAME")
tmp = "[" & tmp & "]"
Dim dt as New DataTable
Using cn as New OleDbConnection(conStr)
cn.Open
/Retrieve all the records from the worksheet.
sqlText = "SELECT * FROM " & tblName
Dim adp As New OleDbDataAdapter(sqlText, cn)
/Fill the data table with the all the data.
adp.Fill(dt)
End Using
ds.Tables.Add(dt)
Next
End Sub
It seems there is a bug in Excel Data Reader (it is first time I have heard about it). Do you have to use it? If not, EPPlus would be a better choice.
excel datareader from codeplex is used for reading data from the excel file directly on web application without any sort of caching on the server.the above code only stands when we can store the excel file somewhere.I have faced similar problems with exceldatareader where some of the data are missing.Most importanly i coludnt find any specific trend.All i cud see that if all the rows have values then there is no problem. Best chance is to convert xlsx to xls.

Can I import INTO excel from a data source without iteration?

Currently I have an application that takes information from a SQLite database and puts it to Excel. However, I'm having to take each DataRow, iterate through each item, and put each value into it's own cell and determine highlighting. What this is causing is 20 minutes to export a 9000 record file into Excel. I'm sure it can be done quicker than that. My thoughts are that I could use a data source to fill the Excel Range and then use the column headers and row numbers to format only those rows that need to be formatted. However, when I look online, no matter what I seem to type, it always shows examples of using Excel as a database, nothing about importing into excel. Unless I'm forgetting a key word or to. Now, this function has to be done in code as it's part of a bigger application. Otherwise I would just have Excel connect to the DB and pull the information itself. Unfortunately that's not the case. Any information that could assist me in quick loading an excel sheet would be appreciated. Thanks.Additional Information:Another reason why the pulling of the information from the DB has to be done in code is that not every computer this is loaded on will have Excel on it. The person using the application may simply be told to export the data and email it to their supervisor. The setup app includes the needed dlls for the application to make the proper format.Example Code (Current):
For Each strTemp In strColumns
excelRange = worksheet.Cells(1, nCounter)
excelRange.Select()
excelRange.Value2 = strTemp
excelRange.Interior.Color = System.Drawing.Color.Gray.ToArgb()
excelRange.BorderAround(Excel.XlLineStyle.xlContinuous, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexAutomatic, Type.Missing)
nCounter += 1
Next
Now, this is only example code in terms of the iteration I'm doing. Where I'm really processing the information from the database I'm iterating through a dataTable's Rows, then iterating through the items in the dataRow and doing essentially the same as above; value by value, selecting the range and putting the value in the cell, formatting the cell if it's part of a report (not always gray), and moving onto the next set of data. What I'd like to do is put all of the data in the excel sheet (A2:??, not a row, but multiple rows) then iterate through the reports and format each row then. That way, the only time I iterate through all of the records is when every record is part of a report.
Ideal Code
excelRange = worksheet.Cells("A2", "P9000")
excelRange.DataSource = ds 'ds would be a queried dataSet, and I know there is no excelRange.DataSource.
'Iteration code to format cells
Update:
I know my examples were in VB, but it's because I was also trying to write a VB version of the application since my boss prefers VB. However, here's my final code using a Recordset. The ConvertToRecordset function was obtained from here.
private void CreatePartSheet(Excel.Worksheet excelWorksheet)
{
_dataFactory.RevertDatabase();
excelWorksheet.Name = "Part Sheet";
string[] strColumns = Constants.strExcelPartHeaders;
CreateSheetHeader(excelWorksheet, strColumns);
System.Drawing.Color clrPink = System.Drawing.Color.FromArgb(203, 192, 255);
System.Drawing.Color clrGreen = System.Drawing.Color.FromArgb(100, 225, 137);
string[] strValuesAndTitles = {/*...Column Names...*/};
List<string> lstColumns = strValuesAndTitles.ToList<string>();
System.Data.DataSet ds = _dataFactory.GetDataSet(Queries.strExport);
ADODB.Recordset rs = ConvertToRecordset(ds.Tables[0]);
excelRange = excelWorksheet.get_Range("A2", "ZZ" + rs.RecordCount.ToString());
excelRange.Cells.CopyFromRecordset(rs, rs.RecordCount, rs.Fields.Count);
int nFieldCount = rs.Fields.Count;
for (int nCounter = 0; nCounter < rs.RecordCount; nCounter++)
{
int nRowCounter = nCounter + 2;
List<ReportRecord> rrPartReports = _lstReports.FindAll(rr => rr.PartID == nCounter).ToList<ReportRecord>();
excelRange = (Excel.Range)excelWorksheet.get_Range("A" + nRowCounter.ToString(), "K" + nRowCounter.ToString());
excelRange.Select();
excelRange.NumberFormat = "#";
if (rrPartReports.Count > 0)
{
excelRange.Interior.Color = System.Drawing.Color.FromArgb(230, 216, 173).ToArgb(); //Light Blue
foreach (ReportRecord rr in rrPartReports)
{
if (lstColumns.Contains(rr.Title))
{
excelRange = (Excel.Range)excelWorksheet.Cells[nRowCounter, lstColumns.IndexOf(rr.Title) + 1];
excelRange.Interior.Color = rr.Description.ToUpper().Contains("TAG") ? clrGreen.ToArgb() : clrPink.ToArgb();
if (rr.Description.ToUpper().Contains("TAG"))
{
rs.Find("PART_ID=" + (nCounter + 1).ToString(), 0, ADODB.SearchDirectionEnum.adSearchForward, "");
excelRange.AddComment(Environment.UserName + ": " + _dataFactory.GetTaggedPartPrevValue(rs.Fields["POSITION"].Value.ToString(), rr.Title));
}
}
}
}
if (nRowCounter++ % 500 == 0)
{
progress.ProgressComplete = ((double)nRowCounter / (double)rs.RecordCount) * (double)100;
Notify();
}
}
rs.Close();
excelWorksheet.Columns.AutoFit();
progress.Message = "Done Exporting to Excel";
Notify();
_dataFactory.RestoreDatabase();
}
Can you use ODBC?
''http://www.ch-werner.de/sqliteodbc/
dbName = "c:\docs\test"
scn = "DRIVER=SQLite3 ODBC Driver;Database=" & dbName _
& ";LongNames=0;Timeout=1000;NoTXN=0;SyncPragma=NORMAL;StepAPI=0;"
Set cn = CreateObject("ADODB.Connection")
cn.Open scn
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select * from test", cn
Worksheets("Sheet3").Cells(2, 1).CopyFromRecordset rs
BTW, Excel is quite happy with HTML and internal style sheets.
I have used the Excel XML file format in the past to write directly to an output file or stream. It may not be appropriate for your application, but writing XML is much faster and bypasses the overhead of interacting with the Excel Application. Check out this Introduction to Excel XML post.
Update:
There are also a number of libraries (free and commercial) which can make creating excel document easier for example excellibrary which doesn't support the new format yet. There are others mentioned in the answers to Create Excel (.XLS and .XLSX) file from C#
Excel has the facility to write all the data from a ADO or DAO recordset in a single operation using the CopyFromRecordset method.
Code snippet:
Sheets("Sheet1").Range("A1").CopyFromRecordset rst
I'd normally recommend using Excel to pull in the data from SQLite. Use Excel's "Other Data Sources". You could then choose your OLE DB provider, use a connection string, what-have-you.
It sounds, however, that the real value of your code is the formatting of the cells, rather than the transfer of data.
Perhaps refactor the process to:
have Excel import the data
use your code to open the Excel spreadsheet, and apply formatting
I'm not sure if that is an appropriate set of processes for you, but perhaps something to consider?
Try this out:
http://office.microsoft.com/en-au/excel-help/use-microsoft-query-to-retrieve-external-data-HA010099664.aspx
Perhaps post some code, and we might be able to track down any issues.
I'd consider this chain of events:
query the SQLite database for your dataset.
move the data out of ADO.NET objects, and into POCO objects. Stop using DataTables/Rows.
use For Each to insert into Excel.

Resources