Read the content of a Word document via its XML - excel

Context
I am trying to build a Word document browser in Excel to sift trough a large amount of documents (around 1000).
The process of opening a word document proves to be rather slow (around 4 seconds per documents, so in this case it takes 2 hour to look through all the items, which is far too slow for a single query), even by disabling all things that could slow down the opening, hence I open:
As read only
Without the open and repair mode (which can happen on some documents)
Disabling the display of the document
My attempt so far
These documents are tricky to look through because some keywords do appear every single time but not in the same context (not the core of the problem here since I can handle that when the text is loaded in arrays). Hence the often used Windows explorer solution (like in this link ) cannot be used in my case.
For the moment, I managed to have a working macro that analyze the content of the word documents by opening them.
Code
Here is a sample of the code.
Note that I used the Microsoft Word 14.0 Object Library reference
' Analyzing all the word document within the same folder '
Sub extractFile()
Dim i As Long, j As Long
Dim sAnalyzedDoc As String, sLibName As String
Dim aOut()
Dim oWordApp As Word.Application
Dim oDoc As Word.Document
Set oWordApp = CreateObject("Word.Application")
sLibName = ThisWorkbook.Path & "\"
sAnalyzedDoc = Dir(sLibName)
sKeyword = "example of a word"
With Application
.DisplayAlerts = False
.ScreenUpdating = False
End With
ReDim aOut(2, 2)
aOut(1, 1) = "Document name"
aOut(2, 1) = "Text"
While (sAnalyzedDoc <> "")
' Analyzing documents only with the .doc and .docx extension '
If Not InStr(sAnalyzedDoc, ".doc") = 0 Then
' Opening the document as mentionned above, in read only mode, without repair and invisible '
Set oDoc = Word.Documents.Open(sLibName & "\" & sAnalyzedDoc, ReadOnly:=True, OpenAndRepair:=False, Visible:=False)
With oDoc
For i = 1 To .Sentences.Count
' Searching for the keyword within the document '
If Not InStr(LCase(.Sentences.Item(i)), LCase(sKeyword)) = 0 Then
If Not IsEmpty(aOut(1, 2)) Then
ReDim Preserve aOut(2, UBound(aOut, 2) + 1)
End If
aOut(1, UBound(aOut, 2)) = sAnalyzedDoc
aOut(2, UBound(aOut, 2)) = .Sentences.Item(i)
GoTo closingDoc ' A dubious programming choice but that works for the moment '
End If
Next i
closingDoc:
' Intending to make the closing faster by not saving the document '
.Close SaveChanges:=False
End With
End If
'Moving on to the next document '
sAnalyzedDoc = Dir
Wend
exitSub:
With Output
.Range(.Cells(1, 1), .Cells(UBound(aOut, 1), UBound(aOut, 2))) = aOut
End With
With Application
.DisplayAlerts = True
.ScreenUpdating = True
End With
End Sub
My question
The idea I thought was to go via the XML content within the document to access directly to its content (which you can access when renaming any document in newer versions of Word, with a .zip extension and going for nameOfDocument.zip\word\document.xml).
It would be a lot faster than loading all the images, charts and tables of the word document which are of no use in a text search.
Thus, I wanted to ask if there was a way in VBA to open a word document like a zip file and access that XML document to then process it like a normal string of characters in VBA, since I already have the path and the name of the file given the above code.

Do note that this is not an easy answer to the above problem and the sole VBA code in my initial question will do perfectly the job as long as you do not have a load of documents to browse, else go for another tool (there is a Python Dynamic Link Library (DLL) that does that very well).
Ok, I'll try to make my answer as explanatory as possible.
First of all this question lead me to the infinite journey of XML in C# and in XPath which I chose not to pursue at some point.
It reduced the time of analyzing the files from roughly 2 hours to 10 seconds.
Context
The backbone of reading XML documents, and therefore inner word XML documents, is the OpenXML library from Microsoft.
Keep in mind what I said above, that the method I was trying to implement cannot be done solely in VBA and thus must be done in another way.
This is probably due to the fact that VBA is implemented for Office and thus limited in accessing the core structure of Office documents, but I have no information relating to this limitation (any information is welcomed).
The answer I will give here is writing a C# DLL for VBA.
For writing DLL in C# and referencing to it in VBA I redirect you toward the following link which will resume in a better way this specific process: Tutorial for creating DLL in C#
Let's start
First of all you will need to reference the WindowsBase library and the DocumentFormat.OpenXML in your project to make the solution work as explained in this MSDN article Manipulate Office Open XML Formats Documents and that one Open and add text to a word processing document (Open XML SDK)
These articles explain broadly how works the OpenXML library for manipulating word documents.
The C# code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.IO.Packaging;
namespace BrowserClass
{
public class SpecificDirectory
{
public string[,] LookUpWord(string nameKeyword, string nameStopword, string nameDirectory)
{
string sKeyWord = nameKeyword;
string sStopWord = nameStopword;
string sDirectory = nameDirectory;
sStopWord = sStopWord.ToLower();
sKeyWord = sKeyWord.ToLower();
string sDocPath = Path.GetDirectoryName(sDirectory);
// Looking for all the documents with the .docx extension
string[] sDocName = Directory.GetFiles(sDocPath, "*.docx", SearchOption.AllDirectories);
string[] sDocumentList = new string[1];
string[] sDocumentText = new string[1];
// Cycling the documents retrieved in the folder
for (int i = 0; i < sDocName.Count(); i++)
{
string docWord = sDocName[i];
// Opening the documents as read only, no need to edit them
Package officePackage = Package.Open(docWord, FileMode.Open, FileAccess.Read);
const String officeDocRelType = #"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
PackagePart corePart = null;
Uri documentUri = null;
// We are extracting the part with the document content within the files
foreach (PackageRelationship relationship in officePackage.GetRelationshipsByType(officeDocRelType))
{
documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), relationship.TargetUri);
corePart = officePackage.GetPart(documentUri);
break;
}
// Here enter the proper code
if (corePart != null)
{
string cpPropertiesSchema = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties";
string dcPropertiesSchema = "http://purl.org/dc/elements/1.1/";
string dcTermsPropertiesSchema = "http://purl.org/dc/terms/";
// Construction of a namespace manager to handle the different parts of the xml files
NameTable nt = new NameTable();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
nsmgr.AddNamespace("dc", dcPropertiesSchema);
nsmgr.AddNamespace("cp", cpPropertiesSchema);
nsmgr.AddNamespace("dcterms", dcTermsPropertiesSchema);
// Loading the xml document's text
XmlDocument doc = new XmlDocument(nt);
doc.Load(corePart.GetStream());
// I chose to directly load the inner text because I could not parse the way I wanted the document, but it works so far
string docInnerText = doc.DocumentElement.InnerText;
docInnerText = docInnerText.Replace("\\* MERGEFORMAT", ".");
docInnerText = docInnerText.Replace("DOCPROPERTY ", "");
docInnerText = docInnerText.Replace("Glossary.", "");
try
{
Int32 iPosKeyword = docInnerText.ToLower().IndexOf(sKeyWord);
Int32 iPosStopWord = docInnerText.ToLower().IndexOf(sStopWord);
if (iPosStopWord == -1)
{
iPosStopWord = docInnerText.Length;
}
if (iPosKeyword != -1 && iPosKeyword <= iPosStopWord)
{
// Redimensions the array if there was already a document loaded
if (sDocumentList[0] != null)
{
Array.Resize(ref sDocumentList, sDocumentList.Length + 1);
Array.Resize(ref sDocumentText, sDocumentText.Length + 1);
}
sDocumentList[sDocumentList.Length - 1] = docWord.Substring(sDocPath.Length, docWord.Length - sDocPath.Length);
// Taking the small context around the keyword
sDocumentText[sDocumentText.Length - 1] = ("(...) " + docInnerText.Substring(iPosKeyword, sKeyWord.Length + 60) + " (...)");
}
}
catch (ArgumentOutOfRangeException)
{
Console.WriteLine("Error reading inner text.");
}
}
// Closing the package to enable opening a document right after
officePackage.Close();
}
if (sDocumentList[0] != null)
{
// Preparing the array for output
string[,] sFinalArray = new string[sDocumentList.Length, 2];
for (int i = 0; i < sDocumentList.Length; i++)
{
sFinalArray[i, 0] = sDocumentList[i].Replace("\\", "");
sFinalArray[i, 1] = sDocumentText[i];
}
return sFinalArray;
}
else
{
// Preparing the array for output
string[,] sFinalArray = new string[1, 1];
sFinalArray[0, 0] = "NO MATCH";
return sFinalArray;
}
}
}
}
The VBA code associated
Option Explicit
Const sLibname As String = "C:\pathToYourDocuments\"
Sub tester()
Dim aFiles As Variant
Dim LookUpDir As BrowserClass.SpecificDirectory
Set LookUpDir = New BrowserClass.SpecificDirectory
' The array will contain all the files which contain the "searchedPhrase" '
aFiles = LookUpDir.LookUpWord("searchedPhrase", "stopWord", sLibname)
' Add here any necessary processing if needed '
End Sub
So in the end you get a tool that can scan .docx documents much faster than in a classic open-read-close approach in VBA at the cost of more code writing.
Above all you get a simple solution for your users that just want to perform simple search, especially when there is a huge number of word documents.
Note
Parsing Word .XML files can be nightmarish in VBA as pointed out by #Mikegrann .
Thankfully OpenXML has an XML parser C# , xml parsing. get data between tags that will do the work for you in C# and take the <w:t></w:t> tags that are refering to the text of the docment. Though I found these answers so far but couldn't make them work:
Parsing a MS Word generated XML file in C# , Reading specific XML elements from XML file
So I went for the .InnerText solution I provided with my code above, to access the internal text, at the cost of having some formatting text input (like \\MERGEFORMAT).

Related

How to get Header / Footer parts from Excel Document

I'm trying to get the header / footer parts from an excel document so that I can do something with their contents, however I cannot seem to get anything from them.
I thought this would be pretty simple... Consider this code:
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filePath, true))
{
var headers = spreadsheet.GetPartsOfType<HeaderPart>().ToList();
foreach (var header in headers)
{
//do something
}
}
Even with a file that contains a header, headers will always be empty. I've tried drilling down into the workbook -> worksheets -> etc but i get nothing back. My testing excel file definitely has a header (headers are ghastly in excel!).
Annoyingly the api's for excel in openxml seem to be worse as in a docx you can get the header by calling:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
MainDocumentPart documentPart = wordDoc.MainDocumentPart;
var headerParts = wordDoc.MainDocumentPart.HeaderParts.ToList();
foreach (var headerPart in headerParts)
{
//do something
}
}
I've seen some people on google saying that I should query the worksheet's descendants (code from this link):
HeaderFooter hf = ws.Descendants<HeaderFooter>().FirstOrDefault();
if (hf != null)
{
//here you can add your code
//I just try to append here for demo
hf = new HeaderFooter();
ws.AppendChild<HeaderFooter>(hf);
}
But I cannot see any way of querying the workbook/sheet/anything with .Descendants and obviously none of the code examples on google show how they got ws 🙃.
Any ideas? Thanks
HeaderFooter, as per your second example, is the correct way to read a Header or Footer from a Spreadsheet using OpenXML. The ws in your example refers to a Worksheet.
The following is an example that reads the HeaderFooter and dumps the InnerText to the console.
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet ws = worksheetPart.Worksheet;
HeaderFooter hf = ws.Descendants<HeaderFooter>().FirstOrDefault();
if (hf != null)
{
Console.WriteLine(hf.InnerText);
}
}
I would highly recommend that you read the documentation for the HeaderFooter element as it's more complex than you might imagine. The documentation can be found in section 18.3.1.46 of the Fifth Edition of the Ecma Office Open XML Part 1 - Fundamentals And Markup Language Reference which can be found here.

Download xls file to client from Datatable

I am developing a website on VisualStudio using VB. In one section of my site I make a DataBase Query, store the result in a DataTable and display it. I give the user the option of dowloading the information, what I would like to do is to download an XLS file to the client's side with the information in the datatable without creating the xls on the server side.
I currently have the following code section to send the file to the user
Dim fileToDownload = Server.MapPath("~/docs/QuejometroVF.pdf")
Response.ContentType = "application/octet-stream"
Dim cd = New ContentDisposition()
cd.Inline = False
cd.FileName = Path.GetFileName(fileToDownload)
Response.AppendHeader("Content-Disposition", cd.ToString())
Dim fileData = System.IO.File.ReadAllBytes(fileToDownload)
Response.OutputStream.Write(fileData, 0, fileData.Length)
But it requires a path to a local file in order to send it.
First I would like to know how to create a xls file from the datatable (only in memory) and then send that object as a file to the client's computer. If it is not possible, Could you tell me how to write the xls file in my server so I can then send it using the code above? I have not really figured out how to do it yet.
I was thinking on doint it that way because I don't want to keep files in the server when I already have that information on the database and I don't pretend on keeping that file stored.
Thank you
I export data to xls file using the following code, my backend is an Oracle database and that's where I get the data:
Dim MyConnection As OracleConnection = OpenConnection(Session("USERNAME"), Session("PASSWORD"))
Dim MyDataSet As New DataSet
MyDataSet = GetExportData(MyConnection, Session("UserDataKey"), Session("CompoundKey"), Session("LastOfCompoundKey"))
'I rename the dataset's table columns to what I want in the xls file
MyDataSet.Tables!data.Columns(0).ColumnName = "IDNumber"
MyDataSet.Tables!data.Columns(1).ColumnName = "FirstName"
MyDataSet.Tables!data.Columns(2).ColumnName = "LastName"
MyDataSet.Tables!data.Columns(3).ColumnName = "Address"
MyDataSet.Tables!data.Columns(4).ColumnName = "City"
MyDataSet.Tables!data.Columns(5).ColumnName = "State"
MyDataSet.Tables!data.Columns(6).ColumnName = "ZipCode"
MyDataSet.Tables!data.Columns(7).ColumnName = "Phone_Area"
MyDataSet.Tables!data.Columns(8).ColumnName = "Phone_Prefix"
MyDataSet.Tables!data.Columns(9).ColumnName = "Phone_Suffix"
MyDataSet.Tables!data.Columns(10).ColumnName = "Email"
MyDataSet.Tables!data.Columns(11).ColumnName = "BirthDay"
Response.ClearContent()
'I create the filename I want the data to be saved to and set up the response
Response.AddHeader("content-disposition", "attachment; filename=" & Replace(Session("Key0"), " ", "-") & "-" & Session("Key1") & "-" & Replace(Replace(Trim(Session("Key2")), ".", ""), " ", "-") & ".xls")
Response.ContentType = "application/excel"
Response.Charset = ""
EnableViewState = False
Dim tw As New System.IO.StringWriter
Dim hw As New System.Web.UI.HtmlTextWriter(tw)
'Create and bind table to a datagrid
Dim dgTableForExport As New DataGrid
If MyDataSet.Tables.Count > 0 Then
If MyDataSet.Tables(0).Rows.Count > 0 Then
dgTableForExport.DataSource = MyDataSet.Tables(0) ' .DefaultView
dgTableForExport.DataBind()
'Finish building response
Dim strStyle As String = "<style>.text { mso-number-format:\#; } </style>"
For intTemp As Integer = 0 To MyDataSet.Tables(0).Rows.Count - 1
For intTemp2 As Integer = 0 To MyDataSet.Tables(0).Columns.Count - 1
dgTableForExport.Items(intTemp).Cells(intTemp2).Attributes.Add("class", "text")
Next
Next
End If
End If
dgTableForExport.RenderControl(hw)
Response.Write(style)
' Write the HTML back to the browser.
Response.Write(tw.ToString())
Response.End()
'Close, clear and dispose
MyConnection.Close()
MyConnection.Dispose()
MyConnection = Nothing
I copied and pasted this from one of my projects, it's untested and may contain error but should get you started.
You can use a MemoryStream or to write the file to Response stream using Response.Write method.
Creating an excel file from a data table is fairly easy as you can just create a GridView and bind the table to it.
Here is a code snippet that does what you need.
Public Sub DownloadExcel(outputTable as System.Data.DataTable)
Dim gv As New GridView
Dim tw As New StringWriter
Dim hw As New HtmlTextWriter(tw)
Dim sheetName As String = "OutputFilenameHere"
gv.DataSource = outputTable
gv.DataBind()
gv.RenderControl(hw)
Response.AddHeader("content-disposition", "attachment; filename=" & sheetName & ".xls")
Response.ContentType = "application/octet-stream"
Response.Charset = ""
EnableViewState = False
Response.Write(tw.ToString)
Response.End()
End Sub
There are a few issues with this method:
This doesn't output a native excel file. Instead, it outputs the HTML for a GridView that Excel will detect and notify the user that the content doesn't match the extension. However, it WILL display in Excel correctly if the user selects 'Yes' from the dialog box.
Earlier versions of Firefox and Chrome didn't like this method and instead download the file with a .html extension. I just tested it in both browsers and it worked with the most up to date versions.
Ideally, you should probably use Excel on your webserver to create native spreadsheets, but this will work if you (like me) don't have the means to do so.

How can I send a Sharepoint List item ID to an Excel file?

I have a Sharepoint (2007) list with some items in it. When I click on one of these items, it will open an Excel (2003) file with a lot of macros. I need to get the ID of this (Sharepoint) item and send it to a cell of my Excel file... Then a macro will be executed and get all the data we need for this ID.
How can I send the item's ID to my Excel file ?
Any idea ?
Thanks
I once write a DataTable into an new excel file. So you can go ahead and change the function parameter from DataTable to SPList/SPLisItem, and write to an existing file (my current implementation writes to a new Excel file everytime, I execute this function). Also, make sure you add references for the Excel (COM) objects for e.g. Microsoft Excel 12.0 Object Library etc. If you need more help let me know.
public void excelgenerate(DataSet ds)
{
Microsoft.Office.Interop.Excel.Application oAppln;
//declaring work book
Microsoft.Office.Interop.Excel.Workbook oWorkBook;
//declaring worksheet
Microsoft.Office.Interop.Excel.Worksheet oWorkSheet;
oAppln = new Microsoft.Office.Interop.Excel.Application();
oWorkBook = (Microsoft.Office.Interop.Excel.Workbook)(oAppln.Workbooks.Add(true));
Microsoft.Office.Interop.Excel.Range wRange;
foreach (DataTable table in ds.Tables)
{
oWorkSheet = (Microsoft.Office.Interop.Excel.Worksheet)(oWorkBook.Worksheets.Add(Type.Missing, Type.Missing, Type.Missing, Type.Missing));
oWorkSheet.Name = table.TableName;
oWorkSheet.Activate();
DataRow dr = table.Rows[0];
string path = dr["Path"].ToString();
if (path.Length > 0)
{
string[] mylist = path.Split('\\');
var features = Array.FindLastIndex(mylist, str => str.Equals("Features"));
string stringmine = "Type ---> " + mylist[4]
+ "/" + mylist[5]
+ " Project Name ---> " + mylist[6]
+ " Feature Name ---> " + mylist[features + 1];
oWorkSheet.Cells[1, 1] = stringmine;
Microsoft.Office.Interop.Excel.Range colrange = oWorkSheet.get_Range(oWorkSheet.Cells[1, 1], oWorkSheet.Cells[1, 8]);
colrange.Merge(true);
}
int ColumnIndex = 0;
foreach (DataColumn col in table.Columns)
{
ColumnIndex++;
oWorkSheet.Cells[2, ColumnIndex] = col.ColumnName;
wRange = (Microsoft.Office.Interop.Excel.Range)oWorkSheet.Cells[2, ColumnIndex];
wRange.Font.Bold = true;
}
int rowIndex = 1;
foreach (DataRow row in table.Rows)
{
rowIndex++;
ColumnIndex = 0;
foreach (DataColumn col in table.Columns)
{
ColumnIndex++;
oWorkSheet.Cells[rowIndex + 1, ColumnIndex] = row[col.ColumnName].ToString();
}
}
oWorkSheet.Columns.AutoFit();
oWorkSheet.Rows.AutoFit();
}
string fileName = System.Guid.NewGuid().ToString().Replace("-", "") + ".xls";
Console.WriteLine("Number of sheets written : " + oWorkBook.Worksheets.Count);
oWorkBook.SaveAs(fileName, Microsoft.Office.Interop.Excel.XlFileFormat.xlWorkbookNormal, null, null, false, false, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlShared, false, false, null, null, null);
oWorkBook.Close(null, null, null);
oAppln.Quit();
}
For executing the Macro using C# ASP.NET and SharePoint, I would recommend you using this article
Hope it will answer your question!
Unless there a reason why you cannot link the SharePoint list data directly into a worksheet and bring your macros into that spreadsheet I think the steps below will get you what you need. It seems too simple...there must be a reason this does not work for what you're trying to do. In any case, here are the steps to make this work:
1) Make sure the SharePoint list actually has an indexed column that enforces unique values. You can check this by looking at the document library settings. Look to make sure there is an index column under the columns listing. If there is not one, you can create one by selecting the "create new column" action, select your data type and make sure that you select the radio button that says "enforce unique values".
2) Export the library to excel using the "export to excel" options in the library's main page menu. This will establish a data link by default and store an excel query file at a default location on your machine that you can discover by going to the data tab and selecting "connections".
3) Copy the macro into the spreadsheet that is linked to your data source and adjust the references in your macro to extract the information you need from the SharePoint list.
Hope this helps.

Can I import INTO excel from a data source without iteration?

Currently I have an application that takes information from a SQLite database and puts it to Excel. However, I'm having to take each DataRow, iterate through each item, and put each value into it's own cell and determine highlighting. What this is causing is 20 minutes to export a 9000 record file into Excel. I'm sure it can be done quicker than that. My thoughts are that I could use a data source to fill the Excel Range and then use the column headers and row numbers to format only those rows that need to be formatted. However, when I look online, no matter what I seem to type, it always shows examples of using Excel as a database, nothing about importing into excel. Unless I'm forgetting a key word or to. Now, this function has to be done in code as it's part of a bigger application. Otherwise I would just have Excel connect to the DB and pull the information itself. Unfortunately that's not the case. Any information that could assist me in quick loading an excel sheet would be appreciated. Thanks.Additional Information:Another reason why the pulling of the information from the DB has to be done in code is that not every computer this is loaded on will have Excel on it. The person using the application may simply be told to export the data and email it to their supervisor. The setup app includes the needed dlls for the application to make the proper format.Example Code (Current):
For Each strTemp In strColumns
excelRange = worksheet.Cells(1, nCounter)
excelRange.Select()
excelRange.Value2 = strTemp
excelRange.Interior.Color = System.Drawing.Color.Gray.ToArgb()
excelRange.BorderAround(Excel.XlLineStyle.xlContinuous, Excel.XlBorderWeight.xlThin, Excel.XlColorIndex.xlColorIndexAutomatic, Type.Missing)
nCounter += 1
Next
Now, this is only example code in terms of the iteration I'm doing. Where I'm really processing the information from the database I'm iterating through a dataTable's Rows, then iterating through the items in the dataRow and doing essentially the same as above; value by value, selecting the range and putting the value in the cell, formatting the cell if it's part of a report (not always gray), and moving onto the next set of data. What I'd like to do is put all of the data in the excel sheet (A2:??, not a row, but multiple rows) then iterate through the reports and format each row then. That way, the only time I iterate through all of the records is when every record is part of a report.
Ideal Code
excelRange = worksheet.Cells("A2", "P9000")
excelRange.DataSource = ds 'ds would be a queried dataSet, and I know there is no excelRange.DataSource.
'Iteration code to format cells
Update:
I know my examples were in VB, but it's because I was also trying to write a VB version of the application since my boss prefers VB. However, here's my final code using a Recordset. The ConvertToRecordset function was obtained from here.
private void CreatePartSheet(Excel.Worksheet excelWorksheet)
{
_dataFactory.RevertDatabase();
excelWorksheet.Name = "Part Sheet";
string[] strColumns = Constants.strExcelPartHeaders;
CreateSheetHeader(excelWorksheet, strColumns);
System.Drawing.Color clrPink = System.Drawing.Color.FromArgb(203, 192, 255);
System.Drawing.Color clrGreen = System.Drawing.Color.FromArgb(100, 225, 137);
string[] strValuesAndTitles = {/*...Column Names...*/};
List<string> lstColumns = strValuesAndTitles.ToList<string>();
System.Data.DataSet ds = _dataFactory.GetDataSet(Queries.strExport);
ADODB.Recordset rs = ConvertToRecordset(ds.Tables[0]);
excelRange = excelWorksheet.get_Range("A2", "ZZ" + rs.RecordCount.ToString());
excelRange.Cells.CopyFromRecordset(rs, rs.RecordCount, rs.Fields.Count);
int nFieldCount = rs.Fields.Count;
for (int nCounter = 0; nCounter < rs.RecordCount; nCounter++)
{
int nRowCounter = nCounter + 2;
List<ReportRecord> rrPartReports = _lstReports.FindAll(rr => rr.PartID == nCounter).ToList<ReportRecord>();
excelRange = (Excel.Range)excelWorksheet.get_Range("A" + nRowCounter.ToString(), "K" + nRowCounter.ToString());
excelRange.Select();
excelRange.NumberFormat = "#";
if (rrPartReports.Count > 0)
{
excelRange.Interior.Color = System.Drawing.Color.FromArgb(230, 216, 173).ToArgb(); //Light Blue
foreach (ReportRecord rr in rrPartReports)
{
if (lstColumns.Contains(rr.Title))
{
excelRange = (Excel.Range)excelWorksheet.Cells[nRowCounter, lstColumns.IndexOf(rr.Title) + 1];
excelRange.Interior.Color = rr.Description.ToUpper().Contains("TAG") ? clrGreen.ToArgb() : clrPink.ToArgb();
if (rr.Description.ToUpper().Contains("TAG"))
{
rs.Find("PART_ID=" + (nCounter + 1).ToString(), 0, ADODB.SearchDirectionEnum.adSearchForward, "");
excelRange.AddComment(Environment.UserName + ": " + _dataFactory.GetTaggedPartPrevValue(rs.Fields["POSITION"].Value.ToString(), rr.Title));
}
}
}
}
if (nRowCounter++ % 500 == 0)
{
progress.ProgressComplete = ((double)nRowCounter / (double)rs.RecordCount) * (double)100;
Notify();
}
}
rs.Close();
excelWorksheet.Columns.AutoFit();
progress.Message = "Done Exporting to Excel";
Notify();
_dataFactory.RestoreDatabase();
}
Can you use ODBC?
''http://www.ch-werner.de/sqliteodbc/
dbName = "c:\docs\test"
scn = "DRIVER=SQLite3 ODBC Driver;Database=" & dbName _
& ";LongNames=0;Timeout=1000;NoTXN=0;SyncPragma=NORMAL;StepAPI=0;"
Set cn = CreateObject("ADODB.Connection")
cn.Open scn
Set rs = CreateObject("ADODB.Recordset")
rs.Open "select * from test", cn
Worksheets("Sheet3").Cells(2, 1).CopyFromRecordset rs
BTW, Excel is quite happy with HTML and internal style sheets.
I have used the Excel XML file format in the past to write directly to an output file or stream. It may not be appropriate for your application, but writing XML is much faster and bypasses the overhead of interacting with the Excel Application. Check out this Introduction to Excel XML post.
Update:
There are also a number of libraries (free and commercial) which can make creating excel document easier for example excellibrary which doesn't support the new format yet. There are others mentioned in the answers to Create Excel (.XLS and .XLSX) file from C#
Excel has the facility to write all the data from a ADO or DAO recordset in a single operation using the CopyFromRecordset method.
Code snippet:
Sheets("Sheet1").Range("A1").CopyFromRecordset rst
I'd normally recommend using Excel to pull in the data from SQLite. Use Excel's "Other Data Sources". You could then choose your OLE DB provider, use a connection string, what-have-you.
It sounds, however, that the real value of your code is the formatting of the cells, rather than the transfer of data.
Perhaps refactor the process to:
have Excel import the data
use your code to open the Excel spreadsheet, and apply formatting
I'm not sure if that is an appropriate set of processes for you, but perhaps something to consider?
Try this out:
http://office.microsoft.com/en-au/excel-help/use-microsoft-query-to-retrieve-external-data-HA010099664.aspx
Perhaps post some code, and we might be able to track down any issues.
I'd consider this chain of events:
query the SQLite database for your dataset.
move the data out of ADO.NET objects, and into POCO objects. Stop using DataTables/Rows.
use For Each to insert into Excel.

How to get a value out of an Excel workbook stored in a SharePoint document library?

I have some data that's currently stored in an Excel workbook. It makes sense for the data to be in Excel (in that it's easy to manage, easy to extend, do calcs, etc.) but some of the data there is required by an automated process, so from that point of view it would be more convenient if it were in a database.
To give the information more visibility, workflow, etc. I'm thinking of moving it to SharePoint. Actually turning it into a SharePoint form would be tedious & time-consuming, and then the flexibility/convenience would be lost; instead, I'm thinking of simply storing the current Excel file within a SharePoint library.
My problem then would be: how can the automated process extract the values it needs from the Excel workbook that now lives within the SharePoint library? Is this something that Excel Services can be used for? Or is there another/better way? And even if it can be done, is it a sensible thing to do?
Having gone through something similar, I can tell you it actually isn't that bad getting values out of an Excel file in a document library. I ended up writing a custom workflow action (used within a SharePoint Designer workflow) that reads values out of the Excel file for processing. I ended up choosing NPOI to handle all of the Excel operations.
Using NPOI, you can do something like this:
// get the document in the document library
SPList myList = web.Lists[listGuid];
SPListItem myItem = myList.GetItemById(ListItem);
SPFile file = myItem.File;
using (Stream stream = file.OpenBinaryStream())
{
HSSFWorkbook workbook = new HSSFWorkbook(stream);
HSSFSheet sheet = workbook.GetSheet("Sheet1");
CellReference c = new CellReference("A1");
HSSFRow row = sheet.GetRow(c.Row);
HSSFCell cell = row.GetCell(c.Col);
string cellValue = cell.StringCellValue;
// etc...
}
You could easily put this in a console application as well.
Yes, I am trying to extract a range of cells on several sheets within a workbook. I was able to use some of the code below in a console application and view the data within the command window. Now I need to dump the data to a SQL Table and was looking for some examples on how to accomplish this and make sure I am going down the correct coding path.
Here is a snapshot of the code I am using.
protected override ActivityExecutionStatus Execute(ActivityExecutionContext executionContext)
{
using (SPSite site = new SPSite(SPContext.Current.Site.Url))
{
using (SPWeb web = site.RootWeb)
{
SPList docList = web.Lists[__ListId];
SPListItem docItem = docList.GetItemById(__ListItem);
SPFile docFile = docItem.File;
using (Stream stream = docFile.OpenBinaryStream())
{
HSSFWorkbook wb = new HSSFWorkbook(stream);
//loop through each sheet in file, ignoring the first sheet
for (int i = 1; i < 0; i++)
{
NPOI.SS.UserModel.Name name = wb.GetNameAt(i);
String sheet = wb.GetSheetName(i);
NPOI.SS.UserModel.Name nameRange = wb.CreateName();
nameRange.NameName = ("DispatchCells");
//start at a specific area on the sheet
nameRange.RefersToFormula = (sheet + "!$A$11:$AZ$100");
}
wb.Write(stream);
}
}
}
return ActivityExecutionStatus.Closed;
}

Resources