importing data from many workbooks in different folders - excel

I am looking to import/copy data from many workbooks into a summary workbook. The workbooks are arranged in different sub-folders, I.e
C:\data1\results_2001.xlm
C:\data2\results_2002.xlm
C:\data3\results_2003.xlm
The names are similar but differ slightly to differentiate them. At present, I import the files individually, and I want to automate the process. The results files (above) are amongst other excel files so I cannot target them by file type.
How would I import these files by partial file name?

One option is to create an array of the filepaths to your excel sheets and then loop over the array and get the data you want into your summary sheet.
Sub CreateSummary()
Dim wkbs() As Variant, wkb As Integer, owb As Workbook
wkbs = Array("C:\data1\results_2001.xlm", "C:\data2\results_2002.xlm", "C:\data3\results_2003.xlm")
For wkb = 0 To UBound(wkbs)
Set owb = Application.Workbooks.Open(wkbs(wkb)) //Open each workbook
With owb
//Get the data you want into your summary workbook
.Close
End With
Next wkb
End Sub

Another way, especially if only a one time operation: Go into Cmd.exe, do a Dir for the files you're looking for, and send it to a text file (eg, something like dir c:\results_*.xlm /s /b > c:\myList.txt). Then import the text file to your worksheet, step thru each cell in the list, opening each workbook in turn.

You can do it in any languages, but for you who is asking this question, i think it's gonna be a little challenging, so here is what you need to do :
create a function that will list files/folders from given path
loop through all items found, if it's a folder , recursive it
if the item fits your target(name, extension, ...) , read it and load the content to the summary
something like this, i believe you will achieve this easily using VBA, look here
Literally, it will be like this, please note that this is not valid code, just something i write down to help you figure it out :
function loopthepath (string pathtoloop)
foreach(dirItem item in pathtoloop.getdirItem)
{
if (item is folder)
{
loopthepath(pathtoloop + item)
}
else
{
if (item fits mydescription)
{
load the content to the summary
}
}
}

Related

Is there an equivalent to SCHEMA.INI for reading Excel Workbooks

I am currently working on a project that will import data from multiple different sources in a variety of formats and structures - e.g., CSV, fixed-length, other-delimited (tab, pipe, etc.) plain-text, and Excel worksheets/workbooks. For this, I'm attempting to build "generic" readers for these files which will throw the files' contents into a DataTable/DataSet I can use in other methods. The plain-text files are pretty simple as I've created a large SCHEMA.INI file which contains field definitions for each of the files the system will handle. That SCHEMA.INI resides in a "processing folder" where the files are temporarily stored until their data has been integrated with other systems. A defined text files' data can be easily extracted using this method:
Private Function TextFileToDataTable(ByVal TextFile As IO.FileInfo) As DataTable
Dim TextFileData As New DataTable("TextFileData")
Using TapeFileConnect As New OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;Data Source='" + TextFile.DirectoryName + "';Extended Properties='Text';")
Using TapeAdapter As New OleDb.OleDbDataAdapter(String.Format("SELECT * FROM {0};", TextFile.Name), TapeFileConnect)
Try
TapeAdapter.Fill(TextFileData)
Catch ex As Exception
TextFileData = Nothing
End Try
End Using
End Using
Return TextFileData
End Function
This works well because a plain-text file isn't terribly complex in its data structure. A single file generally (at least for my requirements) contains, at most, one single table's worth of data - unless, of course, it's some sort of complex XML or JSON structure file, which can/should be handled completely differently anyway - so there's no need to go iterating through different elements beyond this.
NOTE: The code above is dependent on the SCHEMA.INI file being present in the same directory as the plain-text file being read and there being a section within that SCHEMA.INI defined with the same name as that plain-text file.
EXAMPLE:
[EXAMPLE_TEXT_FILE.TXT]
CharacterSet=ANSI
Format=FixedLength
ColNameHeader=FALSE
DateTimeFormat="YYYYMMDD"
COL1=CUSTOMER_NUMBER TEXT WIDTH 20
COL2=CUSTOMER_FIRSTNAME TEXT WIDTH 30
COL3=CUSTOMER_LASTNAME TEXT WIDTH 40
COL4=CUSTOMER_ADDR1 TEXT WIDTH 40
COL5=CUSTOMER_ADDR2 TEXT WIDTH 40
COL6=CUSTOMER_ADDR3 TEXT WIDTH 40
...
Excel workbooks, however, can be a bit trickier. Several of the workbooks I have to process contain multiple worksheets worth of data that I want to consolidate into a single DataSet with a DataTable for each worksheet. The basic functionality is, again, fairly straightforward and I've come up with the following method to read any and all sheets into a DataSet:
Private Function ExcelFileToDataSet(ByVal ExcelFile As IO.FileInfo, ByVal HasHeaderRow As Boolean) As DataSet
Dim ExcelFileData As New DataSet("ExcelFileData")
Dim ExcelConnectionString As String = String.Empty
Dim UseHeaders As String = "NO"
Select Case ExcelFile.Extension.ToUpper.Trim
Case ".XLS"
ExcelConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
Case ".XLSX"
ExcelConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
End Select
If HasHeaderRow Then
UseHeaders = "YES"
End If
ExcelConnectionString = String.Format(ExcelConnectionString, ExcelFile.FullName, UseHeaders)
Try
Using ExcelConnection As New OleDb.OleDbConnection(ExcelConnectionString)
Dim ExcelSchema As New DataTable
ExcelConnection.Open()
ExcelSchema = ExcelConnection.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, Nothing)
For Each ExcelSheet As DataRow In ExcelSchema.Rows
Dim SheetTable As New DataTable
Using ExcelAdapter As New OleDb.OleDbDataAdapter
Dim SheetName As String = ExcelSheet("TABLE_NAME").ToString
Dim ExcelCommand As New OleDb.OleDbCommand
SheetTable.TableName = SheetName.Substring(0, SheetName.Length - 1)
ExcelCommand.Connection = ExcelConnection
ExcelCommand.CommandText = String.Format("SELECT * FROM [{0}]", SheetName)
ExcelAdapter.SelectCommand = ExcelCommand
ExcelAdapter.Fill(SheetTable)
End Using
ExcelFileData.Tables.Add(SheetTable)
Next ExcelSheet
End Using
Catch ex As Exception
ExcelFileData = Nothing
End Try
Return ExcelFileData
End Function
The above code will work in a majority of the cases I deal with, but my "difficulty" is that there may be some worksheets that have header rows and some that don't within the same workbook. Also, for those worksheets that do not have a header row, I'd like to be able to define the field names and data types similar to how I can with the plain-text SCHEMA.INI. The only thing I have going for me in these cases is that the "client" provides me with a data map to help me identify what data elements are in each field.
What I'd like to know is if there is a way similar to the text file's SCHEMA.INI to define the structure of an Excel workbook and the worksheet(s) it contains - including column data types to avoid the OleDb driver from "misinterpreting" a column's data - ahead of time. I imagine this could be any sort of structured file such as INI, XML, or whatever, but it would need to be capable of identifying whether or not a particular sheet contains a header row or, in lieu of such a row, the (expected) column definitions. Does any such "standard definition" file exist for Excel workbooks?
One thing to note: As you may have noticed in the code for the ExcelFileToDataSet() method, I may be dealing with the older .XLS (97-03) format or the .XLSX (07+) format, so I can't necessarily rely on the workbook being Open XML compliant. I suppose I could try breaking the methods out to one for each extension, but I'd rather find something that I can use regardless of which file format the Excel file is using.

Excel 2010 Macro - Creating txt files with names from ColA and content ColB. Stacko. solutions not work

I have found some answers/examples here on stackoverflow for an issue where in Microsoft Excell 2010, I want to create a txt files for each cell from for e.g. ColumnA which would contain file names, and ColumnB which would contain what is inside certain text file, however one example doesn't work at all, and second bugs after few files created.
You can use the CreateTextFile method which will create your file and provide a TextStream object which you can use to write to the text files. Microsoft Docs
Here's a code example that will do what you asked.
Sub CreateTxt()
Dim my_range As Range
Dim pth As String
Set my_range = Selection
For Each x In my_range.Rows:
pth = "C:\excel_test\" + x.Cells(1) + ".txt" 'file name in column A
Set file_sys = CreateObject("Scripting.FileSystemObject")
Set txt_file = file_sys.CreateTextFile(pth, True)
txt_file.WriteLine (x.Cells(2)) 'content in Column B
txt_file.Close
Next x
End Sub
Just remember, in order to create a file you need adequate permissions on the path you're writing to - I had to run excel as administrator to get the functionality.
Also, the True value in the CreateTextFile method is necessary to overwrite any files with the existing file name, if set to false it will throw an error when trying to write to the file.

Determine Image in Excel Comment

I have found plenty of vba for inserting images into a comment
Selection.ShapeRange.Fill.UserPicture "C:\Temp\Pictures\ewe.jpg"
How can you determine the image already used for an comment?
I would like to extract the embedded image names if possible.
Is there not a property to access that will give me this?
In the comment Fill Effects dialog box the image name somehow seems to be accessible.
Sorry, I didn't have the reputation to just comment on your question for clarification.
I made a test file, inserted a comment and image in that comment, and then extracted the base files. I then checked them all for the original file name. I also found the embedded JPEG and decoded it to get the metadata. As you've noted, the original file names are stored in xl\drawings\vmlDrawing1.vml (once you've extracted the xml files from the excel file by appending .zip to the filename and then running an unzip utility on it). I did find the file name, but not the path or file type, so I'm fairly certain that the path and file type aren't preserved.
If just the file name is sufficient for you, then that file contains information for each drawing that you have, and those will include the cell location, although they're 0 based, so you'd have to add one to get the actual row and column. My question is two part:
1) Is the file name alone sufficient, or did you need the entire path? If you needed the entire path, I think you're out of luck, since the paths are on a different computer and you can't even search for them if you do extract the file name.
2) If that is all you need, does the solution have to be VBA? In the past, I have programmatically unzipped and manipulated the xml base files, but it's a little tricky. It's simplified by the fact that you only have to read out the data, so that's a plus. I did it in .net before, but I'm sure that if it had to be VBA it could be done, but it would be simpler if you were open to the type of solution.
Let me know, I'd be happy to help you out.
====================================================================================
Try this: make a copy of the spreadsheet, append .zip (test.xlsm.zip), and then extract the files manually. Change vmlPath to the location of your xl\drawings\vmlDrawing1.vml file. Then run this. I did make some assumptions, for instance, I assumed that the order of the nodes and attributes would always be the same and so I used hardcoded indexes (shp.attributes(0), etc) instead of using logic to make sure I had the correct node or attribute, but you seem like you know your way around VBA, so I'm just going to code a barebones. This will need a reference to Microsoft XML 6.0.
Sub vmlParse()
Dim vmlPath As String: vmlPath = "C:\Users\Lenovo\Desktop\test - Copy.xlsm\xl\drawings\vmlDrawing1.vml"
Dim this As Worksheet: Set this = ActiveSheet
Dim doc As New DOMDocument, shps As IXMLDOMNodeList
Dim shp As IXMLDOMNode, n As IXMLDOMNode, a As IXMLDOMAttribute
Dim fileName As String, productID As String
Dim rng As Range, r As Long, c As Long
doc.Load vmlPath
Set shps = doc.getElementsByTagName("x:ClientData")
For Each shp In shps
If shp.Attributes(0).nodeValue = "Note" Then
r = 0: c = 0
For Each a In shp.ParentNode.FirstChild.Attributes
If a.nodeName = "o:title" Then
fileName = a.nodeValue
Exit For
End If
Next
For Each n In shp.childNodes
If n.nodeName = "x:Row" Then r = n.text
If n.nodeName = "x:Column" Then c = n.text
Next
Set rng = this.Cells(r + 1, c + 1)
productID = rng.Value
'now you have the productID, the fileName, and the cell location
End If
Next
End Sub
Let me know how that worked out for you.
If c4 contains your comment:
Set shp = Range("C4").Comment.Shape
if shp.Fill.TextureType = msoTextureUserDefined then
end if

Excel VBA global variables "lifetime"?

Sorry about the non descriptive Title, I just didn't know how to describe my goal.
I'm new at VBA and didn't yet understand how things really work.
I've written a function which gets a directory from the user, and displays data from the first file in the directory. Now, I want to add a "next" button.
When the "next" button is pressed, my code should display data from the next file in the directory.
I tried to use global variables but they seem to get initialized each time the button is pressed.
What is the best way to achieve my goal? Do I have to use the spreadsheet as memory and write and read everything from there? Or does Excel VBA have some other "live memory" mechanism?
Thanks,
Li
Globals will not normally be reinitialized when you click a button. They will be reinitialized if you recompile your VBA project. Therefore, while debugging, you may see a global being reinitialized.
You can use the spreadsheet as memory. One way to do this is to have a worksheet whose Visibility property you set to xlSheetVeryHidden (you can do this from the VBA project). This worksheet won't be visible to users, so your VBA application can use it to store data.
This could be approached many ways, as with any problem I guess!
You could break the problem up into two subroutines:
1) Retrieve all the file names in the selected directory and display the first file's data
2) If it's not the last file, get the next file's data and display it
You could use a global variable to store the filenames and an index to remember where you are up to in the collection of filenames.
Global filenames As Collection
Global fileIndex As Integer
Public Sub GetFilenames()
Dim selectedDirectory As String
Dim currentFile As String
selectedDirectory = "selected\directory\"
currentFile = Dir$(selectedDirectory)
Set filenames = New Collection
While currentFile <> ""
filenames.Add selectedDirectory & currentFile
currentFile = Dir$()
Wend
' Make sure there were files
If filenames.Count >= 1 Then
fileIndex = 1
' Call a method to display data
DisplayData(filenames(fileIndex))
Else
' No files
End If
End Sub
Public Sub GetNextFile()
' Make sure we have a filenames object
If Not filenames Is Nothing Then
If fileIndex < filenames.Count Then
fileIndex = fileIndex + 1
' Call the display method again
DisplayData(filenames(fileIndex))
Else
' Decide what to do after reaching the final file
End If
Else
' No filenames
End If
End Sub
I didn't include the DisplayData procedure as I'm not sure what type of files you're grabbing or what you are doing with them but if it were say excel files it could be something like:
Public Function DisplayData(filename As String)
Dim displayWb As Workbook
Set displayWb = Workbooks.Open(filename)
' Do things with displayWb
End Function
You could then set the macro of the button to "GetNextFile" and it will cycle through the files after each click. As for the lifetime of global variables, they only reinitialize when the VBA project is reset or when they are specifically initialized through a procedure or the immediate window.
Perhaps these two functions can also help you:
SaveSetting
GetSetting
as showed here: http://www.j-walk.com/ss/excel/tips/tip60.htm

How to add more than 3 sheets to an excel workbook from within MATLAB

How do I add more sheets to an excel workbook from within matlab?
I set up the workbook like so (based on code I got from someone else's post in this forum):
%# create Excel COM Server
Excel = actxserver('Excel.Application');
Excel.Visible = true;
%# create new XLS file
wb = Excel.Workbooks.Add();
wsheet=1;
wb.Sheets.Item(wsheet).Activate();
That's fine. Then later on inside the loop I open a new sheet after so many loops:
...
if loop==sheetlimit,
wsheet=wsheet+1;
wb.Sheets.Item(wsheet).Activate();
end
This works up to sheet 3. But when wsheet=4 I get this error message:
??? Invoke Error, Dispatch Exception: Invalid index.
Error in ==> filename at 97
wb.Sheets.Item(wsheet).Activate();
Appreciate any help. Thanks.
I don't know Matlab but I would be surprised if wb.Sheets.Item(wsheet).Activate(); is actually adding any new worksheets. Most likely it is selecting / activating each worksheet in your wb workbook and your default Excel template has three worksheets. Hence why it errors when it gets to more than three.
Something like this might add a new Excel worksheet:
wb.sheets.Add();
Aargh - comment formatting completely messed up - I'll re-enter it as an new answer
Yes wb.sheets.Add(); will work. You can query the available methods of an interface like this:
methods(wb.sheets)
which gives:
Methods for class Interface.000208D7_0000_0000_C000_000000000046:
Add FillAcrossSheets PrintOut addproperty events loadobj set
Copy Item PrintPreview delete get release
Delete Move Select deleteproperty invoke saveobj

Resources