I have an Excel file that pulls data from multiple CSV files via the "Connections" menu. The problem I'm running into is I need to be able to change the path to the CSV files from within VBA.
After repeated Binging (that's almost a bad word) I came across some solutions, but they involve an SQL connection rather than a text connection. Since the files are CSV, Excel makes it a text connection, and thus there is no ODBC connection string to modify (I get an error when trying to modify it from VBA). I've also dug through the MSDN docs to no avail.
Does anyone know of a way to change a "Text" connection path in Excel, from within VBA?
Also, since I'm on the topic, is it possible to have relative paths to files as opposed to the full file path (such as "\data\some_report.csv" rather than "c:\somedir\data\some_report.csv")?
As you mentioned...
I came across some solutions, but they involve an SQL connection rather than a text connection.
So use .TextConnection.Connection instead of .ODBCConnection.Connection :)
Here is a quick example. Please amend it as applicable.
Sub Sample()
Dim Conn As Variant
Dim ConString As String
Dim oldPath As String, NewPath As String
NewPath = "C:\MyPath.Csv"
Set Conn = ActiveWorkbook.Connections.Item(1)
Debug.Print Conn.TextConnection.Connection
'==> TEXT;C:\Users\Siddharth\Desktop\Delete Later\Output.csv
ConString = Conn.TextConnection.Connection
oldPath = Split(ConString, ";")(1)
ConString = Replace(ConString, oldPath, NewPath)
Conn.TextConnection.Connection = ConString
End Sub
Related
I am currently working on a project that will import data from multiple different sources in a variety of formats and structures - e.g., CSV, fixed-length, other-delimited (tab, pipe, etc.) plain-text, and Excel worksheets/workbooks. For this, I'm attempting to build "generic" readers for these files which will throw the files' contents into a DataTable/DataSet I can use in other methods. The plain-text files are pretty simple as I've created a large SCHEMA.INI file which contains field definitions for each of the files the system will handle. That SCHEMA.INI resides in a "processing folder" where the files are temporarily stored until their data has been integrated with other systems. A defined text files' data can be easily extracted using this method:
Private Function TextFileToDataTable(ByVal TextFile As IO.FileInfo) As DataTable
Dim TextFileData As New DataTable("TextFileData")
Using TapeFileConnect As New OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;Data Source='" + TextFile.DirectoryName + "';Extended Properties='Text';")
Using TapeAdapter As New OleDb.OleDbDataAdapter(String.Format("SELECT * FROM {0};", TextFile.Name), TapeFileConnect)
Try
TapeAdapter.Fill(TextFileData)
Catch ex As Exception
TextFileData = Nothing
End Try
End Using
End Using
Return TextFileData
End Function
This works well because a plain-text file isn't terribly complex in its data structure. A single file generally (at least for my requirements) contains, at most, one single table's worth of data - unless, of course, it's some sort of complex XML or JSON structure file, which can/should be handled completely differently anyway - so there's no need to go iterating through different elements beyond this.
NOTE: The code above is dependent on the SCHEMA.INI file being present in the same directory as the plain-text file being read and there being a section within that SCHEMA.INI defined with the same name as that plain-text file.
EXAMPLE:
[EXAMPLE_TEXT_FILE.TXT]
CharacterSet=ANSI
Format=FixedLength
ColNameHeader=FALSE
DateTimeFormat="YYYYMMDD"
COL1=CUSTOMER_NUMBER TEXT WIDTH 20
COL2=CUSTOMER_FIRSTNAME TEXT WIDTH 30
COL3=CUSTOMER_LASTNAME TEXT WIDTH 40
COL4=CUSTOMER_ADDR1 TEXT WIDTH 40
COL5=CUSTOMER_ADDR2 TEXT WIDTH 40
COL6=CUSTOMER_ADDR3 TEXT WIDTH 40
...
Excel workbooks, however, can be a bit trickier. Several of the workbooks I have to process contain multiple worksheets worth of data that I want to consolidate into a single DataSet with a DataTable for each worksheet. The basic functionality is, again, fairly straightforward and I've come up with the following method to read any and all sheets into a DataSet:
Private Function ExcelFileToDataSet(ByVal ExcelFile As IO.FileInfo, ByVal HasHeaderRow As Boolean) As DataSet
Dim ExcelFileData As New DataSet("ExcelFileData")
Dim ExcelConnectionString As String = String.Empty
Dim UseHeaders As String = "NO"
Select Case ExcelFile.Extension.ToUpper.Trim
Case ".XLS"
ExcelConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
Case ".XLSX"
ExcelConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1}'"
End Select
If HasHeaderRow Then
UseHeaders = "YES"
End If
ExcelConnectionString = String.Format(ExcelConnectionString, ExcelFile.FullName, UseHeaders)
Try
Using ExcelConnection As New OleDb.OleDbConnection(ExcelConnectionString)
Dim ExcelSchema As New DataTable
ExcelConnection.Open()
ExcelSchema = ExcelConnection.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, Nothing)
For Each ExcelSheet As DataRow In ExcelSchema.Rows
Dim SheetTable As New DataTable
Using ExcelAdapter As New OleDb.OleDbDataAdapter
Dim SheetName As String = ExcelSheet("TABLE_NAME").ToString
Dim ExcelCommand As New OleDb.OleDbCommand
SheetTable.TableName = SheetName.Substring(0, SheetName.Length - 1)
ExcelCommand.Connection = ExcelConnection
ExcelCommand.CommandText = String.Format("SELECT * FROM [{0}]", SheetName)
ExcelAdapter.SelectCommand = ExcelCommand
ExcelAdapter.Fill(SheetTable)
End Using
ExcelFileData.Tables.Add(SheetTable)
Next ExcelSheet
End Using
Catch ex As Exception
ExcelFileData = Nothing
End Try
Return ExcelFileData
End Function
The above code will work in a majority of the cases I deal with, but my "difficulty" is that there may be some worksheets that have header rows and some that don't within the same workbook. Also, for those worksheets that do not have a header row, I'd like to be able to define the field names and data types similar to how I can with the plain-text SCHEMA.INI. The only thing I have going for me in these cases is that the "client" provides me with a data map to help me identify what data elements are in each field.
What I'd like to know is if there is a way similar to the text file's SCHEMA.INI to define the structure of an Excel workbook and the worksheet(s) it contains - including column data types to avoid the OleDb driver from "misinterpreting" a column's data - ahead of time. I imagine this could be any sort of structured file such as INI, XML, or whatever, but it would need to be capable of identifying whether or not a particular sheet contains a header row or, in lieu of such a row, the (expected) column definitions. Does any such "standard definition" file exist for Excel workbooks?
One thing to note: As you may have noticed in the code for the ExcelFileToDataSet() method, I may be dealing with the older .XLS (97-03) format or the .XLSX (07+) format, so I can't necessarily rely on the workbook being Open XML compliant. I suppose I could try breaking the methods out to one for each extension, but I'd rather find something that I can use regardless of which file format the Excel file is using.
I have a worksheet DATA with the table populated from json file through the Microsoft Query.
There're different json files so I need to create several connections to any of those files.
I also have a cell on another worksheet where I would like to indicate a parameter (for example Yesterday,Today,Tomorrow).
According to selected parameter the table in the DATA worksheet should be populated from the related data connection (yesterday.json, today.json, tomorrow.json).
Is it possible to do it? If yes, what would be the procedure?
I have an idea that it might be possible to do by changing the filename inside the query.
For example, this is my query:
let
FilePath = Excel.CurrentWorkbook(){[Name="FilePath"]}[Content]{0}[Column1],
FullPathToFile1 = FilePath & "\json\today.json",
Source = Json.Document(File.Contents(FullPathToFile1)),
So am thinking if there's some way to "inject" filename in the above query based on value of some cell.
Will appreciate any help, links etc.
Thanks!
UPDATE:
I have created a named cell jsonPath and put the file name in it.
Then I have modified above query as follows, but it gives me an error.
FilePath = Excel.CurrentWorkbook(){[Name="FilePath"]}[Content]{0}[Column1],
FullPathToFile1 = FilePath & "\json\" & [jsonPath],
Source = Json.Document(File.Contents(FullPathToFile1)),
I got it working by modifying my query as follows:
FilePath = Excel.CurrentWorkbook(){[Name="FilePath"]}[Content]{0}[Column1],
FileName = Excel.CurrentWorkbook(){[Name="jsonPath"]}[Content]{0}[Column1],
FullPathToFile1 = FilePath & "\json\" & FileName,
Source = Json.Document(File.Contents(FullPathToFile1)),
my problem: I have many text files that I want to rename.
I have been using the ADODB.Stream object to open/read/write the files because they are encoded in UTF-8. So now, if possible, I want to rename the files without the workaround of copying their content, writing their content into a new file with the desired name and deleting the old one. The time stamp on the documents is a valuable information for me, which is why I do not want to create new files.
here is my current workaround that creates new files and deletes the old ones.
Issues with the code:
1) Copied files have new time stamps
2) New Lines don't get copied into the new lines. As they contain some kind of XML code, the generated files become hard to read. I would need to write a piece of code that sets new lines on all appropiate positions after copying.
Sub renameModules()
Dim currentTXT As Variant, newTxt As Variant
Dim currentPath As String, newPath As String
Dim currentContent As String
currentPath = "C:\Users\Me\Desktop\Test\MyCurrent.txt"
newPath = "C:\Users\Me\Desktop\Test\Target001.txt"
Set currentTXT = CreateObject("ADODB.Stream")
currentTXT.Charset = "utf-8"
currentTXT.Open
currentTXT.LoadFromFile (currentPath)
currentContent = currentTXT.ReadText()
Set newTxt = CreateObject("ADODB.Stream")
newTxt.Charset = "utf-8"
newTxt.Open
newTxt.WriteText currentContent
newTxt.savetofile newPath, 1
Kill currentPath
End Sub
For simplicity I have only included the essential steps and omitted all error handling.
My goal: Finding some method to simply rename the current file without fiddling with its content.
No need to deal with ADODB.Stream: You can use the VBA command name not only to rename a file but also to move it to a different folder:
Name currentPath As newPath
I'm trying to expose the column names from excel in a vb.net application. The code looks like this.
Dim EXCEL_CONNECTION_TEMPLATE As String =
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR=YES;'"
Using con As OleDbConnection = New OleDbConnection(String.Format(EXCEL_CONNECTION_TEMPLATE, savePath))
con.Open()
Dim schema As DataTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, Nothing)
DoStuffWith(schema)
End Using
After execution i get an exception throwed by GetOleDbSchemaTable:
The Microsoft Jet database engine could not find the object
''sheetname $'Print_Area'. Make sure the object exists and that you
spell its name and the path name correctly.
System.Data.OleDb.OleDbException
It looks like, that GetOleDbSchemaTable has some problem with the sheetnames if they contains space and has print area defined with it on the same time.
Tried to supply parameters like:
Dim schema As DataTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, New Object() {Nothing, Nothing, "sheetname $"})
This way it didn't throwed the exception but returned with no value.
Any tip / workaround / suggestion is welcomed. And it's obviously not an option to ask the users not to include space and print area in the excel file they uploads.
Found the solution.
It seems that if any of the sheet names in the uploaded excel file contains space, the oledb wraps it between single quote. This seems to working now:
Dim schema As DataTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, New Object() {Nothing, Nothing, "'sheetname $'"})
I'm reading an XLSX (Microsoft Excel XML file) using the Excel Data Reader from http://exceldatareader.codeplex.com/ and am having a problem with missing data. Data which is in the source Excel spreadsheet is missing from the data set returned by the library.
Here's a bit more detail of what I'm doing:
Created a simple test spreadsheet in Excel with one sheet, a header row and two data rows. Save and close Excel.
Open the file and pass the stream into the CreateOpenXmlReader() method and get back an IExcelDataReader.
Call the AsDataSet() method on the IExcelDataReader and get back a DataSet.
Get the ItemArray from row 1 of table 0.
Loop through the ItemArray. Discovered there is data missing (i.e. there are System.DBNull members where I expected System.string members).
Here's a bit more analysis...
I debugged the code and looked inside the ExcelDataReader object model. Found a non-public string array called "SST" which appears to contain the data from the spreadsheet as a single linear (one-dimensional) array.
On closer inspection, I found that the data I was looking for was also missing from this array. In this raw data, the member does not exist at all.
My guess is that for some reason the parser is not picking up the data from the OOXML and concluding that the cell is empty. Looking at the OOXML itself, the data seems to be split across the sharedStrings.xml and sheet1.xml files, so perhaps the parser is having a tough time putting all this together.
Saving the file in binary format (Excel 97 to 2003) and reading that in solves the problem so on the surface that seems to confirm my suspicion is with reading the OOXML format.
Suggestions?
As a stop gap I'm converting all files to binary format, but that seems like a kludge. Is there some way to get my OOXML formatted Excel files to read in properly with Excel Data Reader?
To retrieve an Excel spreadsheet (.xlsx) and load it into a DataSet, you don't need to mess with XML readers or a separate library like Excel Data Reader. The code for reading an entire spreadsheet into a DataSet is pretty simple when using the normal OleDb functions in .NET:
Sub readInMyExcelFile
Dim xlsFile as string = "myexcelfile"
Dim conStr As String = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & xlsFile & ";Extended Properties=""Excel 12.0 Xml;HDR=YES"""
Dim dtSheets As New DataTable
Dim tmp As String
Dim sqlText as Sting
Using cn As New OleDbConnection(conStr)
cn.Open()
dtSheets = cn.GetSchema("Tables")
End Using
//Dataset for the spreadsheet
Dim ds as New DataSet
/Loop through the names of all the worksheets in the file.
For Each rw as DataRow in dtSheets.Rows
tmp = rw("TABLE_NAME")
tmp = "[" & tmp & "]"
Dim dt as New DataTable
Using cn as New OleDbConnection(conStr)
cn.Open
/Retrieve all the records from the worksheet.
sqlText = "SELECT * FROM " & tblName
Dim adp As New OleDbDataAdapter(sqlText, cn)
/Fill the data table with the all the data.
adp.Fill(dt)
End Using
ds.Tables.Add(dt)
Next
End Sub
It seems there is a bug in Excel Data Reader (it is first time I have heard about it). Do you have to use it? If not, EPPlus would be a better choice.
excel datareader from codeplex is used for reading data from the excel file directly on web application without any sort of caching on the server.the above code only stands when we can store the excel file somewhere.I have faced similar problems with exceldatareader where some of the data are missing.Most importanly i coludnt find any specific trend.All i cud see that if all the rows have values then there is no problem. Best chance is to convert xlsx to xls.