Using ADO to query text files - terrible performance - excel

I'm writing some code in VBA behind Excel to pull some summary numbers out of potentially huge text files (10M+ rows) out on a network drive. In the past, these numbers have been pulled using greps in linux, but I was hoping to implement something that could be done with a click of a button in Excel for ease of use.
My solution works, but it's like 25 times slower than a linux grep - takes 4 minutes to query 10M records, while the grep can do it in 10 seconds. Should I not be using ADO for this? Why is it so slow, aside from the fact that text files obviously aren't indexed? Is there a better solution that could still be coded without too much hassle in VBA, or is it a lost cause? I'm using Excel 2007 and the ADO 6.0 library. Here is some sample code:
Sub RunSQL()
Dim cn As New ADODB.Connection
Dim rs As New ADODB.Recordset
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Extended Properties=""text;HDR=YES;FMT=Delimited"";" & _
"Data Source=\\network\share\path\;"
rs.Open "select count(*) from Customers.tab where CHANGE_FLAG = 'Y'", cn
Range("A1").CopyFromRecordset rs
rs.Close
cn.Close
End Sub

Related

Using ADODB, SQL on recordset in VBA, strings are truncated [to 255]. How to resolve this in VBA?

I'm trying to import the data from CSV file using VBA. Here i'm trying retrieve the data without opening the .csv file
Connection used
con.Open ("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & short_path & ";" & _
"Extended Properties=""text; HDR=Yes;ReadOnly=true; FMT=Delimited; IMEX=1;ImportMixedTypes=Text;""")
and for that sql used
str_sql = "Select * from [" & str_filename & "]"
rst.Open str_sql, con, adOpenStatic, adLockOptimistic, adCmdText
OLEDB is trying to read the data. So after pasting data from recordset to excel, comments i.e. Long strings are getting truncated. I believe by default it is scanning first 8 rows to determine the datatype.
This is working perfectly when I have the comments/long strings which are more than 255 characters in first 8 rows.
However, the problem comes when more than 255 characters comment/long string is on 9th or '+ rows. so all later comments are truncated to 255.
I'm wandering around the web to find out the solution but got Nowhere.
Note. If I had the small size CSV file then I can do the open/copy/paste/filter/calculation very easily by using VBA to open file and then do the rest.
However, In my scenario the files size is 100+ MB. So i cant open. tried but failed because it goes into non responding mode and crashes later.
so how to achieve this in VBA and Excel.

Writing to SQL server - too slow

I have an Excel sheet (using Excel version 1902) where i use VBA and an ADODB.connection (using Microsoft ActiveX Data objects 2.8 library) to read/write to an sql server (2016).
At first both the read/write operations were very slow, taking approximately 15 seconds each. There will be multiple read/write sessions so 15 seconds is not acceptable.
The amount of data to be read/written varies but is approximately 500 rows in 15 columns in total (split between 7 tables, so the number of rows and columns varies per table). In other words, the amount of data to be transferred is not massive.
At first I thought the problems was in my VBA code (loops, searching for text etc). But by removing those steps, I narrowed the problem down to moving through the recordset (.movenext row by row to read or write to the database).
For the read operation I managed to get acceptable speeds by doing the following:
Changed the CursorType from adOpenKeyset to adOpenForwardOnly
Changed the LockType from adLockOptimistic to adLockReadOnly
This reduced read times from 15 seconds to 5 seconds, which is acceptable.
However, I have not managed to achieve any improved speeds for the write operations which are still at 15 seconds.
I first tried:
Changed the CursorType from adOpenKeyset to adOpenStatic
Changed the LockType from adLockOptimistic to adLockBatchOptimistic
And then altered the .update command to .updatebatch.
I thought maybe that updating all in a batch would speed things up, but that did nothing.
Then I tried changing the connection .open statement from:
cn.Open "Provider = sqloledb;" & _
"Data Source=datasourcename;" & _
"Initial Catalog=catalogname;" & _
"User ID=UserIdname;"
etc. to:
cn.Open "Driver={SQL Server Native Client 11.0};" & _
"Server=servername;" & _
"Database=databasename;" & _
"Uid=UserIDname;"
Again that did nothing. With the updatebatch the native sql server connection performed worse (28 seconds) than the oledb connection. But with .update (no batch update) both connections were similar in performance at about 15 seconds.
Does anyone have any tips on how to possibly speed up the write operation ?

ADODB Recordset cannot get whole multivalued field from Access

I have a problem with one project in my work. I have a database on Sharepoint. It's hooked into .accdb file (Access 2007/2010). So far, I used ADODB Connection with standard ConnectionString (only Provider - ACEDB 12.0).
When I try to get data from one of multivalued field from database the recordset is empty for this column. Example:
I have to get few columns: ID, Location, Name, People (MVF), Trainers (MVF).
When single record in People column has MORE than 3-4 values - the recordset for this column is empty. If there's less than 3-4 values i'm getting semicolon-separated values (Even a LEFT JOIN statement to get the source data of MVF doesn't make any difference)
I'm working on Excel - the End-user uses ONLY Excel.
When I watch a Recordset - it has empty values when the people's values should be placed - Basing on this I think the problem is caused by type of connection or something. I've tried also DAO connection - no positive results.
I've also tried to make a temporary database in .accdb file only to execute SQL (INSERT INTO tmpDB SELECT People FROM inputDB; -it's a pseudo-code, the syntax is good) And then I get "Cannot execute INSERT INTO for multivalued field".
I know, that the MVF is not recommended to use, but it's a SharePoint DB, and my role is only to get data from db to Excel.
Update
I tried to use the ODBC driver ...
objConn.ConnectionString = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};Dbq=" & myconn & ";Uid=Admin;Pwd=;"
... instead of the OLEDB provider ...
objConn.Provider = "Microsoft.ACE.OLEDB.12.0"
objConn.Open myconn
... but now the MVF are always empty.
I resolved this problem. Here's what i've done. The code could have syntax errors. I post here code written from my memory - it's not copy of my working code.
The main and the most important thing is the type of connection. After reserch i found that Microsoft recommends using ADO connection. As I posted earlier, the DAO requires additional looping through recordset and it could be a problem and using DAO with Connection string doesn't look better than ADO.
The best and the only way to get data from MVFs is DAO, but the connection MUST be obtained by "OpenDatabase" method - in this case there's no problems with MVFs with big number of values.
Sub ImportMVFs()
Dim dbs As DAO.Database
Dim rsRecord As DAO.Recordset
Dim rsChild As DAO.Recordset
Dim strSQL As String
Set dbs = "Path to database - works with .accdb too"
Set db = ws.OpenDatabase(dbs) 'This type of connection is a best way to import from MVF.
strSQL = "SELECT * FROM tblToImport;"
Set rsRecord = db.OpenRecordset(strSQL)
Debug.Print rsRecord.Field("Column1").Value
Debug.Print rsRecord.Field("Column2").Value
Do Until rsRecord.EOF
Set rsChild = rsRecord.Field("MultiValuedFieldColumn")
Do Until rsChild.EOF
Debug.Print rsChild.Field(0).Value 'We have to iterate through all mvfs
'Here it's possible to make a temporary table in Access to reorganize MVFs into simple records
'For example: Using SQLQuery as SQL string with Execute method.
db.Execute SQLQuery
rsChild.MoveNext
Loop
rsRecord.MoveNext
Loop
rsRecord.Close
Set rsRecord = Nothing
Set dbs = Nothing
End Sub

ADODB won't connect to my macros enabled excel file

I am tyring to export data from a Word document that has Forms. I can export the data to an excel file with the extension .xlsx, however, I get an error message (Data Type Mismatch in criteria expression) when trying to export the data to a macros enabled excel file with the extnesion .xslm.
Does anyone know why I would be getting this error message? Why does it export to .xlsx but not to .xlsm?
Here is my code:
Set cnn = New ADODB.Connection
With cnn
.Provider = "Microsoft.ACE.OLEDB.12.0"
.ConnectionString = "Data Source=C:\Users\Desktop\OG Database\OG Database.xlsm;" & _
"Extended Properties=Excel 12.0 Macro;"
.Open
.Execute strSQL
End With
I had a similar issue a couple months ago and was able to replicate the issue by defining a column as 'date' formatting in Excel and then trying to insert text.
Excel will sometimes get locked on to the first data-type that is put into it. Excel then will define the whole column as that type. When trying to manipulate that data via ADO it causes this error.
In order to ensure that Excel doesn't get 'confused', try inserting a phony row of data that is not likely to have fields that get confused, like this.
"INSERT INTO [Sheet1$](Colors, SelectedDate, Balance) VALUES('red', #6/25/1950#, 24.76)"
Of course, don't forget to delete it afterwards.

Excel-VBA code that moves Excel sheets to Microsoft access?

I was wondering if it was possible to move data from an excel sheet and store it in a Microsoft Access datbase. I have a lot of sheets of data with a similar format, and I would like a table for each of them in access. I would also like to retrieve data from the database, but i figure I should learn how to store data first. I found this code, I don't know if someone could explain how it works( Or if it is nothing like what I'm looking for)? I have read power programming in excel with vba, so I know basic vba, but not this database content(Probably more).
Public Sub DoTrans()
Set cn = CreateObject("ADODB.Connection")
dbPath = Application.ActiveWorkbook.Path & "\FDData.mdb"
dbWb = Application.ActiveWorkbook.FullName
dbWs = Application.ActiveSheet.Name
scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath
dsh = "[" & Application.ActiveSheet.Name & "$]"
cn.Open scn
ssql = "INSERT INTO fdFolio ([fdName], [fdOne], [fdTwo]) "
ssql = ssql & "SELECT * FROM [Excel 8.0;HDR=YES;DATABASE=" & dbWb & "]." & dsh
cn.Execute ssql
End Sub
Also if you have any book recommendations that would cover this/links, that would also be appreciated.
I'm sure it can be done in Excel, but I don't know it off the top of my head.
But it's fairly easy to do in Access (also uses VBA). Look at the TransferSpreadsheet method. If you combine it with saved import specs, it can do a lot.
You also have the choice of importing the data into a new table, or you can just link to the spreadsheet and have it act like a table. Linking is useful when you don't want all the spreadsheet info and want to query it.
Here's a link on the command syntax: http://msdn.microsoft.com/en-us/library/office/ff844793(v=office.14).aspx
The code that you have found transfers data to an already existing database named FDData.mdb that is probably already set up to look exactly like your excel worksheet. Can I ask why you don't just use Access? It is easier to use VBA to create excel sheets from Access than it is to do the opposite. There is also the import database from excel worksheet feature in Access, are you trying to automate this process for a vast number of excel worksheets? Otherwise you are better off just using the wizard. We might be able to help more if you can tell us exactly what you are trying to do, linking up Excel and Access via VBA might be more counterproductive than just picking one and dealing with the downsides unless you are prepared to write a whole lot of code.
Since you know about accessing cell value in your excel and know how to access an AccessDB as recordset in VBA, this wont be too hard for you ..
I'm sure Google will give you nice direction for it !
And I found this link for you .. http://www.ozgrid.com/forum/showthread.php?t=76110

Resources