rows disappearing on datasets of over 100,000 when importing with VBA - excel

I have a macro that I use in Excel 2010 to loop through some xls files, extracting the data from each into a xlsm file. There should be about 195,000 rows from across all of the xls files, but after running it I end up with closer to 90,000. If I run it on only a few of the files at once I get the correct number so it seems to be something to do with the volume I'm trying to incorporate, but I understand that an xlsm can handle up to a million rows so that shouldn't be a problem, should it?. I've split the source files into batches in the past but I'd rather avoid doing that if possible. Ultimately, I'm trying to compile a csv to import into a SQL database. If anybody has any suggestions, I'd be very grateful.
Thanks.
PS I've asked about this before a month or so ago but as I'd totally misdiagnosed the issue and was asking about the wrong thing, I'm writing a fresh question so that I don't set people off on the wrong track. I was rightly chastised for not including enough code last time. This is the subroutine which extracts the data:
Sub import_data()
Dim wk As Workbook
Dim shRead As Worksheet, ws As Worksheet
Dim i As Integer
Dim reportLocation As String
Dim report As String
Dim reportList As String
Dim reportArray() As String
Dim shReadLastColumn As Long
Dim shReadLastRow As Long
'generate list of xls to open
reportLocation = "C:\Foo"
report = Dir(reportLocation & "\*.xls")
reportList = ""
Do While Len(report) > 0
reportList = report & "," & reportList
report = Dir
Loop
reportArray() = Split(reportList, ",")
'loop through list of xls files
For i = UBound(reportArray) To LBound(reportArray) Step -1
If reportArray(i) <> "" Then
Set wk = Workbooks.Open(reportLocation & "\" & reportArray(i), ReadOnly:=True)
Set shRead = wk.Worksheets(1)
With shRead
shReadLastColumn = .Cells(10, shRead.Columns.count).End(xlToLeft).Column
shReadLastRow = .Cells(shRead.Rows.count, "A").End(xlUp).Row
End With
'copy list over on to xlsm compilation
Dim target_row As Long
Set ws = ThisWorkbook.Worksheets(1)
If IsEmpty(ws.Cells(1, 1)) Then
target_row = 1
shRead.Range(shRead.Cells(10, 1), shRead.Cells(shReadLastRow, shReadLastColumn)).Copy ws.Cells(target_row, 1)
Else
target_row = ws.Cells(Rows.count, 1).End(xlUp).Row + 1
shRead.Range(shRead.Cells(10 + 1, 1), shRead.Cells(shReadLastRow, shReadLastColumn)).Copy ws.Cells(target_row, 1)
End If
wk.Activate
ActiveWorkbook.Close False
End If
Set wk = Nothing
Set shRead = Nothing
Next i
End Sub
Thanks for any help!

You appear to have an unqualified reference with missing workbook object for target_row:
target_row = ws.Cells(Rows.count, 1).End(xlUp).Row + 1
which should be
target_row = ws.Cells(ws.Rows.count, 1).End(xlUp).Row + 1
Also, consider using a With block and avoid any Activate or ActiveWorkbook calls:
' WITH BLOCK (no use of ws)
With ThisWorkbook.Worksheets(1)
If IsEmpty(.Cells(1, 1)) Then
target_row = 1
shRead.Range(shRead.Cells(10, 1), shRead.Cells(shReadLastRow, shReadLastColumn)).Copy .Cells(target_row, 1)
Else
target_row = .Cells(.Rows.count, 1).End(xlUp).Row + 1
shRead.Range(shRead.Cells(10 + 1, 1), shRead.Cells(shReadLastRow, shReadLastColumn)).Copy .Cells(target_row, 1)
End If
End With
' ADJUSTED LINE
wk.Close False
Also, if you only need data without formats, consider range assignment:
With ThisWorkbook.Worksheets(1)
...
target_row = .Cells(.Rows.count, 1).End(xlUp).Row + 1
.Cells(target_row, target_row + shReadLastRow - 11).Value = shRead.Range( _
shRead.Cells(10 + 1, 1), shRead.Cells(shReadLastRow, shReadLastColumn) _
)
...
End With

Related

Combining Excel Sheets into the summary sheet for specific sheet names

I'm trying to run this code but only for specific worksheet names. the name is as follows "X_Score_" & CurrentDate where CurrentDate changes every time the loop runs. Right now it runs for all worksheets in the whole workbook which is quite messy.
Set wsMaster = ThisWorkbook.Worksheets("XXX_SCORE_TOTAL")
RowTracker = 2
flag = False
For Each bs In ActiveWorkbook.Worksheets
If UCase(bs.Name) <> "XXX_SCORE_TOTAL" Then
LastRow = bs.Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = bs.Cells(1, Columns.Count).End(xlToLeft).Column
End If
bs.Range(bs.Cells(2, 1), bs.Cells(LastRow, LastColumn)).Copy wsMaster.Cells(RowTracker, 1)
RowTracker = RowTracker + LastRow
Next bs
I tried to predifine something like Set MyCollection = ThisWorkbook.Worksheets("X_Score_" & CurrentDate) and then put in in the For loop like For Each bs In MyCollectionbut it didn't worked
If the goal is to process all sheets in the workbook whose names start with "X_Score_", this can be done with Left():
Sub test1()
Set wsMaster = ThisWorkbook.Worksheets("XXX_SCORE_TOTAL")
RowTracker = 2
flag = False
For Each bs In ThisWorkbook.Worksheets ' or ActiveWorkbook?
If Left(UCase(bs.Name), 8) = "X_SCORE_" Then
LastRow = bs.Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = bs.Cells(1, Columns.Count).End(xlToLeft).Column
Set Rng = bs.Range(bs.Cells(2, 1), bs.Cells(LastRow, LastColumn))
Rng.Copy wsMaster.Cells(RowTracker, 1)
RowTracker = RowTracker + Rng.Rows.Count ' shift RowTracker according to the number of rows copied
End If
Next
End Sub

Slowness on my code while looping through rows

Set WB = Thisworkbook
Set wk = WB.Sheets("Final")
Set ws = WB.Sheets("OIT-Temp")
irow = wk.Range("B65000").End(xlUp).Row 'Find last row
For i = 2 To irow Step 1
If not wk.Cells(i, 2).FormulaR1C1 = "#N/A" Then
wk.cells(i, 2).copy ws.Cells(i, 2)
wk.cells(i,3).Value = 128
Else
wk.Cells(i, 3).Value = 198
End if
Next i
I'm here facing too much of lag while running this code and unable to understand what's wrong with my code
Please, try the next adapted code. It uses arrays, mostly working in memory and drops the processed arrays at once. That's why it should be very fast:
Sub speedUpCode()
Dim WB As Workbook, wk As Worksheet, ws As Worksheet, iRow As Long, arr, arrCopy, i As Long
Set WB = ThisWorkbook
Set wk = WB.Sheets("Final")
Set ws = WB.Sheets("OIT-Temp")
iRow = wk.Range("B" & wk.rows.count).End(xlUp).row 'Find last row
arr = wk.Range("B2:C" & iRow).Value2 'place the range in an array for faster iteration
arrCopy = ws.Range("B2:B" & iRow).Value2 'place the range in an array for faster iteration
For i = 1 To UBound(arr)
If Not CStr(arr(i, 1)) = "Error 2042" Then ''#N/A
arrCopy(i, 1) = arr(i, 1)
arr(i, 2) = 128
Else
arr(i, 2) = 198
End If
Next i
'drop the processed array values:
wk.Range("B2").Resize(UBound(arr), 2).Value2 = arr
ws.Range("B2").Resize(UBound(arrCopy), 1).Value2 = arrCopy
End Sub
Some general points:
Indent your code.
Give your variables meaningful names.
Normal to write positive code block in if/else first.
I'm not sure how you're doing an xlUp on an entire row.
For this specific problem, reading and writing cells is computationally expensive. You can remove a lot of this by reading and writing data in bulk and working with your data in arrays in memory.
Try this and see if it speeds things up.
Sub FastReadWrite()
' Set the range references.
With ThisWorkbook
With .Sheets("Final")
Dim maxRow As Long
maxRow = .Range("B65000").End(xlUp).Row
Dim finalData As Range
Set finalData = .Range("B2:C" & maxRow)
End With
Dim oitTemData As Range
Set oitTemData = .Sheets("OIT-Temp") _
.Range("B2:C" & maxRow)
End With
' Read the data from the ranges into memory.
Dim fArray() As Variant: fArray = finalData.Formula2R1C1
Dim tArray() As Variant: tArray = oitTemData.Formula2R1C1
' Main logic loop.
Dim i As Long
For i = 1 To UBound(fArray, 1)
If fArray(i, 1) = "#N/A" Then
fArray(i, 2) = 198
Else
tArray(i, 1) = fArray(i, 1)
fArray(i, 2) = 128
End If
Next i
' Write the data to the ranges.
finalData.Formula2R1C1 = fArray
oitTemData.Formula2R1C1 = tArray
End Sub

Split at empty line to pdf, using same headers and footers on all pdf

I'm totally new with VBA's
I have the following code, need to split based on empty file, safe as pdf document and need my Headers and footers to pull thought on all the split files. I need to safe each file as the Id number in cell C4
Just cant get this right, if you can please help.
Example
My code.
Sub spreaddate()
Dim totalrows As Integer
Dim countworksheet As Integer
Dim lastcopy As Integer
Dim Sht As Worksheet
Set Sht = ActiveSheet
countworksheet = 1
lastcopy = 2
totalrows = Cells(Rows.Count, 1).End(xlUp).Row
For i = 2 To totalrows + 1
If Cells(i, 1).Value = "" Then
Sheets.Add.Name = countworksheet
Sht.Range("1:1").Copy Sheets(countworksheet).Cells(1, 1)
Sht.Range(lastcopy & ":" & i - 1).Copy Sheets(countworksheet).Cells(2, 1)
lastcopy = i + 1
countworksheet = countworksheet + 1
Sht.Activate
End If
Next i
End Sub

Exceeding row limit - create new sheet

I have 2 columns on a sheet "list", one column that lists all business entities, the other lists all org units. The functionality of the code below works perfectly but returns an error because it exceeds the sheet row limit.
The data is pasted onto a sheet "cc_act" is there a way to at point of error create a new sheet called "cc_act1"...."cc_act2" until the script is complete?
Declare Function HypMenuVRefresh Lib "HsAddin" () As Long
Sub cc()
Application.ScreenUpdating = False
Dim list As Worksheet: Set list = ThisWorkbook.Worksheets("list")
Dim p As Worksheet: Set p = ThisWorkbook.Worksheets("p")
Dim calc As Worksheet: Set calc = ThisWorkbook.Worksheets("calc")
Dim cc As Worksheet: Set cc = ThisWorkbook.Worksheets("cc_act")
Dim cc_lr As Long
Dim calc_lr As Long: calc_lr = calc.Cells(Rows.Count, "A").End(xlUp).Row
Dim calc_lc As Long: calc_lc = calc.Cells(1,
calc.Columns.Count).End(xlToLeft).Column
Dim calc_rg As Range
Dim ctry_rg As Range
Dim i As Integer
Dim x As Integer
list.Activate
For x = 2 To Range("B" & Rows.Count).End(xlUp).Row
If list.Range("B" & x).Value <> "" Then
p.Cells(17, 3) = list.Range("B" & x).Value
End If
For i = 2 To Range("A" & Rows.Count).End(xlUp).Row
If list.Range("A" & i).Value <> "" Then
p.Cells(17, 4) = list.Range("A" & i).Value
p.Calculate
End If
p.Activate
Call HypMenuVRefresh
p.Calculate
'''changes country on calc table
calc.Cells(2, 2) = p.Cells(17, 4)
calc.Cells(2, 3) = p.Cells(17, 3)
calc.Calculate
'''copy the calc range and past under last column
With calc
Set calc_rg = calc.Range("A2:F2" & calc_lr)
End With
With cc
cc_lr = cc.Cells(Rows.Count, "A").End(xlUp).Row + 1
calc_rg.Copy
cc.Cells(cc_lr, "A").PasteSpecial xlPasteValues
End With
Next i
Next x
Application.ScreenUpdating = True
End Sub
I suppose there are a few ways to handle something like this. See the code sample below, and adapt it to your specific needs.
Sub LongColumnToAFewColumns()
Dim wsF As Worksheet, WST As Worksheet
Dim rf As Range, rT As Range
Dim R As Long, j As Integer
' initialize
Set wsF = ActiveSheet
Set WST = Sheets.Add
WST.Name = "Results"
j = 1
For R = 1 To wsF.Cells(Rows.Count, 1).End(xlUp).Row Step 65536
wsF.Cells(R, 1).Resize(65536).Copy
WST.Cells(j, 1).PasteSpecial xlPasteValues
WST.Cells(j, 1).PasteSpecial xlPasteValues
j = j + 1
Next R
End Sub
As an aside, you may want to consider using MS Access for this kind of thing. Or, better yet, Python or even R. Good luck with your project.

Excel VBA opening and merging many workbooks

I have many, over two dozen (and counting), data sets with 15000 rows and 36 columns each, that I would like to combine. These data sets are have the same columns and more or less the same rows. They are monthly snapshots of the same data, with some data leaving and some entering (hence the marginally different number of rows.
I would like the user to select some of them and and combine them. The name of the file contains that date and my code extracts the date and adds it in a new column at the end. Right now, my code works. I collect all the data in a three dimensional array and then paste it in a new workbook. The problem is that since each book has different numbers or rows, I am creating a data array with more rows than needed. So my data has a lot of empy rows right now. I guess I can delete the empty rows in the end. I am new to excel VBA and new to doing data work so I was wondering if there was a smarter, more efficient way of construction my panel.
Dim DataArray As Variant
Sub test()
Dim filespec As Variant, i As Integer
ReDim DataArray(0 To 20000, 0 To 36, 0 To 0)
' Here the user gets to select the files
On Error GoTo EndNow
filespec = Application.GetOpenFilename(FileFilter:="microsoft excel files (*.xls), *.xls", Title:="Get File", MultiSelect:=True)
For i = 1 To UBound(filespec)
ReDim Preserve DataArray(0 To 20000, 0 To 36, 0 To i)
Set wbSource = Workbooks.Open(filespec(i))
Set ws1 = wbSource.Worksheets("Sheet1")
With ws1
'now I store the values in my array
FinalColumn = .Cells(1, .Columns.Count).End(xlToLeft).Column
FinalRow = .Range("B" & .Rows.Count).End(xlUp).Row
For j = 1 To FinalRow
For k = 1 To FinalColumn
DataArray(j, k, i) = .Cells(j, k).Value
Next k
' Now I extract the date data from the file name and store it in the last column of my array.
DataArray(j, FinalColumn + 1, i) = piece(piece(GetFileName(CStr(filespec(i))), "_", 3), ".", 1)
Next j
End With
ActiveWorkbook.Close
Next i
Set wb2 = Application.Workbooks.Add
Set ws2 = wb2.Worksheets("Sheet1")
With ws2
For i = 1 To UBound(DataArray, 3)
FinalRow2 = 20000
FinalColumn2 = 36
For k = 1 To FinalColumn2
' I did this If loop so as to not copy headers every time.
If i = 1 Then
For j = 1 To FinalRow2
.Cells(j, k).Value = DataArray(j, k, i)
Next j
Else
For j = 2 To FinalRow2
.Cells(FinalRow2 * (i - 1) + j, k).Value = DataArray(j, k, i)
Next j
End If
Next k
Next i
wb2.Sheets(1).Name = "FolderDetails Panel Data"
wb2.SaveAs ThisWorkbook.Path & "Folder_Details_Panel_Data" & "_" & Format(Date, "yyyy_mm_dd"), _
FileFormat:=xlOpenXMLWorkbook, CreateBackup:=False
End With
EndNow:
End Sub
' MsgBox GetFileName(filespec(0))
Function GetFileName(filespec As String)
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
GetFileName = fso.GetFileName(filespec)
End Function
Function piece(Searchstring As String, Separator As String, IndexNum As Integer) As String
Dim t
t = Split(Searchstring, Separator)
If UBound(t) > 0 Then piece = t(IndexNum - 1)
End Function
To answer your direct question, I would copy the data from each workbook into the merged workbook as each is processed. I see no advantage in collecting all the data into a 3D array.
There are also many other issues with your code. What follows is a refactor of your code, with changes highlighted.
Option Explicit ' <-- Force declaration of all variables (must be first line in module)
Sub Demo()
Dim filespec As Variant
Dim i As Long ' --> Long is prefered over Integer
Dim DataArray As Variant ' <-- no need to be Module scoped
' --> Declare all your variables
Dim j As Long, k As Long
Dim wbSource As Workbook
Dim ws As Worksheet
Dim wbMerged As Workbook
Dim wsMerged As Worksheet
Dim DataHeader As Variant
Dim FinalRow As Long, FinalColumn As Long
Dim sDate As String
Dim rng As Range
' Here the user gets to select the files
On Error GoTo EndNow
filespec = Application.GetOpenFilename(FileFilter:="microsoft excel files (*.xls), *.xls", Title:="Get File", MultiSelect:=True)
If Not IsArray(filespec) Then
' <-- User canceled
Exit Sub
End If
' Speed up processing <--
' -- Comment these out for debugging purposes
'Application.ScreenUpdating = False
'Application.Calculation = xlCalculationManual
' Create Merged Workbook
Set wbMerged = Application.Workbooks.Add
Set wsMerged = wbMerged.Sheets(1)
wsMerged.Name = "FolderDetails Panel Data"
For i = 1 To UBound(filespec)
Set wbSource = Workbooks.Open(filespec(i))
Set ws = wbSource.Worksheets("Sheet1")
With ws
FinalColumn = .Cells(1, .Columns.Count).End(xlToLeft).Column
FinalRow = .Cells(.Rows.Count, 2).End(xlUp).Row
If i = 1 Then
' Get header from first workbook only
DataHeader = Range(.Cells(1, 1), .Cells(1, FinalColumn)).Value ' <-- Get data header
ReDim Preserve DataHeader(1 To 1, 1 To UBound(DataHeader, 2) + 1) ' <-- Range.Value arrays are 1 based
k = UBound(DataHeader, 2)
DataHeader(1, k) = "Date" ' <-- Header
End If
' Get all data in one go, excluding header
DataArray = Range(.Cells(2, 1), .Cells(FinalRow, FinalColumn)).Value ' <-- Array size matches data size
End With
wbSource.Close False
' Add Date to data
sDate = GetDateFromFileName(filespec(i)) '<-- do it once
' resize data array
ReDim Preserve DataArray(1 To UBound(DataArray, 1), 1 To UBound(DataArray, 2) + 1) ' <-- Range.Value arrays are 1 based
' Add date data
For j = 1 To UBound(DataArray, 1)
DataArray(j, k) = sDate
Next j
' Complete processing of each workbook as its opened
With wsMerged
' Add header row from first workbook
If i = 1 Then
Range(.Cells(1, 1), .Cells(1, UBound(DataArray, 2))) = DataHeader
End If
' <-- Add data to end of sheet
' Size the destination range to match the data
Set rng = .Cells(.Rows.Count, 2).End(xlUp).Offset(1, -1)
Set rng = rng.Resize(UBound(DataArray, 1), UBound(DataArray, 2))
rng = DataArray
End With
Next i
' <-- append \ to path
wbMerged.SaveAs ThisWorkbook.Path & "\" & "Folder_Details_Panel_Data" & "_" & Format(Date, "yyyy_mm_dd"), _
FileFormat:=xlOpenXMLWorkbook, CreateBackup:=False
CleanUp:
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Exit Sub
EndNow:
MsgBox "Oh dear"
GoTo CleanUp
End Sub
' Simplified
' <-- Not entirely sure if this will match your file name pattern.
' Please check
' Assumed file name
' Some\Path\Some_Words_YYYMMDD.xls
Function GetDateFromFileName(Nm As Variant) As String
Dim str As String
str = Mid$(Nm, InStrRev(Nm, "\") + 1)
str = Left$(str, InStrRev(str, ".") - 1)
str = Mid$(str, InStrRev(str, "_") + 1)
GetDateFromFileName = str
End Function

Resources