Related
I have been given a work task where im to find and replace 8 digits numbers with a corresponding new values coming from a 2 column table....basically a vlookup then replace the old value with a new one...
The challenge im facing is.... the 2 column table is 882k rows, and the cells im trying to replace is about 120 million (41,000 rows x 3000 columns)...
I tried running a vba code i found somewhere...
Option Explicit
Sub Replace_Overwrite()
Dim LRow As Long, i As Long
Dim varSearch As Variant
With Sheets("Sheet2")
LRow = .Cells(.Rows.Count, 1).End(xlUp).Row
varSearch = .Range("A2:B" & LRow)
End With
With Sheets("Sheet1").UsedRange
For i = LBound(varSearch) To UBound(varSearch)
.Replace what:=varSearch(i, 1), replacement:=varSearch(i, 2), lookat:=xlWhole
Next
End With
End Sub
I tried using this and it ran it for 8 hours and my work laptop crashed....
I'm not sure anymore if this is still possible with MS Excel alone...
I wonder if anyone can help me with a code that can process it.. i can leave my system open over the weekend if its stable and does work.. it only has 8GB ram btw, running excel 2013...
To speed up things, do as much as possible in memory and minimize the interaction between VBA and Excel (as this makes things really slow).
The following attempt reads the lookup-list into a dictionary and then processes the data column by column.
I did a test, creating 880.000 lookup rows and 40.000 x 100 cells of data. Building the dictionary took less than a minute, processing the columns took 3-4 seconds per column. I added a logic that after every 10 columns, the whole workbook is saved, that increased the processing time but ensures that after a crash you can more or less continue where you left (the yellow color tells you where, just replace the 1 in for col=1 with the column where you want to restart).
I have added some DoEvents, that in theory slows down the process a little bit. Advantage is that you can see the output of the debug.print and the whole Excel process is not displayed as unresponsive in the task manager.
To build the dictionary, I read the complete data into an array at once (if you are not familiar with Dictionaries: You need to add a reference to the Microsoft Scripting Runtime).
Function createDict() As Dictionary
Dim d As New Dictionary
Dim rowCount As Long
Dim list()
Debug.Print Now, "Read data from Lookup sheet"
With ThisWorkbook.Sheets(1)
rowCount = .Cells(.Rows.Count, 1).End(xlUp).row
list = .Range("A1:B" & rowCount).Value
End With
Debug.Print Now, "Build dictionary."
Dim row As Long
For row = 1 To UBound(list)
If Not d.Exists(list(row, 1)) Then d.Add list(row, 1), list(row, 2)
If row Mod 1000 = 0 Then DoEvents
Next row
Set createDict = d
End Function
As said, replacing the data is done column by column. Again, I read the whole column at once into an array, do the replace on this array and then write it back to the sheet.
Sub replaceAll()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim d As Dictionary
Set d = createDict
Dim row As Long, col As Long
Dim rowCount As Long, colCount As Long
With ThisWorkbook.Sheets(2)
rowCount = .Cells(.Rows.Count, 1).End(xlUp).row
colCount = .Cells(1, .Columns.Count).End(xlToLeft).Column
For col = 1 To colCount
Debug.Print Now & "processing col " & col
DoEvents
Dim data
data = .Range(.Cells(1, col), .Cells(rowCount, col))
For row = 1 To rowCount
If d.Exists(data(row, 1)) Then data(row, 1) = d(data(row, 1))
Next row
.Range(.Cells(1, col), .Cells(rowCount, col)) = data
.Cells(1, col).Interior.Color = vbYellow
If col Mod 10 = 0 Then ThisWorkbook.Save
Next
End With
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
One remark: You should consider to use a database for such amount of data.
I am attempting to run a VBA macro that iterates down about 67,000 rows with 100 columns in each row. For each of the cells in these rows, the value is compared against a column with 87 entries in another sheet. There are no errors noted when the code is run but Excel crashes every time. The odd thing is that the code seems to work; I have it set to mark each row in which a match is found and it does so before crashing. I have attempted to run it many times and it has gotten through between 800 and 11,000 rows before crashing, depending on the attempt.
My first suspect was memory overflow due to the volume of calculations but my system shows CPU utilization at 100% and memory usage around 50% while running this code:
Sub Verify()
Dim codes As String
Dim field As Object
For i = 2 To Sheets("DSaudit").Rows.Count
For Each field In Sheets("Dsaudit").Range(Cells(i, 12), Cells(i, 111))
r = 1
While r <= 87
codes = ThisWorkbook.Sheets("287 Denominator CPT").Cells(r, 1).Value
If field = codes Then
Cells(i, 112).Value = "True"
r = 88
Else
r = r + 1
End If
Wend
Next field
i = i + 1
Next i
End Sub
It should also be noted that I am still very new to VBA so it's likely I've made some sort of egregious rookie mistake. Can I make some alterations to this code to avoid a crash or should I scrap it and take a more efficient approach?
When ever possible iterate variant arrays. This limits the number of times vba needs to access the worksheet.
Every time the veil between vba and Excel is pierced cost time. This only pierces that veil 3 times not 9,031,385,088
Sub Verify()
With Sheets("DSaudit")
'Get last row of Data
Dim lastrow As Long
lastrow = .Cells(.Rows.Count, 12).End(xlUp).Row 'if column 12 ends before the last row of data change to column that has them all.
'Load Array with input Values
Dim rng As Variant
rng = .Range(.Cells(2, 12), .Cells(lastrow, 111)).Value
'Create output array
Dim outpt As Variant
ReDim outpt(1 To UBound(rng, 1), 1 To 1)
'Create Match array
Dim mtch As Variant
mtch = Worksheets("287 Denominator CPT").Range("A1:A87").Value
'Loop through first dimension(Row)
Dim i As Long
For i = LBound(rng, 1) To UBound(rng, 1)
'Loop second dimension(Column)
Dim j As Long
For j = LBound(rng, 2) To UBound(rng, 2)
'Loop Match array
Dim k As Long
For k = LBound(mtch, 1) To UBound(mtch, 1)
'If eqaul set value in output and exit the inner loop
If mtch(k, 1) = rng(i, j) Then
outpt(i, 1) = "True"
Exit For
End If
Next k
'If filled true then exit this for
If outpt(i, 1) = "True" Then Exit For
Next j
Next i
'Assign the values to the cells.
.Cells(2, 112).Resize(UBound(outpt, 1), 1).Value = outpt
End With
End Sub
I need some help with a particular macro I am working on.
The macro processes columns of data that have been imported from a pdf file. The import process produces multiple sheets of consistent data, all variables stay in the same columns across multiple sheets. This macro needs to read the three columns of numbers, subtract all cells in two columns one from another, place solved value in an empty column at the end of each row. Then repeat with another combination of two columns. After that, it needs to compare the solved values against a margin value, and generate a new sheet that pulls the whole row of data that the failed margin value is in to a new sheet at the front of the workbook.
This is what I have so far.
I can preform the function on one sheet so far, but don't know how to automate this to the other sheets. Numbers populate columns B, C, and D, Answers should be placed in G, H and any other columns after H are empty.
Private Sub FindAndCreateSheet3dBm()
' Declare variables
Dim eWs As Worksheet
Dim rMargin As Range
Dim myUnion As Range
'Column G: subrtact max and measured values
Worksheets("page 6").Range("G1:G21").Formula = "=(C1-D1)"
'*need to fix sheet reference, make all sheets, add flexible range to
'end of G range
'Column H: subrtact measured and min values
Worksheets("page 6").Range("H1:H21").Formula = "=(D1-B1)"
'*need to fix sheet reference, make all sheets, add flexible range to
'end of H range
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Create the report sheet at first position then name it "Less than 3dBm"
Dim wsReport As Worksheet
Dim rCellwsReport As Range
Set wsReport = ThisWorkbook.Sheets.Add(Before:=ThisWorkbook.Sheets(1))
wsReport.Name = "Less than 3dBm"
Set rCellwsReport = wsReport.Cells(1, 1)
'Create union of columns to search G and H?
Set myUnion = Union(Columns("G"), Columns("H"))
'Check whole Workbook, union G and H for values less than rMargin
NextSheet:
Next
End Sub
Thank you
This should work for your needs. Before I get into my code, I just want to note that usually the response you'll get from the community when asking a 'how do I do this' question is that SO is not a code for me site. We are happy to help fix broken code, but these kinds of problems can generally be solved with Google.
That being said, I wanted a break from the project I was working on, so I threw this together. My hope here is that you can use it as a learning opportunity of how to write better code (and maybe get some kudos from your boss in the process).
Here's the code:
Private Sub FindAndCreateSheet3dBm()
' Ideally, you wouldnt even use something like this. For your purposes
' it will get you going. I highly recommend finding a dynamic way of
' determining the positions of the data. It may be consistent now, but
' in the world of programming, everything changes, especially when
' you think it wont.
Const FIRST_INPUT_COL As Long = 3 ' Column C
Const SECOND_INPUT_COL As Long = 4 ' D
Const THIRD_INPUT_COL As Long = 2 ' B
Const FIRST_OUTPUT_COL As Long = 7 ' G
Const SECOND_OUTPUT_COL As Long = 8 ' H
Dim marginReport As Worksheet
Set marginReport = ThisWorkbook.Sheets.Add(Before:=ThisWorkbook.Sheets(1))
marginReport.Name = "Less than 3dBm"
Dim targetWorksheet As Worksheet
For Each targetWorksheet In ThisWorkbook.Worksheets
If Not targetWorksheet Is marginReport Then
Dim inputData As Variant
inputData = targetWorksheet.UsedRange.value
Dim outputData As Variant
' I resize the array to be the exact same as the first, but to add two additional columns
ReDim outputData(LBound(inputData, 1) To UBound(inputData, 1), LBound(inputData, 2) To UBound(inputData, 2) + 2)
Dim i As Long
Dim j As Long
' Loop through rows
For i = LBound(inputData, 1) To UBound(inputData, 1)
' Loop through columns
For j = LBound(inputData, 2) To UBound(inputData, 2)
' Essentially, just copy the data
outputData(i, j) = inputData(i, j)
Next
Next
Dim offSetValue As Long
If LBound(outputData, 2) = 1 Then offSetValue = -1
' For your purposes I will use hardcoded indices here, but it is far more ideal to manage this in a more flexible manner
For i = LBound(outputData, 1) To UBound(outputData, 1)
outputData(i, FIRST_OUTPUT_COL) = outputData(i, FIRST_INPUT_COL) - outputData(i, SECOND_INPUT_COL)
outputData(i, SECOND_OUTPUT_COL) = outputData(i, FIRST_OUTPUT_COL) - outputData(i, THIRD_INPUT_COL)
If LessThanMargin(outputData(i, SECOND_OUTPUT_COL)) Then
For j = LBound(outputData, 2) To UBound(outputData, 2)
' I start with the output worksheet, and use the 'End(xlUp) to find the first
' non-blank row. I then iterate columnwise and add values to the row beneath it.
' The offSetValue variable ensures I am not skipping any cells if the array
' is 1-Based versus the default 0-Base.
marginReport.Range("A1048576").End(xlUp).Offset(1, j + offSetValue).value = outputData(i, j)
Next
End If
Next
OutputArray outputData, targetWorksheet, "UpdatedData_" & UCase(Replace(targetWorksheet.Name, " ", "_"))
End If
Next
End Sub
' I am just checking for a negative number here, but change this to use the logic you need
Public Function LessThanMargin(ByVal InputValue As Double)
LessThanMargin = InputValue < 0
End Function
Public Sub OutputArray(ByVal InputArray As Variant, ByVal InputWorksheet As Worksheet, ByVal TableName As String)
Dim AddLengthH As Long
Dim AddLengthW As Long
If NumberOfArrayDimensions(InputArray) = 2 Then
If LBound(InputArray, 1) = 0 Then AddLengthH = 1
If LBound(InputArray, 2) = 0 Then AddLengthW = 1
Dim r As Range
If Not InputWorksheet Is Nothing Then
With InputWorksheet
.Cells.Clear
Set r = .Range("A1").Resize(UBound(InputArray, 1) + AddLengthH, UBound(InputArray, 2) + AddLengthW)
r.value = InputArray
.ListObjects.Add(xlSrcRange, r, , xlYes).Name = TableName
With .ListObjects(1).Sort
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
End With
End If
End If
End Sub
I use arrays to solve the problem since they are far more efficient when processing data versus using excel-formulas. While this is unlikely to make a performance boost on a ~200 row project, it makes tremendous differences when you're dealing with a few thousand rows or even more.
I also used constants for the column positions to make it easier for you to adjust these in the future. This comes with a caution though, even constants (for this purpose) are terrible habit so dont get used to them. Learn how to calculate where the data is.
Finally, please (for the love of all that is programmatic) don't just copy and paste this code and never look back. I put this up here for you (and others) to learn from it. Not for it to be some sort of quick fix. I hope you can use it to grow.
I have this macro to delete the entire rows for those that are not "chr9". I have a total of 401,094 rows. It seems to compile fine, but my Excel freezes and I have to Force Quit.
I think it may be an inefficient algorithm or maybe some error in the code?
Sub deleteNonChr9()
Dim lastrow As Long
Dim firstrow As Long
Dim i As Long
lastrow = 401094
firstrow = 0
' Increment bottom of sheet to upwards
For i = lastrow To firstrow Step -1
If (Range("C1").Offset(i, 0) <> "chr9") Then
Range("C1").Offset(i, 0).EntireRow.Delete
End If
Next i
End Sub
The fastest way to conditionally delete rows is to have them all at the bottom of the data block. Sorting them into that position and deleting is faster than individual looping or even compiling a discontiguous Union of rows to delete.
When any group or cells is contiguous (i.e. all together) Excel does not have to work as hard to get rid of them. If they are at the bottom of the Worksheet.UsedRange property, Excel doesn't have to calculate what to fill the empty space with.
Your original code did not allow for a column header text label in row 1 but I will account for that. Modify to suit if you do not have one.
These will turn off the three primary parasites of computing power. Two have already been addressed in the comments and answers, the third Application.EnableEvents property can also make a valid contribution to Sub procedure efficiency whether you have event driven routines or not. See the helper Sub procedure at the bottom for details.
Sample data²: 500K rows of random data in A:Z. ~33% Chr9 in column C:C. Approximately 333K randomly discontiguous rows to delete.
Union and delete
Option Explicit
Sub deleteByUnion()
Dim rw As Long, dels As Range
On Error GoTo bm_Safe_Exit
appTGGL bTGGL:=False 'disable parasitic environment
With Worksheets("Sheet1")
Set dels = .Cells(.Rows.Count, "C").End(xlUp).Offset(1)
For rw = .Cells(.Rows.Count, "C").End(xlUp).Row To 2 Step -1
If LCase$(.Cells(rw, "C").Value2) <> "chr9" Then
Set dels = Union(dels, .Cells(rw, "C"))
End If
Next rw
If Not dels Is Nothing Then
dels.EntireRow.Delete
End If
End With
bm_Safe_Exit:
appTGGL
End Sub
Elapsed time: <It has been 20 minutes... I'll update this when it finishes...>
Bulk load from worksheet to variant array, change, load back, sort and delete
Sub deleteByArrayAndSort()
Dim v As Long, vals As Variant
On Error GoTo bm_Safe_Exit
appTGGL bTGGL:=False 'disable parasitic environment
With Worksheets("Sheet1")
With .Cells(1, 1).CurrentRegion
.EntireRow.Hidden = False
With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
'bulk load column C values
vals = .Columns(3).Value2
'change non-Chr9 values into vbNullStrings
For v = LBound(vals, 1) To UBound(vals, 1)
If LCase$(vals(v, 1)) <> "chr9" Then _
vals(v, 1) = vbNullString
Next v
End With
'dump revised array back into column C
.Cells(2, "C").Resize(UBound(vals, 1), UBound(vals, 2)) = vals
'sort all of blank C's to the bottom
.Cells.Sort Key1:=.Columns(3), Order1:=xlAscending, _
Orientation:=xlTopToBottom, Header:=xlYes
'delete non-Chr9 contiguous rows at bottom of currentregion
.Range(.Cells(.Rows.Count, "C").End(xlUp), .Cells(.Rows.Count, "C")).EntireRow.Delete
End With
.UsedRange 'reset the last_cell property
End With
bm_Safe_Exit:
appTGGL
End Sub
Elapsed time: 11.61 seconds¹ (166,262 rows of data remaining²)
Original code
Elapsed time: <still waiting...>
Summary
There are obvious advantages to working within a variant array as well as deleting contiguous ranges. My sample data had ~66% of the rows to delete so it was a harsh task master. If there were 5 or 20 rows to delete, using an array to parse data for a sort may not be the best solution. You will have to make your own decisions based on your own data.
appTGGL helper Sub procedure
Public Sub appTGGL(Optional bTGGL As Boolean = True)
With Application
.ScreenUpdating = bTGGL
.EnableEvents = bTGGL
.Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual)
End With
Debug.Print Timer
End Sub
¹ Environment: old business class laptop with a mobile i5 and 8gbs of DRAM running WIN7 and Office 2013 (version 15.0.4805.1001 MSO 15.0.4815.1000 32-bit) - typical of the low end of the scale for performing this level of procedure.
² Sample data temporarily available at Deleting entire row cannot handle 400,000 rows.xlsb.
Toggling ScreenUpdating and Calculation will help. But as Jeeped stated, applying a custom sort order is the way to go.
Sub deleteNonChr9()
Dim lastrow As Long
Dim firstrow As Long
Dim i As Long
lastrow = 401094
firstrow = 1
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' Increment bottom of sheet to upwards
For i = lastrow To firstrow Step -1
If (Cells(i, "C") <> "chr9") Then
Rows(i).EntireRow.Delete
End If
Next i
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
Major Progress
The following code for dealing with deleting a very large number of rows is inspired by Ron de Bruin - Excel Automation.
Sub QuickDeleteRows()
Dim Sheet_Data As Worksheet, NewSheet_Data As Worksheet
Dim Sheet_Name As String, ZeroTime As Double, Data As Range
On Error GoTo Error_Handler
SpeedUp True
Set Sheet_Data = Sheets("Test")
Sheet_Name = Sheet_Data.Name
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column
Set Data = Sheet_Data.Range("A1", Cells(LastRow, LastColumn))
Set NewSheet_Data = Sheets.Add(After:=Sheet_Data)
Data.AutoFilter Field:=3, Criteria1:="=Chr9"
Data.Copy
With NewSheet_Data.Cells
.PasteSpecial xlPasteColumnWidths
.PasteSpecial xlPasteAll
.Cells(1, 1).Select
.Cells(1, 1).Copy
End With
Sheet_Data.Delete
NewSheet_Data.Name = Sheet_Name
Safe_Exit:
SpeedUp False
Exit Sub
Error_Handler:
Resume Safe_Exit
End Sub
Sub SpeedUp(SpeedUpOn As Boolean)
With Application
If SpeedUpOn Then
.ScreenUpdating = False
.EnableEvents = False
.Calculation = xlCalculationManual
.DisplayStatusBar = False
.DisplayAlerts = False
Else
.ScreenUpdating = True
.EnableEvents = True
.Calculation = xlCalculationAutomatic
.DisplayStatusBar = True
.DisplayAlerts = True
End If
End With
End Sub
While my old version of code takes time rather long (about 130 seconds on average) for handling sample data provided by Jeeped, but the code above completes less than 4.6 seconds for handling 400,000 rows of sample data on my machine. It's a very significant gain in performance!
System Information of my PC (Very Minimum Computer Configurations for Students)
Operating System: Windows 7 Professional 32-bit (6.1, Build 7601)
Service Pack 1
System Manufacturer: Hewlett-Packard
System Model: HP Pro 3330 MT
Processor: Intel(R) Core(TM) i3-2120 CPU # 3.30GHz (4
CPUs), ~3.3GHz
Memory: 2048MB RAM
Original Answer
I'm aware that this answer is not really what the OP wants, but maybe this answer can be useful for other users and helpful to future users, if not the OP. Please see this answer as the alternative method.
Copy/paste, cut/insert, and delete entire row operations in Excel can take an excessively long time even when doing it in VBA Excel. For copy/paste and cut/insert operations the cause of the slowness is the formatting of the data itself. Memory over-allocation is another cause of those operations. So how do we resolve a situation like this? There are several things you can look for speeding up your code.
Use arrays instead of the range of cells. It's usually considered to be faster than working on the range of cells because it ignores the formatting of the data in cells.
Use .Value2 rather than the default property (.Value) because .Value2 will only treat all formatting numbers (currency, accounting, date, scientific, etc) as Doubles.
Suppose we have 10,000 rows of dummy data like the following dataset:
Instead of deleting entire rows of "non-chr9" data, I simply ignore those data and only consider the "chr9" data by copying all the "chr9" data into an array. How to code to implement such task? First of all, we must make a copy of our data to avoid data loss because we cannot undo all changes to recover the original data after running VBA Excel.
It seems you have done all the preparations needed. Now, we may start coding by first declaring every variable we need to the appropriate type of data.
Dim i As Long, j As Long, k As Long
Dim LastRow As Long, LastColumn As Long, LengthDataChr9 As Long
If you don't declare the variables, your code will run with those variables defaulting to the Variant type. While Variant can be enormously useful, but it can make your code slow. So, make sure each variable is declared with a sensible type. This is good programming practice and considerably faster.
Next, we determine all variables we will use to construct the size of arrays. We will need
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column
LastRow and LastColumn are the row and column number of the last cell with data in one row or one column. Keep in mind, LastRow and LastColumn may not give you the desired row and column number if you are not setting them up properly or using a well-formatted data sheet. What I mean by a "well-formatted data sheet", is a worksheet with data that starts in cell A1 and the number of the rows in column A and columns in row 1 must be equal to the range of all data. In other words, the size of the range of all data must be equal to LastRowxLastColumn.
We also need the length of the array for storing all the "chr9" data. This can be done by counting all the "chr9" data using the following statement:
LengthDataChr9 = Application.CountIf(Columns("C"), "chr9")
We now know the size of the arrays and we can redimension it. Add the following code lines:
ReDim Data(1 To LastRow, 1 To LastColumn)
ReDim DataChr9(1 To LengthDataChr9, 1 To LastColumn)
We use ReDim instead of Dim because we use the dynamic arrays. VBA Excel has automatically declared the arrays defaulting to the Variant type, but they have no size yet. Next, we copy the data into the array Data by using statement
Data = Range("A1", Cells(LastRow, LastColumn)).Value2
We use .Value2 to improve the performance of the code (See speeding up tips point 2 above). Since the data has already copied to the array Data we may clear the worksheet data so we can use it to paste DataChr9.
Rows("1:" & Rows.Count).ClearContents
To clear everything (all contents, formats, etc.) on the worksheet, we may use Sheets("Sheet1").Cells.Clear or Sheet1.Cells.Clear. Next, we want the code to loop through the elements array Data in column 3 by using For ... Next statement because the desired data we're looking for are located there. If the element of array Data contains string "chr9" is found, the code then copying all the elements in the row where "chr9" is located into DataChr9. Again we use For ... Next statement. Here are the lines for implementing those procedures.
For i = 1 To UBound(Data)
If Data(i, 3) = "chr9" Then
j = j + 1
For k = 1 To LastColumn
DataChr9(j, k) = Data(i, k)
Next k
End If
Next i
where j = j + 1 is a counter for looping through the rows of DataChr9. The final step, we paste back all the elements of DataChr9 to the worksheet by adding this line to the code:
Range("A1", Cells(LengthDataChr9, LastColumn)) = DataChr9
and then you're done! Yay, finally!
OK, let's compile all the lines code above. We obtain
Sub DeleteNonChr9()
Dim i As Long, j As Long, k As Long
Dim LastRow As Long, LastColumn As Long, LengthDataChr9 As Long
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column
LengthDataChr9 = Application.CountIf(Columns("C"), "chr9")
ReDim Data(1 To LastRow, 1 To LastColumn)
ReDim DataChr9(1 To LengthDataChr9, 1 To LastColumn)
Data = Range("A1", Cells(LastRow, LastColumn)).Value2
Rows("1:" & Rows.Count).ClearContents
For i = 1 To UBound(Data)
If Data(i, 3) = "chr9" Then
j = j + 1
For k = 1 To LastColumn
DataChr9(j, k) = Data(i, k)
Next k
End If
Next i
Range("A1", Cells(LengthDataChr9, LastColumn)) = DataChr9
End Sub
The performance of the code above is satisfying. It takes less than 0.5 seconds on average to complete the process of extracting all "chr9" data from 10,000 rows dummy data on my machine.
Sub xLator2()
Dim s1 As Worksheet, s2 As Worksheet
Dim N As Long, i As Long
Dim from(), too()
Set s1 = Sheets("Sheet1") ' contains the data
Set s2 = Sheets("Sheet2") ' contains the translation table
s2.Activate
N = Cells(Rows.Count, 1).End(xlUp).Row
ReDim from(1 To N)
ReDim too(1 To N)
For i = 1 To N
from(i) = Cells(i, 1).Value
too(i) = Cells(i, 2).Value
Next i
s1.Activate
' -------------- Modification starts here --------------------------
' Replace from from(i) to __MYREPLACEMENTi__ (where i is the counter)
For i = LBound(from) To UBound(from)
Cells.Replace What:=from(i), Replacement:="__MYREPLACEMENT" + Str(i) + "__"
Next i
' Replace from __MYREPLACEMENTi__ to too(i) (where i is the counter)
For i = LBound(from) To UBound(from)
Cells.Replace What:="__MYREPLACEMENT" + Str(i) + "__", Replacement:=too(i)
Next i
' -------------- Modification ends here --------------------------
End Sub
I am using above code to find and replace multiple words (in "Column A Sheet1" with words in "Column B Sheet 2") in below mentioned sheet:
https://docs.google.com/spreadsheets/d/14ba9pQDjMPWJd4YFpGffhtVcHxml0LdUUVQ0prrOEUY/edit?usp=sharing
However when I apply this in another sheet (as mentioned below) for another data then code fails i.e. I get distorted words in sheet1:
https://docs.google.com/spreadsheets/d/1spvZAzxT1kB1bytCQaNQH7tl1DJSpLITYgW6P5dxbQE/edit?usp=sharing
Please help me out so that I can replace words in "Column A Sheet1" with words in "Column B Sheet2"
Note: Above links have been given of google spreadsheet however I am having problem in Excel 2007 sheet.
I request you to help me by giving whole revised code as I am not good in VBA
I agree with sous2817: the more you solve your own problems, the faster you will develop. However, I think a little advice will help.
You must learn VBA if you are going to use it. Finding, and attempting to use, a piece of code which you do not understand is never going to end well. This is particularly true when the author of this code only knows a little more than you.
Search the web for "Excel VBA Tutorial". There are many to choose from so try a few and complete one that matches your learning style. I prefer books. I visited a large library, reviewed their Excel VBA Primers and borrowed those I liked. After trying them out at home, I bought the one that suited me best.
Where did you get this code? It contains typical beginner mistakes and it does not work even with the first example.
The first word in Sheet1 is "it". Sheet2 indicates "it" is to be replaced by "that". The code correctly replaces "it" by "that" Unfortunately, it replaces all "it"s by "that"s so "with" is translated to "wthath" not "having". Since you complain about the second pair of sheets, I assume you did not notice that mistranslation. Such mistranslations will be must more obvious in the second pair with the first word "the" appearing in "they", "there", "their", "them", "then" and "these".
If you look up the Replace Method, you should see a quick correction to this error. Note: Replace Method not Replace Function.
I will end by giving a couple examples of how a less junior programmer might have coded this routine.
Consider:
s2.Activate
N = Cells(Rows.Count, 1).End(xlUp).Row
ReDim from(1 To N)
ReDim too(1 To N)
For i = 1 To N
from(i) = Cells(i, 1).Value
too(i) = Cells(i, 2).Value
Next i
One of the first things you learn when you start learning VBA is “Do not activate worksheets or select cells”. These are slow commands and even if you use ScreenUpdating = False there will be some screen rewriting. More importantly, your code can become very confusing. The following is better:
With s2
N = .Cells(Rows.Count, 1).End(xlUp).Row
ReDim from(1 To N)
ReDim too(1 To N)
For i = 1 To N
from(i) = .Cells(i, 1).Value
too(i) = .Cells(i, 2).Value
Next i
End With
Note 1: the periods before the three Cells. Cells operates on the active worksheet. .Cells operates on the worksheet specified in the With statement.
Note 2: I have not placed a period before Rows.Count. Rows.Count returns the number of rows in the active worksheet. .Rows.Count returns the number of rows in the specified worksheet. The number of rows depends on the version of Excel and does not vary from worksheet to worksheet so most programmers do not bother with the period.
I would have made the two Worksheets parameters:
Sub xLator2(s1 As Worksheet, s2 As Worksheet)
This makes the subroutine much more flexible. I would probably go further and make the parameters Ranges which would give even more flexibility.
I could continue but I believe that is enough to start with. Good luck and welcome to the joys of programming.
Edit: Tutorial and complete solution
Part 1 - One reason not use Activate
Please study the following blocks of code which show why only the most junior VBA programmers use Activate without a very good reason. I do not want you to get too uptight about saving a few seconds or milliseconds. There are programmers who will spend half-an-hour to optimise a routine that is only run now and again. Only if a routine is to be run hundreds of times a day can that time be justified. I want you to appreciate that Application.ScreenUpdating = False can save a lot of time so you use it automatically. When Application.ScreenUpdating = False is included, the difference between Activate and With is a lot less but enough to justify making With the default choice.
For Count = 1 To 10000 ' This takes 148 seconds
Worksheets("Sheet1").Activate
Worksheets("Sheet2").Activate
Next
Application.ScreenUpdating = False ' This takes 11 seconds
For Count = 1 To 10000
Worksheets("Sheet1").Activate
Worksheets("Sheet2").Activate
Next
Application.ScreenUpdating = False ' This takes .07 seconds
For Count = 1 To 10000
With Worksheets("Sheet1")
End With
With Worksheets("Sheet2")
End With
Next
Application.ScreenUpdating = False ' This takes 12 seconds
For Count = 1 To 10000
Worksheets("Sheet1").Activate
Cells(23, 1).Value = "A"
Worksheets("Sheet2").Activate
Cells(23, 1).Value = "A"
Next
Application.ScreenUpdating = False ' This takes 1.16 seconds
For Count = 1 To 10000
With Worksheets("Sheet1")
.Cells(23, 1).Value = "A"
End With
With Worksheets("Sheet2")
.Cells(23, 1).Value = "A"
End With
Next
Application.ScreenUpdating = False ' This takes 0.96 seconds
Set Wsht1 = Worksheets("Sheet1")
Set Wsht2 = Worksheets("Sheet2")
For Count = 1 To 10000
With Wsht1
.Cells(23, 1).Value = "A"
End With
With Wsht2
.Cells(23, 1).Value = "A"
End With
Next
Part 2 - Copying values from a worksheet to an array
Macro LoadFromTo1() is based on the opening code of your macro which loads the From and Too tables. It is slightly different because my test data is slightly different. It takes six-tenths of a seconds to load the From and Too table
Sub LoadFromTo1()
' Takes about .594 seconds for 50,000 rows * 2 columns
Dim s1 As Worksheet
Dim N As Long, i As Long
Dim From(), too()
Dim InxFromTo As Long
Dim TimeStart As Single
TimeStart = Timer
Set s1 = Sheets("Test1") ' contains the data
s1.Activate
N = Cells(Rows.Count, 3).End(xlUp).Row
ReDim From(1 To N - 1)
ReDim too(1 To N - 1)
For i = 2 To N
From(i - 1) = Cells(i, 3).Value
too(i - 1) = Cells(i, 4).Value
Next i
Debug.Print "M1: " & Timer - TimeStart
For InxFromTo = 1 To 20
Debug.Print Right(" " & InxFromTo, 5) & " " & From(InxFromTo) & " " & too(InxFromTo)
Next
For InxFromTo = UBound(From) - 20 To UBound(From)
Debug.Print Right(" " & InxFromTo, 5) & " " & From(InxFromTo) & " " & too(InxFromTo)
Next
End Sub
Macro LoadFromTo2() (not shown) used With instead of Activate. With only only one Activate or With there was no noticiable change in performance.
Macro LoadFromTo3() uses a different technique for loading the tables. Insteading of importing the cell values one at a time, it imports then in a single statement: CellValue = .Range(.Cells(2, 3), .Cells(RowMax, 4)).Value. This statement will probably seems strange to you now. However, if you study and practice the technique, it will become second nature. I find LoadFromTo3() easier to code and understand than LoadFromTo1() and it is ten times as fast. I have read that importing a range as a unit rather than cell by cell can be fifty times faster although I have never achieved that level of performance improvement.
Sub LoadFromTo3()
' Takes about .0625 seconds for 50,000 rows * 2 columns
Const ColFrom As Long = 1
Const ColTo As Long = 2
Dim s1 As Worksheet
Dim RowMax As Long, RowCrnt As Long
Dim InxFromTo As Long
Dim TimeStart As Single
Dim CellValue As Variant
TimeStart = Timer
Set s1 = Sheets("Test1") ' contains the data
With s1
RowMax = .Cells(Rows.Count, 3).End(xlUp).Row
CellValue = .Range(.Cells(2, 3), .Cells(RowMax, 4)).Value
Debug.Print "M3: " & Timer - TimeStart
End With
For InxFromTo = 1 To 20
Debug.Print Right(" " & InxFromTo, 5) & " " & CellValue(InxFromTo, ColFrom) & _
" " & CellValue(InxFromTo, ColTo)
Next
For InxFromTo = UBound(CellValue, 1) - 20 To UBound(CellValue, 1)
Debug.Print Right(" " & InxFromTo, 5) & " " & CellValue(InxFromTo, ColFrom) & _
" " & CellValue(InxFromTo, ColTo)
Next
End Sub
Part 3 - Analysis of original code
First some definitions. The Target range is the column of words to be translated. The Target table is the the Target range loaded into memory. The FromTo Range is the From and To columns. The FromTo table is the the FromTo range loaded into memory.
This section has taken longer than I originally planned. Initially I only intended to conduct a proper test of my code. With the error in the code provided in an earlier answer and the implication there was another error, I was especially careful. I coded a routine that generated test data of varying sizes. I used that routine to generate the 50,000 row FromTo table I used for the timings in Part 2. I made some minor changes to the translate routine you published and ran it against my test data to give the following durations:
FromTo Target Target Duration
Rows Rows Cols in secs
50,000 20 1 103
50,000 10,000 1 486
10,000 3,000 2 60
The key section of my modified code was:
With RngTgt
For RowFromTo = 1 To UBound(FromToTable, 1)
.Replace What:=FromToTable(RowFromTo, ColFrom), _
Replacement:="__" + Str(RowFromTo), _
LookAt:=xlWhole
Next
For RowFromTo = 1 To UBound(FromToTable, 1)
.Replace What:="__" + Str(RowFromTo), _
Replacement:=FromToTable(RowFromTo, ColTo), _
LookAt:=xlWhole
Next
End With
I use a range so the Target range can be in any worksheet and be multi-column. I did not see the point of having such a large prefix and suffix which must result in a time penalty so I reduced to a prefix of two underscores. I include the LookAt:=xlWhole correction. I use a ToFrom table loaded from range. I have replaced i by the meaningful name RowFromTo to make the code more understandable.
There are no comments saying what this code does and why it does it this way. It is essential to include adequate comments in a macro. Will you remember what this macro does in six or twelve months? What if a colleague needs to amend it?
In your first FromTo table, "for" translates to "on" and "on" translates" to "upon". Without the two pass solution. "for" might be translated to "upon". This is my guess and it seems reasonable but there should have been a comment so I did not need to guess. A new programmer taking responsibility for this macro might be unable to guess and might "improve" the code by removing the "unnecessary" second pass.
The code behind the Replace method will be as efficient as its authors can manage but it will still have to examine every cell in the range. I have improved the situation by searching a specified range rather than the whole worksheet. However, it still has to search that range twice for every row in the FromTo table.
Studying this code I saw an immediate improvement. The first pass replaces the words in the Target table with "__1", "__2", "__n" and so on where 1, 2 and n are indices into the FromTo table. The second pass searches for "__1", "__2" and "__n". A better technique is to extract the 1, 2 and n and use them to access the correct entry in the FromTo table. With this better technique the duration were:
First Second
FromTo Target Target duration duration
Rows Rows Cols in secs in secs
50,000 20 1 103 52
50,000 10,000 1 486 257
10,000 3,000 2 60 32
That is, I halved the duration by changing the code to:
With RngTgt
For RowFromTo = 1 To UBound(ToFromTable, 1)
.Replace What:=ToFromTable(RowFromTo, ColFrom), _
Replacement:="__" + Str(RowFromTo), _
LookAt:=xlWhole
Next
End With
For Each Cell In RngTgt
Test = Mid(Cell.Value, 3)
If IsNumeric(Test) Then
Cell.Value = ToFromTable(Val(Test), ColTo)
End If
Next
However, I believed the basic approach to be wrong. If there are FT entries in the FromTo range and T entries in the Target range then:
With approach 1 there are 2 * FT replaces that require the searching of T cells.
With approach 2 there are FT replaces that require the searching of T cells and then T replaces.
Since the same word may repeat within the Target range, the search of the Target range must examine every cell. But what if we searched the From column of the FromTo rangw for each entry in the Target range? Unless there are a large number of words in the Target range that are missing from the FromTo range, an average FT / 2 entries will be examined before a match is found. Also there is no need for a second pass. We would expect a third approach based on this logical, to half the approach 2 durations.
I recoded the main part of my routine and repeated the tests
First Second Third
FromTo Target Target duration duration duration
Rows Rows Cols in secs in secs in secs
50,000 20 1 103 52 .13
50,000 10,000 1 486 257 61.51
10,000 3,000 2 60 32 7.54
This as a much bigger reduction in duration than I was expecting. I have some guesses as to the reason but I have not investigated further. I believe the final durations are acceptable. I have one further idea but I do not think it is worth the time to imvestigate.
The major lesson of the above is: think about your implementation startegy before implementing it. To me technique 3 is obviously superior to techniques 1 and 2 and I would have started with that technique. Some time spent on initial design can repay itself hansomely.
Part 4 - Final solution
You posted two workbooks each with the Target range in Sheet1 and the FromTo range in Sheet2. I created one workbook with the data from the second workbook copied to Sheet3 and Sheet3.
I amended your macro to call my macro:
Option Explicit
Sub xLator2()
Dim RngTgt As Range
Dim RngFromTo As Range
Dim RowMax As Long
Dim TimeStart As Single
With Worksheets("Sheet1")
RowMax = .Cells(Rows.Count, "A").End(xlUp).Row
Set RngTgt = .Range(.Cells(1, "A"), .Cells(RowMax, "A"))
End With
With Worksheets("Sheet2")
RowMax = .Cells(Rows.Count, "A").End(xlUp).Row
Set RngFromTo = .Range(.Cells(1, "A"), .Cells(RowMax, "B"))
End With
TimeStart = Timer
Call Translate3(RngTgt, RngFromTo)
Debug.Print "Sheet1 technique 3 duration: " & Timer - TimeStart
With Worksheets("Sheet3")
RowMax = .Cells(Rows.Count, "A").End(xlUp).Row
Set RngTgt = .Range(.Cells(1, "A"), .Cells(RowMax, "A"))
End With
With Worksheets("Sheet4")
RowMax = .Cells(Rows.Count, "A").End(xlUp).Row
Set RngFromTo = .Range(.Cells(1, "A"), .Cells(RowMax, "B"))
End With
TimeStart = Timer
Call Translate3(RngTgt, RngFromTo)
Debug.Print "Sheet3 technique 3 duration: " & Timer - TimeStart
End Sub
My macro for technique 3 takes two ranges as its parameters so there can be multiple Target and FromTo ranges which can be placed where ever convenient:
Sub Translate3(ByVal RngTgt As Range, ByVal RngFromTo As Range)
' RngTgt A rectangle containing words to be translated
' RngFromTo Two columns with the left column containing the original values
' for words and the right column containing the values to replace
' the original values.
' Constants numbering the From and To columns within RngFromTo. This makes the
' code easier to understand than if 1 and 2 had been used.
Const ColFrom As Long = 1
Const ColTo As Long = 2
Dim ColTgtCrnt As Long
Dim Test As String
Dim RngFindStart As Range
Dim RngFrom As Range
Dim RngTemp As Range
Dim RowFromTo As Long
Dim RowTgtCrnt As Long
Dim TgtTable As Variant
' Check FromTo range has two columns
If RngFromTo.Columns.Count <> 2 Then
Call MsgBox("ToFrom table must have two columns", vbOKOnly)
Exit Sub
End If
' Load Target range to array
TgtTable = RngTgt.Value
' Set RngFrom to the From column of RngFromTo
Set RngFrom = RngFromTo.Columns(ColFrom)
' Set RngFindStart to first cell of RngFrom
Set RngFindStart = RngFrom.Rows(1)
' Loop for every entry in Target table
For RowTgtCrnt = 1 To UBound(TgtTable, 1)
For ColTgtCrnt = 1 To UBound(TgtTable, 2)
Set RngTemp = RngFrom.Find(What:=TgtTable(RowTgtCrnt, ColTgtCrnt), _
After:=RngFindStart, _
LookAt:=xlWhole)
If Not RngTemp Is Nothing Then
' This target cell is to be translated
' Replace value in Target table with value for To column of FromTo table
TgtTable(RowTgtCrnt, ColTgtCrnt) = RngTemp.Offset(0, ColTo - ColFrom).Value
End If
Next
Next
' Upload updated array back to Target range
RngTgt.Value = TgtTable
End Sub
There is a lot to study here. Take it slowly and look up any statement you do not understand. Come back with questions if necessary but the more you understand on your own, the faster you will develop.