Possible faster VBA lookup with wildcards? - excel

I need to perform a quite a lot of lookups with wildcards on the worksheet using a macro (mainly lookup for value & returning the value from another column though with proper adjustment it can be also just looking for a value with wildcard, and some lookups only as checks if the value exists in the dataset). My data can't be sorted and all the lookups are within a loop A or loops within loop A; wildcards are included mostly for condition "string begins with...". I often have to find a value in one row and find corresponding value in row N rows below or above.
I have a working code, but I wonder if it can be done faster. #response to comment about posting it on Code Review (sorry, I cannot comment yet :)) - preparation the whole code to posting would take a bit too much time for me, confidentiality etc, so I prefer to treat it as a general question to be worked on this example.
Example data (I can add more columns, if I need any helper column):
Example Data picture at Imgur
Assume 100 000 rows (max xPagesCount = 1000, typically around 400; all values for certain xPage is in one block). Due to a lot of possible rows with additional data I can't simply find one value and add numbers to the found row to find the other values by their position.
Example lookups to perform while looping through consecutive xPages (so, for each given xPage):
value in row just below row with "RESTRICTIONS:" text
find name (which is always given with height (column C) = 35)
find RSW number (which can be in several rows depending on page content, but always below name)
find all rows starting with the same four digits as RSW, in two formats: DDDD.LLL.DD and DDDD.DDDDD.DD (L letter, D digit) (I use internal loop here)
check if there is a text "MASTER" (or "MASTER " etc.)
find all values between values "DOCUMENTS:" and "OPTIONS:", which quantity can be from 1 to 50 (I use internal loop here)
I was wondering, what is the fastest way to do such lookups?
What I tried:
using a dictionary on all dataset (keys in column A or C with, values
col.D) but as dictionary can't work on wildcards, I had to add ifs
for not finding a key to perform additional Application.Match
lookup... and then realized it mostly worked on these Match lookups
and not sure I even need a dictionary. I also have duplicate values
within a page and dictionary was getting only first value, regardless
their position (for example, several attachments could have value 1).
The main use remained dict.exists("MASTER") but when I removed
dictionary and changed it to IsError(Application.Match(...)) the code
worked slightly faster.
Application.Match in whole range, typical example: Application.Match(xPage & "4???.*", sh.Range("A1:A" & LastRow), 0)
in few places I use If xValue Like "????.???.??" Then construction
I have dictionary lookups with ifs redirecting to Application.Match:
xValue = dict(xPage & "ATH.416")
If dict(xPage & "ATH.416") = "" Then xValue = Application.Match("ATH.*", Sheets(1).Range("D:D"), 0)
What I consider, but not sure it's worth the effort:
altering the code that at the beginning of the iteration I find the first and the last row for xPage, and then each later check is performed in this range
xStartPage = sh.Range("D" & Application.Match(xPage, sh.Range("A1:A" & LastRow), 0))
'or, I guess better:
xStartPage = xEndPage + 1
If xPage = xPagesCount Then
xEndPage = LastRow
Else
xEndPage = sh.Range("D" & Application.Match(xPage + 1, sh.Range("A1:A" & LastRow), 0) - 1)
End If
xValue = sh.Range("D" & Application.Match("4???.*", sh.Range("D" & xStartPage & ":D" & xEndPage), 0)).Value

Related

Comparing Strings Producing Unexpected Results

I have a list of data and I created a form to enter new data to be added to the list. Upon the click of a button it will take the information (name and email address) from the form and add it to the corresponding sheets in alphabetical order. There are linked cells involved so I can't just add this to the bottom and sort. Instead, I have it searching the last name cell in the correct sheets to insert a row into the correct location.
This was working as expected for the most part until I came along a possibly unique situation that I can't figure out.
Basically, I have an if statement checking to see if the name is a duplicate and afterwards checking to see if the if the new name should be inserted.
For i = 2 To lastrow
''^^IF STATEMENT CHECKING FOR DUPLICATE^^''
'''''''''''''''''''''''''''''''''''''''''''
'''vvIF STATEMENT CHECKING TO ADD DATAvv'''
ElseIf StrComp(lastname, searchl) = 1 And StrComp(lastname, searchl2) = -1 Then
Sheets("Master List").Range("A" & i).Offset(1).EntireRow.Insert (xlDown)
Sheets("Master List").Range("A" & i + 1).Value = firstname
Sheets("Master List").Range("B" & i + 1).Value = lastname
Sheets("Master List").Range("C" & i + 1).Value = fullname
Variables searchl and searchl2 are the last names from search rows i and i + 1, respectively.
My problem is that when I tried to add the last name "Kralik" it tried to insert the data between the last names "Day" and "de Castro"
Originally, I tried comparing the names using the line of code below:
ElseIf lastname > searchl And lastname < searchl2 Then
This executed the exact same way as the code outlined above. I then inserted a break point and decided to use the StrComp method for troubleshooting. Comparing "Kralik" to "Day" produced results expected but the problem occurs when comparing "Kralik" to "de Castro". For some reason, the code thinks "Kralik" is less than "de Castro" and enters the if statement to insert the data at that location. Even more head scratching for me is that I opened a new workbook and quickly typed "Kralik" into A1, "de Castro" into A2 and the formula "=A1>A2" into A3. The formula gave a result TRUE which is what I would have expected from VBA as well.
EDIT: After more tests, I think it must have something to do with the capitalization of "Kralik" vs. "de Castro" my code works as expected as long as the "k" in "Kralik" is uncapitalized. I will use the UCase method on my variables and come back with the results.
EDIT 2: Using UCase works as well. Outlined by GSerg's answer below as to why my original method was not working.
Excel formulas use case insensitive comparisons by default.
VBA uses case sensitive comparisons by default.
If you want case insensitive comparisons, either put
Option Compare Text
at the beginning of the code module to make all text comparisons in that code module case insensitive by default, or request a comparison type in each specific comparison:
ElseIf StrComp(lastname, searchl, vbTextCompare) = 1 And StrComp(lastname, searchl2, vbTextCompare) = -1 Then
On top of that, you should be using binary search in your particular case to find the position to insert. MATCH with match_type = 1 will return you position in a sorted list where the value should go.

Prevent Excel converting number range to Date format

I'm using Excel to translate data from one system which outputs .csv files to another which can read in .xls files. One column is sizes, which frequently includes terms like 4-6, 6-8, etc. When the csv is opened in Excel, these are automatically(?) converted to dates (6-Apr, 8-Jun, etc.) I can use
.NumberFormat = "m-d"
to get the data to *look like it should, but of course it's still a date value, and any operations I try to perform with it convert back to date. Is there any way to convert it to text in exactly that format? If I declare the .NumberFormat as "#", it just changes the value to a serial. I use the size, along with several other columns to create one long product ID code for inventory purposes.... when the size changes to a date value, I end up missing some of the data.
I found a solution to this problem that runs entirely from VBA - i.e., doesn't depend on special import conditions. I would appreciate any pointers or recommendations, especially if this has the potential to create unforeseen errors:
Dim Size as String
For i = 2 To LastRow
If IsDate(Range("H" & i).Value) Then
Size = Month(Range("H" & i)) & "-" & Day(Range("H" & i))
Range("H" & i).NumberFormat = "#"
Range("H" & i) = Size
End If
Next i
The column I'm testing includes size values that are string (XL, SM, 12T, etc), along with values that get converted to dates (4-6, 6-8). This seems to convert it back to a string value, and then properly treats it as a string value on all future uses (concatenations and logical tests)

Macro to Group Rows Throughout Excel Document

I have looked around and I feel like I am going crazy for not understanding how to do this, or what to do. It seems simple and yet I cannot figure out the best method.
I have an excel document that has 8 rows of data and that is supported by individual data from individuals that is 16 rows long. In total, there are 600 individuals in the dataset.
I was trying to locate a macro that would simply allow me to group every 16 rows in my excel sheet together. Whatever I have tried though, has not worked.
I am using Microsoft Excel 20116 for mac.
Range("1:16").Select
Selection.Rows.Group
Range("17:32").Select
Selection.Rows.Group
Repeat!
If there is a predictable data pattern (i.e. always exactly 16 rows) you might wish to put this inside a loop, where Range.Select is offset a further 16 rows each time.
For example:
i = 1
j = 16
While i < "YOUR UPPER LIMIT"
Range(i & ":" & j).Select
Selection.Rows.Group
i = i+16
j = j+16
Next
Note that if you group contiguous rows without a break, your groupings will automatically combine. You will need to play around with the line Range(i & ":" & j).Select to either use i+1 or j-1 depending on what you want as your display row on the grouping.

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Excel: create sortable compound ID

All, I asked a question "Excel VBA: Sort, then Copy and Paste" and received two excellent answers. However, because I failed to provide sufficient user requirements, they won't work: I asked for a fix to the existing solution I created, instead of specifying the actual business need and seeing if anyone has a better way.
(sigh) Here goes:
My boss asked me to create a ss to log issues. He wants a compound ID that concatenates the "Assigned Date" with a number that indicates what number issue it is for that day only. A new day, the count must restart at 1. E.g.:
Assigned Issue Concatenated
Date & Count = ID
5/11/2011 & 1 = 5112011-1
5/11/2011 & 2 = 5112011-2
5/11/2011 & 3 = 5112011-3
5/12/2011 & 1 = 5122011-1
I solved this with a hidden column C that calculates =IF(D2<>D1,1,C1+1), thus calculating the Issue Count by incrementing the previous issue count if the assigned date in column D is the same as the previous date, and starting over at 1 when the date changes. Another column concatenates the assigned date and the issue count, and I have my issue ID.
Quick, easy, elegant, in, out, and done. Right? But when I delivered the ss, he pointed out that if you (that is, he) sorts any part of the spreadsheet, the issue ID goes out of sequence. Of course---each formula isn't referencing the previous date in sequence if the rows are sorted out of Assigned Date order.
My immediate thought, which prompted my previous question, was to first re-sort the Assigned Date order correctly, then copy and paste the value of the calculated Issue Count to lock it in, and thus preserve the concatenated ID.
The only other way I can see to do this (in VBA, natch) is to:
evaluate all the dates in the Assigned Date column
evaluate all the numbers in the Issue Count column
calculate the latest sequential Issue Count for an a new item assigned on a given Assigned Date
Assign that sequential Issue Count to the new item
It'd be nice to then place the cursor into the next cell that the user would ordinarily go to, which would be the one right adjacent to the just-entered Assigned Date; however, that isn't necessary
That would avoid the need to re-sort the physical ss. However, besides a hazy guess that this would involve VLOOKUP, I got nothing. I couldn't find anything through searching.
Can anyone help? Or suggest a place to go? Thanks!!!
Sounds like you just want to automate a Paste Special action. The following replaces the formulas in a1:a100 with their calculated values:
Set src = ActiveSheet.Range("a1:a100")
src.Copy
src.Select
Selection.PasteSpecial Paste:=xlPasteValues, _
Operation:=xlNone, _
SkipBlanks:=False, _
Transpose:=False
I think the formula =IF(D2<>D1,1,C1+1) could be improved as this relies on dates being in order. The following will preserve the count for any order that is sorted
Assume
ColA ColB ColC
Row1 Assigned_Date Issue Count Concatenate
Row2 05/11/2011 =COUNTIF($A$1:A2,A2) =TEXT(A2,"ddmmyyyy")&"-"&B2
Row3 05/11/2011 =COUNTIF($A$1:A3,A3) =TEXT(A3,"ddmmyyyy")&"-"&B3
Row4 05/12/2011 =COUNTIF($A$1:A4,A4) =TEXT(A4,"ddmmyyyy")&"-"&B4
Row5 05/11/2011 =COUNTIF($A$1:A5,A5) =TEXT(A5,"ddmmyyyy")&"-"&B5
Essentially enter B2 and C2 formulae and drag down. You might need to swap ddmmyyyy to mmddyyyy as we use dates first rather than months :)
Also, note the locking of the first part of the range only using $ - $A$1:Ax
This works perfectly for your current question but does not work if the Issue Count is assigned in time order per date.
How about using a procedure? Just click a button to add the next entry.
I've assumed that the entries will be given today's date and that the sheet layout is:
Rows: 1 = Title / 2 = left blank / 3 = Headings of the data block
Columns: A = Date / B = Issue Count / C = Combined ID / D etc = other data
Sub AddEntry()
Dim iDayRef As Long, iNumRows As Long, n As Long
With Range("A3")
iNumRows = .CurrentRegion.Rows.Count
For n = 2 To iNumRows
If .Cells(n, 1).Value = Date Then
If .Cells(n, 2).Value > iDayRef Then iDayRef = .Cells(n, 2).Value
End If
Next
.Cells(iNumRows + 1, 1).Value = Date
.Cells(iNumRows + 1, 2).Value = iDayRef + 1
.Cells(iNumRows + 1, 3).Value = Format(Date, "mm/dd/yyyy") & " - " & iDayRef + 1
.Cells(iNumRows + 1, 4).Select
End With
End Sub
And do you really need three columns for Date, Count, and Combined ID? If you went with a
yyyy/mm/dd - xx
ID format, one column could replace all three, and you could easily sort on it.

Resources