Find the nth instance of a number in a range of cells - excel

I have a spreadsheet that contains entries for a general ledger account that we need to reconcile.
Ideally every entry starts with a set of numbers but sometimes due to human error or other reasons, the entries might have the number within the string of characters. For example, this is how the entry should look:
33333333 Name 12345
But sometimes it can look like this:
Name 33333333 12345
Or
12345 Name 33333333 Fee
What I'm trying to do is have a search for the number and tell me all the different cells that contain but don't necessarily start or end with that 33333333 number. Right now I have figured out how to get the first instance of the number, return the cell reference and then pull the value from the cell reference. Here is the formula for the cell reference:
=CELL("address",INDEX(C9:C199,MATCH("*"&B1&"*",C9:C199,0)))
Where C9:C199 is the range of cells I'm looking up and B1 is the Lookup cell, which will change depending on what number the Searcher wants to lookup. For example, the result could be $C$19 and then I have the Indirect function giving me the value of that cell reference. But I want to be able to pull the 1st, 2nd, 3rd, etc instance of a cell reference because there could be many, up to 5 or 6 cells that contain the number in some way (not all of which are relevant to what we're trying to reconcile but we want to see all the instances). So how do I get the 2nd instance of this formula, such that if there was a second answer of $C$26, I could also do an indirect reference to that cell and get that value as well.
=CELL("address",INDEX(C9:C199,MATCH("*"&B1&"*",C9:C199,0)))
My ultimate goal is to be able to easily see the following such that visually I can easily see that they offset. I've thrown in a random "fee" word because some users like to throw in random stuff like that:
33333333 Name 12345 Debit $500
Name 33333333 12345 Fee Credit $500
33333333 Name 12345 Other Fee Debit $250
Visually you would see the two $500 fees offset and then the remaining would be the $250 entry. Ultimately I would delete the two $500 entries to have the remaining $250 as the leftover to be offset at a future date. I can do the deleting of entries once I figure out how to get the nth cell reference instance.
Thank you!

You can try a custom function like this if you don't mind messing around with VBA and have Excel 365 or 2021 (dynamic array support is required for this function to be used in a cell). Add the function to a new module in the project and call it with =GetMatches(search range, search term). Also note, I've made this case insensitive with the use of LCase on both the search range and search term.
Function GetMatches(rng As Range, searchTerm As String) As Variant
Dim c As Range
Dim matches As New Collection
Dim output() As Variant
Dim iMax As Integer
Dim i As Integer
Set rng = Intersect(rng.Worksheet.UsedRange, rng)
For Each c In rng
If InStr(1, LCase(c), LCase(searchTerm)) > 0 Then matches.Add c
Next c
iMax = matches.Count
ReDim output(1 To iMax, 1)
For i = 1 To iMax
output(i, 0) = matches(i)
output(i, 1) = matches(i).Address
Next i
GetMatches = output
End Function

One way to do this, with Power Query:
Name cell B1 as 'tblSearchTerm' (create a ListObject) then name the source table as 'Table1' (create a LisObject).
If not shown, open the pane "Queries & Connections' (Data>Queries & Connections).
In 'ResultingSearchTable' right click and choose 'Load to...' then choose a starting cell to load the resulting search table.
If any term is found, the data will be listed as follows:
|RowNumber|NumberInRow|
RowNumber is the DataBodyRange row number of the ListObject.
let
tblSearchTerm = Excel.CurrentWorkbook(){[Name="tblSearchTerm"]}[Content],
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"AddedIndex" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
#"DividedColumnByDelimiter" = Table.ExpandListColumn(Table.TransformColumns(AddedIndex, {{"TextLines", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "TextLines"),
#"ChangedType" = Table.TransformColumnTypes(#"DividedColumnByDelimiter",{{"TextLines", type text}}),
#"ReplacedValue" = Table.ReplaceValue(#"ChangedType","$","",Replacer.ReplaceText,{"TextLines"}),
#"ChangedType1" = Table.TransformColumnTypes(#"ReplacedValue",{{"TextLines", type number}}),
RemovedErrors = Table.RemoveRowsWithErrors(ChangedType1, {"TextLines"}),
#"RenamedColumns" = Table.RenameColumns(RemovedErrors,{{"Index", "RowNumber"}, {"TextLines", "NumberInRow"}}),
#"FinalFilter" = Table.NestedJoin(#"RenamedColumns", {"NumberInRow"}, tblSearchTerm, {"SearchTerm"}, "tblSearchTerm", JoinKind.Inner),
#"ResultingSearchTable" = Table.SelectColumns(FinalFilter,{"RowNumber", "NumberInRow"})
in
#"ResultingSearchTable"

With data in C2:C199 (adjust as needed)
B2: (the address) =ADDRESS(AGGREGATE(15,6,1/ISNUMBER(FIND($B$1, $C$2:$C$199))*ROW($C$2:$C$199), ROW(INDEX($A:$A,1):INDEX($A:$A, COUNTIF(C2:C199,"*"&$B$1&"*")))),3)
A2: (the contents) =INDEX($C:$C,AGGREGATE(15,6,1/ISNUMBER(FIND($B$1, $C$2:$C$199))*ROW($C$2:$C$199), ROW(INDEX($A:$A,1):INDEX($A:$A, COUNTIF(C2:C199,"*"&$B$1&"*")))))

Related

How to combine multiple columns under one column with no change in column name in Excel?

Currently have an input file of the format shown below
The expected output should
How to achieve this thru Excel?
Here's a VBA solution to your problem. It can handle any number of columns and any number of rows. It would require modification if your columns aren't blank below the significant data shown in your sample.
Sub ColumnsToList()
' 148
Dim Arr As Variant ' data array
Dim Fun As Variant ' output array
Dim C As Long ' loop counter: columns
Dim R As Long ' loop counter: rows
Dim i As Long ' Fun index
With ActiveSheet.UsedRange
Arr = .Value
ReDim Fun(1 To .Cells.Count * 2)
End With
For C = 1 To UBound(Arr, 2)
For R = 2 To UBound(Arr)
If Len(Arr(R, C)) Then
i = i + 2
Fun(i) = Arr(R, C)
Fun(i - 1) = Arr(1, C)
End If
Next R
Next C
If i Then
ReDim Preserve Fun(1 To i)
With ActiveSheet
' this specifies the first empty column on the source sheet for output
' specify another cell like this:-
' Worksheets("Sheet1").cells(1, "A").Resize( ... continue as below
.Cells(1, .UsedRange.Columns.Count + 1).Resize(UBound(Fun)).Value = Application.Transpose(Fun)
End With
End If
End Sub
Edit 30 Dec 2020 In response to the claim that the above code returns #N/A errors from column 1750 onward I used the code below to create a set of data which I believe might be similar to the actual data you use.
Private Sub CreateData()
' 148
Dim C As Long
Dim R As Long
Dim L As Integer
Application.ScreenUpdating = False
For C = 1 To 5000
Cells(1, C).Value = "PIN " & C
For R = 2 To Int(4 * Rnd) + 2
With Cells(R, C)
.Value = .Address(0, 0)
End With
Next R
Next C
Application.ScreenUpdating = True
End Sub
I then ran my above procedure ColumnsToList on the data thus created. I was amazed at the speed with which more than 25000 rows were produced, instantly and without any errors.
If you have Office 365 then you can try below formula. Otherwise using VBA is better bet.
=IFERROR(FILTERXML("<t><d>"&TEXTJOIN("",TRUE,IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>",""),IF($B$2:$B$6<>"",$B$1&"</d><d>"&$B$2:$B$6&"</d><d>",""),IF($C$2:$C$6<>"",$B$1&"</d><d>"&$C$2:$C$6&"</d><d>",""),IF($D$2:$D$6<>"",$D$1&"</d><d>"&$D$2:$D$6&"</d><d>",""))&"</d></t>","//d["&ROWS($A$1:$A1)&"]"),"")
First portion of this formula is IF where we build a concatenated expression for each column which builds up data in PIN1</d><d>main1.txt</d><d> form:
IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>","")
All 4 columns are joined together by TEXTJOIN formula to build a valid XML data. And then we extract using the FILTERXML formula. This may seem a little complicated to begin with but it is fairly straightforward once you read information in below link:
Excel - Extract substring(s) from string using FILTERXML
Here is a Power Query solution (available in Excel 2010+).
It should adapt to any changes in number of rows or columns.
It should also ignore blank entries in the table.
See the comments in the code for the algorithm, and explore the Applied Steps window to see what happens at each step.
To open the PQ editor, in later versions of Excel
select some cell in the data table
Data => Get & Transform => From Table/Range
Be sure to change the Table name in line 4 to match the real name in your workbook
You can then paste the code into the Advanced Editor accessible from the HOME / Query tab of the UI.
M Code
let
//Change table name in next line to match the REAL table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Unpivot all the columns to generate a two column table
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
//Sort by Attribute (column Header), then by Value (column data)
//May need to create a Custom Sort for the data if it does not sort readily into what you want
#"Sorted Rows" = Table.Sort(#"Unpivoted Columns",{{"Attribute", Order.Ascending}, {"Value", Order.Ascending}}),
//combine the two columns into an alternating List of Header/Data
zipList = List.Zip({Table.Column(#"Sorted Rows","Attribute"),
Table.Column(#"Sorted Rows","Value")}),
#"Converted to Table" = Table.FromList(zipList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
//Expand the list into new rows
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
in
#"Expanded Column1"
This is a very fiddly problem to do in Excel efficiently... but I have an inefficient solution for you.
If you know the range you're extracting from, this will provide you with the right answer:
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
So, what is this doing? I'm using a LET function to make it a bit more readable.
The first steps are to take in the input range, and calculate how many rows and columns we have, and store those in variables (InputRowCount and InputColumnCount, respectively):
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
The next step will be to design a number of ranges to help us. Excel now handles dynamic ranges, so, we can generate some ranges to help us pick out the right data
I use a lot of MOD functions. MOD is basically the remainder after division, and if you take the MOD of a default SEQUENCE, the result will be a list of numbers that increase from 0 to one less than the divisor, and then it will go back to 0 and rise again. So, if I were to take MOD(SEQUENCE(Rows * Columns,, 0), Rows), the resulting list of numbers will increase from 0 to Rows, Column times.
That's what these are for:
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0)
RowIndexSkip, MOD(RowSequence,2)
RowSequenceSkip, (RowSequence-RowIndexSkip)/2
RowIndex, MOD(RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount + 1)
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1
RowIndexSkip is basically a list of alternating 0's and 1's, that we're going to use below to track whether we're inserting a header or a data item.
And for the final options, bringing it all together:
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
The above basically creates a listing of every data item, interleaved with the headers, as you requested. However… we still have to apply a filter, because your original data shows that there can be blank spaces.
If, on the other hand, you only know the columns, but you don't know how many rows, this should do the trick:
=LET(
InputRange, Sheet1!A:D,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This time, you can see that I've slightly changed how it finds the final row.
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
That MMULT isn't very efficient, but it will at least ensure that this formula won't be thrown off by blank spaces or columns with different amounts of data.
And finally, if you don't know how many columns OR rows there are, but only know the name of the sheet, this is how I would do it:
=LET(
InputSheet, "Sheet1",
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
LongRange, INDIRECT(InputSheet&"!A:"&LastColumnLetters),
CheckIsBlank_FullRange, 1-ISBLANK(LongRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(InputColumnCount,,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(LongRange)),CheckIsBlank_ByRow>0)),
InputRange, INDIRECT(InputSheet&"!A1:"&LastColumnLetters&InputRowCount),
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This code is even slower and more confusing, because there's so much to be done, but it will check to see that it has every column. It does it with this:
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
This code searches the first row of the target worksheet, looking for the last column. It does this with the -1 command in the XMATCH, this tells excel to start looking from the right/bottom, and work up/left.
Then I've included some code for addressing the relevant columns:
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
You could use an INDEX to generate this, but it will actually slow down quite a bit, because Excel would be essentially loading all 1048576 * ColumnCount cells into memory (and this code already does this once... no need to do it twice).

Excel VBA: Conditional Format of Pivot Table based on Column Label

(Note: This is my first question. Please let me know if I can improve how I ask or explain)
I have created a module that can look through a pivot table on any excel worksheet and apply conditional formatting to each column to create quartile performances.
The pivot table can be any size and have different columns at any point - so I can't explicitly reference a particular column name or caption etc
Additionally, the quartile reporting identifies the top 25% for a particular figure, the next 25, next 25 and bottom 25.
For some figures high is better; for others, low is better.
So I list all the values that are high-to-low in an array, then run a quick "if name in array, rank this way; otherwise, rank this way" function.
All of this works like a dream - until we come to certain fields that result in an ambiguous name issue (and error). It seems that some caption names are similar to the source or database names.
The code reads like this (below).
Any ideas how I can refer dynamically to the column name and identify the column beneath it, please?
Dim myColumnNames As Variant
myColumnNames = VBA.Array("CONTRACTGROSSVOLUME", _
"MigrationVolume") 'etc
' Set colour choices for quartiles
Dim myQuartile1 As Long
Dim myQuartile2 As Long
Dim myQuartile3 As Long
Dim myQuartile4 As Long
myQuartile1 = RGB(146, 208, 80) 'Top Quartile - Green
myQuartile2 = RGB(255, 255, 0) '2nd Quartile - yellow
myQuartile3 = RGB(255, 192, 0) '3rd Quartile - orange
myQuartile4 = RGB(255, 0, 0) 'Bottom Quartile - red
Dim myRankingFactor As Boolean
myRankingFactor = True 'true is low to high / false is high to low ranking
Dim myPivotTable As PivotTable
Dim myPivotTableName As String
For Each myPivotTable In ActiveSheet.PivotTables
myPivotTableName = myPivotTable.Name
Next
Dim myPivotField As PivotField
Dim myPivotSourceName As String
Set myPivotTable = ActiveSheet.PivotTables(myPivotTableName)
For Each myPivotField In myPivotTable.DataFields
'get the source name for the pivot field
myPivotSourceName = myPivotField.Name
'Check if column name is in our list to rank high-to-low
If Not IsError(Application.Match(myPivotSourceName, myColumnNames, False)) Then
'This column name is in our list of names that should be ranked high-to-low (ie. Higher is better)
myRankingFactor = True
Else
'This column name is not in our list and should be ranked low-to-high (ie. lower is better)
myRankingFactor = False
End If
The error then comes on the following line:
myPivotTable.PivotSelect (myPivotSourceName), xlDataOnly, True
I've tried refering to the column with .caption, .name etc - no avail.
Any dieas on what I need to do to dynamically get the column name, check if it's in the array of names, then refer to that entire column to apply my formatting, please?
Thanks
Additional info:
The value is passing (apparently) correctly.
The column name displayed is "My Volume" and the variable is displaying "My Volume" as the value.
It's source name is "MYVOLUME" (one word), which I've also tried referencing without success.
The error generated is:
Run-time error '1004': An item name is ambiguous. Another field in the
pivottable report may have an item with the same name. Use the syntax
field [item]. For example, if the item is Oranges and the field is
Product, you would use Product[Oranges].
As an addition, I just manually changed one of the column names to a unique, single word ("ABCDEFG") that is not present in the database, object or anywhere in the data output to see if it would be picked up.
Changing that alias/caption value worked fine and didn't error.
Summary:
It's behaving as if the column name is already used elsewhere in the pivot - but it is not.
How do I explicitly refer to the column name/caption/label, but on-the-fly? :)
FIXED!
I ensured that the pivot table column name was passed as a string:
myPivotSourceName = myPivotField.Name
Then rather than referencing the data field with the pivot field object, I referenced the DataRange with the string:
myPivotTable.PivotFields(myPivotSourceName).DataRange.Select
Works perfectly and is completely portable for any pivottable on any sheet with any fields
I could reproduce the error by having a data item with the same name as the one of the data fields. For example, if you have the following table from which you create a pivot:
Product Price
Cola 123
Fanta 456
Sum of Price 789
then by creating a pivot table, you will have these items: Cola, Fanta, 'Sum of Price', and the following field labels: 'Row labels', 'Sum of Price'.
If you try to use 'Sum of Price' in the PivotTable.PivotSelect function, then the error message in the question will appear.
I think the Name parameter in PivotSelect is not clear enough, I did not find the documentation of the naming convention used in it, so I recommend referring to the datafield explicitly:
myPivotTable.PivotFields("Sum of Price").DataRange.Select
Note: There are many stylistic errors a superfluous parts in your code, e.g. the parentheses in the line that causes the error are not required, the loops just select the last item.
myPivotTable.PivotFields(myPivotSourceName).DataRange.Select
This references the name of the column as a string. Works on all pivot tables I've tested it on.

How to fill data in excel sheet where date lies between a series of date ranges given in another sheet ? Also, a particular column should match

I'm working on Social Survey project.Due to discrepancies in data I'm stuck at a certain place. The survey conducting volunteers were given tablets with unique IDs. On different dates, the tablets were used in different cities
Sheet 1 one contains a list of around thousands of responses for which city names are missing and Sheet 2 contains a list of tablets in use in different cities on different dates.
Sheet 1
City DeviceID StartDate EndDate
Delhi 25 21-08-2014 26-08-2014
Mumbai 39 14-05-2014 21-05-2014
Chennai 91 17-11-2014 21-11-2014
Bangalore 91 11-10-2014 21-10-2014
Delhi 91 26-05-2015 29-05-2015
Hyderabad 25 23-05-2015 28-05-2015
Sheet 2
S.Id DeviceId SurveyDate City
203 91 15-10-2014 ?
204 25 24-08-2014 ?
I need to somehow fill up the values for the city column in Sheet 2.
I tried using Vlookup but being a beginner to excel, was unable to get things working. I managed to format the string in date columns as date.
But am unsure about how to pursue this further.
From my understanding, Vlookup requires that the date ranges to be continuous, with no missing values in between. It is not so in this case. This is real world data and hence imperfect.
What would be the right approach to this problem ? Can this be done with excel macros ?
I also read up a bit about nested if statements but am confused being a beginner to excel formulas and data manipulation.
There is two ways to do what you want.
The first one is using vba and create a macro to do the job BUT you will have to iterate through all your data multiple time (n1*n2 loops in the worst case scenario where n1 and n2 is the number of rows in it's table respectively) which is really slow if you have a lot of data.
The other way is a little more complicated and includes array formulas but is really faster than vba because it uses the build in functions of excel (which are optimized already).
So I will use a much simpler example and you can use that as you wish on your data.
I have the following tables:
Table1
city ID start end
A 1 3 5
B 3 4 6
C 3 5 8
Table 2
ID point city
3 5 ?
So we want a formula that completes the second table. where ID match exactly and point is between start-end. We are going to use MATCH and INDEX to get it.
Here it is:
=INDEX(A$2:A$4;MATCH(1;(B$2:B$4=G2)*(C$2:C$4<=H2)*(D$2:D$4>=H2);0))
First of all to run this after you write it you should not press enter but instead ctrl+shift+enter to tell excel to run it as an array formula otherwise it will not run at all.
Now we got that out of the way let me explain what is going on here:
The MATCH does the following:
match the value 1 (TRUE) in the range I created and that should be an exact match. But how the range is created? Lets take that part for example:
This B$2:B$4=G2 -gives-> {1;3;3}=3 --> {FALSE;TRUE;TRUE}
Similarly the second thing in the MATCH gives: {TRUE;TRUE;FALSE}
So now we have (keep in mind that the * is similar to logical AND):
{FALSE;TRUE;TRUE}*{TRUE;TRUE;FALSE} --> {FALSE;TRUE;FALSE}
and this combined with the third gives {FALSE;TRUE;FALSE}
So now we have MATCH(1;{FALSE;TRUE;FALSE};0) --> 2 because in the range only the second row matches the 1 (first row that it matches).
So now we just use index to get from another range whatever is on row 2.
You can use the above on your own data to get the expected results.
Good luck!
If the deviceId values should match and the survey date should be between the start date and end date, VLookup won't suffice. The following pointers, however, should get you started:
1) Define the date ranges from which the date comparisons should be made.
2) Use an overlap date checking function to determine if the date in question overlaps the start and end dates.
3) Loop through the date ranges and insert in Sheet2 when a match is found, i.e. when the deviceId values match and the date overlaps.
The following function takes as parameters the date to be checked, the start and end date and returns True, if dateVal overlaps the start and end date:
Function dateOverlap(dateVal As String, startDate As String, endDate As String) As Boolean
If DateDiff("d", startDate, dateVal) >= 0 And DateDiff("d", endDate, dateVal) <= 0 Then _
dateOverlap = True
End Function
Example usage
Debug.Print dateOverlap("05-10-2016", "01-10-2016", "10-10-2016") (returns true).
Here we use MEDIAN() as an easy way to test for "in-between".
Sub FillInTheBlanks()
Dim s1 As Worksheet, s2 As Worksheet
Dim N1 As Long, N2 As Long, i As Long, j As Long
Dim rc As Long, DeId As Long, sDate As Date
Dim wf As WorksheetFunction
Set s1 = Sheets("Sheet1")
Set s2 = Sheets("Sheet2")
Set wf = Application.WorksheetFunction
rc = Rows.Count
N1 = s1.Cells(rc, "A").End(xlUp).Row
N2 = s2.Cells(rc, "A").End(xlUp).Row
For i = 2 To N2
DeId = s2.Cells(i, "B").Value
sDate = s2.Cells(i, "C").Value
For j = 2 To N1
If DeId = s1.Cells(j, 2).Value Then
If sDate = wf.Median(sDate, s1.Cells(j, "C").Value, s1.Cells(j, "D").Value) Then
s2.Cells(i, "D").Value = s1.Cells(j, "A").Value
End If
End If
Next j
Next i
End Sub
Sheet2:
starting from Sheet1:

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Categorizing bank statements in Excel

I'm wanting to categorize a bank statement from a list of rules in excel. I've tried using a vlookup but I would like to be able to have non exact matches and as far as I know vlookup is not suited to this.
For instance if my statement looked like this and was located in worksheet "Statement"
Date | Transaction desciption | Amount
7/3/2013 | The Home Depot | $345.00
7/4/2013 | McDonald's #27 | $4.50
And I had a list of rules located in worksheet "Rules"
Rule | Category
The Home Depot | Home improvements
McDonald's * | Fast food
Is there a simple way to add another column using vba to the sheet "Statement" called Category that uses the rules to generate categories for each transaction?
Simple, no. I've done something similar in the past, this is how I did it.
1) Setup a rules page, I've called mine 'Patterns'. These patterns are setup with with the A column (from A2 on) being the 'name' and the B column being the regex pattern.
2) Load these into a public variable with the following code (I put a button on the patterns sheet to run this macro and load them into memory.):
Option Explicit
Public Patterns() As String
Sub LoadPatterns()
Dim cell As Range
Dim bRow As Range
Sheets("Patterns").Activate
'select column A, and load into first dimensino
Range("A2", Sheets("Patterns").Range("A" & Sheets("Patterns").Range("A:A").Rows.Count).End(xlUp).Address).Select
ReDim Patterns(Selection.Rows.Count - 1, 1)
For Each cell In Selection
Patterns(cell.Row - 2, 0) = cell.Value
Next
'select column B and load into the second dimension
Range("B2", Sheets("Patterns").Range("A" & Sheets("Patterns").Range("A:A").Rows.Count).End(xlUp).Address).Select
For Each cell In Selection
Patterns(cell.Row - 2, 1) = cell.Value
Next
End Sub
3) Create the following UDF, load the VB regex library as a reference in vba (Microsoft VBScript Regular Expressions 5.5, see http://www.macrostash.com/2011/10/08/simple-regular-expression-tutorial-for-excel-vba/) and call it on your transaction description as a formula after running step 2:
Public Function rxBanking(sName As String)
Dim x As Integer
'Get & load Patterns
Dim regex As New VBScript_RegExp_55.RegExp
Dim match
For x = 0 To UBound(Patterns)
regex.Pattern = Patterns(x, 1)
regex.ignorecase=True
match = regex.Test(sName)
If match Then
rxBanking = Patterns(x, 0)
Exit For
Else
rxBanking = "Unknown"
End If
Next
End Function
so for example, after you've loaded a pattern such as:
Category | RegEx pattern
--------------------------------
Home loan | INTEREST[\s]CHARGED
If your transaction data was in cell D1, then you could categorise it using the formula
=rxBanking(D1)
If you reload your pattern, you will need to re-copy your formulas down on the spreadsheet, as it doesn't automatically recalculate.
For help using regex (which even if you are familiar, you might need) I find a great testing ground is http://regexpal.com/

Resources