I am trying to achieve a fairly simple task in Excel, but I do not get the results that I want. I have a simple schedule in which I assign one of a pool of coaches to a series of matches, by filling out a simple table. Here is scaled-down version:
Match | John | Pete | Chris |
-------|------|------|-------|-------
1 | X | | | John
2 | | X | | Pete
3 | A | | X | Chris
4 | X | | A | Chris (!)
5 | | X | A | Pete
Legend: X: will coach; A: is available.
I used the table to register availability and then changed one A to an X in each row, to select the person that will actually coach the match.
For an overview, I decided to add a column in which the selected coach would appear. I used the following formula: =LOOKUP("X"; B2:D2; B$1:D$1) for row 2 and copied it to the other rows so that the row numbers of each row corresponded with the row in which the formula was placed.
To my surprise, match 4 became assigned to Chris, whereas John has an X and Chris only an A.
When I read Microsoft's documentation on LOOKUP, I noticed a few things:
LOOKUP has a vector form and an array form. Microsoft recommends using HLOOKUP for the array form, but I use the vector form. I do not think that HLOOKUP is useful for me, as it only looks for values in the first row of the specified array, whereas my first row contains the values to be returned.
It reads: "A range that contains only one row or one column.", but also "Important: The values in lookup_vector must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z, FALSE, TRUE; otherwise, LOOKUP might not return the correct value. Uppercase and lowercase text are equivalent."
I think that 2. is what causes the issue. I am not sure how to sort a range parameter like A2:A4. Microsoft documents a SORT function, but it's beta. Also, I think that sorting the search row will mess up the match, anyway.
The workaround I found is changing my codes to A: assigned and B: backup, in which A and B are chosen to be alphabetically ascending. If I change the formulas to use lookup value "A", this gives me:
Match | John | Pete | Chris |
-------|------|------|-------|-------
1 | A | | | John
2 | | A | | Pete
3 | B | | A | Chris
4 | A | | B | John
5 | | A | B | Pete
which is the result I want.
Can anyone shed some light on this completely counter-intuitive behavior and/or describe alternative ways to achieve this?
Notes:
I take care of only putting 1 X in each row.
I have seen this in Microsoft Excel for Office 365 MSO (16.0.11328.20362) 64-bit and in Microsoft Excel 2016 MSO (16.0.12130.20232) 32-bits.
LOOKUP is doing exactly what it was designed to, per the documentation. You should use INDEX and MATCH:
=INDEX($B$1:$D$1;MATCH("X";B2:D2;0))
The final 0 argument to MATCH means that you are looking for an exact match so the data doesn't need to be sorted.
LOOKUP does a binary search and therefore it returns A. We have had a long discussion on Chandoo.org forum which you can read here:
https://chandoo.org/forum/threads/how-vlookup-works.18378/
And here's another discussion here: http://www.ashishmathur.com/return-an-exact-value-via-the-lookup-function/
Basically, it keeps looking for an equal or lower value and then keeps slicing through data and therefore it needs data to be sorted.
You can still use LOOKUP by tweaking like below.
=LOOKUP(2;1/(B2:D2="X");B$1:D$1)
Related
I am in the process of matching a number of data sets. These are passenger arrivals from a number of different systems. I need to match these as best as possible. 2% unique in each set, the rest common.
I am not trying to merge, deduplicate, or standardise the data as is normally the case with fuzzy look up. I am trying to find the quality, value and location of the closest match. Other then the common fields the data sets have a whole bunch of unique fileds. Essentially am trying to find a link between these so that I can create reports with the different data sets, each of which has information I need. These have over 100k rows.
I have made the common fields into a sting to simplify the calculations. The fields are arrival date (in excel number format), DOB, Passport and full name. i.e. "44250 | 15-JAN-80 | UK1234567 | JOHN AMITH"
Essentially starting with Table1, I want to add 3 columns; the nearest match in text, the ID associated with this value in the second table or the row number so I can index/match the data and finally the percentage similarity as per example.
I have found functions that find the nearest match, but not the location, or associated ID. Any ideas how the below would work or any other ideas.
MADEUP VALUES
TABLE 1 REF TABLE 1 ID
44054 | 29-Aug-1960 | CL-F2944458 | JOHN THOMSON ID1-010739
44054 | 09-Dec-1989 | LM389990 | EDWARD SMITH ID1-010737
44054 | 09-Dec-1991 | LL556699 | RICHARD FREEMAN ID1-010738
44054 | 06-May-1960 | LK9915782 | JEAN HAMILTON ID1-010740
44054 | 05-Nov-1954 | US 9910505 | BEN JONES ID1-010753
TABLE 2 REF TABLE 2 ID
44054 | 05-Nov-1954 | US 9910505 | BENJAMIN JONES ID2-0001
44059 | 19-Aug-1960 | CL-F2944458 | JOHN THOMSON ID2-0002
44054 | 09-Dec-1991 | LL556666 | RICHARD FREEMAN ID2-0003
44054 | 06-May-1960 | LK9915782 | JEAN HAMILTON ID2-0004
44054 | 09-Nov-1989 | AU-LM389990 | EDWARD SMTH ID2-0005
Levenshtein Distance in VBA
Fuzzy matching Mr Excel
github Fuzzy
I needed to use fuzzy matching in Excel for work, and I also needed to know string similarity, be able to partition sentences, etc.
I created a VBA module for doing just this, I think it may help you: https://github.com/kyledeer-32/vba_fuzzymatching.git
Basically, importing it into your workbook will give you access to several UDFs, e.g., fuzzy match, string similarity, etc.
Note: it won't return the cell index of a matched value, but you could configure the scripts to do this fairly easy, e.g., with just a few changes, you could configure the "=fuzzy_match" function to return the array position of the best match instead of the value itself.
Hope this helps!
I have a sheet that looks like this:
A | B | C | D | E | F
1 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
2 DROPDOWN_2 | move | NY, xy_street | Ann | 1 | ...
3 DROPDOWN_2 | fill | CA, yx_street | Rose | 3 | ...
...
100 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
101 DROPDOWN_1
102
103 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
104 DROPDOWN_1
INITIALLY:
In rows 1-99 you find the tasks with 1 column empty (NAME).
In rows 100+ you find "Tickets" which can be printed (2 rows for example 100-101)
THEN
1, The ORGANISER (me) makes tickets with names, by ctrl+c/ctrl+v the "ticket structure" and by choosing a name from the DROPDOWN_1 list.
2, Then starts to assign the tasks (row 1-99) to people by choosing them from the DROPDOWN_2 list. (note that dropdown name lists contain the same names.)
After this I would like to have the Excel to fill in the tickets by the rows that contain the same name as the ticket. One person can be assigned to more tasks, but one task can only be assigned to one man. (So tickets can have 1 NAME but more rows depending on the 1-99 list.
I am asking you to help me make a formula or function for this "autofill" of tickets because I have been searching for days for a solution however couldn't find a proper one.
In the Similar problems and solutions section you can find 2 links which had the closest answer. Unfortunately neither of them contain dropdown lists. I tried to solve the problem with INDEX(MATCH()) functions, but the problem is that it cannot handle the changes of names.
Thanks you,
Max
Similar problems and solutions:
https://www.get-digital-help.com/2009/09/28/extract-all-rows-from-a-range-that-meet-criteria-in-one-column-in-excel/
Extracting all rows based on a value of cell without VBA
Select A101:F392 and enter this as an array formula (ctrl+shift+enter):
=IFERROR(INDEX(A1:F99,ROUND(MOD(SMALL(IFERROR(CHOOSE({1,2},SMALL(IFERROR(1/(1/MMULT(IF(SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("2:98")))=SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("1:97"))),0,ROW(A2:A98)),{1,1}))+{0.001,-0.001},FALSE),ROW(INDIRECT("1:196"))),COUNTIF(A2:A99,"<="&A2:A99)+ROW(A2:A99)/1000),FALSE),ROW(INDIRECT("1:292"))),1)*1000,0),{1,2,3,4,5,6}),"")
I'm trying to add a custom 'discount' list to my spreadsheet.
I've got a table that contains all the data, and has costs for the standard 'used' value, then also the values at a 5% discount and a 10% discount.
Example:
+---------+-------------------+------+------------+-------------+
| Code | Role | Used | Used - 5% | Used - 10% |
+=========+===================+======+============+=============+
| Test001 | Employee | 5.67 | | |
+---------+-------------------+------+------------+-------------+
| Test002 | Junior Technician | 9.80 | 9.31 | 8.38 |
+---------+-------------------+------+------------+-------------+
| Test003 | Project Manager | 15 | | |
+---------+-------------------+------+------------+-------------+
| Test004 | Engineer | 20 | 19 | 17.10 |
+---------+-------------------+------+------------+-------------+
I've then got a Data validation list which returns all other the 'Roles' to select from. On the back of this this populates the Cost cell.
Example:
+----------+----------+----------+-------+
| Role | VLOOKUP | Discount | Cost |
+==========+==========+==========+=======+
| Employee | | | 5.67 |
+----------+----------+----------+-------+
| Engineer | 5%,10% | 10% | 15.10 |
+----------+----------+----------+-------+
What I want to do is have a list to be populated with 5%, 10% if there is that option. I'd like to achieve this without vba (I could easily achieve this with vba but trying to keep it all in the worksheet)
My VLOOKUP Column is populated using:
=CONCATENATE(IF(VLOOKUP(A2,INDIRECT("Test[[Role]:[Used - 10%]]"), 3, FALSE) <> "", "5%", ""),
IF(VLOOKUP(A2,INDIRECT("Test[[Role]:[Used - 10%]]"), 4, FALSE) <> "", ",10%", ""))
The issue comes when trying to do the data validation. It accepts the formula (tried using the above to no avail in the data validation) but populates the drop down list with just the one value of 5%,10% instead of interpreting it as a csv.
I'm currently using this to attempt to populate the Discount Drop Down
=OFFSET(INDIRECT(ADDRESS(ROW(), COLUMN())),0, -1)
It is possible assuming your version of Excel has access to the dynamic functions FILTER and UNIQUE. Let's go through a couple of things, and here is a google doc where this is demonstrated. I also included an online excel file*.
It isn't necessary to calculate the cost in the setup table (A:E). You can just use a character to mark availability (and in some versions it was difficult to make the FILTER work with comparisons like <>"", etc, when ="x" worked fine).
You can get an array of available discounts by using FILTER, INDEX and MATCH. See Col P. You use INDEX/MATCH to return a single row of the array containing the discounts (in this case D:E), and then use that row to filter the top row (D1:E1) which has the friendly discount names and return it as an array.
It isn't necessary to concat the discount list the way you're doing. You can use TEXTJOIN, FILTER, INDEX and MATCH. See Col I. You just wrap the calculation that generates the array of discount names (step 2) in TEXTJOIN to get a string.
The validation is accomplished by referencing the output of step 2. I don't think that the data validation dialog can handle the full formula, so I pointed it to Cols O:Q. Col O is included in the validation so that you can get an empty spot at the top of the list, but Google Docs seems to strip it out.
You can just calculate the discounted cost from the selected option. See Col K. I included the original cost in Col L so you can see it.
you will need a microsoft account to view
I have a range containing values such as:
169.7978
168.633
168.5479
168.7819
167.7407
165.4146
165.1232
I don't need the maximum value of the range, i.e., the first cell in this example), but the last relative maximum, which in this case is the fourth cell. Is there a way to get this value without having to write a VBA macro? The formula must be general enough to work with a multiple number of maxima.
It may be a bit limited, but you may start somewhere as below.
Stated array in the OP is:
+----------+---+
| y | x |
+----------+---+
| 169.7978 | 1 |
| 168.633 | 2 |
| 168.5479 | 3 |
| 168.7819 | 4 |
| 167.7407 | 5 |
| 165.4146 | 6 |
| 165.1232 | 7 |
+----------+---+
Given this, you can find direct adjacency relative min/max with the following helper columns
Assign a Global_Rank helper column and look for y distro identical trend on both adjacent f(x) with the following formulas ( assuming your data is sorted by the x index )( formulas from Row 2 and filled down ).
RelativeMax:
=IF(AND(D2<=D1,D2<=D3),"RelativeMax","")
RelativeMin:
=IF(AND(D2>=D1,D2>=D3),"RelativeMin","")
Modify as needed. Hope this helps.
Edit:
Although...
If you're going to assume the data is ordered properly, you could also just use =IF(AND(B2>=B1,B2>=B3),"RelativeMin",IF(AND(B2<=B1,B2<=B3),"RelativeMax","")) and skip all the malarkey. This should work with multiple maxima/minima. Please report back with results from your dataset!
I'm trying to generate a table that shows a count of how many items are in any given status on any given day. My result table has a set of Dates down column A and column headers are various statuses. A sample of my data table with headers looks like this:
Product | Notice | Assigned | Complete | In Office | In Accounting
1 | 5/5/13 | 5/7/13 | 5/9/13 | 5/10/13 | 5/11/13
2 | 5/5/13 | 5/6/13 | 5/8/13 | 5/9/13 | 5/10/13
3 | 5/6/13 | 5/9/13 | 5/10/13 | 5/10/13 | 5/10/13
4 | 5/4/13 | 5/5/13 | 5/7/13 | 5/8/13 | 5/9/13
5 | 5/7/13 | 5/8/13 | 5/10/13 | 5/11/13 | 5/11/13
If my output table were to contain a set of dates in the first column with the statuses as headers, I need a count of how many rows were at the given status and had not yet transitioned to the next status so that in the Notice column, I'd have a count of rows where the Notice Date was <= X AND where the Assigned, Complete, In Office, In Accounting are all greater than X.
I've used a Sum(if(frequency(if statement to get me REALLY close but I feel like I need to have an AND statement within the second IF like this =SUM(IF(FREQUENCY(IF(AND
Here's what I have that won't work:
=SUM(IF(FREQUENCY(IF(AND(Table1[Assigned]<=A279,Table1[[Complete]:[In Accounting]]<=A279),ROW(Table1[[Complete]:[In Accounting]])),ROW(Table1[[Complete]:[In Accounting]]))>0,1))
If I take the "AND" portion out, this works fine except I need it to ONLY count rows where the given status actually has a date so if an "Assigned" date is empty, I don't want that row to be counted for the Assigned column.
Here's an example of what I'd expect to see in the results. I've listed the count in the each column as well as the corresponding product numbers in parenthesis. The corresponding product numbers are for reference only and won't actually be in the result table.
Date | Notice | Assigned | Complete
5/6 | 2 (1,3) | 2 (2,4) | 0
5/7 | 2 (3,5) | 2 (1,2) | 1 (4)
5/8 | 1 (3) | 2 (1,5) | 1 (2)
OK, assuming you have the original data in A1:F6 then with 2nd table headers in B9:D9 and row labels in A10:A12 then you can use this "array formula" in B10
=SUM((B$2:B$6<=$A10)*(MMULT((C$2:$F$6>$A10)+(C$2:$F$6=""),TRANSPOSE(COLUMN(C$2:$F$6)^0))=COLUMNS(C$2:$F$6)))
confirmed with CTRL+SHIFT+ENTER and copied down and across (see screenshot below)
As you can see the results are as per your requirement. If you replace dates with blanks it will still work
MMULTis a way to get a single value from each row even when you are looking at multiple columns.
I used cell references because I think that's easier, especially when copying the formula across and having a reducing range.......but you can use structured references if you want
Have you tried using COUNTIFS to count based on multiple criteria. It is fairly well documented here: http://office.microsoft.com/en-us/excel-help/countifs-function-HA010047494.aspx (2007+ only)
Basically, you use it like
=COUNTIFS(first_range_to_check, value_you_want_in_first_range, ...)
where the ... represents as many pairs as you want (up to 127 total pairs), note the conditions are AND connection so if you have two pairs, the first pair AND the second pair must return true for that row to count.