Search formula for best text match among two excel lists - excel

I have a long list of products (+20,000 items) of surgical instruments. Sometimes I receive requests for different names of these products which is impossible to manually match in my list.
I was thinking of a formula to find or suggest the closest result of match for the common words in each cell.
I have created this formula:
=INDEX('Products'!G:G,MATCH((("*"&LEFT(A2,5),'Products'!G:G,0))
(where Products G:G refers to my long list.
it gave some results correctly but more than 80% of the result came back with false results.
please see the attached image to show you the result.
is there is a way I can get more accurate result?
or I was thinking of finding major category of each item such as:
Category 1: Scissors, Retractors, Knives, etc.
Category 2: Straight, Curved, Angeled, etc.
Category 3: Sharp, Blunt, etc.
Category 4: 10mm, 130mm, 24cm, etc. (size)
which is easy for me to do it.
then use the same formula but with referring to the common words..
something like:
=INDEX(Products!G:G,MATCH("*"&LEFT(E2,5)&"*"&F2&"*"&G2&"*",Products!G:G,0))
where E2, F2, G2 refers to the categories..
I tried but it gave false results as well.

I urge you in the strongest sense of the word to spend some time creating a good quality master table and then spinning off 1 table for each category.
make use of clean(), trim(), proper(), heck, if you need to copy the data in notepad++ and enable view all symbols then switch between ansi utf utf8 wtf omgwtf or any other encoding to ensure you dont have any hidden special characters than do it.
you have 4 categories, so that's 4 1 column tables. name them. no duplicates. no trash. no junk. sort your data. nice clean names/words/whatever you categorize by. if you absolutely must add an index or key column then go ahead but do yourself a favor and stop there. use a different table to deepen your relationships.
next step is to to create comboboxes. i'm not sure why but the combobox in excel is not the same combobox in the vba editor. you want the one in the editor. you can make a fancy user form or you can make a minimalist text box design. whatever you fancy. just make sure the combobox has a field for RowSource in the properties. for whatever reason i don't get that option if i am not in the vba editor when i create the box.
you're almost gauratneed to want drawmodal = false on every user form you make for these boxes
you probably really don't need more than 4 boxes but it depends what you're doing so that's up to you. name your combo boxes.
verify each box has: matchentry = 1-fmMatchEntryComplete
i recommend: style = 0 - fmStyleDropDownCombo
this will allow you to begin typing and autocomplete the first match and also let you select from a drop down list, starting with the the first match of the name you've typed.
you can set the number of elements in the list. default is 8. if you have a slow computer than i wouldn't push it much. if you have a best then give it a shot.
you can also change fonts for easier reading and a bunch of other format changes.
now the this is the most important part - RowSource will be one of those 4 tables
now that i've given instructions, let me explain why. some businesses don't have the best practices and i'm currently with one that's using an oracle erp solution for data management but the front end isn't used. data entry is done in excel and loaded into oracle using batches. lookups in oracle continue to be a psychological barrier for the ap/ar teams so i did exactly what i suggested here but took it a few steps farther.
i pulled the vendor master and i pulled teh customer master
i cleaned the data and compiles simple pure clean 1 column tables
then i created a form for the comboboxes
first came ap with vendor name
then vendor number
then vendor remittance location
our orderering facility number
selecting a vendor name populates the vendor number box with the possible vendor numbers. same for remittance location and ordering facility
i pulled a year's worth of transaction data and created a gl table. some vendors have only ever used 1 gl acount. some several. there's a % number next to each gl. that represents the value of transactions posted from that vendor to that gl for the last year.
next up a date picker and text boxes for invoice fields.
get it nice and tight on a form, set the tab stops, add a commit button and all of a sudden we have a front end excel platform that performs better than oracle - because people use
i haven't finished the ar side but it'll get done and i'll the angels will be rejoicing and singing my name for years to come.
1:1 matching with autofill and drop down functionality you can't beat it. that's what you get with my suggestion.
best of luck!

Related

Looking for suggestions on data filtering options in Excel via VBA

I have a project where I want to filter down a lot of data based on user's selections of 8 different criteria. In my case it's kitchen cabinets but a more universal example would be cars. The user is presented with a userform containing 8 listboxes, one for Make, Model, Package, Color, Transmission and so on. Double-clicking on a listbox item removes all other choices in that listbox and filters out invalid choices from all the other listboxes. The choices compound so the user could choose "Green" for Color and "Ford" for Make.
I've done this two different ways already and I'm not happy with either.
1) I created a matrix where I listed Models down the left hand side and each of the other 7 features across the top with "X" at the intersect points. Then I taught VBA how to basically read it like a human would using a ton of nested loops. This was pretty slow, where each selection caused the program to halt for 1-2 minutes while it sorted everything out.
2) I listed out every single possible combination (some 22,000) and used Excel's built in filtering options to filter the data, then copy each column of the results onto a second sheet, filtering out duplicates and re-sorting them, and using that to re-populate the listboxes. This worked really well but for some reason I can't fathom, Excel gets hung up for 3-10 minutes doing something when I filter certain columns. I debug/traced it and the hang up happens after all my code is run, but before it returns control to the user.
At this point I'm open to any suggestions. Ultimately I need the user to choose one complete (all 8 options narrowed down to one choice) configuration. The tricky part is the user may start with color "Green" which would eliminate all other colors from the "Color" choices list, as well as any Make, or Model, or Package, etc that doesn't come in Green from their respective lists. Maybe data would help explain it?
Make Model Package Color Transmission
Ford Focus LT Blue Manual
Ford Focus LT Green Manual
...
Ford Focus ST Blue Manual
...
Ford Focus LT Blue Automatic
...
And so on for 22,000 rows.
So if the user chooses "Manual" I need to drop the "Automatic" rows. If the user then chooses "ST" I need to drop the "LT" rows (while assuming the Automatic rows are still being filtered out, and so on until there's only one row left.
Just an idea that came to mind: Are you familiar with a highly efficient method of searching for words in a dictionary of words called Trie?
If you code each option properly you could enter the whole option map in a Trie. When a user makes a selection you follow that path in the Trie. If no matching option exist then the Trie returns Null and you can inform the user about that.
Good explanation of a practical Trie implementation for words:
https://www.geeksforgeeks.org/trie-insert-and-search/

Generating a unique identifier list with multiple tables & criteria

I'm not a coder, just someone who uses excel for basic estimating functions at work. However I've found myself in need of a complex list or index system.
Background/Intent: (Skip below if doesn't matter.) In an apartment building construction they build buildings like opened books - mirror images of 2 bed 2 bath apartments, for example. There is a standard "typical" unit and then the mirror image across the hall, the "reverse" unit. The door swings are all opposite from one to the other. My job is to figure out how to give each door a unique identifier code based on: Bldg No., Unit Type, Door No., Door Swing (left or right.) The raw data tables are provided below.
I've attempted to clean this up as much as possible, but there are two steps (I think) to this process.
Step 1:
The raw data table is on the left. My output field is on the right. I want to be able to select a drag down box, like data validation list, and select the building. Then a formula (which one?) spits out a list of every unit type per building. For example, Bldg 5 has 2 each of "A1 Typ." How do I get the formula to recognize that if there are 2 of them, to produce 2 separate lines for "A1 Typ." And so on and so forth until all 41 occurences/units have been accounted for and labeled appropriately. Some occur once, some multiple times, and some zero.
Step 1
From there, Step 2.
I want to use this output field again to automate another sequence, this time pulling from a different table, see picture. Now, depending on the unit type under the "type" column, I want it to expand each unit type showing each indivudual door number (1,2,3 etc through 12) and if it's an L (LH) or R (RH), and if there is more than one, to list out each occurence. (what formula?)
Then the decriptor text that will pop up under "DOOR LABEL" column would just be a joining of several fields to give a unique identifier. (suggestions?)
Step 2.
Easy right? Is this too much for excel, or can this be done?
Thanks so much for considering helping me out!

Sharepoint Lookup

I have been absolutely stymied by Sharepoint lookups and the complete lack of information anywhere that relates to my problem so one last stab at seeing if anyone has a clue if not I am going to find another job which doesnt involve the use of this very good but highly complicated system.
My problem is that I want to add a column in list 1 that looks up a column in list 2, all very easy you may think and the nice videos on you tube make it seem very easy. So imagine the frustration when I create the column choose the appropriate list from "get information from" drop down and then go to the "in this column" drop down to find the fields i want are missing. I have tried this many ways and could not resolve it. So I decided to start again and set up a brand new list 2 which contains just 4 columns each created manually one column is "just single line of text" called Financial year the other three are a "number" and called Population, Dwellings and Non Domestic. Now having done that I would expect to see thos 4 columns (Financial year, Population, Dwellings and Non Domestic) all appear in the "in this column" drop down when setting up the lookup. But no of course not bloody Sharepoint will only show: Title, Financial year, ID , Content Type, version, and Title (linked to item) none of which I am the slightest bit interested in. I want to look up Population, or Dwellings or Non Domestic. Why is this so easy to do in Excel but in Sharepoint it seems as if Bill Gates has decided he's in charge of what people lookup!! in a word its crap.
I should have added that the same happens whatever list I select to look up from.

How can this lookup (find the last relevant item) be improved?

One of the reports that wastes a bunch of my time at work is the Roster. It's a multi-site, multi-contract listing of every employee currently assigned to a specific client. Currently, it has a little over 6,000 lines by 20-something columns, indexed against 3 different datasets. Not the largest mess in the world, but still a pain. And it's almost all in excel, because I somehow don't have a business case for Access.
But one part of this monster stands apart. One tab per site Site Totals, listing off every time any agent has gone through training. A second tab (again, one per site) Site Data displaying only the most recent training class, and the credentials they had during that class.
That second tab is driven by variations of this array formula - Last_Row is a named range on another tab, and column A is a pivot of the UID column on Site Totals. I've broken it apart for readability:
=IF(INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1))="Trainer",
"",
INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1)))
I know what this formula does, but I don't know how to improve it. This formula needs to be changed, because it currently is on the order of 500 Million calculations (I'm not allowed to delete historical data), and it takes me 3 hours to calculate the workbook ... if it doesn't crash Excel first.
I'm open to VBA and / or custom functions, but would prefer to have native Excel functions. I'm not able to install anything, so any solution must be native Excel, and Must be compatible to Excel 2007.
If your source is a pivot table, try is the GETPIVOTDATA function. You might be able to accomplish what you want without INDIRECT and INDEX.
What i have understood is that every person has/has not attended a training and you want to retrive the name of that training, in case he has not, you want a blank space in the cell. If this description is correct you can try this formua, press ctrl+shift+enter to execute.
=IFERROR(INDEX('Site Totals'!B$1:B$12,MATCH(A2&"Trainer",'Site Totals'!A$1:A$12&'Site Totals'!B$1:B$12)),"")
Here A2 contians the name of the person. I can be more precise with this formula if you can provide some sample data butI would recommend to not to use entire B & Columns in Site Total workssheete as this will definately slow down computing process, instead you can use B1:B8000 or smaller range, to speed up process. Hope that helps.

Book ordering comparison between spreadsheets for existing catalogue of a Library

I have recently asked this question of google's spreadsheet page.
I a significant data comparison problem I would like to solve. It relates to purchasing books for a Library. We have a catalogue of over 11,000 books. When we order new books we need to compare our proposed purchases to the current stock. Currently we can manually compare them to our catalogue, very laboriously book by book.
We need to do 3 things to make our life easier -
1 easily clean out bad data/characters in the ISBN's - these are either spaces, - (hyphen's) or . (period mark or full stops). A simple formula to run over all ISBN fields would be great.
2 I need to compare data between 1 spreadsheet with 11,000 books in it (current library stock), a second with up to 1000 books in it (currently on order) and finally the third currently active one (about to be ordered) with 50 to 200 books listed in it.
All spreadsheets use the same column configuration as below
Library orders
Title Author Publisher ISBN (long version) US$ UKgpd HK$ Other$ P/O no. Date ordered
UNNATURAL SELECTION MARA HVISTENDAHL Public Affairs Publishing; Reprint edition (May 1, 2012) 978610391511
Finally, the out put of these comparisons should quickly and easily identify on what lines we have matches. and what type of match it is, Author only, Author and Title, or Author, title and ISBN etc for all the possible combinations. To make this easier assume spreadsheet 1 is an unalterable master table, with spreadsheet two similar. It is really only on Spreadsheet 3 we need to be clear if we are starting to reorder materials.
If it is possible to have these as different sheets in a workbook it would be ideal. The only additional feature is that any scripts that run need to be able to cope with spreadsheet 1 increasing in size as new acquisitions arrive and are included. Both spreadsheets 2 and 3 will vary (increase and decrease) as the ordering process proceeds.
Finally the absolute ideal would be for this comparison process to be instant (live) and ongoing as data is included.
If anyone would like to take this on 3 Library staff will be eternally grateful.
regards
Nick
This would be very much easier had you one sheet rather than three (simply add a column to each existing sheet to show whether in stock, on order or to be ordered – three individual letters would be sufficient, then append each of the smaller two files to the largest). Then for example you could apply Conditional Formatting to highlight duplicates one column at a time (Author, Title etc). Apart from the initial data cleansing it would mean in the future switching ‘between sheets’ would merely involve changing a one-letter flag. Filtering would allow you and your colleagues to appear to have three separate sheets and if anyone asks for a particular Title the search would be one-time, not in triplicate.
Also, http://www.microsoft.com/en-gb/download/details.aspx?id=15011 may be of interest, also =SUBSTITUTE.And with data validation you would prevent entry of a new ISBN that already is in your list.

Resources