Looking for suggestions on data filtering options in Excel via VBA - excel

I have a project where I want to filter down a lot of data based on user's selections of 8 different criteria. In my case it's kitchen cabinets but a more universal example would be cars. The user is presented with a userform containing 8 listboxes, one for Make, Model, Package, Color, Transmission and so on. Double-clicking on a listbox item removes all other choices in that listbox and filters out invalid choices from all the other listboxes. The choices compound so the user could choose "Green" for Color and "Ford" for Make.
I've done this two different ways already and I'm not happy with either.
1) I created a matrix where I listed Models down the left hand side and each of the other 7 features across the top with "X" at the intersect points. Then I taught VBA how to basically read it like a human would using a ton of nested loops. This was pretty slow, where each selection caused the program to halt for 1-2 minutes while it sorted everything out.
2) I listed out every single possible combination (some 22,000) and used Excel's built in filtering options to filter the data, then copy each column of the results onto a second sheet, filtering out duplicates and re-sorting them, and using that to re-populate the listboxes. This worked really well but for some reason I can't fathom, Excel gets hung up for 3-10 minutes doing something when I filter certain columns. I debug/traced it and the hang up happens after all my code is run, but before it returns control to the user.
At this point I'm open to any suggestions. Ultimately I need the user to choose one complete (all 8 options narrowed down to one choice) configuration. The tricky part is the user may start with color "Green" which would eliminate all other colors from the "Color" choices list, as well as any Make, or Model, or Package, etc that doesn't come in Green from their respective lists. Maybe data would help explain it?
Make Model Package Color Transmission
Ford Focus LT Blue Manual
Ford Focus LT Green Manual
...
Ford Focus ST Blue Manual
...
Ford Focus LT Blue Automatic
...
And so on for 22,000 rows.
So if the user chooses "Manual" I need to drop the "Automatic" rows. If the user then chooses "ST" I need to drop the "LT" rows (while assuming the Automatic rows are still being filtered out, and so on until there's only one row left.

Just an idea that came to mind: Are you familiar with a highly efficient method of searching for words in a dictionary of words called Trie?
If you code each option properly you could enter the whole option map in a Trie. When a user makes a selection you follow that path in the Trie. If no matching option exist then the Trie returns Null and you can inform the user about that.
Good explanation of a practical Trie implementation for words:
https://www.geeksforgeeks.org/trie-insert-and-search/

Related

Match and Conditional Formatting from Matrix Table

I am looking for some decent help with my matrix table, and is there a good or best approach to properly match dependent instances in certain matrix using drop downs.
This picture represents my matrix table (Picture 1):
As you can see there are a lot of instances, but horizontally and vertically they got the same number of "headers". Those "1`s" are representing not compatibility in my case but lets call it simply "match". That is on one sheet that is gonna be populated with some new values from time to time.
On another sheet which is actually sheet for showing the data and their compatibility possibilities is equipped with drop downs. There you got "Groups (Group1, Group2...)" in a sense of main parts and "dependent groups (AA1, BB2..)" as small components that are part of main parts. To avoid misunderstanding here you have explanations, I used for the sake of this example fictional values:
Groups aka. Main Parts
Dependent groups aka. components
As you can see beneath, is my fictional table but exactly the same concept as I should use in my real case.
I PUT AN EXPLANATION IN THE PICTURE 2 SO YOU CAN FOLLOW ALONG AND SEE EXACTLY WHERE/WHAT I DID!
What I used firstly there are =match functions, one for vertical position (A3) and one for horizontal (B4). This boolean row is done using =or(index) but reffering to the match positions as you can see. And from there I should use true/false for coloring my group boxes in a case compatibility is possible - thats all the science.
So, my question is if there is another approach to this problem? As you can see I have 3 different rows of functions at one place, or imagine if I will have more "groups" that can rise in many more rows and calculations.
Picture 2
EDITED:
This is screenshot of the original sheet, I just hid some rows that were with Infos that is reason the number is not consistent. As you can see it is almost the same as dummy example I provided above. Underneath every "box" you got three rows of calculations as I mentioned before. The two times number "2" that you see here is the position of some value that I found using =match function, one is for horizontal and another for vertical lookup. In this case it is model type, 070FX is position 2, 100FX is 3 and 200FX is 4th position in the matrix table, and so on for all the other groups. And those groups (Model, Endpoint, Gas sensor...) are defined separately on another sheet where I had to make unique list and dependent list so I can reference those to my drop down list.
EDIT Nr 4! So this formula I used for true/false:
=SUMPRODUCT(('0359-matrix'!$A$2:$A$101=F10)*(('0359-matrix'!$B$1:$CW$1=$B$10)+('0359-matrix'!$B$1:$CW$1=$C$10)+('0359-matrix'!$B$1:$CW$1=$D$10)+('0359-matrix'!$B$1:$CW$1=$E$10)+('0359-matrix'!$B$1:$CW$1=$F$10)+('0359-matrix'!$B$1:$CW$1=$G$10)+('0359-matrix'!$B$1:$CW$1=$H$10)+('0359-matrix'!$B$1:$CW$1=$I$10)+('0359-matrix'!$B$1:$CW$1=$J$10)+('0359-matrix'!$B$1:$CW$1=$K$10)+('0359-matrix'!$B$1:$CW$1=$L$10)+('0359-matrix'!$B$1:$CW$1=$M$10)+('0359-matrix'!$B$1:$CW$1=$N$10)+('0359-matrix'!$B$1:$CW$1=$O$10)+('0359-matrix'!$B$1:$CW$1=$P$10)+('0359-matrix'!$B$1:$CW$1=$Q$10)+('0359-matrix'!$B$1:$CW$1=F13)+('0359-matrix'!$B$1:$CW$1=G13)+('0359-matrix'!$B$1:$CW$1=H13)+('0359-matrix'!$B$1:$CW$1=I13)+('0359-matrix'!$B$1:$CW$1=J13))*'0359-matrix'!$B$2:$CW$101)>0
I copied only last part, or when it starts from second row..Because it is too long to write whole funciton - it cuts down automatically.
('0359-matrix'!$B$1:$CW$1=$Q$10)+('0359-matrix'!$B$1:$CW$1=$B$13)+('0359-matrix'!$B$1:$CW$1=$C$13)+('0359-matrix'!$B$1:$CW$1=$D$13)+('0359-matrix'!$B$1:$CW$1=$E$13)+('0359-matrix'!$B$1:$CW$1=$F$13))*'0359-matrix'!$B$2:$CW$101)>0
But on marked cells I am getting the same results: B22 - F22 has the same as B21 - F21 (boolean) what shouldnt be like that but to follow color, green is False, it has to be something with an array reference.
Checkout the following. A1 to E5 is the matrix that shows which pieces are incompatible (=1). The others have to be empty or 0.
In cell I8 I used the following formula (and copied it down up to I11):
=SUMPRODUCT(($A$2:$A$5=H8)*(($B$1:$E$1=$H$8)+($B$1:$E$1=$H$9)+($B$1:$E$1=$H$10)+($B$1:$E$1=$H$11))*$B$2:$E$5)
The formula result shows you the amount of incompatibilities a part has. Eg AA1 has one incompatibility with BB2 but BB2 is incompatible with 2 AA1 and CC3.
To get the TRUE/FALSE use the same formula and append >0: like =SUMPRODUCT(…)>0
For any additinonal "group" (Model, Endpoint, …) you need to add another +($B$1:$E$1=$H$12) where $B$1:$E$1 points to your matrix data and $H$12 to your selected group value.
Overview of the formula ranges:
Note that this kind of calculation can only tell the amount of incompatibilites a part has but not the names of the parts that are incompatible.
Edited horizontal version
Formula in the selected cell is
=SUMPRODUCT(($A$2:$A$5=G8)*(($B$1:$E$1=$G$8)+($B$1:$E$1=$H$8)+($B$1:$E$1=$I$8)+($B$1:$E$1=$J$8))*$B$2:$E$5)
you can pull it to the right.

Column to rows and highlight difference between values in the same group

I have a huge table with data structured like this:
And I would like to display them in Spotfire Analyst 7.11 as follows:
Basically I need to display the columns that contain "ANTE" below the others in order to make a comparison. Values that have variations for the same ID must be highlighted.
I also have the fields "START_DATE_ANTE" and "END_DATE_ANTE" which have been omitted in the example image.
Amusingly, if you were limited to just what the title asks, this would be a very simple answer.
If you wanted this in a table where the rows are displayed as usual, and the cells are highlighted, you can do this by going to properties, adding a newGrouping where you select VAL_1 and VAL_1_ANTE and add a Rule, Rule type "Boolean expression", where the value is:
[VAL_1] - [VAL_1_ANTE] <> 0
This will highlight the affected cells, which you can place next to each other. You can even throw in a calculated column showing the difference between the two columns, and slap it on right next to it. This gives you the further option to filter down to only showing rows with discrepancies, or sorting by these values.
However, if you actually need it to display the POSTs on different lines from the ANTEs, as formatted above, things get a little tricky.
My personal preference would be to pivot (split/union/etc) the data before pulling it in to Spotfire, with an indicator flag on "is this different", yes/no. However, I know a lot of Spotfire users either aren't using a database or don't have leeway to perform the SQL themselves.
In fact, if you try to do it in Spotfire using custom expressions alone, it becomes so tricky, I'm not sure how to answer it right off. I'm inclined to think you should be able to do it in a cross table, using Subsets, but I haven't figured out a way to identify which subset you're in while inside the custom expressions.
Other options include generating a table using IronPython, if you're up to that.

Search formula for best text match among two excel lists

I have a long list of products (+20,000 items) of surgical instruments. Sometimes I receive requests for different names of these products which is impossible to manually match in my list.
I was thinking of a formula to find or suggest the closest result of match for the common words in each cell.
I have created this formula:
=INDEX('Products'!G:G,MATCH((("*"&LEFT(A2,5),'Products'!G:G,0))
(where Products G:G refers to my long list.
it gave some results correctly but more than 80% of the result came back with false results.
please see the attached image to show you the result.
is there is a way I can get more accurate result?
or I was thinking of finding major category of each item such as:
Category 1: Scissors, Retractors, Knives, etc.
Category 2: Straight, Curved, Angeled, etc.
Category 3: Sharp, Blunt, etc.
Category 4: 10mm, 130mm, 24cm, etc. (size)
which is easy for me to do it.
then use the same formula but with referring to the common words..
something like:
=INDEX(Products!G:G,MATCH("*"&LEFT(E2,5)&"*"&F2&"*"&G2&"*",Products!G:G,0))
where E2, F2, G2 refers to the categories..
I tried but it gave false results as well.
I urge you in the strongest sense of the word to spend some time creating a good quality master table and then spinning off 1 table for each category.
make use of clean(), trim(), proper(), heck, if you need to copy the data in notepad++ and enable view all symbols then switch between ansi utf utf8 wtf omgwtf or any other encoding to ensure you dont have any hidden special characters than do it.
you have 4 categories, so that's 4 1 column tables. name them. no duplicates. no trash. no junk. sort your data. nice clean names/words/whatever you categorize by. if you absolutely must add an index or key column then go ahead but do yourself a favor and stop there. use a different table to deepen your relationships.
next step is to to create comboboxes. i'm not sure why but the combobox in excel is not the same combobox in the vba editor. you want the one in the editor. you can make a fancy user form or you can make a minimalist text box design. whatever you fancy. just make sure the combobox has a field for RowSource in the properties. for whatever reason i don't get that option if i am not in the vba editor when i create the box.
you're almost gauratneed to want drawmodal = false on every user form you make for these boxes
you probably really don't need more than 4 boxes but it depends what you're doing so that's up to you. name your combo boxes.
verify each box has: matchentry = 1-fmMatchEntryComplete
i recommend: style = 0 - fmStyleDropDownCombo
this will allow you to begin typing and autocomplete the first match and also let you select from a drop down list, starting with the the first match of the name you've typed.
you can set the number of elements in the list. default is 8. if you have a slow computer than i wouldn't push it much. if you have a best then give it a shot.
you can also change fonts for easier reading and a bunch of other format changes.
now the this is the most important part - RowSource will be one of those 4 tables
now that i've given instructions, let me explain why. some businesses don't have the best practices and i'm currently with one that's using an oracle erp solution for data management but the front end isn't used. data entry is done in excel and loaded into oracle using batches. lookups in oracle continue to be a psychological barrier for the ap/ar teams so i did exactly what i suggested here but took it a few steps farther.
i pulled the vendor master and i pulled teh customer master
i cleaned the data and compiles simple pure clean 1 column tables
then i created a form for the comboboxes
first came ap with vendor name
then vendor number
then vendor remittance location
our orderering facility number
selecting a vendor name populates the vendor number box with the possible vendor numbers. same for remittance location and ordering facility
i pulled a year's worth of transaction data and created a gl table. some vendors have only ever used 1 gl acount. some several. there's a % number next to each gl. that represents the value of transactions posted from that vendor to that gl for the last year.
next up a date picker and text boxes for invoice fields.
get it nice and tight on a form, set the tab stops, add a commit button and all of a sudden we have a front end excel platform that performs better than oracle - because people use
i haven't finished the ar side but it'll get done and i'll the angels will be rejoicing and singing my name for years to come.
1:1 matching with autofill and drop down functionality you can't beat it. that's what you get with my suggestion.
best of luck!

Generating a unique identifier list with multiple tables & criteria

I'm not a coder, just someone who uses excel for basic estimating functions at work. However I've found myself in need of a complex list or index system.
Background/Intent: (Skip below if doesn't matter.) In an apartment building construction they build buildings like opened books - mirror images of 2 bed 2 bath apartments, for example. There is a standard "typical" unit and then the mirror image across the hall, the "reverse" unit. The door swings are all opposite from one to the other. My job is to figure out how to give each door a unique identifier code based on: Bldg No., Unit Type, Door No., Door Swing (left or right.) The raw data tables are provided below.
I've attempted to clean this up as much as possible, but there are two steps (I think) to this process.
Step 1:
The raw data table is on the left. My output field is on the right. I want to be able to select a drag down box, like data validation list, and select the building. Then a formula (which one?) spits out a list of every unit type per building. For example, Bldg 5 has 2 each of "A1 Typ." How do I get the formula to recognize that if there are 2 of them, to produce 2 separate lines for "A1 Typ." And so on and so forth until all 41 occurences/units have been accounted for and labeled appropriately. Some occur once, some multiple times, and some zero.
Step 1
From there, Step 2.
I want to use this output field again to automate another sequence, this time pulling from a different table, see picture. Now, depending on the unit type under the "type" column, I want it to expand each unit type showing each indivudual door number (1,2,3 etc through 12) and if it's an L (LH) or R (RH), and if there is more than one, to list out each occurence. (what formula?)
Then the decriptor text that will pop up under "DOOR LABEL" column would just be a joining of several fields to give a unique identifier. (suggestions?)
Step 2.
Easy right? Is this too much for excel, or can this be done?
Thanks so much for considering helping me out!

Dynamic Range with Categorical Variables

I'd like to sort a time series of exam performance by one of three categories:
Ideally, a function would sort the scores by "difficulty" while still preserving chronological order. I'd like to do this without filters etc. Something like this is very close, but not quite there. Do I need to use dynamic ranges? Or can I just define data ranges in the table dialog with VLOOKUP or INDEX/MATCH?
I'm thinking a bar graph would be the easiest way to illustrate the data, but I'm open to suggestions. New scores are added every day, with varying difficulties.
Here is the spreadsheet if anyone would like to look it over.
EDIT:
The output visualization could be, for example, a clustered bar graph, but with only one label per category. The idea is that I'd like to preserve chronological order without necessarily having to mark it on the graph.
Would there, for instance, be a quick-and easy and formula-driven way to put these 14 and 17 values for "score" all together under one label? I feel like 17 bar graphs clustered too closely would be hard to read.
I realize this is more of a formatting than a formula issue, but I appreciate input with regards to both.
I would recommend you add a Table over the data in the workbook. One for verbal and one for math. The upside is that it will automatically grow with your data as you add new rows. This is very helpful because charts and other things will automatically refer to the new data. Add one with CTRL+T or Insert->Table on the Ribbon.
Once you have the Table, you can easily do the sorting bit by adding a two column sort onto the Table. This menu is accessible by right clicking in the Table and doing Sort->Custom Sort. Again, the Table is nice here because it will only sort the data within it (not the whole sheet) and will remember your settings. This lets you add new data and simply do Data->Reapply to get it to sort again. Your sort on Difficulty is going to be alphabetic unless you add a number at the front. Here is the sorting step:
With this done, you can create a quick chart based on that data. For the "implicit chronology" you can simply plot score vs. difficulty for all of them since they are sorted.
To get closer to that matrix style display, you can easily create a PivotTable based on this Table and let it do the organizing by date/difficulty. Here is the result of that. I am using Average as the aggregation function since it appears that no dates have more than 1 score. If they did, it would be a better choice than Sum.

Resources