Custom sort with wildcard - excel

I'm working on automating removing some rows of data from multiple files. One of the criteria is to exclude all products except a specific list. Currently I'm doing that by iterating through the rows and deleting rows that aren't for the right product.
I'm thinking that sorting the rows so that the products to be kept are first, and the ones to be deleted are clumped together lower down in the data would be faster, as I could then just find the first bad row and clear contents (as opposed to Range("<XX:XX>").Delete Shift:=xlUp).
The trouble I'm having is that the products actually present in any of the files varies, as does the list of products to be kept. It's pretty much a unique list of products to keep for each of over a dozen files.
So what I'm hoping is that there is a way when I specify the custom sort list such that I can have a single item for all other products I didn't explicitly list.
For example, if I wanted sort the alphabet "V, C, R, ", would there be a way to do so without specifically listing every other letter in the alphabet?
I'm avoiding specifying the remainder of the list for two reasons. One, the list is fairly long and retyping the full list for each of the files I need to sort would be pretty error prone and hard to maintain. Two, the product list is not necessarily static going forward, so I don't want to have to update the macro each time a new product is added.

I'm not sure I fully understood your question, but perhaps using a column with a formula might help?
It can be automated, by coding to add a column, set the formulas in the newly created range, and clear or sort based on a filter on the new range.

Related

Return Column Index for Each Matching Value in a Row?

Excel Novice using Excel for Business (Online) here.
I am having a difficult time wrapping my head around a way to write a function that does three things for me.
Check a specific row of data in a large table, preferably matching ID's between the two tables (I've been using XLOOKUP to ensure that results are keyed to a specific ID)
Find and return the Column index for every cell within the row with the string "Yes"
SUM associated points tied to the column indexes.
I am creating a new QA Scoring system, and all of the questions share the potential for "Yes" and "No" but depending on the question the number of points will be different. I have been approaching this with the idea that I could return the column indexes, convert them to the points associated to the column indexes, and then SUM them for a score, but I am open to different ideas.
Click Here for a Demo of what I am trying to do, included is the actual data set I was using.
Possible Points
=SUMIFS($O$2:$AH$2,INDEX(Raw_Data!D:W,MATCH(Scoring!A4,Raw_Data!A:A,0),0),"*")
Points Earned
=SUMIFS($O$2:$AH$2,INDEX(Raw_Data!D:W,MATCH(Scoring!A4,Raw_Data!A:A,0),0),"Yes")
These will work with your present setup, but in the long run it would probably be best to just add Possible Points and Points Earned columns to your Table3, if possible.

How do I create a list that automatically pulls only unique instances from a second list - even when the second list is changed?

I know how to do this by hand, but I'd like to automate it since my 2nd list will change. My list is relatively long 5k+ with 2k+ uniques. I have found an array solution on exceljet but after trying it, it looks like the operation would take hours to execute.
Thank you,
Excel 2016 Windows.
You can Create an Excel Advanced Filter (tutorials)
leave the criteria values blank not the header,
and apply unique records
this is the fastest way to get unique records for long lists

How do one extract information from a dynamic table, automatically through excel functions?

I have been searching high and low for a way to solve my dilemma, in different ways, so I am trying to post both of the things I've been trying to do:
The challenge version 1:
I want to extract the entire row with information tied to the name which is the latest entry of that name in the table. So from the table below I would want to collect the entire row which contains the information: "A, Jack Black, 01.01.2029, 10:20". I simply want to copy the entire row to another sheet. But one important factor is that it has to happen automatically.
So i need functions which can check if: Is there another entry with the same name, higher up in the table? If so, DO NOT COPY THE ROW. If there ain't another entry with the exact same name higher up in the table, COPY THE ENTIRE ROW, to another table, within another sheet.
The challenge version 2:
What I really want to do is count the number of unique people(unique names) per. department, and summarize this in another table. Basically this means that "Jack Black" should be counted as 1 person, in department A.
So the result I want, is a table looking like this (the one beneath), where the number of people does not contain any duplicate people (names). OR it does not function with a dynamic table, which updates the information it contains on the fly. I can make this happen if I am copying from a static table, but as stated above, the table is dynamic and updates with new information every minute...
So far i've tried excel's built in filtering, but this does not work automatically. I've also tried using functions like in this guide: https://excel-bytes.com/how-to-extract-a-dynamic-list-from-a-data-range-based-on-a-criteria-without-filters-in-excel/. However every solution i find seems to need criteria for filtering out duplicates or does not function when copying information from a dynamic table.
Does anyone know how to reach my desired result, without implementing criteria for selecting the rows or counting rows as stated above? VBA code is not an option at the moment :(
In advance, THANK YOU, I've really tried solving this, but I feel like this just might break my head wide open soon if I can't solve it. HEEEEELP!
Sincerely
haakonlu

List of items find almost duplicates

Within excel I have a list of artists, songs, edition.
This list contains over 15000 records.
The problem is the list does contain some "duplicate" records. I say "duplicate" as they aren't a complete match. Some might have a few typo's and I'd like to fix this up and remove those records.
So for example some records:
ABBA - Mamma Mia - Party
ABBA - Mama Mia! - Official
Each dash indicates a separate column (so 3 columns A, B, C are filled in)
How would I mark them as duplicates within Excel?
I've found out about the tool Fuzzy Lookup. Yet I'm working on a mac and since it's not available on mac I'm stuck.
Any regex magic or vba script what can help me out?
It'd also be alright to see how much similar the row is (say 80% similar).
One of the common methods for fuzzy text matching is the Levenshtein (distance) algorithm. Several nice implementations of this exist here:
https://stackoverflow.com/a/4243652/1278553
From there, you can use the function directly in your spreadsheet to find similarities between instances:
You didn't ask, but a database would be really nice here. The reason is you can do a cartesian join (one of the very few valid uses for this) and compare every single record against every other record. For example:
select
s1.group, s2.group, s1.song, s2.song,
levenshtein (s1.group, s2.group) as group_match,
levenshtein (s1.song, s2.song) as song_match
from
songs s1
cross join songs s2
order by
group_match, song_match
Yes, this would be a very costly query, depending on the number of records (in your example 225,000,000 rows), but it would bubble to the top the most likely duplicates / matches. Not only that, but you can incorporate "reasonable" joins to eliminate obvious mismatches, for example limit it to cases where the group matches, nearly matches, begins with the same letter, etc, or pre-filtering out groups where the Levenschtein is greater than x.
You could use an array formula, to indicate the duplicates, and you could modify the below to show the row numbers, this checks the rows beneath the entry for any possible 80% dupes, where 80% is taken as left to right, not total comparison. My data is a1:a15000
=IF(NOT(ISERROR(FIND(MID($A1,1,INT(LEN($A1)*0.8)),$A2:$A$15000))),1,0)
This way will also look back up the list, to indicate the ones found
=SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A1)*0.8)),$A3:$A$15000,1)),0,1))+SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A2)*0.8)),$A$1:$A1,1)),0,1))
The first entry i.e. row 1 is the first part of the formula, and the last row will need the last part after the +
try this worksheet fucntions in your loop:
=COUNTIF(Range,"*yourtexttofind*")

Grouping rows by area codes

I have a table of customers to which my company ships products. The problem is that these customers need to be sorted by their area codes, so that the products can be sent to the appropriate shipping companies (we have two partner companies that ship to certain parts of the country). Each company sent us a list of area code numbers to which they can ship and I need to divide the Excel sheet into two sheets, each containing the customers with the area codes compatible with the respective company.
I tried to solve this problem with VLOOKUP function, but it only works on individual row basis, and I need a solution that will find all rows that contain a number from the specified group of area codes.
Another way would be IF function that would put a True or False (one IF function for each company) value in new column and then I could sort by that value, and copy the data into a new sheet. This approach would work, but the IF function would be extremely long and hard to control.
Can you suggest a way to solve this problem?
Edit to incorporate details provided via Comment:
Presently I have about 5,000 rows but in future it might be more though I doubt over 10,000 rows.
A VLOOKUP seems very promising, of the kind =VLOOKUP($B2,F:G,1,0) in C2 copied across and down as required, with a layout as below:
This does not group as you say you require (but do you really need to?) because it seems possible some locations will be served by both shippers. You might resolve this by flagging those rows where both are viable and then by sorting to split into three groups (Shipper1 only, Shipper2 only, both) before transferring the ranges as desired.
Edit in response to OPs comment
If you can be certain there is no overlap between Shippers, a single column with this formula, say in E2copied down, might be preferable:
=IF(ISERROR(MATCH(B2,F:F,0)>0),"Shipper2","Shipper1")
and would not routinely show #N/A. (This assumes no area is outside the range of both shippers.)

Resources