Relabeling Duplicates in Excel of Cells in direct proximity - excel

Apologies for the tile gore - was trying to be descriptive.
I have a large lab result data set, and it has been found that one analyte was screened for twice per sample and i need to capture both sets of results. This results in me having a table similar to below where Antimony is listed twice. Is there a way to automate something either to flag the instances where i have two rows like that or rename to antimony-1 and antimony-2? Since I have 300 sites screened for the same things, everything shows up as duplicate and i can't use the simple methods. The main trigger is the proximity to another row where everything is matching except the results.

If I assume you have the data in you screen shot starting in cell A1 (and Soil as your site) I'd add to columns the first combines Site & Element (Column F in my example):
=A1&C1
Result: SoilAluminium
In the column next to that I'd have a formula:
=F1&COUNTIF($F$1:F1,F1)
Result - SoilAluminium1, SoilAntimony1, SoilAntimony2 etc
Note: Pay Attention to the $'s
I hope that works

Related

Match and Conditional Formatting from Matrix Table

I am looking for some decent help with my matrix table, and is there a good or best approach to properly match dependent instances in certain matrix using drop downs.
This picture represents my matrix table (Picture 1):
As you can see there are a lot of instances, but horizontally and vertically they got the same number of "headers". Those "1`s" are representing not compatibility in my case but lets call it simply "match". That is on one sheet that is gonna be populated with some new values from time to time.
On another sheet which is actually sheet for showing the data and their compatibility possibilities is equipped with drop downs. There you got "Groups (Group1, Group2...)" in a sense of main parts and "dependent groups (AA1, BB2..)" as small components that are part of main parts. To avoid misunderstanding here you have explanations, I used for the sake of this example fictional values:
Groups aka. Main Parts
Dependent groups aka. components
As you can see beneath, is my fictional table but exactly the same concept as I should use in my real case.
I PUT AN EXPLANATION IN THE PICTURE 2 SO YOU CAN FOLLOW ALONG AND SEE EXACTLY WHERE/WHAT I DID!
What I used firstly there are =match functions, one for vertical position (A3) and one for horizontal (B4). This boolean row is done using =or(index) but reffering to the match positions as you can see. And from there I should use true/false for coloring my group boxes in a case compatibility is possible - thats all the science.
So, my question is if there is another approach to this problem? As you can see I have 3 different rows of functions at one place, or imagine if I will have more "groups" that can rise in many more rows and calculations.
Picture 2
EDITED:
This is screenshot of the original sheet, I just hid some rows that were with Infos that is reason the number is not consistent. As you can see it is almost the same as dummy example I provided above. Underneath every "box" you got three rows of calculations as I mentioned before. The two times number "2" that you see here is the position of some value that I found using =match function, one is for horizontal and another for vertical lookup. In this case it is model type, 070FX is position 2, 100FX is 3 and 200FX is 4th position in the matrix table, and so on for all the other groups. And those groups (Model, Endpoint, Gas sensor...) are defined separately on another sheet where I had to make unique list and dependent list so I can reference those to my drop down list.
EDIT Nr 4! So this formula I used for true/false:
=SUMPRODUCT(('0359-matrix'!$A$2:$A$101=F10)*(('0359-matrix'!$B$1:$CW$1=$B$10)+('0359-matrix'!$B$1:$CW$1=$C$10)+('0359-matrix'!$B$1:$CW$1=$D$10)+('0359-matrix'!$B$1:$CW$1=$E$10)+('0359-matrix'!$B$1:$CW$1=$F$10)+('0359-matrix'!$B$1:$CW$1=$G$10)+('0359-matrix'!$B$1:$CW$1=$H$10)+('0359-matrix'!$B$1:$CW$1=$I$10)+('0359-matrix'!$B$1:$CW$1=$J$10)+('0359-matrix'!$B$1:$CW$1=$K$10)+('0359-matrix'!$B$1:$CW$1=$L$10)+('0359-matrix'!$B$1:$CW$1=$M$10)+('0359-matrix'!$B$1:$CW$1=$N$10)+('0359-matrix'!$B$1:$CW$1=$O$10)+('0359-matrix'!$B$1:$CW$1=$P$10)+('0359-matrix'!$B$1:$CW$1=$Q$10)+('0359-matrix'!$B$1:$CW$1=F13)+('0359-matrix'!$B$1:$CW$1=G13)+('0359-matrix'!$B$1:$CW$1=H13)+('0359-matrix'!$B$1:$CW$1=I13)+('0359-matrix'!$B$1:$CW$1=J13))*'0359-matrix'!$B$2:$CW$101)>0
I copied only last part, or when it starts from second row..Because it is too long to write whole funciton - it cuts down automatically.
('0359-matrix'!$B$1:$CW$1=$Q$10)+('0359-matrix'!$B$1:$CW$1=$B$13)+('0359-matrix'!$B$1:$CW$1=$C$13)+('0359-matrix'!$B$1:$CW$1=$D$13)+('0359-matrix'!$B$1:$CW$1=$E$13)+('0359-matrix'!$B$1:$CW$1=$F$13))*'0359-matrix'!$B$2:$CW$101)>0
But on marked cells I am getting the same results: B22 - F22 has the same as B21 - F21 (boolean) what shouldnt be like that but to follow color, green is False, it has to be something with an array reference.
Checkout the following. A1 to E5 is the matrix that shows which pieces are incompatible (=1). The others have to be empty or 0.
In cell I8 I used the following formula (and copied it down up to I11):
=SUMPRODUCT(($A$2:$A$5=H8)*(($B$1:$E$1=$H$8)+($B$1:$E$1=$H$9)+($B$1:$E$1=$H$10)+($B$1:$E$1=$H$11))*$B$2:$E$5)
The formula result shows you the amount of incompatibilities a part has. Eg AA1 has one incompatibility with BB2 but BB2 is incompatible with 2 AA1 and CC3.
To get the TRUE/FALSE use the same formula and append >0: like =SUMPRODUCT(…)>0
For any additinonal "group" (Model, Endpoint, …) you need to add another +($B$1:$E$1=$H$12) where $B$1:$E$1 points to your matrix data and $H$12 to your selected group value.
Overview of the formula ranges:
Note that this kind of calculation can only tell the amount of incompatibilites a part has but not the names of the parts that are incompatible.
Edited horizontal version
Formula in the selected cell is
=SUMPRODUCT(($A$2:$A$5=G8)*(($B$1:$E$1=$G$8)+($B$1:$E$1=$H$8)+($B$1:$E$1=$I$8)+($B$1:$E$1=$J$8))*$B$2:$E$5)
you can pull it to the right.

Create an Excel Formula that uses filtered data

I'm trying to design a second page that shows % results of my data on page 1.
For example, Column F & G allow manual entry of numbers 1-4 which are based off data the user types in at another location.
This is being used for trade tracking in investments so there will be quite a few numbers but the end result will be a row will show a specific stock, it's subsequent data, whether it made or lost money, etc.
What I want to do in page 2 is using the numbers 1-4 which were typed in at columns F & G, translate that into an edge on page 2.
For example, if there were 50 columns of data typed out for trades executed, I could take the number of winning trades of a certain setup (say number 3) and divide that by the total trades of 50 to come out with a win % for that setup.
However, I have no clue to how to translate that forumla into a filter formula so that on page 2 I could see that of the numbers 1-4 (4 different setups) I could easily see the highest and lowest win % to determine the best setup to use.
I'm not the best in excel but I understand enough to code most of that, I simply have no idea how to take that end formula and add a filter to it so that it only uses partial results. I've got 4 other formulas I want to use on page 2 as well to help build something that could really benefit myself, but if someone could just show me how to filter data into a formula, I think I could take it form there.
Thanks for the help
Ben
You can also do something like this with array formulas
=MAX(IF(Sheet1!$F$2:$F$50=$A2,$E$2:$E$50))
(Press Ctrl+Shift+Enter [CSE], instead of just Enter when entering Array Formulas)
Also, take a look a the SUMPRODUCT function. It comes in very handy for filtering data. Here are some helpful links...
https://www.get-digital-help.com/2017/12/07/sumproduct-multiple-criteria/
https://www.get-digital-help.com/2017/12/08/sumproduct-and-if-function/
https://www.get-digital-help.com/2010/09/01/extract-a-unique-distinct-list-by-matching-items-that-meet-a-criterion-in-excel/

List of items find almost duplicates

Within excel I have a list of artists, songs, edition.
This list contains over 15000 records.
The problem is the list does contain some "duplicate" records. I say "duplicate" as they aren't a complete match. Some might have a few typo's and I'd like to fix this up and remove those records.
So for example some records:
ABBA - Mamma Mia - Party
ABBA - Mama Mia! - Official
Each dash indicates a separate column (so 3 columns A, B, C are filled in)
How would I mark them as duplicates within Excel?
I've found out about the tool Fuzzy Lookup. Yet I'm working on a mac and since it's not available on mac I'm stuck.
Any regex magic or vba script what can help me out?
It'd also be alright to see how much similar the row is (say 80% similar).
One of the common methods for fuzzy text matching is the Levenshtein (distance) algorithm. Several nice implementations of this exist here:
https://stackoverflow.com/a/4243652/1278553
From there, you can use the function directly in your spreadsheet to find similarities between instances:
You didn't ask, but a database would be really nice here. The reason is you can do a cartesian join (one of the very few valid uses for this) and compare every single record against every other record. For example:
select
s1.group, s2.group, s1.song, s2.song,
levenshtein (s1.group, s2.group) as group_match,
levenshtein (s1.song, s2.song) as song_match
from
songs s1
cross join songs s2
order by
group_match, song_match
Yes, this would be a very costly query, depending on the number of records (in your example 225,000,000 rows), but it would bubble to the top the most likely duplicates / matches. Not only that, but you can incorporate "reasonable" joins to eliminate obvious mismatches, for example limit it to cases where the group matches, nearly matches, begins with the same letter, etc, or pre-filtering out groups where the Levenschtein is greater than x.
You could use an array formula, to indicate the duplicates, and you could modify the below to show the row numbers, this checks the rows beneath the entry for any possible 80% dupes, where 80% is taken as left to right, not total comparison. My data is a1:a15000
=IF(NOT(ISERROR(FIND(MID($A1,1,INT(LEN($A1)*0.8)),$A2:$A$15000))),1,0)
This way will also look back up the list, to indicate the ones found
=SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A1)*0.8)),$A3:$A$15000,1)),0,1))+SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A2)*0.8)),$A$1:$A1,1)),0,1))
The first entry i.e. row 1 is the first part of the formula, and the last row will need the last part after the +
try this worksheet fucntions in your loop:
=COUNTIF(Range,"*yourtexttofind*")

Break-Down Data in Excel without VBA (Formula Only)

Many times, I am required to provide some type of break-down to the customers - an example is shown in the attached figure.
I have a table of data ("TABLE DATA" - which is some type of pivot) + Customer provides its official form, its structure must be preserved (highlighted in yellow ). Basically, I need to separate the cost details of CODE "A" and CODE "B" into 2 separated sections.
Customer requires me to provided details for each individual Part (example shows Part A - "Break-Down Part A)
Is there anyway to put a"ITEM" from "TABLE DATA" into Code A and Code B ? the rests can be solved by Vlookup (Price, Quantity) - note: "ITEM" is non-duplicated values . Thank you very much
Number your rows in the breakout using =1 and =A1+1 and then just use the formula ="B-ITEM"&TEXT(A1,"000"). If you want to skip making a counter column you could use ="B-ITEM"&TEXT(ROW()-1,"000") to just use the current row number (minus 1 or however many you need).
If your items aren't sequentially like that, but still unique, I would recommend adding counters on the original tab similar to what you have, which would let you quickly find the 5th A or 7th B, something that counts the previous instances of your current type, and then adds 1. For Row 6 you could do =COUNTIF(A$1:A5,A6)+1.

How to search for a partial and an absolute in excel to get an answer?

I have a worksheet, in where I need a search that does more than one query. The problem I am running into is this:
On the workbook there are two tabs, the first is Jobs, the second is OOR. In OOR there are multiple columns empty, Order Qty., Orig Promise Date, and Shop Order.
Now I know there are duplicates, and this is fine, what I am looking at now is to use Column B in OOR is a refrence. So in this case use B3 as the refrence point. which is a partial number of 48900421 Rev 2. What I want to do is this, use two refrence points.
I want to look up B3 in OOR, and use two points of refrence to gurantee the correct job is refrenced. Those two columns to refrence is in Jobs. The first is Column B which will always equal Dakota Systems, Inc., and the other will reference Column C, but this is where I don't know what to do here, I since C3 in OOR only shows 48900421, it will never find 48900421 Rev 2I thought about using something like this:
=IFERROR(INDEX(Jobs!$E:$E,MATCH(1,INDEX((OOR!$C:$C=$B3)*(Jobs!$C:$C="Dakota Systems, Inc."),1),0)),"")
But for some reason I am getting a blank when I don't think I should be. I'm loosing my sanity this late in the week, can someone help?
https://dl.dropbox.com/u/3327208/Excel/twosearches.xlsx
You don't seem to be referencing the right columns....and also you need a zero in the second INDEX function, not a 1
Try this version in in OOR!I3 copied down, using ISNUMBER(FIND to find your part number within other text:
=IFERROR(INDEX(Jobs!E$3:E$1000,MATCH(1,INDEX(ISNUMBER(FIND(B3,Jobs!C$3:C$1000))*(Jobs!B$3:B$1000="Dakota Systems, Inc."),0),0)),"")
format in required date format
Revised re comment below:
=IFERROR(INDEX(Jobs!E$3:E$1000,MATCH(1,INDEX(ISNUMBER(FIND(B3,Jobs!C$3:C$1000))*(Jobs!B$3:B$1000="Dakota Systems, Inc.")*(Jobs!A$3:A$1000=M3),0),0)),"")

Resources