Compare multiple columns, pull out only cells that appear in every column - excel

I have 10 or so columns in my worksheet. Each column contains about 200 names, and there is no other data on the sheet.
What I'd like to do is create a new column that only contains the names that are common between the columns. So essentially compare each cell in each column to all the other cells in all the other columns, and only return the the common cells.
For example:
Column1 : name_A, name_C, name_F
Column2: name_C, name_B, name_D
Column3: name_C, name_Z, name_X
So in this example, the new column would only contain name_C, because it's the only value common to all three columns.
Is there any way to do this? My knowledge of Excel is quite poor, and I can't find anything similar to my problem online so I would appreciate any help.
Thanks for reading,
N

Put everything on a single spreadsheet and create a pivot table is probably more efficient than the algorithm you have on your mind.

here is my mock-up. I added extra names to demonstrate better
D(formula) has the easiest version. this will list only values that appear in all columns, but these will appear on the same lines as the corresponding name in column A, with blanks, and not sorted (giving D(result))
IF you would like all the names to appear the the top - as shown here in column E you can either sort your table (you will have to re-sort if the columns change) OR you can use my solution below:
get yourself the MoreFunc Addon for Excell ( here is the last working download link I found, and here is a good installation walk-through video )
once all is done select cells E1:E8, click the formula bar and type the following: =UNIQUEVALUES(IF(COUNTIF(A2:C9,A2:A9)=3,A2:A9,""))
accept the formula by clicking ctrl-shift-enter (this will create an array-formula and curly braces will appear around your formula)
A B C D(formula) D(result) E(result - sorted)
-------------------------------------------------------------------------------------------------------
1 | name_A name_C name_C =IF(COUNTIF($A$1:$C$8,A1)=3,A1,"") name_m
2 | name_C name_B name_Z =IF(COUNTIF($A$1:$C$8,A2)=3,A2,"") name_C name_C
3 | name_F name_D name_X =IF(COUNTIF($A$1:$C$8,A3)=3,A3,"")
4 | name_t name_o name_g =IF(COUNTIF($A$1:$C$8,A4)=3,A4,"")
5 | name_y name_p name_h =IF(COUNTIF($A$1:$C$8,A5)=3,A5,"")
6 | name_u name_k name_7 =IF(COUNTIF($A$1:$C$8,A6)=3,A6,"")
7 | name_i name_5 name_9 =IF(COUNTIF($A$1:$C$8,A7)=3,A7,"")
8 | name_m name_m name_m =IF(COUNTIF($A$1:$C$8,A8)=3,A8,"") name_m

Related

How to replace text in column by the value contained in the columns named in this text

In pyspark, I'm trying to replace multiple text values in a column by the value that are present in the columns which names are present in the calc column (formula).
So to be clear, here is an example :
Input:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |param_1-param_2
|Cell 3 |Cell 4 |param_2/param_1
Output needed:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |Cell 1-Cell 2
|Cell 3 |Cell 4 |Cell 4/Cell 3
In the column calc, the default value is a formula. It can be something as much as simple as the ones provided above or it can be something like "2*(param_8-param_4)/param_2-(param_3/param_7)".
What I'm looking for is something to substitute all the param_x by the values in the related columns regarding the names.
I've tried a lot of things but nothing works at all and most of the time when I use replace or regex_replace with a column for the replacement value, the error the column is not iterable occurs.
Moreover, the columns param_1, param_2, ..., param_x are generated dynamically and the calc column values can some of these columns but not necessary all of them.
Could you help me on the subject with a dynamic solution ?
Thank you so much.
Best regards
Update: Turned out I misunderstood the requirement. This would work:
for exp in ["regexp_replace(calc, '"+col+"', "+col+")" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Yet Another Update: To Handle Null Values add coalesce:
for exp in ["coalesce(regexp_replace(calc, '"+col+"', "+col+"), calc)" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Input/Output:
------- Keeping the below section for a while just for reference -------
You can't directly do that - as you won't be able to use column value directly unless you collect in a python object (which is obviously not recommended).
This would work with the same:
df = spark.createDataFrame([["1","2", "param_1 - param_2"],["3","4", "2*param_1 + param_2"]]).toDF("param_1", "param_2", "calc");
df.show()
df=df.withColumn("row_num", F.row_number().over(Window.orderBy(F.lit("dummy"))))
as_dict = {row.asDict()["row_num"]:row.asDict()["calc"] for row in df.select("row_num", "calc").collect()}
expression = f"""CASE {' '.join([f"WHEN row_num ='{k}' THEN ({v})" for k,v in as_dict.items()])} \
ELSE NULL END""";
df.withColumn("Result", F.expr(expression)).show();
Input/Output:

Comparing two cells and show both of them when it is not equal

I have a table in EXCEL that looks like this:
ATTRIBUTE_IN_1|TYPE_IN_1|ATTRIBUTE_IN_2 |TYPE_IN_2 |
________________________________________________________
advance |String | accounts |decimal(20,10) |
education |String | Job_category |String |
date of birth |date | advance |decimal(20,10) |
and etc.
I want to compare TYPE_IN_1 and TYPE_IN_2 (2nd and 4th columns) for equal values in ATTRIBUTE_IN_1 and ATTRIBUTE_IN_2 (1st and 3rd columns).
In the result I want to see values, when ATTRIBUTE_IN_1 and ATTRIBUTE_IN_2 have equal value, but TYPE_IN_1 and TYPE_IN_2 (2nd and 4th columns) have unequal values.
For that example in result I want to see:
ATTRIBUTE_IN_1|TYPE_IN_1|ATTRIBUTE_IN_2 |TYPE_IN_2 |
_______________________________________________________
advance |String | advance |decimal(20,10) |
What formula can i use in MS EXCEL for solving this problem?
Excel doesn't really lend itself to this type of question. You can cobble something together, perhaps along these lines (for attribute 1):
=IFERROR(INDEX($A$2:$A$10,INT(SMALL(IF(($A$2:$A$10=TRANSPOSE($C$2:$C$10))*($B$2:$B$10<>TRANSPOSE($D$2:$D$10)),
ROWS($A$2:$A$10)*(ROW($A$2:$A$10)-ROW($A$2))+TRANSPOSE(ROW($A$2:$A$10)-ROW($A$2))),ROW(1:1))/ROWS($A$2:$A$10))+1),"")
This sets up a 2D array where the rows are the attribute 1's and the columns are the attribute 2's. The array elements where there is a conflict are set to 1. The rest of the formula is to get the corresponding values out in row/column order.
For type 1:
=IFERROR(INDEX($B$2:$B$10,INT(SMALL(IF(($A$2:$A$10=TRANSPOSE($C$2:$C$10))*($B$2:$B$10<>TRANSPOSE($D$2:$D$10)),
ROWS($A$2:$A$10)*(ROW($A$2:$A$10)-ROW($A$2))+TRANSPOSE(ROW($A$2:$A$10)-ROW($A$2))),ROW(1:1))/ROWS($A$2:$A$10))+1),"")
For type 2:
=IFERROR(INDEX($D$2:$D$10,MOD(SMALL(IF(($A$2:$A$10=TRANSPOSE($C$2:$C$10))*($B$2:$B$10<>TRANSPOSE($D$2:$D$10)),
ROWS($A$2:$A$10)*(ROW($A$2:$A$10)-ROW($A$2))+TRANSPOSE(ROW($A$2:$A$10)-ROW($A$2))),ROW(1:1)),ROWS($A$2:$A$10))+1),"")
It's interesting to compare this with the SQL for doing the same thing e.g. in SQLite:
.mode csv
.import book2.csv test1
select a.attribute_in_1,a.type_in_1,b.type_in_2
from
test1 a inner join test1 b
on a.attribute_in_1=b.attribute_in_2 and a.type_in_1<>b.type_in_2;

Excel: Convert single column of repeating text into multiple rows

Easy excel question? (I've looked, and looked...)
How do you take a single column with groupings of TEXT (e.g. A1:A5, A6:A10, A11:A15) and convert those groups into rows (e.g. A1:A5 converts to B1:F1; A6:A10 converts to B2:F2) Thanks. (Note: can't upload image b/c I'm new)
Here's a text example:
CURRENT:
(Convert this column (2000 entries in groups of 5))
Trinity
Neo
Morpheous
Agent Smith
Oracle
Dozer
Mouse
Zion
Nebuchadnezzar
The Twins
REFORMATTED:
(Data in rows from the groups of 5)
Trinity | Neo | Morpheous | Agent Smith | Oracle
Dozer | Mouse | Zion | Nebuchadnezzar | The Twins
With data in column A, in C1 enter:
=INDEX($A$1:$A$2000,COLUMNS($A:A)+5*(ROWS($1:1)-1),1)
Then copy C1 to D1 through G1
Then copy C1 through G1 downwards. For example:
Obviously Gary's way is much quicker , but an easy to understand but dirty way is to do like so
Add Formula's like so( if you are not putting a header row in , then use =MOD(ROW(A2),5)
The data will look like so
Then filter to 1 on the last column

Extract data from two columns of one worksheet to another depending on presence of character in third column of a worksheet

I am looking for some help with a formula. On one worksheet, I have data set up similar to:
A B C D E
1 Test1 Test2 Test3
2 ID Name
3 T01 Confucius X
4 T02 Newton X
5 T03 Enstein X
6 T04 Plato X X
7 T05 Da Vinci X
It is similar to comparison of different products against set of features.
What I would like is, on a separate worksheet, when a test name (i.e. row1) is selected, information from column A and column B is returned ONLY if there is a check mark in the column for the respective test selected. In other words, if on a separate worksheet, I select Test3, then T04 Plato and T05 Da Vinci are returned. I would like this dynamic as my data spreadsheet will continue to grow with new IDs/Names and test(x).
I am open to formulaic or filter-type solutions.
I suggest a slight rearrangement of your labels and using a PivotTable:
This though would require refreshing of the PT if further values are added.
Edit re supplementary
I think what you want is a more amenable dataset. Replace Xs with the Test# in a single column (so two rows for Plato), then pivot:
You can user vlookup but you would need to put the values you are looking for (i.e. test3) in the early columns (i.e. "A") this can be done by using vlookup to a second sheet to put them in order.
A lot of vlookups get resource heavy, so a better solution is probably to try and do something with Access, if you own it.

Match text from column within a certain cell - Excel

I have a column of few thousand filenames that are not uniform. For instance:
| Column A | Column B |
===============================
| junk_City1_abunc | City1 |
-------------------------------
| nunk_City1_blahb | City1 |
-------------------------------
| small=City2_jdjf | City2 |
-------------------------------
| mozrmcity3_somet | City3 |
I would like to identify the city within the text in column A and return it in Column B.
I've come up with a complex formula that does the trick, but it is difficult to adjust if more cities are added within the filenames in new entries within column A.
Here is an example:
=IF(ISNA(MATCH("*"&$W$3&"*",I248,0)),IF(ISNA(MATCH("*"&$W$4&"*",I248,0)),IF(ISNA(MATCH("*"&$W$5&"*",I248,0)),IF(ISNA(MATCH("*"&$W$6&"*",I248,0)),IF(ISNA(MATCH("*"&$W$7&"*",I248,0)),IF(ISNA(MATCH("*"&$W$8&"*",I248,0)),"Austin","Orlando"),"Las Vegas"),"Chicago"),"Boston"),"Las Angeles"),"National")
It seems like there should be an easier way to do it, but I just can't figure it out.
(To make matters worse, not only am I identifying a city within the filename, I'm looking for other attributes to populate other columns)
Can anyone help?
Use the formula =IFERROR(LOOKUP(1E+100,SEARCH($E$2:$E$11,A2),$E$2:$E$11),A2)
This does *****NOT***** have to be array entered.
Where $E$2:$E$11 is the list of names you want returned and A2 is the cell to test
If no matches are found instead of errors you will just use the full name in column b.
If you want errors or expect to NEVER have then you can just use:
=LOOKUP(1E+100,SEARCH($E$2:$E$11,A2),$E$2:$E$11)
Here's a round about way that works, not all my own work but a mish mash of bits from other sources:
Assuming the sheet is setup as follows:
The formula to use is below, this must be entered using Ctrl+Shift+Enter
=INDEX($C$2:$C$8,MAX(IF(ISERROR(SEARCH($C$2:$C$8,A2)),-1,1)*(ROW($C$2:$C$8)-ROW($C$2)+1)))
Annotated version:
=INDEX([List of search terms],MAX(IF(ISERROR(SEARCH([List of search terms],[Cell to search])),-1,1)*(ROW([List of search terms])-ROW([Cell containing first search term])+1)))

Resources