MS Excel formulae to Concatenate on the basis of string content - excel

How to write the formula for below sample problem statement-
Jack1
Jill1
Mike1
Mike2
Mike3
Dave1
Dave2
Max1
This should be written as-
Jack1
Jill1
Mike1,2,3
Dave1,2
Max1

In order to manipulate data, that data needs to be relatively uniform (the less uniform it is, the more work is required to make it uniform before it can be manipulated.
In this case, I will make the following assumptions about the uniformity of this data:
(1) The number value will always exist;
(2) The number value will always be at the end of each name;
(3) The number value will always be a single digit;
(4) The number value will always start at 1; and
(5) The names will always be in order.
If any of these assumptions is false, then a VBA solution is required. If they are accurate, then a few helper columns will allow for an Excel formula solution.
Assuming your data is in column A, starting at A1, first use this formula in column B, starting at B1 and dragged down:
=right(A1)
This pulls the right-most character from each cell in column A. Then put the following formula in column C, starting at C2 and dragged down [C1 will need to be changed to just "=A1"]:
=IF(B2=1, A2, C1&","&B2)
This will create an arranged list that counts up at each row, until there is a new name. To use this to create a list which only shows the data you want, there are shorter ways but I'll show one of the simpler (but longer) methods:
In D1 and dragged down, put the following formula:
=IF(B2=1,COUNTIF($B$1:B1,1),"")
This will create a column which increases by 1 each time a new name ends its iteration.
Then in column E (or on a new sheet, or wherever you want your final list to be) put this starting in row 1 and drag down:
=IF(ROW()<=MAX(D:D),INDEX(C:C,MATCH(ROW(),D:D,0)),"")
This checks the highest number achieved in column D (ie: the number of names so far), and pulls the index in column C (the formatted name) which matches column D for the current row number. ie: if this formula is in ROW 5, and there are at least 5 names listed from column D, then it will match the number 5 from column D, and pull the info from column C where that row matches.

Related

How to find non matching records from two columns while accounting for duplicate values in Excel

I have two large columns.
Column A contains 100,000 different numbers/rows. Column B contains 100,210 numbers/rows. They have the same numbers except column B has 210 extra rows. I need to be able get the values of that extra 210 rows.
The issue im having is that the numbers in these rows are not unique.
For example,
Column A contains the following numbers: 2,1,3,4,5,5,6,7
Column B contains the following numbers: 1,2,3,4,5,5,5,5,6,6,7,8
I want the outcome result to be: 5,5,6,8
I can't seem to wrap my head around a way to do this.
I have the two columns in a text file that im importing into excel. If there are better ways to do it outside of excel, I am open to it too.
With the Dynamic Array formula Filter:
=FILTER(B1:B12,COUNTIF(OFFSET(B1,0,,SEQUENCE(ROWS(B1:B12))),B1:B12)>COUNTIF(A:A,B1:B12))
Without FILTER:
Put this in the first cell and copy down:
=IFERROR(INDEX(B:B,AGGREGATE(15,7,ROW(B1:B12)/(COUNTIF(OFFSET(B1,0,,ROW(INDEX($ZZ:$ZZ,1):INDEX($ZZ:$ZZ,ROWS(B1:B12)))),B1:B12)>COUNTIF(A:A,B1:B12)),ROW($ZZ1))),"")
Try to follow these steps, supposing that Column A has less values than the Column B and the rows start at 1:
A. Create Column C.
In the cell C1 place the function: =COUNTIF(A:A;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell C2 will have the function =COUNTIF(A:A;B2) and so on.
B. Create column D.
In the cell D1 place the function: =COUNTIF($B1:$B1;B1)
Copy this function to the rest of cells, for all items of Column B. So, cell D2 will have the function =COUNTIF($B$1:$B2;B2) and so on.
C. Create column E.
In the cell E1 place the function: =IF(D1<=C1,"Exists","Missing")
Copy this function to the rest of cells, for all items of Column B. So, cell E2 will have the function =IF(D2<=C2,"Exists","Missing") and so on.
D. Filter to show only the rows that Column E values are "Missing".
Of course you can combine all above 3 columns to one (e.g. in Column F), so these cells will have the functions:
F1: =IF(COUNTIF($B$1:$B1,B1)<=COUNTIF(A:A,B1),"Exists","Missing")
F2: =IF(COUNTIF($B$1:$B2,B2)<=COUNTIF(A:A,B2),"Exists","Missing")
and so on
Explanation:
In column C we count how many times the value of the respective cell
of Column B exist in the whole Column A.
In Column D we count how many times we have "met" this value in Column B so far.
In Column E we check if we have "met" the value more times that it exists in Column A. If indeed we have "met" it more times, then we mark the cell as "missing"
Tested with the example you provided and works okay.
I hope it helps!
Good luck!
EDIT - Addition of Screenshot

Excel: An array in an array?

So, I would like to return the contents of all rows where the value in column A is, let's say, 1.
My thought process is that I could use:
=INDEX(row_range,MATCH(1,A:A,0),0))
But Match will only return one value here, i.e. the number of the first row which contains a 1 in column A.
Is there a way of creating an array with the Match formula (thus returning the multiple row numbers, all of which contain '1' in column A) and then place that in the Index array so that it then runs through each of the Match-array values and creates a big long list of values in one array which I can then list out on a separate sheet?
Hope this makes sense...
Here is a demonstration of what I'm hoping for, if that helps! The idea would be that the array as shown would be created, which could then be extended down the column as per the part underneath.
https://i.stack.imgur.com/nCusM.png
Use the file you showed in your example (as "Sheet1") and put these formulas into indicated cells in Sheet2:
Into cell A2 put
=AGGREGATE(15;6;ROW(Sheet1!A:A)/((Sheet1!A:A=1)*1);ROW(A1))
this will give you all the rownumbers where value in A column of sheet1 equals 1.
Into cell B2 put
=COUNTA(INDIRECT("Sheet1!"&A2&":"&A2))-1
this will give you how many cells are filled in that row.
Into cell C2 put
=TEXTJOIN(",";TRUE;OFFSET(Sheet1!$A$1;A2-1;1;1;B2))
This will give you all the cells with data from that row concatenated. If you dont have this formula (first time in 2016 I believe) you can use OFFSET function to list the values in separate columns and then CONCATENATE them.
Copy these three down as many times as you want and into cell C1 put
=TEXTJOIN(",";TRUE;OFFSET(C2;0;0;COUNTIF(Sheet1!A:A;1);1))

How to find the maximum value of a given range, dependent on the value in a separate column

Screenshot of the Excel worksheet
I'm working with historic stock prices, and using eight columns I have:
Column A: High
Column B: Low
Column C: Close
Column D: Cx-Cx-4
Column E: Counts the number of consecutive positive numbers in column D
Column F: Counts the number of consecutive negative numbers in column D
Column G: Calculate the difference between the maximum of column A and minimum of column B within a given sequence.
As an example G1 should equal:
=max(A1:A5)-min(B1:B5)
G6 should equal:
=max(A6:A8)-min(B6:B8)
G9 should equal:
=max(A9:A11)-min(B9:B11)
And so on.
I'd like to know if it is possible to automate this calculation, possibly with the use of one or more additional columns.
Welcome to SO!
This may not be the most efficient solution as you need to add two helper columns, but if I understand your requirements correctly, then this idea should work well enough.
First, let's assume that there are 100 rows in your data set. Given that, enter the formula "=A100" in cell G100 and the formula "=B100" in cell H100. This sets up the boundary condition for the formulas in columns G and H. Now, in cell G99, enter this formula:
"=IF(E99="",G100,IF(E100="",A99,MAX(A99,G100)))"
What this formula does is set up a "running maximum" with the following logic:
If the cell in E99 is blank, copy the running maximum from G100, else:
If the cell in E99 is not blank but the cell in E100 is, set up a new running maximum from the cell in A99, else:
Take the maximum of A99 and G100 as the new running maximum.
Similarly, copy the following formula into cell H100:
"=IF(F99="",H100,IF(F100="",B99,MIN(B99,H100)))"
This follows the same logic as the previous formula, but takes the minimum of column B.
Copy or autofill these formulas to the top of the data set. This should now give you running maximum for column A and a running minimum for column B.
The next step is to calculate the difference. I notice from your question, that you only seem to be interested in calculating this difference at the top of each range (G1, G6, G9, etc.), rather than doing it in every row. Given that, we need a slightly more complicated formula.
The boundary condition for this formula is simply "=G1-H1" entered in cell I1. In cell I2, enter this:
"=IF(OR(AND(E2<>"",E1=""),AND(F2<>"",F1="")),G2-H2,"")"
How this works is that it check two conditions that indicate a range boundary:
E1 is blank and E2 is not
or
F1 is blank and F2 is not
If either of these conditions hold, the IF statement is true and "G2-H2" is diplayed, otherwise a blank cell is displayed. Now copy or autofill this formula to the bottom of the data set.
As a final step, you can now hide columns G and H if you don't need them displayed. This should now give you the results I think you're looking for. Please let me know if this doesn't work out for you.

Excel Index & Match two columns while pulling data for a third

I am trying to create a formula which compares numerical values of two columns and when a match is found return data from a different column.
Here is the problem I am having - the formula I have come up with seems to go Row by Row - for example, if column A & B are the two columns I want to compare, with column C having the data I want if a match is made. If I have a value of 1 in row A1 and a value of 1 in B1 - it will successfuly return the data in column C.
The problem is that my numbers are jumping, the row's do not match, for example column A is 1,2,3 however column B is 1,3,2. The end result here is that I get data for the value 1, but on the mismatch for the second row I get no value.
Basically the formula I made seems to do a hard comparison based just off the two rows of each column - meaning it will only compare A1 to B1. What I really need is it to compare the ENTIRE column and disregard the rows completely
Here is the formula I have been fooling around with - this formula works if A1 and B1 match
=INDEX(M:M,MATCH($L:L,$V:V,0))
In this formula M has the data I need, while columns L and V have numerical values, I need it to not 'hard check' row by row and instead evaluate the entire column and when a match is found return the result (so if both columns have a '2' return that value REGARDLESS of the fact that the '2' may be in rows A2 and B9)
Hopefully I explained my issue well, and I appreciate all the help I can get
EDIT
Sorry for failing to explain it properly on my end -- I will base my explanation off the picture link below.
I need data from column B to show up in column D. What is happening is row 2 matches so the data is successfully retrieved for number 1 - however on row 3 where column A switches the numbering it compared the number '3' to the number '2' and recognizes it is not a match and returns NA -- even though there IS a match in columns A4 and C2 -- in this situation for C3 I would need the data from B4 to showup in D3
http://i.stack.imgur.com/nrKJp.png
Two formulas that will each give you the return you want, Put one of the following in D2 and copy down:
=VLOOKUP(C2,A:B,2,FALSE)
OR
=INDEX(B:B,MATCH(C2,A:A,0))

Excel: Print Values from Column A not found in Column B in Column C

I have tried finding this solution on the web but have not had success for this specific problem. In Excel 2010 I have some data in column A where each value may partially contain data in column B.
EX:
Column A might contain "http://google.com/webmasters"
Column B might contain "google.com"
This should give me a match.
I want to print in Column C all values in column A that do not contain any values from column B.
EX:
Column A
http://dir.mydomain.tdl
http://myotherdomain.tdl
http://blog.otherdomain.tdl
http://www.lastdomain.tdl
Column B
mydomain.tdl
lastdomain.tdl
Column C (results required)
http://myotherdomain.tdl
http://blog.otherdomain.tdl
Any help would be greatly appreciated.
I think I have the solution using ARRAY formula. Assuming your input AND that columns A-C have titles, or simply, strings are listed starting cells A2 and B2, do the following:
C2: type the formula =IF(OR(NOT(ISERROR(SEARCH(INDIRECT("B2:B"&(COUNTA($B:$B))),$A2)))),"",$A2) but press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in {} brackets around it (but do NOT type them manually!).
Autofill formula in C2 until the end of list in column A, e.g. if the last value is in A100, then autofill up to C100 (how long column B does not matter here).
You may then copy & paste obtained results as values and sort out empty strings.
Here you go! The key here - we check every string in column A for having at least one match among array of strings in column B, and return empty string in case at least one match found.
For your convenience sample file is shared: https://www.dropbox.com/s/janf0xxon4z2yh5/DomainsLookup.xlsx
Maybe not the must efficient but you could simply use two arrays - one for Column A and one for Column B. Iterate through ColumnA array to see if it exists in ColumnB array (use Array.IndexOf or .contains). If it does you could remove it from the ColumnA array and output the remaining values in Column C as the remainder.

Resources