Create a table splitting comma and finding unique elements

Create a table splitting comma and finding unique elements - excel

I have the following data
Person Week1
P1 L,L
P2 M,H
Output I would like is
Person Week1
L M H
P1 2 0 0
P2 0 1 1
My intention is to create a chart based on the output so I can figure out how many codes a person got per week. Pivot tables does not seem to work for this case.
Thanks

This is a pure formula approach.
Its based off of two basic formulas. The first formula is how to count the number of times a string A occurs within string B. This is done, by counting the number of characters in the string B, then by counting the number of characters in String B after string A has been replaced by nothing or "". If string A is more than 1 character long you need to divide the result by the length of string A. That gives us this formula:
=(LEN(STRING B)-LEN(SUBSTITUTE(STRING B, STRING A, "")))/LEN(STRING A)
Now we know how to count the number of time L, M or H occur as they are string A and now we need to determine string B.
IF we look at the first table, it has nice row headers and column hearders. We could take a short cut and just assume everything is in order however I am going to go with the more generic approach in case the headers happen to be in a random order.
Basically we need to find out what column in the first table matches with the header in our second table. ie is week2 really the second column? is P1 still the first row? in order to do that we use the following
=MATCH("WEEK X",$B$1:$D$1,0)
and
=MATCH("PX",$A$2:$A$3,0)
Those will return an integers which we can then drop into the an INDEX function to locate and find the text in the first table:
=INDEX($B$2:$D$3,MATCH("PX",$A$2:$A$3,0),MATCH("WEEK X",$B$1:$D$1,0))
AWESOME we now know how to find the text from the table to drop into our counting formula that we started with. That las formula gets substituted into wherever there is STRING B!
=(LEN(INDEX($B$2:$D$3,MATCH("PX",$A$2:$A$3,0),MATCH("WEEK X",$B$1:$D$1,0)))-LEN(SUBSTITUTE(INDEX($B$2:$D$3,MATCH("PX",$A$2:$A$3,0),MATCH("WEEK X",$B$1:$D$1,0)), STRING A, "")))/LEN(STRING A)
yeah its getting a little ugly isn't it! String A is then whatever cell L is in in your second table. Replace "Week X" with your week header in your second table. Replace "PX" with the name of your person in your second table.
I would do the first formula, then copy it over under the M and under the H. Go into the M and H formula and adjust it so its pointing at the right week header in each. Lock the row but not the column references for the week header and the string A cells. Lock the column but not the row for the persons name. once you have that set up, copy al three formulas and paste under each week. Then just copy your entire first row of table two down for the number of people you have and voila!
Proof of concept
The formula I used in Cells H3, I3, and K3 respectively
=(LEN(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)))-LEN(SUBSTITUTE(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)),H$2,"")))/LEN(H$2)
=(LEN(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)))-LEN(SUBSTITUTE(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)),I$2,"")))/LEN(I$2)
=(LEN(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)))-LEN(SUBSTITUTE(INDEX($B$2:$D$3,MATCH($G3,$A$2:$A$3,0),MATCH(H$1,$B$1:$D$1,0)),J$2,"")))/LEN(J$2)
Here is another proof of concept with expanded range showing rows out of order, and multiple letter strings to be searching for and more than two entries. Same formulas, just had to adjust the look up ranges for the increased table size.

If using VBA is acceptable, splitting the comma separated data using TextToColumns should help as a first processing step.
Then using a pivot table gives you the output you want.

Related

Condensing nested if-statements with multiple criteria

The blue columns is the data given and the red columns is what is being calculated. Then the table to the right is what I am referencing. So, F2 will be calculated by the following steps:
Look at the Machinery column (D), if the cell contains LF, select column K, otherwise select column L
Look at the Grade column (E), if the cell contains RG, select rows 4:8, otherwise select rows 9:12.
Look at the Species column (A), if the cell contains MS, select rows 5 and 10, otherwise.......
Where every the most selected cell is in columns K and L, copy into column F.
Multiply column F by column C.
I don't want to make another column for my final result. I did in the picture to show the two steps separately. So column F should be the final answer (F2 = 107.33). The reference table can be formatted differently as well.
At first, I tried using nested-if statements, but realized that I would have like 20+ if statements for all the different outcomes. I think I would want to use the SEARCH function to find weather of not the cell contains a specific piece of information. Then I would probably use some sort of combination of match, if, v-lookup, index, search, but I am not sure how to condense these.
Any suggestion?

SUMPRODUCT is the function you need. I quickly created some test data on the lines of what you shared like this:
Then I entered the below formula in cell F2
=SUMPRODUCT(($I$4:$I$9=E2)*($J$4:$J$9=LEFT(A2,FIND(" ",A2)-1))*IF(ISERROR(FIND("LF",D2,1)),$L$4:$L$9,$K$4:$K$9))
The formula may look a little scary but is indeed very simple as each sub formula checks for a condition that you would want to evaluate. So, for example,
($I$4:$I$9=E2)
is looking for rows that match GRADE of the current row in range $I$4:$I$9 and so on. The * ensures that the arrays thus returned are multiplied and only the value where all conditions are true remains.
Since some of your conditions require looking for partial content like in Species and Machine, I have used Left and Find functions within Sumproduct
This formula simply returns the value from either column K or L based on the matching conditions and you may easily extend it or add more conditions.

Why am I obtaining this strange result adding all the values in 2 Excel columns?

I am not into Excel and I have this problem trying to sum the values of 2 different column and put this result value into a cell.
So basically I have the D column containing 2 values (at the moment only 2 but will grows without a specific limit, I have to sum all the values in this column). These value are decimal values (in my example are: 0,3136322400 and 0,1000000000).
Then I have an I column containing the same type of value (at the moment only one but also the values in this column can grow without a specific limit...in my example at this time I have this value −0,335305)
Then I have the K3 cell where I have to put the sum of all the valus into the D column and all the values into the I column (following my example it will contain the result of this sum: 0,3136322400 + 0,1000000000 −0,335305.
Following a tutorial I tried to set this simple forumla in the K3 cell:
=SUM(A:I)
The problem is that in this cell now I am not obtaining the expected result (that is 0.07832724) but I am obtaining this value: 129236,1636322400.
It is very strange...I think that maybe it can depend by the fact that the D and the I column doesn't contain only number but both have a textual "heder" (that is the string "QUANTITY" for both the cells). So I think that maybe it is adding also the number conversion of this string (but I am absolutly not sure about this assertion).
So how can I handle this type of situation?
Can I do one of these 2 things:
1) Adding the column values starting from a specific starting cell in the column (for example: sum all the values under a cell without specify a down limit).
2) Exclude in some way the "header" cells from my sum so the textual values are not considered in my sum.
What could be a smart solution for my problem? How can I fix this issue?

The sum function can take several arguments.
=sum(d2:d10000, i2:I10,000, more columns )
This should remove the header from the calculation.

If you turn your data into an Excel Table (Insert > Table), you can use structured referencing to address a table column, excluding the header.
=SUM(Table1[This Header],Table1[That Header])
Then you don't need to reference whole columns. If you add new data to the table, the formula will take that into account.

Excel-how to convert multiple rows into a single row for each unique identifier in column A

I have a large data set(over 1 million rows) of patient names, problems/diagnoses, and the dates these diagnoses were entered(with each variable as a column header).
I would like to pull data from this source file to add to an existing file which has about 900 unique patient names with other demographics(in columns).
I am not able to use the vlookup function because most patients have multiple problems.
Are there any other functions or tricks which might be helpful?
Thanks in advance for your time and efforts.
Sample of what Data Currently looks like:
Name Diagnosis Date of Dx
A Head 11/15/12
B Leg 09/08/14
B Elbow 10/11/15
C Hand 02/23/16
A Toe 04/11/13
A Eye 05/25/15
C Ear 12/21/14
What I would like Data Set to Look like:
Name Dx#1 Date#1 Dx#2 Date#2 Dx#3 Date#3
A Head 11/15/12 Toe 04/11/13 Eye 05/25/15
B Leg 09/08/14 Elbow 10/11/15 n/a n/a
C Hand 02/23/16 Ear 12/21/14 n/a n/a

I'm not sure if you're familiar with the index and match functions, but you can use those to create the sheet. The easiest way would be to add several helper columns (in your example 3) and use the match function to get the reference row that you want.
From there you can offset the search range by previous match to find the next match. You can do this as many times as necessary depending on the number of conditions a patient has.
After that it's a simple index function to fill in the rows of the table with the desired values. You can clean up the extra cells with iferror if you want.
Assuming your data is in columns A1:C8, and your output dataset is in columns E1:K4, the following formulas will give you the desired output. The helper columns are found in L1:N4. These formulas would go in row 2, but you can drag them down to calculate for the rest of the rows.
I'll add the column above each formula:
E
No formula, list all patient names
F
=INDEX(B:B,L2)
G
=INDEX(C:C,L2)
H
=IFERROR(INDEX(B:B,M2),"")
I
=IFERROR(INDEX(C:C,M2),"")
J
=IFERROR(INDEX(B:B,N2),"")
K
=IFERROR(INDEX(C:C,N2),"")
L
=MATCH(E2,$A$1:$A$8,0)
M
=IFERROR(MATCH($E2,OFFSET($A$1,L2,0,COUNTA($A:$A)-L2),0)+L2,"")
N
=IFERROR(MATCH($E2,OFFSET($A$1,M2,0,COUNTA($A:$A)-M2),0)+M2,"")
Hope this helps, and let me know if you have any questions about the formulas.

Sum every 11 rows excel

I have a table with 2600+ rows, related to towns in my region and their population; each town has 11 rows, one for each age class (0-9, 10-19, and so on).
I need to get the sum of the population of each town; of course I can do it manually but it's a never ending job; I wonder if there's some kind of command that tells excel to do the sum every 11 rows and do it for all the towns.
I think it's a kind of loop but I have no idea about how to do it.

The problem can be reduced by using the SUMIF function. The question then becomes how to apply this to your dataset.
Assuming one of the columns in your 2600+ rows contains the town name (or another unique identifier), and you have a list of towns (or other unique identifier), the below method can be used.
The formula in E2 is =SUMIF(A:A,D2,C:C), and in E3 =SUMIF(A:A,D3,C:C). A to C is the list of all data, D is a list of towns.

For a VBA solution, you should be able to use a step in the loop.
So if you wish to step by 11 rows at a time.
Public Sub IterateRows()
Dim rData As Range, rPtr As Range
Dim dSum As Double
Dim i As long
Set rData = Sheet1.Range("A1:A1000")
For i = 1 To rData.Rows.Count Step 11
Set rPtr = rData(1).Resize(11).Offset(i - 1)
dSum = Application.WorksheetFunction.Sum(rPtr)
Next
End Sub
If you want a worksheet function solution, you will probably have to use the MOD operator and check for when the value is zero..

You can also try this manual method which is not a never ending job; i.e.:
in E11 put your formula as =SUM(C1:C11)
copy range E1:E11
select range E12:E2600 and paste special function

Do you have any reference columns? As in...say for example Column A has the Town Name, and Column B has the Age Class, and Column C has the values.
Going down the rows Column A will have repeating town names, yes?
Like this:
Town - Age Class - Pop
Wherever - 0-9 - 1000
Wherever - 10-19 - 2000
Wheverer - 20-29 - 2500
Assuming you have maintained the data structure (NO GAPS) a possible solution in Column D (or whatever column just make sure you change the references) could be (putting this in D2 and dragging it down the length of your sheet):
=IF(A1<>A2,SUM(INDIRECT("C"&ROW(A2)&":C"&SUMPRODUCT(MAX((A:A=A2)*(ROW(A:A)))))),D1)
This works if you have any amount of rows per town, so long as you SORT by town name so the same names are next to each other in the list and there are no gaps.
In the above test data set I subtracted 250 from each Values count per Town going down (each class has 250 less than the previous city) just to show some variation in the output...you can see each city has 2750 (250 * 11) less pop than the previous.
Basically it builds an array with a starting position of "not the town above" in the first row it encounters a new town name to an ending position of "last (max) position of new town in same list" so that is how it doesn't matter how many rows you have per town. From 1 to memory limit basically, I think. :)
ALTHOUGH, this also works:
=SUMIF(A:A,A2,C:C)
Yep. Not kidding just drag that down Column D...

Assuming the following structure
This is a very easy task using a pivot table.
For LibreOffice Calc:
Just select the complete data area including the column headers (in my example: A1:C13);
Menu Data -> Pivot Table;
Current selection;
Following settings for Pivot table:
(drag the Town field into the Row Fiels area, and the Count field into the Data fields area. LO Calc will offer to calculate the Sum of the count entries by default).
Hit OK - the resulting pivot table will look like this:
This solution has the advantage that the source data area hasn't to be sorted by town, and it doesn't matter if some towns don't have nine value rows each. Additionally, you don't need any formulas.
EDIT:
You can work with the contents of the pivot table the same way as with calculated results. For example, you could use the pivot table values to calculate the sum for some of the towns (in my example, calculate the sum for town B and C based on the pivot table values B3 and B4 respectively):

You could do this with the MOD function, which gives the remainder of division. You could look at each row number and if its MOD of 11 equals zero, then it's the row you're looking for.
I am counting 10 items in each section in your example so I'm not sure I completely understand. Let's assume you need to sum every row that ends in a 9 (A9, A19, A29, etc.). You can replace the 9's below with an 11.
=ROW(A9) gets you the row number.
=MOD(ROW(A9),9) gets you a TRUE or false on weather that number is divisible by 9. If it is a multiple of 9, it will return the number 0.
Now use the SUM function and hit CONTROL+SHIFT+ENTER to complete it. Note that the formula bar indicates that this is an array function by using curly braces. You don't need to type those in yourself.
{=SUM(A1:A9*(MOD(ROW(A1:A9),9)=0))}

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.

I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.

The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string