Count occurrences of multiple values in Excel - excel

IF I have a table like this:
[1968][?]
[1968][?]
[1968][?]
[1969][?]
[1969][?]
[1970][?]
[1970][?]
I want to count the number of times each year occurs in the next column.
=COUNTIF(A1:A7,"1968")
How can I do this automatically for each year? (because the table is not this small).

I reduced your example data by one row with year 1969 for better result display
[1968][=COUNTIF($A$1:$A$6, $A1)]
[1968][=COUNTIF($A$1:$A$6, $A2)]
[1968][=COUNTIF($A$1:$A$6, $A3)]
[1969][=COUNTIF($A$1:$A$6, $A4)]
[1970][=COUNTIF($A$1:$A$6, $A5)]
[1970][=COUNTIF($A$1:$A$6, $A6)]
Just enter the formula in B1 and drag-copy it until end of the column B.
Results in
[1968][3]
[1968][3]
[1968][3]
[1969][1]
[1970][2]
[1970][2]

Related

Sort data contained in blocks in excel

I have a large amount of reference data in excel, which I am trying to manipulate in a variety of ways. I'm having some problems with the way it is structured and sorting into a more manageable format.
Problem number 1:
I have three columns. Column A contains first a date, and then a designator of high or low. Column B contains times, Column C contains heights.
I would like to sort the data by column B (easy enough) EXCEPT I would like the date headings in Column A preserved. It's almost as though I have 365 tables, each with between 3 and 5 pieces of data - I'm looking to sort the 3 - 5 pieces of data within each date only.
This is what I have currently:
There's no issue with me taking the data and manipulating it some other way first - this is ultimately around me being able to take a batch of data (5x different reference points, each for 365 days) and develop a process to sanitise it and get it displayed in time order, as well as being able to get it into a usable format for problem 2 (I need to adjust some other data points by the sorted data once I have it).
This is what I would like it to look like (I manually went through each of these blocks and sorted them):
It is possible to do it in Excel as follows in cell E2:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(date, DROP(SORT(FILTER(in, inDates = date),3),,1), {"","",""}),
VSTACK(acc, sorted)
))), IFERROR(DROP(DROP(out,1),-1),"")
)
Here is the output:
You can avoid the clean-up process except for removing the last row as follow:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(HSTACK(date,"",""), DROP(SORT(FILTER(in, inDates = date),3),,1),
{"","",""}), IF(MAX(LEN(acc))=0, sorted, VSTACK(acc, sorted))
))), DROP(out, -1)
)
Explanation
Basically is to carry out the manual steps but using excel functions. The name set, is the same as the input data (rng) but we removed the empty rows. The name dates, is a column with the same size as rng, repeating all the dates. The condition in the SCAN function to identify a new date is ISNUMBER because dates are stored in Excel as whole numbers. The name in has the data in the format we want for doing the sorting and filter by date removing the date header and adding as the first column the dates.
Now we use DROP/REDUCE/VSTACK pattern (check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by David Leal) to append each sorted data for a given unique date. We add the date as the first row, then sorted data, and finally an empty row to separate each group of data. Finally, we do a clean-up via IFERROR/DROP to remove the #N/A values and the first and the last empty row.

How can I extract a total count or sum of agents who made their first sale in a specified month?

I am trying to extract some data out of a large table of data in Excel. The table consists of the month, the agent's name, and either a 1 if they made a sale or a 0 if they did not.
What I would like to do is plug in a Month value into one cell, then have it spit out a count of how many agents made their first sale that month.
Sample Data and Input Output area
I found success by creating a secondary table for processing a minif and matching to agent name, then countif in that table's data how many sales months matched the input month. However I would like to not have a secondary table and do this all in one go.
=IF(MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1)=0,"No Sales",MINIFS(E2ERawData[Date Group],E2ERawData[Agent],'Processed Data'!B4,E2ERawData[E2E Participation],1))
=COUNTIFS(ProcessedData[Month of First E2E Sale],H4)
Formula in column F is:
=MAX(0;COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1)-SUM(COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;IF($A$2:$A$8=E3;$B$2:$B$8))))
This is how it works (we'll use 01/03/2022 as example)
COUNTIFS($A$2:$A$8;E3;$C$2:$C$8;1) This counts how many 1 there are for the proper month (in our example this part will return 2)
COUNTIFS($A$2:$A$8;"<"&E3;$C$2:$C$8;1;$B$2:$B$8;SI($A$2:$A$8=E3;$B$2:$B$8)) will count how many 1 you got in previous months of the same agents (in our example, it will return 1)
Result from step 2, because it's an array formula, we sum up using SUM() (in our example, this return 1)
We do result from step 1 minus result from step 3 (so we get 1)
Finally, everything is inside a MAX function to avoid negative results (February would return -1 because there were no sales at all and agent B did a sale on January, so it would return -1. To avoid this, we force Excel to show biggest value between 0 and our calculation)
NOTE: Because it's an array formula, depending on your Excel version maybe you must introduce pressing CTRL+ENTER+SHIFT
If one has got access to the newest functions:
=LET(X,UNIQUE(C3:C9),VSTACK({"Month","Total of First time sales"},HSTACK(X,BYROW(X,LAMBDA(a,SUM((C3:C9=a)*(MINIFS(C3:C9,D3:D9,D3:D9,E3:E9,1)=C3:C9)))))))

Excel - Using start_date and end_date values with {=SUM(IF ...)}

I am trying to count data in a range named 'democracy' based on a date start and date end. I have attempted to get this done, but can't seem to fit the pieces together
Component 1
In the SumIf I have:
{=SUM(IF((democracy_highlighted=1)+(democracy_shown=1),1,0))}
The named range democracy contains 2 columns, "highlighted" and "shown", this works in checking both the columns for the value of 1 to be present in one of them.
Component 2
In the countif I have:
=COUNTIFS(democracy_shown,1,DateList,">="&$B$3, DateList,"<="&$B$4)
=COUNTIFS(democracy_highlighted,1,DateList,">="&$B$3, DateList,"<="&$B$4)
This shows data between the start and end dates.
What I need to do is use the {=SUM(IF ...)} component, and limit the results based on start date and end date values?
You could use SUMPRODUCT - something like:
=SUMPRODUCT(SIGN(((democracy_highlighted=1)+(democracy_shown=1)))*(DateList>=$B$3)*( DateList<=$B$4))

Excel AVERAGEIFS looking up ONE of the criteria columns

I have built a large data set and I need to see the average results given many different criteria. I've done this with the AVERAGEIFS function and it works just fine, however the more and more I add its getting really time intensive.
I'm wondering if there is a way to nest a vlookup or index match or anything like that in the AVERAGEIFS that read the criteria column heading and criteria in a cell (or 2 if they need to be separated) to be added to the AVERAGEIFS.
Here is an example of my spreadsheet:
The first 3 sets of criteria I want to stay locked.
I want it to read what the 4th criteria column and criteria should be by referencing the I11 cell. The highlighted portion in the formula bar is the part that I want to reference I11 so it reads it and knows that the 4th criteria is the 'code' column and the criteria is '>7'. I can separate this into 2 separate cells if need be.
I've tried a few combinations of VLOOKUP and INDEX MATCH but cannot get it to work.
Data as Text:
Price,Type,sub cat,Time,code,amount,Result,,
,,,,,,,,
9.95,t2,d,ac,2.18," 22,780,893 ",0.73,,T2 and D and AC
118.94,u2,d,bo,2.78," 172,110,893 ",4.07,,
57.63,t1,u,ac,7.09," 128,419,877 ",-2.16,,code
8.88,t2,d,ac,1.50," 62,634,868 ",12.72,,amount < 100 000 000
11.61,u1,u,ac,2.14," 146,982,736 ",1.07,,price >10
13.46,u3,u,ac,0.93," 17,513,672 ",-13.93,,
31.53,t1,u,ac,0.89," 47,170,877 ",1.39,,
16.34,t3,d,bo,1.07," 1,914,767,076 ",-1.42,,
111.59,u1,d,bo,0.62," 2,283,546,000 ",0.67,,
72.4,u3,d,bo,10.37," 951,541,514 ",1.13,,
34.55,u3,d,bo,0.77," 951,541,514 ",-2.52,,
42.25,t1,d,bo,1.05," 63,748,352 ",8.88,,
17.18,u3,u,ac,2.64," 140,217,257 ",4.35,,
97.66,t1,d,bo,3.45," 1,070,383,954 ",1.33,,
58.49,t2,u,bo,8.64," 151,876,559 ",-0.92,,
64.48,t2,d,ac,2.35," 291,967,334 ",3.03,,
38.4,t1,u,ac,17.05," 83,478,472 ",-4.31,,
20.87,u3,d,ac,28.92," 214,080,937 ",-2.16,,
36.53,t1,d,ac,1.43," 73,438,589 ",-2.07,,
89.16,t3,u,ac,1.41," 26,786,958 ",-1.75,,
15.84,t1,u,bo,2.90," 133,560,818 ",1.76,,
3.2,u3,u,bo,2.95," 215,677,667 ",-1.06,,
25.46,t1,d,bo,3.92," 57,148,431 ",1.89,,
40,t2,d,ac,8.00," 65,274,903 ",0.61,,
27.72,t1,u,ac,2.50," 381,400,886 ",6.46,,
29.07,u3,u,ac,2.32," 52,632,107 ",-0.78,,
173.31,t1,d,ac,3.58," 31,547,380 ",-4.92,,
18.22,u3,d,ac,0.58," 292,669,493 ",4.06,,
9.59,t1,d,bo,3.60," 266,883,020 ",3.16,,
115.22,t2,d,bo,4.51," 132,376,476 ",0.78,,
64.48,u3,d,ac,3.03," 338,360,104 ",-0.95,,
41.74,t1,u,bo,25.65," 245,766,436 ",-3.42,,
5.99,t3,u,bo,2.15," 175,054,713 ",-4.37,,
Use INDEX/MATCH to return the correct column. This will require that you separate the column name and the criteria:
=AVERAGEIFS(G:G,B:B,"T2",C:C,"D",D:D,"AC",INDEX(A:F,0,MATCH(I11,$A$7:$G$7,0)),J11)
An idea:
I10 - "Write down the limitation. (You have to use <,>,=,<> AND the value, for e.g.: <5)"
I11 - The user can use relations and values.
In J11, you can reference to I11 ;) It works for me.

Find values occurring in multiple columns in excel

I have sets of gene probes that are upregulated when put under different chemical stresses. Each column contains all of the upregulated gene probes. I have 12 columns, how do I get a list of gene probes that appear in all 12 columns?
I've been able to find similarities between two columns using the formula
=IF(ISERROR(MATCH(A2,$C$2:$C$21473,0)),"",A2)
but cant work out how to adapt it to include 12 columns
G.Ac G.As G.At G.Ac.At G.As.Ac G.As.At G.Cd G.Cu G.Ni
G.Cd.Cu G.Cd.Ni G.Ni.Cu
GENE:JGI_V11_3346220103 GENE:JGI_V11_2653050203 GENE:JGI_V11_3299790103
GENE:JGI_V11_359040103 GENE:JGI_V11_2228010103 GENE:JGI_V11_2662750203
GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303 GENE:JGI_V11_3119540303
GENE:JGI_V11_3134270203 GENE:JGI_V11_1926920303 GENE:JGI_V11_3134270303
GENE:JGI_V11_3164760203 GENE:JGI_V11_565470303 GENE:JGI_V11_2296170203
GENE:JGI_V11_2045300203 GENE:JGI_V11_2421620203 GENE:JGI_V11_2228010303
GENE:JGI_V11_2196580303 GENE:JGI_V11_3134270203 GENE:JGI_V11_3119540203
GENE:JGI_V11_1926920103 GENE:JGI_V11_1926920103 GENE:JGI_V11_1014720202
GENE:JGI_V11_478830203 GENE:JGI_V11_3168730303 GENE:JGI_V11_3311070202
GENE:JGI_V11_3216620102 GENE:JGI_V11_2653050303 GENE:JGI_V11_3300140202
GENE:JGI_V11_2653050303 GENE:JGI_V11_1159220202 GENE:JGI_V11_2024180303
GENE:JGI_V11_1926920303 GENE:JGI_V11_2196580303 GENE:JGI_V11_1159220202
GENE:JGI_V11_3164760303 GENE:JGI_V11_2228010203 GENE:JGI_V11_2341670203
GENE:JGI_V11_1938910303 GENE:JGI_V11_3026230203 GENE:JGI_V11_2449230203
GENE:JGI_V11_3134270303 GENE:JGI_V11_2235750203 GENE:JGI_V11_1981410203
GENE:JGI_V11_3251310202 GENE:JGI_V11_977750103 GENE:JGI_V11_954070203
GENE:JGI_V11_2267320203 GENE:JGI_V11_2268000303 GENE:JGI_V11_2226270101
GENE:JGI_V11_3003640303 GENE:JGI_V11_223520203 GENE:JGI_V11_2662750103
GENE:JGI_V11_2228010103 GENE:JGI_V11_3251310202 GENE:JGI_V11_3198630203
GENE:JGI_V11_3134270303 GENE:JGI_V11_1926920203 GENE:JGI_V11_287750103
GENE:JGI_V11_465160203 GENE:JGI_V11_2268000203 GENE:JGI_V11_2473230303
GENE:JGI_V11_3192220102 GENE:JGI_V11_3026230303 GENE:JGI_V11_3039310303
GENE:JGI_V11_1926920103 GENE:JGI_V11_1159220102 GENE:JGI_V11_3052790202
GENE:JGI_V11_3075830303 GENE:JGI_V11_2196580203 GENE:JGI_V11_3134280203
GENE:JGI_V11_3142970303 GENE:JGI_V11_503720303 GENE:JGI_V11_2236410103
GENE:JGI_V11_3042230103 GENE:JGI_V11_2228010203 GENE:JGI_V11_3028210101
GENE:JGI_V11_2105710303 GENE:JGI_V11_1926920303 GENE:JGI_V11_2131620103
GENE:JGI_V11_1002840203 GENE:JGI_V11_2088480203 GENE:JGI_V11_3196120102
Heres the first 8 rows of the 12 columns. There are 21473 rows in total.
Thanks
You could use an array formula like this to count how many columns a particular gene probe occurs in
=SUM(--(MMULT(TRANSPOSE(ROW(A$2:L$10000)^0),N(A$2:L$10000=A2))>0))
This is a standard way of getting column totals for a 2D array - in this case an array of true/false values corresponding to instances of an array element being equal/unequal to A2.
It is rather a brute force approach - it needs ~120K multiplications for each row. If you copy the formula down for ~10K rows, there is a delay of ~100 seconds on my computer while Excel works out the results.
Must be entered as an array formula using CtrlShiftEnter
In this dummy data C is the only value that occurs in all 12 columns.

Resources