I have a table with column AGE with numbers. I wanted to cluster similar number and count - excel

AGE
CARD
SCORE
10
1
20000
10
1
3000
25
0
2000
10
1
20000
18
1
3000
10
0
2000
12
1
20000
10
1
3000
10
0
2000
I want to count Age 10 as 4.
The first two rows (group) should be counted as 1 and 10 appearing in different rows can be added individually and the last two rows (group of age 10) should be counted as 1.

Assuming that data is in a table named "Table1":
=COUNT(1/FREQUENCY(IF(Table1[AGE]=10,ROW(Table1[AGE])),IF(Table1[AGE]<>10,ROW(Table1[AGE]))))

Related

Calculate production capacity per product/day up to goal

I have the following data.
Available resources data per day:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
2
resources
day
3
1
2
3
4
5
6
7
8
9
10
11
12
4
empl.1
8
8
4
2
2
4
4
8
8
5
empl.2
8
4
4
8
4
8
6
empl.3
And different products and it's production per hour (per employee) and the required quantity per part:
P
Q
R
S
2
product
production/hour
required qty
3
4
prod.1
1
60
5
prod.2
1
6
6
prod.3
2
4
From this data I want to calculate the number of products that can be produced per day based on the available employees for the day and the production capacity for that product up until the goal is reached for that product.
edit: calculation from original post was calculating to hours spent per product per day only, not to qty of products produced; also the MOD-part gave wrong calculation results if the daily produced qty exceeds the goal
I use the following formula to calculate the above (used in C11 and dragged to the right):
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
reached,--(prod<$S$4:$S$6),
dayprod,IFERROR(SUM(C4:C6)/SUM(reached*$R$4:$R$6),0)*reached*$R$4:$R$6,
IF(prod+dayprod>$S$4:$S$6,dayprod-((prod+dayprod)-$S$4:$S$6),dayprod))
This results in the following:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
9
product
day
10
1
2
3
4
5
6
7
8
9
10
11
12
11
prod.1
2
8
4
6
2
8
4
0
8
12
6
0
12
prod.2
2
4
0
0
0
0
0
0
0
0
0
0
13
prod.3
4
0
0
0
0
0
0
0
0
0
0
0
This formula sums the hours from the employees available that day and divides their hours over the products that did not reach the goal yet.
If the goal is reached the available hours are divided over the remaining products to produce.
Screenshot of the data + current result:
Now the problem I'm having is the following:
If the goal is reached for a product somewhere halfway the day the dayprod-((prod+dayprod)-$S$4:$S$6)-part of the function calculates the remaining hours of production for that product for that day, but the available hours from the employees are divided over each product that needs production still, but let's take the following example:
prod.1, day 2: value 8
prod.2, day 2: value 4
The 8 for prod.1 is calculated based on both prod.1 & prod.2 in need for production still and both take 1 hour per person to produce one.
Having 16 hours available that day that means a capacity of 8 for each.
But the challenge lies in the goal being reached halfway the day.
In fact the first 4 hours are used by both employees to produce 4 of each product.
The last 4 hours both employees can focus on prod.1 resulting in not qty 4 of production for the last 4 hours, but 4 + 4 which results in a total of 12 being produced for prod.1, not 8 like now calculated.
How can I get the formula to add the remaining time to the remaining products?
Original post, prior to edit, containing error (not calculating to number of products, but to number of hours spent per product per day only)
I use the following formula to calculate the above (used in C11 and dragged to the right):
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
reached,--(prod<$S$4:$S$6),
dayprod,IFERROR(SUM(C4:C6)/SUM(reached*$R$4:$R$6),0)*reached*$R$4:$R$6,
IF(prod+dayprod>$S$4:$S$6,dayprod-MOD(prod+dayprod,$S$4:$S$6),dayprod))
This results in the following:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
9
product
day
10
1
2
3
4
5
6
7
8
9
10
11
12
11
prod.1
2
8
4
6
2
8
4
0
8
12
6
0
12
prod.2
2
4
0
0
0
0
0
0
0
0
0
0
13
prod.3
4
0
0
0
0
0
0
0
0
0
0
0
This formula sums the hours from the employees available that day and divides their hours over the products that did not reach the goal yet.
If the goal is reached the available hours are divided over the remaining products to produce.
Screenshot of the data + current result:
Now the problem I'm having is the following:
If the goal is reached for a product somewhere halfway the day the MOD-part of the function calculates the remaining qty for that product for that day, but the available hours from the employees are divided over each product that needs production still, but let's take the following example:
prod.1, day 2: value 8
prod.2, day 2: value 4
The 8 for prod.1 is calculated based on both prod.1 & prod.2 in need for production still and both take 1 hour per person to produce one.
Having 16 hours available that day that means a capacity of 8 for each.
But the challenge lies in the goal being reached halfway the day.
In fact the first 4 hours are used by both employees to produce 4 of each product.
The last 4 hours both employees can focus on prod.1 resulting in a total of 12 being produced for prod.1, not 8.
I kind of broke my head on getting this far, but from here I could use some help.
How can I get the MOD part of the formula to add the remaining time to the remaining products?
I was able to find a solution to my problem.
I had to use the result from the formula in place and check if the sum up to the current day (including that day's production) exceeds the goal. If so I needed to get the time difference between the day's production and that day's production needed to get to the goal. The difference is the time of production to be added to the remaining part(s) for that day that did not reach the goal yet, also not when adding the day's production.
This results in the following formula in C11 dragged to the right:
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
prodhour,$R$4:$R$6,
goal,$S$4:$S$6,
reached,--(prod<goal),
dayprod,(IFERROR(SUM(C$4:C$6)/SUM(reached*prodhour),0)*reached*prodhour)*prodhour,
preres,IF(prod+dayprod>goal,dayprod-((prod+dayprod)-goal),dayprod),
timecorr,(dayprod*(dayprod<>preres)-preres*(dayprod<>preres))/prodhour,
reachedcorr,reached*(timecorr=0),
dayprodcorr,(IFERROR(SUM(timecorr)/SUM(reachedcorr*prodhour),0)*reachedcorr*prodhour)*prodhour,
IF(prod+dayprod>=goal,dayprod-((prod+dayprod)-goal),dayprod+dayprodcorr))
Where preres is the previous result (from where I got stuck in the opening post).
And the corr parts are taking care of the correction if goal is reached for a product and there was still production time remaining.

pandas - sort columns and group by a particular field

I have a list of objects
[
{
"companyid": long,
"parentid": long
"score": long,
...
}
]
The parentid is nothing but the cid of the parent company
Sample data will look something like this
cid parentid score
1 10 1000
2 10 100
3 10 1001
10 10 20
11 100 1000
12 100 100
100 100 200
111 1000 10
112 1000 100
1000 100 2000
I need to sort the values based on the score, but i want to group the values by parentid
I tried this which didn't really fit my requirements, since it groups then sorts
df.groupby('headcompanyid').apply(lambda x: x.sort_values('score'))
Sorting by score will give this result:
cid parentid score
1000 100 2000
3 10 1001
1 10 1000
11 100 1000
100 100 200
2 10 100
112 1000 100
12 100 100
10 10 20
111 1000 10
Grouping by parentid on the sorted data (which is my end goal), should give this result
cid parentid score
1000 100 2000
11 100 1000 // since 100 is the parentid, it needs to be pushed up the in the result set
100 100 200 // if multiple records are pushed up, then sorting should be based on score
12 100 100
3 10 1001 // 2nd group by is based on 10, since 1001 is the next highest score which
1 10 1000 // doesn't belong to the 100 parentid group
2 10 100
10 10 20
112 1000 100
111 1000 10
i am using pandas v0.24.2 and python 3.7 if it matters
Try this:
df.sort_values(['parentid', 'score'], ascending=[False, False])
Output:
cid parentid score
8 112 1000 100
7 111 1000 10
9 1000 100 2000
4 11 100 1000
6 100 100 200
5 12 100 100
2 3 10 1001
0 1 10 1000
1 2 10 100
3 10 10 20

taking top 3 in a groupby, and lumping rest into 'other category'

I am currently doing a groupby in pandas like this:
df.groupby(['grade'])['students'].nunique())
and the result I get is this:
grade
grade 1 12
grade 2 8
grade 3 30
grade 4 2
grade 5 600
grade 6 90
Is there a way to get the output such that I see the groups of the top 3, and everything else is classified under other?
this is what I am looking for
grade
grade 3 30
grade 5 600
grade 6 90
other (3 other grades) 22
I think you can add a helper column in the df and call it something like "Grouping".
name the top 3 rows with its original name and name the remaining as "other" and then just group by the "Grouping" column.
Can't do much without the actual input data, but if this is your starting dataframe (df) after your groupby -
grade unique
0 grade_1 12
1 grade_2 8
2 grade_3 30
3 grade_4 2
4 grade_5 600
5 grade_6 90
You can do a few more steps to get to your table -
ddf = df.nlargest(3, 'unique')
ddf = ddf.append({'grade': 'Other', 'unique':df['unique'].sum()-ddf['unique'].sum()}, ignore_index=True)
grade unique
0 grade_5 600
1 grade_6 90
2 grade_3 30
3 Other 22

How do I replicate rows a specific number of times according to a condition?

I'm trying to create a dataframe for a game simulation to calculate how many points each player would make according to a set of parameters.
I have this dataframe:
PLAYER TYPE Quantity in my base STRENGTH POWER Number of Matches (min) \
0 A 2 15 200 3
1 B 3 80 20 0
Number of Matches (max)
0 5
1 2
df
Each row in this df represents one type of player. On column "Quantity in my base" I have the number of times each type of player appears in my base and on columns "Number of Matches" the minimum and maximum number of matches they each type of player is expected to play in one day.
I need to replicate the rows for each type of players with their respective "Strength" and "Power" a number of times that is = to "Quantity in my base" times a random number between the min and max number of matches of each one. I'm doing this so that, on the new data frame, each row will represent one match per each specific player in my base.
For instance. If
PLAYERTYPE Quantity_in_my_base Rand_Num_Matches Number_of_rows
0 A1 2 4 8
1 A2 3 3 9
Number of rows to be replicated
Than I want to create a second df like this:
PLAYERTYPE STRENGTH POWER
0 A 15 200
1 A 15 200
2 A 15 200
3 A 15 200
4 A 15 200
5 A 15 200
6 A 15 200
7 A 15 200
8 A 15 200
9 A 15 200
10 A 15 200
11 A 15 200
12 A 15 200
13 A 15 200
14 A 15 200
15 A 15 200
16 A 15 200
New df
But I want to do this for players A1, A2 and B1, B2, B3 and so on, in a way that each one will be replicating according to their respective random number.
Thank you so much!
You could use .repeat() ;
repeat_df = df.loc[df.index.repeat(df['Number of Matches'])]
repeat_df[['PLAYERTYPE', 'STRENGTH', 'POWER']]

Returning total sum of value for each month. VBA

I need to produce a total value for each month of the year from a generated report. Data is split into colunms one with a value you the other with a date.
I need to return a total for each month.
Data is output as such:
100 21/01/2019
200 21/06/2019
150 01/01/2019
300 14/09/2019
8 08/05/2019
I need it to return as
1 2 3 4 5 6 7 8 9 10 11 12
250 0 0 0 8 200 0 0 300 0 0 0
With a further column for the following year. The original data and dates can be removed as this can be reproduced when running the next report.
You could try the below:
Add a helper column next to you date to get the month of the date:
=MONTH(B3)
and use:
=SUMPRODUCT(($C$3:$C$7=F2)*($A$3:$A$7))
Results:

Resources