I want to write a formula for a large data chart. The criteria which I have to choose is on rows and columns.
I attach the file with the manually written calculus.
|PRODUCT|01-feb|02-feb|03-feb|04-feb|05-feb|06-feb|07-feb|08-feb|09-ef|10-feb|11-feb|feb-12|
|PRODUCT 1|4|3|1|5|2|9|1|3|5|8|0|5|
|PRODUCT 3|2|5|7|4|4|8|3|5|7|4|4|8|
|PRODUCT 1|1|0|5|3|1|1|8|0|5|3|1|1|
|PRODUCT 2|5|4|6|6|0|7|4|4|6|6|0|7|
|PRODUCT 5|8|7|8|7|1|9|2|7|8|7|1|9|
|PRODUCT 4|4|2|9|3|5|1|7|2|9|3|5|1|
|PRODUCT 1|9|8|1|4|4|6|5|8|1|4|4|6|
|PRODUCT 2|6|4|4|7|2|8|6|4|4|7|2|8|
|PRODUCT 5|2|6|1|8|3|9|3|6|1|8|3|9|
|PRODUCT 3|3|9|5|1|7|4|7|9|5|1|7|4|
|PRODUCT 4|7|6|5|5|8|2|1|6|5|5|8|2|
The compact chart that I have to get:
|PRODUCT|04-feb|08-feb|12-feb|
|PRODUCT 1|44|48|43|
|PRODUCT 2|42|35|40|
|PRODUCT 3|36|47|40|
|PRODUCT 4|41|32|38|
|PRODUCT 5|47|40|46|
The formula that it should works:
=SUMAR.SI.CONJUNTO(C5:N15,B5:B15,H20,C4:N4,"=<"&J19)
because I want to show a range of date between 01-feb to 04-feb from the first chart in the new column 04-feb.
Please, help me.
The following might help you. The formula in the upper left cell of the table of the summary is
{=SUM((($B$1:$M$1<=B$14)*($B$1:$M$1>=A$14)*$B$2:$M$13)*($A15=$A$2:$A$13))}
and can be copied over to the over cells. The 31.01 in the summary table is used as a "helper cell", so that you don't have to alter the formula for the different cells.
Product 01. Feb 02. Feb 03. Feb 04. Feb 05. Feb 06. Feb 07. Feb 08. Feb 09. Feb 10. Feb 11. Feb 12. Feb
Product1 5 2 3 3 5 5 3 3 5 3 3 5
Product3 5 4 2 4 5 1 5 3 3 5 3 3
Product4 3 1 2 2 4 5 5 1 5 5 1 5
Product1 4 1 4 3 4 1 4 1 3 4 1 3
Product3 1 2 2 4 5 2 5 1 1 5 1 1
Product4 3 2 4 1 1 4 3 5 2 3 5 2
Product1 4 3 5 1 1 1 2 2 2 2 2 2
Product3 3 2 4 3 5 1 1 1 4 1 1 4
Product4 2 1 4 2 2 1 4 4 3 4 4 3
Product1 4 5 5 2 3 4 3 4 5 3 4 5
Product3 4 2 3 1 4 1 1 3 1 1 3 1
Product4 3 5 3 3 1 4 1 1 3 1 1 3
31. Jan 04. Feb 08. Feb 12. Feb
Product1 54 55 62
Product2 0 0 0
Product3 46 56 46
Product4 41 54 61
Product5 0 0 0
You can use sumproduct for this. B2:E12 is the range of data for Feb 1 though Feb 4, and O2 is equal to the criteria you are searching for. So in my case O2 was equal to Product 1. When you want the range for Feb 8, just change B2:E12 to the range of data corresponding to Feb 5 to Feb 8.
=SUMPRODUCT(B2:E12*(A2:A12=O2))
Related
As the title says it, my dataframe looks as follows:
ID
Follow up month
Value-x
value -y
1
0
12
12
1
0
11
14
2
0
10
11
2
3
11
0
2
0
12
1
1
3
13
12
2
3
11
5
I want to add another column called timepoint which would make the table look like as follows:
ID
Follow up month
Value-x
value -y
Timepoint
1
0
12
12
1
1
0
11
14
1
2
0
10
11
1
2
3
11
0
2
2
0
12
1
1
1
3
13
12
2
2
3
11
5
2
2
3
11
0
2
2
0
12
1
1
1
3
13
12
2
2
3
11
5
2
So far I tried to group the rows by their ID and follow up month and then apply a timepoint using cumcount. This didn't give me any results any help on how to handle this would be appreciated.
From your table I can only infer that you want to create the Timepoint column based on the corresponding values in Follow up month, which will look like:
from io import StringIO
import pandas as pd
wt = StringIO("""ID Follow up month Value-x value -y
1 0 12 12
1 0 11 14
2 0 10 11
2 3 11 0
2 0 12 1
1 3 13 12
2 3 11 5""")
df = pd.read_csv(wt, sep='\s\s+')
df['Timepoint'] = df['Follow up month'].apply(lambda x: 1 if x==0 else 2)
df
Output:
ID Follow up month Value-x value -y Timepoint
0 1 0 12 12 1
1 1 0 11 14 1
2 2 0 10 11 1
3 2 3 11 0 2
4 2 0 12 1 1
5 1 3 13 12 2
6 2 3 11 5 2
Edit
Based on your comment, this should be what you want:
def timepoint(s):
if not s.isin([0]).any() and s.iloc[0] == 3:
return 1
else:
return s.apply(lambda x: 1 if x==0 else 2)
df['Timepoint'] = df.groupby('ID')['Follow up month'].transform(timepoint)
This question already has an answer here:
Pandas - Fill N rows for a specific column with a integer value and increment the integer there after
(1 answer)
Closed 1 year ago.
Given a dataframe df as follows:
df = pd.DataFrame({'Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Value': [11, 8, 10, 15, 110, 60, 100, 40]})
Out:
Date Sym Value
0 2015-05-08 aapl 11
1 2015-05-07 aapl 8
2 2015-05-06 aapl 10
3 2015-05-05 aapl 15
4 2015-05-08 aaww 110
5 2015-05-07 aaww 60
6 2015-05-06 aaww 100
7 2015-05-05 aaww 40
I hope to create a new column Group to indicate groups with a range of integers starting from 1, each group should have 3 rows, except for the last group which may have less than 3 rows.
The final result will like this:
Date Sym Value Group
0 2015-05-08 aapl 11 1
1 2015-05-07 aapl 8 1
2 2015-05-06 aapl 10 1
3 2015-05-05 aapl 15 2
4 2015-05-08 aaww 110 2
5 2015-05-07 aaww 60 2
6 2015-05-06 aaww 100 3
7 2015-05-05 aaww 40 3
How could I achieve that with Pandas or Numpy? Thanks.
My trial code:
n = 3
for g, df in df.groupby(np.arange(len(df)) // n):
print(df.shape)
You are close, assign output from groupby to new column and add 1:
n = 3
df['Group'] = np.arange(len(df)) // n + 1
print (df)
Date Sym Value Group
0 2015-05-08 aapl 11 1
1 2015-05-07 aapl 8 1
2 2015-05-06 aapl 10 1
3 2015-05-05 aapl 15 2
4 2015-05-08 aaww 110 2
5 2015-05-07 aaww 60 2
6 2015-05-06 aaww 100 3
7 2015-05-05 aaww 40 3
import pandas as pd
import numpy as np
import ast
pd.options.display.max_columns = 20
I have dataframe column season that looks like this (first 20 entries):
season
0 2006-07
1 2007-08
2 2008-09
3 2009-10
4 2010-11
5 2011-12
6 2012-13
7 2013-14
8 2014-15
9 2015-16
10 2016-17
11 2017-18
12 2018-19
13 Career
14 season
15 2018-19
16 Career
17 season
18 2017-18
19 2018-19
It starts with season and ends with Career. I want to replace years with numbers starting with 1 and ending when there's career. I want to be like this:
season
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 Career
14 season
15 1
16 Career
17 season
18 1
19 2
So counting should reset every time there's season in column and end every time there's career.
Create consecutive groups by compare mask created by Series.isin with shifted values with GroupBy.cumcount for counter:
s = df['season'].isin(['Career', 'season'])
df['new'] = np.where(s, df['season'], df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1)
print (df)
season new
0 2006-07 1
1 2007-08 2
2 2008-09 3
3 2009-10 4
4 2010-11 5
5 2011-12 6
6 2012-13 7
7 2013-14 8
8 2014-15 9
9 2015-16 10
10 2016-17 11
11 2017-18 12
12 2018-19 13
13 Career Career
14 season season
15 2018-19 1
16 Career Career
17 season season
18 2017-18 1
19 2018-19 2
For replace column season:
s = df['season'].isin(['Career', 'season'])
df.loc[~s, 'season'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1
print (df)
season
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 Career
14 season
15 1
16 Career
17 season
18 1
19 2
Following is the portion of my table in Excel:
A B C D E
5 10 1 18316 3
5 11 1 18313 3
5 11 2 18002 3
5 11 3 10825 3
5 12 1 18316 3
5 12 2 18001 3
5 12 3 10825 3
5 13 1 18313 3
5 13 2 18002 3
5 14 1 18316 3
5 14 2 18001 3
5 14 3 18002 3
5 15 1 18313 3
5 16 1 18316 3
5 16 2 18002 3
5 16 3 18313 3
5 17 1 18313 3
5 17 2 18002 3
5 17 3 18316 3
5 20 1 18313 3
5 21 1 18316 3
5 21 2 18001 3
5 21 3 18313 3
15 10 1 47009 3
15 10 2 40802 3
15 11 1 47009 3
15 12 1 47010 3
15 12 2 47009 3
15 13 1 47009 3
15 13 2 47010 3
15 14 1 47010 3
What I want to achieve is the following:
To be able to calculate the count of a number in column D for every unique B and A with respect to C (if D is at the Max of C or not)
Output something like:
Filter: 18001 on Column D
5
12 1 Non-Max
14 1 Non-Max
21 1 Non-Max
Similarly if the filter is changed to 18316:
5
10 1 Max
12 1 Non-Max
14 1 Non-Max
16 1 Non-Max
17 1 Max
21 1 Non-Max
I have 20K rows of data that needs processing.
I seem to be able to achieve close to the results you indicate from the data you have provided - but have no idea what you mean by "for every unique B and A with respect to C (if D is at the Max of C or not)". I applied a PivotTable as below:
Max and Non-Max being indicated by the relationship between Count of E and Max of C - which could be used in a simple formula to display Max or Non-Max outside the PivotTable.
Suppose i have a file like this...
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2
the first and the last line are the same in the input...
I want the output to be like ...
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2 1
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3 1
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2 1
The extra last coloum in the output simply specifies the extra number of lines.....
how can i do this in bash...
i know the sort command but it only works with one number per line....
Coming from sehe's suggestion, what about this?
sort your_file | uniq -c | awk '{for(i=2;i<=NF;i++) printf $i"\t"; printf $1"\n"}'
Output:
1 3 3 3 17 3 19 3 26 2 28 2 30 2 32 2 1
2 3 10 3 12 2 16 2 18 3 20 2 24 2 26 3 1
4 2 8 2 12 3 18 2 22 2 26 2 28 3 30 2 2
4 3 10 2 14 2 18 2 20 3 22 2 28 2 32 2 1