Difference between value from multiple sheets - excel

I would like to find the difference between the matched serial numbers from multiple excel sheets
sheet 1
June July
B 10 20
A 50 90
Sheet 2
June July
A 6 3
C 5 9
B 10 5
Sheet 3(results)
June July
A 44 87
B 0 15

Related

How to find again the index after pivoting dataframe?

I created a dataframe form a csv file containing data on number of deaths by year (running from 1946 to 2021) and month (within year):
dataD = pd.read_csv('MY_FILE.csv', sep=',')
First rows (out of 902...) of output are :
dataD
Year Month Deaths
0 2021 2 55500
1 2021 1 65400
2 2020 12 62800
3 2020 11 64700
4 2020 10 56900
As expected, the dataframe contains an index numbered 0,1,2, ... and so on.
Now, I pivot this dataframe in order to have only 1 row by year and months in column, using the following code:
dataDW = dataD.pivot(index='Year', columns='Month', values='Deaths')
The first rows of the result are now:
Month 1 2 3 4 5 6 7 8 9 10 11 12
Year
1946 70900.0 53958.0 57287.0 45376.0 42591.0 37721.0 37587.0 34880.0 35188.0 37842.0 42954.0 49596.0
1947 60453.0 56891.0 56442.0 45121.0 42605.0 37894.0 38364.0 36763.0 35768.0 40488.0 41361.0 46007.0
1948 46161.0 45412.0 51983.0 43829.0 42003.0 37084.0 39069.0 35272.0 35314.0 39588.0 43596.0 53899.0
1949 87861.0 58592.0 52772.0 44154.0 41896.0 39141.0 40042.0 37372.0 36267.0 40534.0 47049.0 47918.0
1950 51927.0 47749.0 50439.0 47248.0 45515.0 40095.0 39798.0 38124.0 37075.0 42232.0 44418.0 49860.0
My question is:
What do I have to change in the previous pivoting code in order to find again the index 0,1,2,..etc. when I output the pivoted file? I think I need to specify index=*** in order to make the pivot instruction run. But afterwards, I would like to recover an index "as usual" (if I can say), exactly like in my first file dataD.
Any possibility?
You can reset_index() after pivoting:
dataDW = dataD.pivot(index='Year', columns='Month', values='Deaths').reset_index()
This would give you the following:
Month Year 1 2 3 4 5 6 7 8 9 10 11 12
0 1946 70900.0 53958.0 57287.0 45376.0 42591.0 37721.0 37587.0 34880.0 35188.0 37842.0 42954.0 49596.0
1 1947 60453.0 56891.0 56442.0 45121.0 42605.0 37894.0 38364.0 36763.0 35768.0 40488.0 41361.0 46007.0
2 1948 46161.0 45412.0 51983.0 43829.0 42003.0 37084.0 39069.0 35272.0 35314.0 39588.0 43596.0 53899.0
3 1949 87861.0 58592.0 52772.0 44154.0 41896.0 39141.0 40042.0 37372.0 36267.0 40534.0 47049.0 47918.0
4 1950 51927.0 47749.0 50439.0 47248.0 45515.0 40095.0 39798.0 38124.0 37075.0 42232.0 44418.0 49860.0
Note that the "Month" here might look like the index name but is actually df.columns.name. You can unset it if preferred:
df.columns.name = None
Which then gives you:
Year 1 2 3 4 5 6 7 8 9 10 11 12
0 1946 70900.0 53958.0 57287.0 45376.0 42591.0 37721.0 37587.0 34880.0 35188.0 37842.0 42954.0 49596.0
1 1947 60453.0 56891.0 56442.0 45121.0 42605.0 37894.0 38364.0 36763.0 35768.0 40488.0 41361.0 46007.0
2 1948 46161.0 45412.0 51983.0 43829.0 42003.0 37084.0 39069.0 35272.0 35314.0 39588.0 43596.0 53899.0
3 1949 87861.0 58592.0 52772.0 44154.0 41896.0 39141.0 40042.0 37372.0 36267.0 40534.0 47049.0 47918.0
4 1950 51927.0 47749.0 50439.0 47248.0 45515.0 40095.0 39798.0 38124.0 37075.0 42232.0 44418.0 49860.0

Aggregate values based on lookup IDs while meeting multiple criteria

I am given sales data as per "sheet 1" below. Each row contains the quarter, the product, the region ID, and the sales figure. Each region can have multiple IDs, and I have a lookup table (sheet 2) which denote which region each ID belongs to.
My goal is to get to sheet 3. Essentially I am trying to write a formula that will reference the product name in column A, the quarter in cell A1 (which the user will input), and aggregate the relevant sales figures under each region in column B.
I have tried nesting an INDEX & MATCH function within a SUMIFS within a SUMPRODUCT as per below, but I am getting a #VALUE! error:
=SUMPRODUCT(SUMIFS(INDEX(sheet1!$D:$D$,MATCH(1,($A4=sheet1!$B:$B)*($A$1=Sheet1!$A:$A),0),0),sheet1$C:$C,sheet2!$A$2:$A$8)*(sheet2!$B$2:$B$8=$B4))
Does anyone know what is wrong with my formula, or if there is a better approach to this problem?
Sheet 1 (Raw data)
A
B
C
D
1
Quarter
Product
ID
2021 Sales
2
Q1
A
1
39
3
Q1
A
3
41
4
Q1
A
7
20
5
Q1
A
14
7
6
Q1
A
25
2
7
Q1
A
27
2
8
Q1
A
44
45
9
Q1
B
1
28
10
Q1
B
3
34
11
Q1
B
7
29
12
Q1
B
14
48
13
Q1
B
25
5
14
Q1
B
27
15
15
Q1
B
44
32
16
Q2
A
1
19
17
Q2
A
3
28
and so forth…
Sheet 2 (region ID lookup table)
A
B
1
ID
Region
2
1
East
3
3
East
4
7
Central
5
14
Central
6
25
Central
7
27
West
8
44
West
Sheet 3 (Report)
A
B
C
1
Q1
2
3
Product
Region
Sales
4
A
East
29
5
A
Central
42
6
A
West
31
A little rework:
=SUMPRODUCT(SUMIFS(Sheet1!D:D,Sheet1!C:C,IF(Sheet2!$B$2:$B$8=$B4,Sheet2!$A$2:$A$8),Sheet1!A:A,$A$1,Sheet1!B:B,A4))
Depending on version one may need to confirm with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
With the dynamic array formula FILTER we can replace the IF() part with FILTER():
=SUMPRODUCT(SUMIFS(Sheet1!D:D,Sheet1!C:C,FILTER(Sheet2!$A$2:$A$8,Sheet2!$B$2:$B$8=$B4),Sheet1!A:A,$A$1,Sheet1!B:B,A4))
And it will save a few iterations.

How to select a set of values in pandas data frame (multiple colums with multiple row conditions)

I have a huge ass csv file like given below which I opened as dataframe using pandas. I want to extract data from multiple columns at different date sets.
I want to select from a particular date and hour to another for the last 3 column values. The slicing options I tried and googled were for single column.
date heure PM10 NO2 O3
0 01/01/2016 1 27 22 36
1 01/01/2016 2 25 29 27
2 01/01/2016 3 26 47 10
3 01/01/2016 4 16 40 13
4 01/01/2016 5 15 34 13
5 02/01/2016 1 15 34 13
6 02/01/2016 2 15 34 13
Target output - taking data from a particular data and hour to another one.
3 01/01/2016 4 16
4 01/01/2016 5 15
Thank you. The data set is obviously way bigger than 4 No.
You can do this:
df_selected = df[(df.date >= "01/01/2016") &
(df['hour']>=4) &
(df.date < "02/01/2016") &
(df['hour']<6)
].iloc[:,:3] #first three columns
Alternatively, for the columns selection you can use .loc[:,['name', 'of', 'columns']] or for the last n columns .iloc[:,-n:].
Be careful with date because I'm not sure what happens with an "English" date, maybe you have to change the date using df['date'] = pd.to_datetime(df.date).

Excel indexmatch, vlookup

I have a holiday calendar for several years in one table. Can anyone help – How to arrange this data by week and show holiday against week? I want to reference this data in other worksheets and hence arranging this way will help me to use formulae on other sheets. I want the data to be: col A having week numbers and column B showing holiday for year 1, col. C showing holiday for year 2, etc.
Fiscal Week
2015 2014 2013 2012
Valentine's Day 2 2 2 3
President's Day 3 3 3 4
St. Patrick's Day 7 7 7 7
Easter 10 12 9 11
Mother's Day 15 15 15 16
Memorial Day 17 17 17 18
Flag Day 20 19 19 20
Father's Day 21 20 20 21
Independence Day 22 22 22 23
Labor Day 32 31 31 32
Columbus Day 37 37 37 37
Thanksgiving 43 43 43 43
Christmas 47 47 47 48
New Year's Day 48 48 48 49
ML King Day 51 51 51 52
It's not too clear what year 1 is, so I'm going to assume that's 2015, and year 2 is 2014, etc.
Here's how you could set it up, if I understand correctly. Use this index/match formula (psuedo-formula):
=Iferror(Index([holiday names range],match([week number],[2015's week numbers in your table],0)),"")
It looks like this:
(=IFERROR(INDEX($A$3:$A$17,MATCH($H3,B$3:B$17,0)),""), in the cell next to the week numbers)
You can then drag the formula over, and the matching group (in above picture, B3:B17) will "slide over" as you drag the formula over.

Sum number according to date and name in excel

To sum the third column (numbers o companies) I've used this
=SUM(1/COUNTIF(Names;Names))
Names is name of array in C column and CTRL+SHIFT+ENTER and it works perfectly.
Now I'd like to sum earnings but only for each company once and with the latest data. For example, the result shoud be like this
=C4+C6+C7+C8+C9+C10
(93)
Thanks
A B C D
1 # company earnings date
2 1 ISB 12 10/11/2011
3 2 DTN 15 11/11/2011
4 3 ABC 13 12/11/2011
5 4 ISB 17 13/11/2011
6 5 RTV 18 14/11/2011
7 6 DTN 22 15/11/2011
8 7 PVS 11 16/11/2011
9 8 ISB 19 17/11/2011
10 9 ANH 10 18/11/2011
Sum 6 93
Assuming ascending dates, you could try with CTRL+SHIFT+ENTER in C11:
=SUM((MAX(A2:A10)-MATCH(B2:B10,LOOKUP(MAX(A2:A10)-A2:A10,A2:A10-1,B2:B10),0)=A2:A10-1)*C2:C10)
I'd suggest using a helper column as the easiest approach. In E2 use this formula
=IF(COUNTIF(B2:B$1000,B2)=1,C2,"")
and copy down the column. Now sum column D for the required answer.
Note that the above formula assumes 1000 rows of data maximum, increase if required.

Resources