Excel compare 2 columns and shift cells down if not same - excel

I have a large set of data in excel that needs to be matched by column
Before condition is like this
column a column b
1 1
2 2
3 4
5 5
6 6
8 7
9 9
10 10
14 11
15 12
16 13
17 14
19 15
20 17
18
20
And I want after condition like this
column a column b
1 1
2 2
3
4
5 5
6 6
7
8
9 9
10 10
11
12
13
14 14
15 15
16
17 17
18
19
20 20
Thanks for helping me

Copy one column and append to it the other. Remove duplicates and sort. If say in ColumnA starting A1 and your data is in two ranges named RangeA and RangeB, in B1:
=IF(ISERROR(VLOOKUP($A1,RangeA,1,0)),"",$A1)
in C1:
=IF(ISERROR(VLOOKUP($A1,RangeB,1,0)),"",$A1)
copy both down to suit.

Related

resampling a pandas dataframe and filling new rows with zero

I have a time series as a dataframe. The first column is the week number, the second are values for that week. The first week (22) and the last week (48), are the lower and upper bounds of the time series. Some weeks are missing, for example, there is no week 27 and 28. I would like to resample this series such that there are no missing weeks. Where a week was inserted, I would like the corresponding value to be zero. This is my data:
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 29 3
6 30 3
7 31 3
8 32 7
9 33 4
10 34 5
11 35 4
12 36 2
13 37 3
14 38 10
15 39 5
16 40 7
17 41 10
18 42 11
19 43 15
20 44 9
21 45 13
22 46 5
23 47 6
24 48 2
I am wondering if this can be achieved in Pandas without creating a loop from scratch. I have looked into pd.resample, but can't achieve the results I am looking for.
I would set week as index, reindex with fill_value option:
start, end = df['week'].agg(['min','max'])
df.set_index('week').reindex(np.arange(start, end+1), fill_value=0).reset_index()
Output (head):
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 27 0
6 28 0
7 29 3
8 30 3

Sum two dataframes for equal entries

I have two dataframes with same entries in column A, but different entries in columns B and C.
One dataframe has multiple entries for one entry in A.
df1
A B C
0 this 3 4
1 is 4 6
2 an 7 9
3 example 12 20
df2
A B C
0 this 11 11
1 this 5 9
2 this 18 7
3 is 12 14
4 an 1 4
5 an 8 12
6 example 3 17
7 example 9 5
8 example 19 6
9 example 7 1
I want to sum the two dataframes for same entries in column A. The result shoul look like this:
df3
A B C
0 this 14 15
1 this 8 13
2 this 21 11
3 is 16 20
4 an 8 13
5 an 15 21
6 example 15 37
7 example 21 25
8 example 31 26
9 example 19 21
How can I calculate this in a fast way in pandas?
Use DataFrame.merge to left merge the dataframe df2 with df1 on column A then add the columns B, C of df2 to the columns B, C of df3:
df3 = df2[['A']].merge(df1, on='A', how='left')
df3[['B', 'C']] += df2[['B', 'C']]
Result:
print(df3)
A B C
0 this 14 15
1 this 8 13
2 this 21 11
3 is 16 20
4 an 8 13
5 an 15 21
6 example 15 37
7 example 21 25
8 example 31 26
9 example 19 21
OR another possible idea if order is not important:
df3 = df2.set_index('A').add(df1.set_index('A')).reset_index()
print(df3)
A B C
0 an 8 13
1 an 15 21
2 example 15 37
3 example 21 25
4 example 31 26
5 example 19 21
6 is 16 20
7 this 14 15
8 this 8 13
9 this 21 11

Replace values from one column in dataframe

import pandas as pd
import numpy as np
import ast
pd.options.display.max_columns = 20
I have dataframe column season that looks like this (first 20 entries):
season
0 2006-07
1 2007-08
2 2008-09
3 2009-10
4 2010-11
5 2011-12
6 2012-13
7 2013-14
8 2014-15
9 2015-16
10 2016-17
11 2017-18
12 2018-19
13 Career
14 season
15 2018-19
16 Career
17 season
18 2017-18
19 2018-19
It starts with season and ends with Career. I want to replace years with numbers starting with 1 and ending when there's career. I want to be like this:
season
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 Career
14 season
15 1
16 Career
17 season
18 1
19 2
So counting should reset every time there's season in column and end every time there's career.
Create consecutive groups by compare mask created by Series.isin with shifted values with GroupBy.cumcount for counter:
s = df['season'].isin(['Career', 'season'])
df['new'] = np.where(s, df['season'], df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1)
print (df)
season new
0 2006-07 1
1 2007-08 2
2 2008-09 3
3 2009-10 4
4 2010-11 5
5 2011-12 6
6 2012-13 7
7 2013-14 8
8 2014-15 9
9 2015-16 10
10 2016-17 11
11 2017-18 12
12 2018-19 13
13 Career Career
14 season season
15 2018-19 1
16 Career Career
17 season season
18 2017-18 1
19 2018-19 2
For replace column season:
s = df['season'].isin(['Career', 'season'])
df.loc[~s, 'season'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1
print (df)
season
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 Career
14 season
15 1
16 Career
17 season
18 1
19 2

PivotTable with multiple conditions to count unique items

Following is the portion of my table in Excel:
A B C D E
5 10 1 18316 3
5 11 1 18313 3
5 11 2 18002 3
5 11 3 10825 3
5 12 1 18316 3
5 12 2 18001 3
5 12 3 10825 3
5 13 1 18313 3
5 13 2 18002 3
5 14 1 18316 3
5 14 2 18001 3
5 14 3 18002 3
5 15 1 18313 3
5 16 1 18316 3
5 16 2 18002 3
5 16 3 18313 3
5 17 1 18313 3
5 17 2 18002 3
5 17 3 18316 3
5 20 1 18313 3
5 21 1 18316 3
5 21 2 18001 3
5 21 3 18313 3
15 10 1 47009 3
15 10 2 40802 3
15 11 1 47009 3
15 12 1 47010 3
15 12 2 47009 3
15 13 1 47009 3
15 13 2 47010 3
15 14 1 47010 3
What I want to achieve is the following:
To be able to calculate the count of a number in column D for every unique B and A with respect to C (if D is at the Max of C or not)
Output something like:
Filter: 18001 on Column D
5
12 1 Non-Max
14 1 Non-Max
21 1 Non-Max
Similarly if the filter is changed to 18316:
5
10 1 Max
12 1 Non-Max
14 1 Non-Max
16 1 Non-Max
17 1 Max
21 1 Non-Max
I have 20K rows of data that needs processing.
I seem to be able to achieve close to the results you indicate from the data you have provided - but have no idea what you mean by "for every unique B and A with respect to C (if D is at the Max of C or not)". I applied a PivotTable as below:
Max and Non-Max being indicated by the relationship between Count of E and Max of C - which could be used in a simple formula to display Max or Non-Max outside the PivotTable.

Excel Pivot table - get maximum for a period of 24 hours

I have an excel with:
Days of the week and 24 hours for each day.
Each hour I get some points.
I would like to calcute the maximum of cumulate points I can get within 24 hours.
[TEST.XLSX]
2 Columns:
Monday Points
0 34
1 32
2 4
3 54
4 12
5 55
6 4
7 4
8 555
9 787
10 8
11 76
12 78
13 8
14 656
15 7
16 4
17 45
18 54
19 543
20 56
21 65
22 4
23 3
Tuesday
0 56
1 7
2 333
3 9
4 876
5 3333
6 3333
7 76
8 3333
9 465
10 7
11 6
12 5
13 6
14 7
15 6
16 7
17 65
18 555555555
19 6
20 5
21 4
22 6
23 6
Wednesday
0 6
1 7
...
Thanks for your help!
Use real date time values in your hours column. Delete the rows with the day text. Instead, use a formula that increments from a starting date/time. For example: cell A2 contains the date and midnight time for Nov 17. Cell A3 and copied down contains the formula
=A2+TIME(1,0,0)
which increments by one hour.
Now you con build a pivot table. Group by the date/time value by day and hour. Show the subtotal for the day and set its value field settings to Max.

Resources