Excel: Average values where column values match - excel

What I am attempting to accomplish is this - where Report ID matches, i need to calculate the average of Value, and then fill the rows with matching Report ID's with the average for that array of Value.
The data essentially looks like this:
Report ID | Report Instance | Value
11111 1 20
11112 1 50
11113 1 40
11113 2 30
11113 3 20
11114 1 40
11115 1 20
11116 1 30
11116 2 40
11117 1 20
The end goal should look like this:
Report ID | Report Instance | Value | Average
11111 1 20 20
11112 1 50 50
11113 1 40 30
11113 2 20 30
11113 3 30 30
11114 1 40 40
11115 1 20 20
11116 1 30 35
11116 2 40 35
11117 1 20 20
I have tried using average(if()), index(match()), vlookup(match()) and similar combinations of functions, but I haven't had much luck in getting my final output. I'm relatively new to using arrays in excel, and I dont have a strong grasp on how they're evaluated just yet, so any help is much appreciated.

Keep it simple :-)
Why not using =sumif(...)/countif(...) ?

Related

Append columns to DataFrame form another DataFrame

everyone!
Can you pleas help me with the bellow!
I have the first df_1:
key
end
1
10
1
20
2
30
2
40
And the second df_2:
key
time
1
13
1
25
2
35
2
45
I need add columns from df_1 to df_2 with the condition:
df_1['key'] == df_2['key'] and df_2['time'] > df_1['end']
The final solution should look like:
key
time
end_1
end_2
1
13
10
1
25
10
20
2
35
30
2
45
30
40
I was thinking to solve it like on the example bellow:
for index_1, row_1 in df_2.iterrows():
for index_2, row_2 in df_1.iterrows():
if row_1[0] == row_2[0] and row_1[1] > row_2[2]:
row_1.append(row_2)
But it doesn't work
I would appreciate if someone could help me.

Compare hourly data using python pandas

Currently I am having two columns in dataframe one is timestamp and another is temperature which is received every 5 minutes. So data looks like:
timestamp temp
2021-03-21 00:02:17 35
2021-03-21 00:07:17 32
2021-03-21 00:12:17 33
2021-03-21 00:17:17 34
...
2021-03-21 00:57:19 33
2021-03-21 01:02:19 30
2021-03-21 01:07:19 31
...
Now if I want to compare each and every data on hourly basis how can I go ahead, I have tried df.resample() method but it just gives one result after every hour.
The result which I am expecting is like:
data at 00:02:17 - 35 and 01:02:19 - 30, So ans will be 35 -30 = 5
For second one 01:07:19 - 32 and 00:07:17 - 31, So ans will be 32 - 31 = 1
How can I do it dynamically such that it compares hourly data difference
Any help would be great.
Thanks a lot.
Use:
result_df = df.assign(minute_diff=df.sort_values('timestamp', ascending=False)
.groupby(pd.to_datetime(df['timestamp']).dt.minute)['temp']
.diff())
print(result_df)
timestamp temp minute_diff
0 2021-03-21 00:02:17 35 5.0
1 2021-03-21 00:07:17 32 1.0
2 2021-03-21 00:12:17 33 NaN
3 2021-03-21 00:17:17 34 NaN
4 2021-03-21 00:57:19 33 NaN
5 2021-03-21 01:02:19 30 NaN
6 2021-03-21 01:07:19 31 NaN

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

Below is my example dataframe
Date Indicator Value
0 2000-01-30 A 30
1 2000-01-31 A 40
2 2000-03-30 C 50
3 2000-02-27 B 60
4 2000-02-28 B 70
5 2000-03-31 C 90
6 2000-03-28 C 100
7 2001-01-30 A 30
8 2001-01-31 A 40
9 2001-03-30 C 50
10 2001-02-27 B 60
11 2001-02-28 B 70
12 2001-03-31 C 90
13 2001-03-28 C 100
Desired Output
Date Indicator Value
2000-01-31 A 40
2000-02-28 B 70
2000-03-31 C 90
2001-01-31 A 40
2001-02-28 B 70
2001-03-31 C 90
I want to write a code that groups data by particular month-year and then keep the entry of latest date in that particular month-year and drop the rest. The data is till year 2020
I was only able to fetch the count by month-year. I am not able to drop create a proper code that helps to group data as per month-year and indicator and get the correct results
Use Series.dt.to_period for months periods, aggregate index of maximal date per groups by DataFrameGroupBy.idxmax and then pass to DataFrame.loc:
df['Date'] = pd.to_datetime(df['Date'])
print (df['Date'].dt.to_period('m'))
0 2000-01
1 2000-01
2 2000-03
3 2000-02
4 2000-02
5 2000-03
6 2000-03
7 2001-01
8 2001-01
9 2001-03
10 2001-02
11 2001-02
12 2001-03
13 2001-03
Name: Date, dtype: period[M]
df = df.loc[df.groupby(df['Date'].dt.to_period('m'))['Date'].idxmax()]
print (df)
Date Indicator Value
1 2000-01-31 A 40
4 2000-02-28 B 70
5 2000-03-31 C 90
8 2001-01-31 A 40
11 2001-02-28 B 70
12 2001-03-31 C 90

How to select a set of values in pandas data frame (multiple colums with multiple row conditions)

I have a huge ass csv file like given below which I opened as dataframe using pandas. I want to extract data from multiple columns at different date sets.
I want to select from a particular date and hour to another for the last 3 column values. The slicing options I tried and googled were for single column.
date heure PM10 NO2 O3
0 01/01/2016 1 27 22 36
1 01/01/2016 2 25 29 27
2 01/01/2016 3 26 47 10
3 01/01/2016 4 16 40 13
4 01/01/2016 5 15 34 13
5 02/01/2016 1 15 34 13
6 02/01/2016 2 15 34 13
Target output - taking data from a particular data and hour to another one.
3 01/01/2016 4 16
4 01/01/2016 5 15
Thank you. The data set is obviously way bigger than 4 No.
You can do this:
df_selected = df[(df.date >= "01/01/2016") &
(df['hour']>=4) &
(df.date < "02/01/2016") &
(df['hour']<6)
].iloc[:,:3] #first three columns
Alternatively, for the columns selection you can use .loc[:,['name', 'of', 'columns']] or for the last n columns .iloc[:,-n:].
Be careful with date because I'm not sure what happens with an "English" date, maybe you have to change the date using df['date'] = pd.to_datetime(df.date).

Find total no of links to and from node based on data in csv

I have a csv with the following info
Src Rx LinkId Weight
===================================
2 1 4000 10
2 1 4056 15
3 1 4100 10
3 1 4156 15
28 1 10650 8
113 2 15051 205
113 3 15058 205
1 4 3952 9
1 4 3951 5
1 4 3950 34
2 4 4052 9
47 4 18672 44
47 4 18670 38
69 4 4701 11
69 4 4700 21
70 4 4801 11
`
The linkId is unique. Each row represents the link between two devices. For example, source 2 and rx 1 means that a link goes from 2 to 1.
I intend to compute the total weight of all the links originating from each device and coming into each device like so:
Device Out weight In weight
=============================
2 25 205
1 48 58
and so on.
I would like to know if doing this is possible in excel. If yes, how.
Using a pivot table may be the best solution here and I think that if you select this table and click pivot-table it will give you your answer.
Alternatively, you can make a column for each in and out and use =sumif(Src, 1, weight ) and then use the totals at the bottom of each column.

Resources