Append columns to DataFrame form another DataFrame - python-3.x

everyone!
Can you pleas help me with the bellow!
I have the first df_1:
key
end
1
10
1
20
2
30
2
40
And the second df_2:
key
time
1
13
1
25
2
35
2
45
I need add columns from df_1 to df_2 with the condition:
df_1['key'] == df_2['key'] and df_2['time'] > df_1['end']
The final solution should look like:
key
time
end_1
end_2
1
13
10
1
25
10
20
2
35
30
2
45
30
40
I was thinking to solve it like on the example bellow:
for index_1, row_1 in df_2.iterrows():
for index_2, row_2 in df_1.iterrows():
if row_1[0] == row_2[0] and row_1[1] > row_2[2]:
row_1.append(row_2)
But it doesn't work
I would appreciate if someone could help me.

Related

Compare hourly data using python pandas

Currently I am having two columns in dataframe one is timestamp and another is temperature which is received every 5 minutes. So data looks like:
timestamp temp
2021-03-21 00:02:17 35
2021-03-21 00:07:17 32
2021-03-21 00:12:17 33
2021-03-21 00:17:17 34
...
2021-03-21 00:57:19 33
2021-03-21 01:02:19 30
2021-03-21 01:07:19 31
...
Now if I want to compare each and every data on hourly basis how can I go ahead, I have tried df.resample() method but it just gives one result after every hour.
The result which I am expecting is like:
data at 00:02:17 - 35 and 01:02:19 - 30, So ans will be 35 -30 = 5
For second one 01:07:19 - 32 and 00:07:17 - 31, So ans will be 32 - 31 = 1
How can I do it dynamically such that it compares hourly data difference
Any help would be great.
Thanks a lot.
Use:
result_df = df.assign(minute_diff=df.sort_values('timestamp', ascending=False)
.groupby(pd.to_datetime(df['timestamp']).dt.minute)['temp']
.diff())
print(result_df)
timestamp temp minute_diff
0 2021-03-21 00:02:17 35 5.0
1 2021-03-21 00:07:17 32 1.0
2 2021-03-21 00:12:17 33 NaN
3 2021-03-21 00:17:17 34 NaN
4 2021-03-21 00:57:19 33 NaN
5 2021-03-21 01:02:19 30 NaN
6 2021-03-21 01:07:19 31 NaN

Creating an aggregate columns in pandas dataframe

I have a pandas dataframe as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ORDER':["A", "A", "B", "B"], 'var1':[2, 3, 1, 5],'a1_bal':[1,2,3,4], 'a1c_bal':[10,22,36,41], 'b1_bal':[1,2,33,4], 'b1c_bal':[11,22,3,4], 'm1_bal':[15,2,35,4]})
df
ORDER var1 a1_bal a1c_bal b1_bal b1c_bal m1_bal
0 A 2 1 10 1 11 15
1 A 3 2 22 2 22 2
2 B 1 3 36 33 3 35
3 B 5 4 41 4 4 4
I want to create new columns as below:
a1_final_bal = sum(a1_bal, a1c_bal)
b1_final_bal = sum(b1_bal, b1c_bal)
m1_final_bal = m1_bal (since we only have m1_bal field not m1c_bal, so it will renain as it is)
I don't want to hardcode this step because there might be more such columns as "c_bal", "m2_bal", "m2c_bal" etc..
My final data should look something like below
ORDER var1 a1_bal a1c_bal b1_bal b1c_bal m1_bal a1_final_bal b1_final_bal m1_final_bal
0 A 2 1 10 1 11 15 11 12 15
1 A 3 2 22 2 22 2 24 24 2
2 B 1 3 36 33 3 35 38 36 35
3 B 5 4 41 4 4 4 45 8 4
You could try something like this. I am not sure if its exactly what you are looking for, but I think it should work.
dfforgroup = df.set_index(['ORDER','var1']) #Creates MultiIndex
dfforgroup.columns = dfforgroup.columns.str[:2] #Takes first two letters of remaining columns
df2 = dfforgroup.groupby(dfforgroup.columns,axis=1).sum().reset_index().drop(columns =
['ORDER','var1']).add_suffix('_final_bal') #groups columns by their first two letters and sums the columns up
df = pd.concat([df,df2],axis=1) #concatenates new columns to original df

row substraction in lambda pandas dataframe

I have a dataframe with multiple columns. One of the column is the cumulative revenue column. If the year is not ended then the revenue will be constant for the rest of the period because the coming daily revenue is 0.
The dataframe looks like this
Now I want to create a new column where the row is substracted by the last row and if the result is 0 then print 0 for that row in the new column. If not zero then use the row value. The new dataframe should look like this:
My idea was to do this with the apply lambda method. So this is the thinking:
{df['2017new'] = df['2017'].apply(lambda x: 0 if row - lastrow == 0 else x)}
But i do not know how to write the row - lastrow part of the code. How to do this? Thanks in advance!
By using np.where
df2['New']=np.where(df2['2017'].diff().eq(0),0,df2['2017'])
df2
Out[190]:
2016 2017 New
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0
We can shift the data and fill the values based on condition using np.where i.e
df['new'] = np.where(df['2017']-df['2017'].shift(1)==0,0,df['2017'])
or with df.where i.e
df['new'] = df['2017'].where(df['2017']-df['2017'].shift(1)!=0,0)
2016 2017 new
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0

Excel: Average values where column values match

What I am attempting to accomplish is this - where Report ID matches, i need to calculate the average of Value, and then fill the rows with matching Report ID's with the average for that array of Value.
The data essentially looks like this:
Report ID | Report Instance | Value
11111 1 20
11112 1 50
11113 1 40
11113 2 30
11113 3 20
11114 1 40
11115 1 20
11116 1 30
11116 2 40
11117 1 20
The end goal should look like this:
Report ID | Report Instance | Value | Average
11111 1 20 20
11112 1 50 50
11113 1 40 30
11113 2 20 30
11113 3 30 30
11114 1 40 40
11115 1 20 20
11116 1 30 35
11116 2 40 35
11117 1 20 20
I have tried using average(if()), index(match()), vlookup(match()) and similar combinations of functions, but I haven't had much luck in getting my final output. I'm relatively new to using arrays in excel, and I dont have a strong grasp on how they're evaluated just yet, so any help is much appreciated.
Keep it simple :-)
Why not using =sumif(...)/countif(...) ?

Table transformation in Excel

I have a spreadsheet with data in following format:
CarID Day DistanceTraveled
Ford1 1 10
Ford1 2 12
Nissan1 1 13
Ford1 3 41
Nissan1 2 20
Nissan1 3 10
...
And so on. There are a few hundreds of records in format like this, with a few dozens of cars.
I have to transform it into a following format:
Day Ford1 Nissan1
1 10 13
2 12 20
3 41 10
Is it any fast and automatic way to achieve it in Excel?
Just for the sake of an answer:

Resources