Iterate over rows in a data frame create a new column then adding more columns based on the new column - python-3.x

I have a data frame as below:
Date Quantity
2019-04-25 100
2019-04-26 148
2019-04-27 124
The output that I need is to take the quantity difference between two next dates and average over 24 hours and create 23 columns with hourly quantity difference added to the column before such as below:
Date Quantity Hour-1 Hour-2 ....Hour-23
2019-04-25 100 102 104 .... 146
2019-04-26 148 147 146 .... 123
2019-04-27 124
I'm trying to iterate over a loop but it's not working ,my code is as below:
for i in df.index:
diff=(df.get_value(i+1,'Quantity')-df.get_value(i,'Quantity'))/24
for j in range(24):
df[i,[1+j]]=df.[i,[j]]*(1+diff)
I did some research but I have not found how to create columns like above iteratively. I hope you could help me. Thank you in advance.

IIUC using resample and interpolate, then we pivot the output
s=df.set_index('Date').resample('1 H').interpolate()
s=pd.pivot_table(s,index=s.index.date,columns=s.groupby(s.index.date).cumcount(),values=s,aggfunc='mean')
s.columns=s.columns.droplevel(0)
s
Out[93]:
0 1 2 3 ... 20 21 22 23
2019-04-25 100.0 102.0 104.0 106.0 ... 140.0 142.0 144.0 146.0
2019-04-26 148.0 147.0 146.0 145.0 ... 128.0 127.0 126.0 125.0
2019-04-27 124.0 NaN NaN NaN ... NaN NaN NaN NaN
[3 rows x 24 columns]

If I have understood the question correctly.
for loop approach:
list_of_values = []
for i,row in df.iterrows():
if i < len(df) - 2:
qty = row['Quantity']
qty_2 = df.at[i+1,'Quantity']
diff = (qty_2 - qty)/24
list_of_values.append(diff)
else:
list_of_values.append(0)
df['diff'] = list_of_values
Output:
Date Quantity diff
2019-04-25 100 2
2019-04-26 148 -1
2019-04-27 124 0
Now create the columns required.
i.e.
df['Hour-1'] = df['Quantity'] + df['diff']
df['Hour-2'] = df['Quantity'] + 2*df['diff']
.
.
.
.
There are other approaches which will work way better.

Related

reshape dataframe time series

[![enter image description here][1]][1]I have a dataframe for a weather data in certain shape and i want to transform it, but struggling on it.
My dataframe looks like that :
city temp_day1, temp_day2, temp_day3 ...., hum_day1, hum_day2, hum_day4, ..., condition
city_1 12 13 20 44 44.5 good 44
city_1 12 13 20 44 44.5
bad 44
city_2 14 04 33 44 44.5
good 44
I want to transforme it to
city_1 city_2 .....
day. temperature humidity condition ... temperature humidity condition
1 12 44 good . 12 13
20 44 44.5
2 13 44 .5 bad .
3 20 NaN bad .
4 NaN 44 .
some day dont have temperature values and humidity values
Thanks for your help
Use wide_to_long with DataFrame.unstack and last DataFrame.swaplevel and DataFrame.sort_index:
df1 = (pd.wide_to_long(df,
stubnames=['temp','hum'],
i='city',
j='day',
sep='_',
suffix='\w+')
.unstack(0)
.swaplevel(1,0, axis=1)
.sort_index(axis=1))
print (df1)
city city_1
hum temp
day
day1 44.0 12.0
day2 44.5 13.0
day3 NaN 20.0
day4 44.0 NaN
Alternative solution:
df1 = df.set_index('city')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack([0,1]).unstack([0,1])
If need extract numbers from index:
df1 = (pd.wide_to_long(df,
stubnames=['temp','hum'],
i='city',
j='day',
sep='_',
suffix='\w+')
.unstack(0)
.swaplevel(1,0, axis=1)
.sort_index(axis=1))
df1.index = df1.index.str.extract('(\d+)', expand=False)
print (df1)
city city_1
hum temp
day
1 44.0 12.0
2 44.5 13.0
3 NaN 20.0
4 44.0 NaN
EDIT:
Solution with real data:
df1 = df.set_index(['condition', 'ACTIVE', 'mode', 'apply', 'spy', 'month'], append=True)
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack([0,1]).unstack([0,-2])
If need remove unnecessary levels in MultiIndex:
df1 = df1.reset_index(level=['condition', 'ACTIVE', 'mode', 'apply', 'spy', 'month'], drop=True)
You can use pandas transpose method like this: df.T
This will turn your dataframe into one row. If you create multiple columns, you can slice it with indexing and assing each slice to independent columns.

Fill in missing values in DataFrame Column which is incrementing by 10

Say , Some Values in the 'Counts' column are missing. These numbers are meant to be increased by 10 with each row so '35' and '55' need to be put in place. I would want to fill in these missing values.
Counts
0 25
1 NaN
2 45
3 NaN
4 65
So my output should be :
Counts
0 25
1 35
2 45
3 55
4 65
Thanks,
We have interpolate
df=df.interpolate()
Counts
0 25.0
1 35.0
2 45.0
3 55.0
4 65.0
Since you now the pattern, you can simply recreate it:
start = df.iloc[0]['Counts'] # first row
end = df.iloc[-1]['Counts'] # last row
df['Counts'] = np.where(df['Counts'].notnull(), df['Counts'],
np.arange(start, end + 1, 10))

Python - Create copies of rows based on column value and increase date by number of iterations

I have a dataframe in Python:
md
Out[94]:
Key_ID ronDt multidays
0 Actuals-788-8AA-0001 2017-01-01 1.0
11 Actuals-788-8AA-0012 2017-01-09 1.0
20 Actuals-788-8AA-0021 2017-01-16 1.0
33 Actuals-788-8AA-0034 2017-01-25 1.0
36 Actuals-788-8AA-0037 2017-01-28 1.0
... ... ...
55239 Actuals-789-8LY-0504 2020-02-12 1.0
55255 Actuals-788-T11-0001 2018-08-23 8.0
55257 Actuals-788-T11-0003 2018-09-01 543.0
55258 Actuals-788-T15-0001 2019-02-20 368.0
55259 Actuals-788-T15-0002 2020-02-24 2.0
I want to create an additional record for every multiday and increase the date (ronDt) by number of times that record was duplicated.
For example:
row[0] would repeat one time with the new date reading 2017-01-02.
row[55255] would be repeated 8 times with the corresponding dates ranging from 2018-08-24 - 2018-08-31.
When I did this in VBA, I used loops, and in Alteryx I used multirow functions. What is the best way to achieve this in Python? Thanks.
Here's a way to in pandas:
# get list of dates possible
df['datecol'] = df.apply(lambda x: pd.date_range(start=x['ronDt'], periods=x['multidays'], freq='D'), 1)
# convert the list into new rows
df = df.explode('datecol').drop('ronDt', 1)
# rename the columns
df.rename(columns={'datecol': 'ronDt'}, inplace=True)
print(df)
Key_ID multidays ronDt
0 Actuals-788-8AA-0001 1.0 2017-01-01
1 Actuals-788-8AA-0012 1.0 2017-01-09
2 Actuals-788-8AA-0021 1.0 2017-01-16
3 Actuals-788-8AA-0034 1.0 2017-01-25
4 Actuals-788-8AA-0037 1.0 2017-01-28
.. ... ... ...
8 Actuals-788-T15-0001 368.0 2020-02-20
8 Actuals-788-T15-0001 368.0 2020-02-21
8 Actuals-788-T15-0001 368.0 2020-02-22
9 Actuals-788-T15-0002 2.0 2020-02-24
9 Actuals-788-T15-0002 2.0 2020-02-25
# Get count of duplication for each row which corresponding to multidays col
df = df.groupby(df.columns.tolist()).size().reset_index().rename(columns={0: 'multidays'})
# Assume ronDt dtype is str so convert it to datetime object
# Then sum ronDt and multidays columns
df['ronDt_new'] = pd.to_datetime(df['ronDt']) + pd.to_timedelta(df['multidays'], unit='d')

pandas df merge avoid duplicate column names

The question is when merge two dfs, and they all have a column called A, then the result will be a df having A_x and A_y, I am wondering how to keep A from one df and discard another one, so that I don't have to rename A_x to A later on after the merge.
Just filter your dataframe columns before merging.
df1 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(0,100,12),'C':list('ABCD')*3})
df2 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(100,1000,12),'C':list('ABCD')*3})
df1.merge(df2[['Key','A']], on='Key')
Output: (Note: C is not duplicated)
A_x C Key A_y
0 60 A 0 440
1 65 B 1 731
2 76 C 2 596
3 67 D 3 580
4 44 A 4 477
5 51 B 5 524
6 7 C 6 572
7 88 D 7 984
8 70 A 8 862
9 13 B 9 158
10 28 C 10 593
11 63 D 11 177
It depends if need append columns with duplicated columns names to final merged DataFrame:
...then add suffixes parameter to merge:
print (df1.merge(df2, on='Key', suffixes=('', '_')))
--
... if not use #Scott Boston solution.

Pandas: Remove aggfun np.sum labels and value labels from pivot table

I have created a pandas pivot table, and I am trying to remove the 'sum' and 'LENGTH' rows from the output xlsx.
So far I have tried to remove the two rows upon exporting the pivot table to xlsx.
I have tried to read in the exported pivot table and DataFrame.drop the two rows and re-export.
I am not having much luck. Thanks all in advance!
Link to pic:
http://i.stack.imgur.com/AmjFy.png
You can use droplevel:
df.columns = df.columns.droplevel([0,1])
print df
STATUS X Y Z
CODE
A 13.0 6 20
B NaN 472 472
C NaN 105 105
D 13.0 584 598
And then maybe reset_index with rename_axis (new in pandas 0.18.0):
df = df.reset_index().rename_axis(None, axis=1)
print df
CODE X Y Z
0 A 13.0 6 20
1 B NaN 472 472
2 C NaN 105 105
3 D 13.0 584 598

Resources