So, I want to calculate the time differences; and the file looks something like this
id
message_id
send_date
status
0
5f74b996a2b7e
2020-10-01 00:00:07
sent
1
5f74b996a2b7e
2020-10-01 00:00:09
delivered
2
5f74b99e85b3c
2020-10-02 02:00:14
sent
3
5f74b99e85b3c
2020-10-02 02:01:16
delivered
4
5f74b99e85b3c
2020-10-02 08:06:49
read
5
5f74b996a2b7e
2020-10-02 15:16:32
read
6
5f9d97ff1af9e
2020-10-14 13:45:43
sent
7
5f9d97ff1af9e
2020-10-14 13:45:45
delivered
8
5f9d97ff1af9e
2020-10-14 13:50:48
read
9
5f74b9a35b6c5
2020-10-16 19:01:19
sent
10
5f74b9a35b6c5
2020-10-16 19:01:25
deleted
Inside is id which increment, message_id is unique to each message, send_date is the time, status is the message status (it has 5 statuses; sent, delivered, read, failed, and deleted).
I wanted to calculate the time differences when the message was sent then delivered, if delivered then read.
I know something like this can be handy, but I wasn't sure how to assign it uniquely to each of the message_id
from datetime import datetime
s1 = '2020-10-14 13:45:45'
s2 = '2020-10-14 13:50:48' # for example
FMT = '%Y-%m-%d %H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
print(tdelta)
Ref: https://stackoverflow.com/questions/3096953/how-to-calculate-the-time-interval-between-two-time-strings
The expected output would be,
message_id
delivered_diff
read_diff
deleted_diff
0
5f74b996a2b7e
00:00:02
1 day, 15:16:23
1
5f74b99e85b3c
00:01:02
6:05:33
2
5f9d97ff1af9e
00:00:02
0:05:03
3
5f74b9a35b6c5
0:00:06
You can use pandas to do this and datetime.
The code is commented to better understand and realized with python 3.8.
import datetime
import pandas as pd
def time_delta(a, b):
return datetime.datetime.strptime(b, '%Y-%m-%d %H:%M:%S') - datetime.datetime.strptime(a, '%Y-%m-%d %H:%M:%S') # calculate the timedelta
def calculate_diff(val, first_status, second_status):
if not val['status'].str.contains(first_status).any() or not val['status'].str.contains(second_status).any(): # Check if the status exist
return ''
a = val.loc[val['status'] == first_status, 'send_date'].values[0] # Get the first send_date value for the first status value
b = val.loc[val['status'] == second_status, 'send_date'].values[0] # Get the first send_date value for the second status value
return time_delta(a, b) # calculate the delta
df = pd.read_csv('test.csv', sep=';') # Load csv file with ; as separator
grouped = df.groupby('message_id') # Group by message ids
final_df = pd.DataFrame(columns=['message_id', 'delivered_diff', 'read_diff', 'deleted_diff']) # create empty result dataframe
for message_id, values in grouped: # calculate the results for each group
delivered_diff = calculate_diff(values, 'sent', 'delivered') # calculate delivered_diff as delta between sent status and delivered status
read_diff = calculate_diff(values, 'delivered', 'read') # calculate read_diff as delta between delivered status and read status
deleted_diff = calculate_diff(values, 'sent', 'deleted') # calculate deleted_diff as delta between sent status and deleted status
res = {
'message_id': message_id,
'delivered_diff': delivered_diff,
'read_diff': read_diff,
'deleted_diff': deleted_diff
}
# append the results
final_df = final_df.append(res, ignore_index=True)
# print final result
print(final_df)
The result:
message_id delivered_diff read_diff deleted_diff
0 5f74b996a2b7e 0 days 00:00:02 1 days 15:16:23
1 5f74b99e85b3c 0 days 00:01:02 0 days 06:05:33
2 5f74b9a35b6c5 0 days 00:00:06
3 5f9d97ff1af9e 0 days 00:00:02 0 days 00:05:03
import pandas as pd
from datetime import datetime, timedelta
final_dict = []
data = pd.read_csv('data.csv', names=['id','unique_id','time','status'])
data['time'] = pd.to_datetime(data['time'])
# data.info()
groupByUniqueId = data.groupby('unique_id')
for name,group in groupByUniqueId:
for row in group.iterrows():
if row[1][3] == "sent":
sent = row[1][2]
if row[1][3] == "read":
final_dict.append({row[1][1]: {"read": str(sent - row[1][2])}})
elif row[1][3] == "delivered":
final_dict.append({row[1][1]: {"delivered":str(sent - row[1][2])}})
elif row[1][3] == "deleted":
final_dict.append({row[1][1]: {"deleted":str(sent - row[1][2])}})
print(final_dict)
Data Sample for CSV
Related
I have a df describing transactions like
transaction start_in_s_since_epoch duration_in_s charged_energy_in_wh
1 1.457423e+09 1821.0 1732
2 1.457389e+09 35577.0 18397
3 1.457425e+09 2.0 0
[...]
I assume the charged_energy is linear through the transaction. I would like to transform it to a time series with the granularity of a day. charged_energy within a day should be summed up as well as duration.
day sum_duration_in_s sum_charged_energy_in_wh
2016-03-16 00:00 123 456
2016-03-17 00:00 456 789
2016-03-18 00:00 789 012
[...]
Any idea? I am struggling with the borders between days. This transaction with
transaction start_in_s_since_epoch duration_in_s charged_energy_in_wh
500 1620777300 600 1000
should be equally divided to
day sum_duration_in_s sum_charged_energy_in_wh
2021-05-11 00:00 300 500
2021-05-11 00:00 300 500
This did it for me. Slow af but works:
from datetime import datetime
from datetime_truncate import truncate
df_tmp = pd.DataFrame()
for index, row in df.iterrows():
day_in_s = 60*60*24
start = row.start_in_s_since_epoch
time = row.duration_in_s
energy_per_s = row.charged_energy_in_wh / row.duration_in_s
till_midnight_in_s = truncate(pd.to_datetime(start + day_in_s, unit='s'), 'day').timestamp() - start
rest_in_s = time - till_midnight_in_s
data = {'day':truncate(pd.to_datetime(start, unit='s'), 'day'),
'sum_duration_in_s':min(time, till_midnight_in_s),
'sum_charged_energy_in_wh':min(time, till_midnight_in_s) * energy_per_s}
df_tmp = df_tmp.append(data, ignore_index=True)
while rest_in_s > 0:
start += day_in_s
data = {'day':truncate(pd.to_datetime(start, unit='s'), 'day'),
'sum_duration_in_s':min(rest_in_s, day_in_s),
'sum_charged_energy_in_wh':min(rest_in_s, day_in_s) * energy_per_s}
df_tmp = df_tmp.append(data, ignore_index=True)
rest_in_s = rest_in_s - day_in_s
df_ts = df_tmp.groupby(['date']).agg({'sum_charged_energy_in_wh':sum,
'sum_duration_in_s':sum}).sort_values('date')
df_ts = df_ts.asfreq('D', fill_value=0)
I have many csv files that only have one row of data. I need to take data from two of the cells and put them into a master csv file ('new_gal.csv'). Initially this will only contain the headings, but no data.
#The file I am pulling from:
file_name = "N4261_pacs160.csv"
#I have the code written to separate gal_name, cat_name, and cat_num (N4261, pacs, 160)
An example of the csv is given here. I am trying to pull "flux" and "rms" from this file. (Sorry it isn't aligned nicely; I can't figure out the formatting).
name band ra dec raerr decerr flux snr snrnoise stn rms strn fratio fwhmxfit fwhmyfit flag_elong edgeflag flag_blend warmat
obsid ssomapflag dist angle
HPPSC160A_J121923.1+054931 red 184.846389 5.8254 0.000151 0.00015
227.036 10.797 21.028 16.507 13.754 37.448 1.074 15.2 11 0.7237
f 0 f 1342199758 f 1.445729 296.577621
I read this csv and pull the data I need
with open(file_name, 'r') as table:
reader = csv.reader(table, delimiter=',')
read = iter(reader)
next(read)
for row in read:
fluxP = row[6]
errP = row[10]
#Open the master csv with pandas
df = pd.read_csv('new_gal.csv')
The master csv file has format:
Galaxy Cluster Mult. Detect. LumDist z W1 W1 err W2 W2 err W3 W3 err W4 W4 err 70 70 err 100 100 err 160 160 err 250 250 err 350 350 err 500 500 err
The main problem I have, is that I want to search the "Galaxy" column in the 'new_gal.csv' for the galaxy name. If it is not there, I need to add a new row with the galaxy name and the flux and error measurement. When I run this multiple times, I get duplicate rows even though I have the append command nested in the if statement. I only want it to append a new row if the galaxy name is not already there; otherwise, it should only change the values of the flux and error measurements for that galaxy.
if cat_name == 'pacs':
if gal_name not in df["Galaxy"]:
df = df.append({"Galaxy": gal_name}, ignore_index=True)
if cat_num == "70":
df.loc[df.Galaxy == gal_name, ["70"]] = fluxP
df.loc[df.Galaxy == gal_name, ["70 err"]] = errP
elif cat_num == "100":
df.loc[df.Galaxy == gal_name, ["100"]] = fluxP
df.loc[df.Galaxy == gal_name, ["100 err"]] = errP
elif cat_num == "160":
df.loc[df.Galaxy == gal_name, ["160"]] = fluxP
df.loc[df.Galaxy == gal_name, ["160 err"]] = errP
else:
if cat_num == "70":
df.loc[df.Galaxy == gal_name, ["70"]] = fluxP
df.loc[df.Galaxy == gal_name, ["70 err"]] = errP
elif cat_num == "100":
df.loc[df.Galaxy == gal_name, ["100"]] = fluxP
df.loc[df.Galaxy == gal_name, ["100 err"]] = errP
elif cat_num == "160":
df.loc[df.Galaxy == gal_name, ["160"]] = fluxP
df.loc[df.Galaxy == gal_name, ["160 err"]] = errP
After running the code 5 times with the same file, I have 5 identical lines in the table.
I think I've got something that'll work after tinkering with it this morning...
Couple points... You shouldn't incrementally build in pandas...get the data setup done externally then do 1 build. In what I have below, I'm building a big dictionary from the small csv files and then using merge to put that together with the master file.
If your .csv files aren't formatted properly, you can either try to replace the split character below or switch over to csv reader that is a bit more powerful.
You should put all of the smaller .csv files in a folder called 'orig_data' to make this work.
main prog
# galaxy compiler
import os, re
import pandas as pd
# folder location for the small .csvs, NOT the master
data_folder = 'orig_data' # this folder should be in same directory as program
result = {}
splitter = r'(.+)_([a-zA-Z]+)([0-9]+)\.' # regex to break up file name into 3 groups
for file in os.listdir(data_folder):
file_data = {}
# split up the filename and process
galaxy, cat_name, cat_num = re.match(splitter, file).groups()
#print(galaxy, cat_name, cat_num)
with open(os.path.join(data_folder, file), 'r') as src:
src.readline() # read the header and disregard it
data = src.readline().replace(' ','').strip().split(',') # you can change the split char
flux = float(data[2])
rms = float(data[3])
err_tag = cat_num + ' err'
file_data = { 'cat_name': cat_name,
cat_num: flux,
err_tag: rms}
result[galaxy] = file_data
df2 = pd.DataFrame.from_dict(result, orient='index')
df2.index.rename('galaxy', inplace=True)
# check the resulting build!
#print(df2)
# build master dataframe
master_df = pd.read_csv('master_data.csv')
#print(master_df.head())
# merge the 2 dataframes on galaxy name. See the dox on merge for other
# options and whether you want an "outer" join or other type of join...
master_df = master_df.merge(df2, how='outer', on='galaxy')
# convert boolean flags properly
conv = {'t': True, 'f': False}
master_df['flag_nova'] = master_df['flag_nova'].map(conv).astype('bool')
print(master_df)
print()
print(master_df.info())
print()
print(master_df.describe())
example data files in orig_data folder
filename: A99_dbc100.csv
band,weight,flux,rms
junk, 200.44,2e5,2e-8
filename: B250_pacs100.csv
band,weight,flux,rms
nada,2.44,19e-5, 74
...etc.
example master csv
galaxy,color,stars,flag_nova
A99,red,15,f
B250,blue,4e20,t
N1000,green,3e19,f
X99,white,12,t
Result:
galaxy color stars ... 200 err 100 100 err
0 A99 red 1.500000e+01 ... NaN 200000.00000 2.000000e-08
1 B250 blue 4.000000e+20 ... NaN 0.00019 7.400000e+01
2 N1000 green 3.000000e+19 ... 88.0 NaN NaN
3 X99 white 1.200000e+01 ... NaN NaN NaN
[4 rows x 9 columns]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 9 columns):
galaxy 4 non-null object
color 4 non-null object
stars 4 non-null float64
flag_nova 4 non-null bool
cat_name 3 non-null object
200 1 non-null float64
200 err 1 non-null float64
100 2 non-null float64
100 err 2 non-null float64
dtypes: bool(1), float64(5), object(3)
memory usage: 292.0+ bytes
None
stars 200 200 err 100 100 err
count 4.000000e+00 1.0 1.0 2.000000 2.000000e+00
mean 1.075000e+20 1900000.0 88.0 100000.000095 3.700000e+01
std 1.955121e+20 NaN NaN 141421.356103 5.232590e+01
min 1.200000e+01 1900000.0 88.0 0.000190 2.000000e-08
25% 1.425000e+01 1900000.0 88.0 50000.000143 1.850000e+01
50% 1.500000e+19 1900000.0 88.0 100000.000095 3.700000e+01
75% 1.225000e+20 1900000.0 88.0 150000.000048 5.550000e+01
max 4.000000e+20 1900000.0 88.0 200000.000000 7.400000e+01
the input is a range of date for which we need to find the starting date of the month and end date of the month of all date in between the interval. example is given below
input:
start date: 2018-6-15
end date: 2019-3-20
desired output:
[
["month starting date","month ending date"],
["2018-6-15","2018-6-30"],
["2018-7-1","2018-7-31"],
["2018-8-1","2018-8-31"],
["2018-9-1","2018-9-30"],
["2018-10-1","2018-10-31"],
["2018-11-1","2018-11-30"],
["2018-12-1","2018-12-31"],
["2019-1-1","2019-1-31"],
["2019-2-1","2019-2-28"],
["2019-3-1","2019-3-20"]
]
An option using pandas: create a date_range from start to end date, extract the month numbers from that as a pandas.Series, shift it 1 element forward and 1 element backward to retrieve a boolean mask where the months change (!=). Now you can create a DataFrame to work with or create a list of lists if you like.
Ex:
import pandas as pd
start_date, end_date = '2018-6-15', '2019-3-20'
dtrange = pd.date_range(start=start_date, end=end_date, freq='d')
months = pd.Series(dtrange .month)
starts, ends = months.ne(months.shift(1)), months.ne(months.shift(-1))
df = pd.DataFrame({'month_starting_date': dtrange[starts].strftime('%Y-%m-%d'),
'month_ending_date': dtrange[ends].strftime('%Y-%m-%d')})
# df
# month_starting_date month_ending_date
# 0 2018-06-15 2018-06-30
# 1 2018-07-01 2018-07-31
# 2 2018-08-01 2018-08-31
# 3 2018-09-01 2018-09-30
# 4 2018-10-01 2018-10-31
# 5 2018-11-01 2018-11-30
# 6 2018-12-01 2018-12-31
# 7 2019-01-01 2019-01-31
# 8 2019-02-01 2019-02-28
# 9 2019-03-01 2019-03-20
# as a list of lists:
l = [df.columns.values.tolist()] + df.values.tolist()
# l
# [['month_starting_date', 'month_ending_date'],
# ['2018-06-15', '2018-06-30'],
# ['2018-07-01', '2018-07-31'],
# ['2018-08-01', '2018-08-31'],
# ['2018-09-01', '2018-09-30'],
# ['2018-10-01', '2018-10-31'],
# ['2018-11-01', '2018-11-30'],
# ['2018-12-01', '2018-12-31'],
# ['2019-01-01', '2019-01-31'],
# ['2019-02-01', '2019-02-28'],
# ['2019-03-01', '2019-03-20']]
Note that I use strftime when I create the DataFrame. Do this if you want the output to be of dtype string. If you want to continue to work with datetime objects (timestamps), don't apply strftime.
This code is simple and uses standard python packages.
import calendar
from datetime import datetime, timedelta
def get_time_range_list(start_date, end_date):
date_range_list = []
while 1:
month_end = start_date.replace(day=calendar.monthrange(start_date.year, start_date.month)[1])
next_month_start = month_end + timedelta(days=1)
if next_month_start <= end_date:
date_range_list.append((start_date, month_end))
start_date = next_month_start
else:
date_range_list.append((start_date, end_date))
return date_range_list
I have two date columns, I want to subtract the two columns based on conditions. First check for all the blanks in the first column and then check second column for blanks and the third condition check if the subtracted dates are less than one. If these conditions are satisfied, carry out subtraction of the the two columns. Something like this:
'''if [Recommendation signed] = null or [Executed Date] = null or Duration.Days([Contract Executed Date]-[Recommendation signed]) < 1 then null else Duration.Days([Contract Executed Date]-[Recommendation signed])'''
You can do that using apply function. For example you want to store the value into a new column called day difference.
Make sure these were datetime columns (if they're not apply to_datetime function).
df['Recommendation signed'] = pd.to_datetime(data['Recommendation signed']).dt.date
df['Executed Date'] = pd.to_datetime(data['Executed Date']).dt.date
df['Contract Executed Date'] = pd.to_datetime(data['Contract Executed Date']).dt.date
def substract_columns(row):
if pd.isnull(row['Recommendation signed']) or pd.isnull(row['Executed Date']) or ((row['Contract Executed Date'] - row['Recommendation signed']) == '0 days'):
return None
else:
row['Contract Executed Date'] - row['Recommendation signed']
df['day difference'] = df.apply(substract_columns, axis=1)
Hope this helps.
Here's one way to do it. Since no data was provided I created my own generator. The solution is contained within find_duration and how it is used in df.apply(find_duration, axis=1).
from datetime import datetime, timedelta
from itertools import islice
import numpy as np
import pandas as pd
RECOMMENDATION_IS_PENDING = "RECOMMENDATION_IS_PENDING"
EXECUTION_IS_PENDING = "EXECUTION_IS_PENDING"
COMPLETED_IN_LESS_THAN_ONE_DAY = "COMPLETED_IN_LESS_THAN_ONE_DAY"
COMPLETED_IN_MORE_THAN_ONE_DAY = "COMPLETED_IN_MORE_THAN_ONE_DAY"
MIN_YEAR = 1900
MAX_YEAR = 2020
NUM_YEARS = MAX_YEAR - MIN_YEAR + 1
START_DATE = datetime(MIN_YEAR, 1, 1, 00, 00, 00)
END_DATE = START_DATE + timedelta(days=365 * NUM_YEARS)
NUM_RECORDS = 20
def random_datetime(rng, dt):
return START_DATE + (END_DATE - START_DATE) * rng.uniform()
def less_than_one_day(rng, dt):
hours = int(np.round(23.0 * rng.uniform()))
return dt + timedelta(hours=hours)
def more_than_one_day(rng, dt):
days = 1 + int(np.round(100.0 * rng.uniform()))
return dt + timedelta(days=days)
def null_datetime(rng, dt):
return None
class RecordGenerator:
PROBABILITIES = {
RECOMMENDATION_IS_PENDING: 0.1,
EXECUTION_IS_PENDING: 0.2,
COMPLETED_IN_LESS_THAN_ONE_DAY: 0.2,
COMPLETED_IN_MORE_THAN_ONE_DAY: 0.5,
}
GENERATORS = {
RECOMMENDATION_IS_PENDING: (null_datetime, random_datetime),
EXECUTION_IS_PENDING: (random_datetime, null_datetime),
COMPLETED_IN_LESS_THAN_ONE_DAY: (random_datetime, less_than_one_day),
COMPLETED_IN_MORE_THAN_ONE_DAY: (random_datetime, more_than_one_day),
}
def __init__(self, seed=0):
self.rng = np.random.RandomState(seed)
def __iter__(self):
while True:
res = self.rng.uniform()
for kind, val in self.PROBABILITIES.items():
res -= val
if res <= 0.0:
break
recommendation_signed_fn, execution_date_fn = self.GENERATORS[kind]
recommendation_signed = recommendation_signed_fn(self.rng, None)
execution_date = execution_date_fn(self.rng, recommendation_signed)
yield recommendation_signed, execution_date
def find_duration(df):
duration = df["execution_date"] - df["recommendation_signed"]
if duration is pd.NaT or duration < pd.Timedelta(days=1):
return None
return duration
if __name__ == "__main__":
records = RecordGenerator()
recommendation_signed_dates, execution_dates = zip(*islice(records, NUM_RECORDS))
df = pd.DataFrame.from_dict({
"recommendation_signed": recommendation_signed_dates,
"execution_date": execution_dates,
})
print(f"`recommendation_signed` is null: [{df['recommendation_signed'].isnull().sum()}]")
print(f"`execution_date` is null: [{df['execution_date'].isnull().sum()}]")
print(f"`completed_in_less_than_one_day`: [{((df['execution_date'] - df['recommendation_signed']) < pd.Timedelta(days=1)).sum()}]")
print(f"`completed_in_more_than_one_day`: [{((df['execution_date'] - df['recommendation_signed']) >= pd.Timedelta(days=1)).sum()}]")
df["completion_time"] = df.apply(find_duration, axis=1)
print(df)
Output:
`recommendation_signed` is null: [2]
`execution_date` is null: [2]
`completed_in_less_than_one_day`: [4]
`completed_in_more_than_one_day`: [12]
recommendation_signed execution_date completion_time
0 1986-06-25 08:07:14.808395 1986-08-25 08:07:14.808395 61 days
1 1951-03-25 17:08:27.986156 1951-05-30 17:08:27.986156 66 days
2 2007-11-01 03:42:35.672304 2007-11-02 01:42:35.672304 NaT
3 1995-09-26 12:52:16.917964 1995-09-27 00:52:16.917964 NaT
4 2011-12-03 23:24:45.808880 2011-12-11 23:24:45.808880 8 days
5 NaT 1902-06-12 22:41:33.183052 NaT
6 1994-02-04 07:01:47.052493 1994-05-03 07:01:47.052493 88 days
7 1996-08-19 20:06:42.217770 1996-10-05 20:06:42.217770 47 days
8 1914-04-21 14:09:37.598524 1914-06-25 14:09:37.598524 65 days
9 2014-03-25 07:15:55.137157 NaT NaT
10 1950-02-21 13:04:11.684479 1950-03-20 13:04:11.684479 27 days
11 1955-02-27 21:06:22.090510 1955-04-26 21:06:22.090510 58 days
12 NaT 1974-09-07 20:55:17.329968 NaT
13 1974-08-07 21:21:33.578522 1974-11-10 21:21:33.578522 95 days
14 1943-06-22 15:59:39.451885 1943-08-06 15:59:39.451885 45 days
15 1907-04-14 20:35:27.269379 1907-06-21 20:35:27.269379 68 days
16 1925-06-10 13:05:57.968982 1925-06-24 13:05:57.968982 14 days
17 1943-12-25 06:52:07.566032 1943-12-25 19:52:07.566032 NaT
18 2019-07-07 12:44:00.201327 2019-07-07 14:44:00.201327 NaT
19 1919-07-05 05:38:11.678570 NaT NaT
You could try something like this:
import numpy as np
from datetime import datetime, timedelta
df['Recommendation Signed'] = pd.to_datetime(df['Recommendation Signed'], errors='coerce')
df['Contract Executed Date'] = pd.to_datetime(df['Contract Executed Date'], errors='coerce')
df['date_difference'] = np.where(df['Recommendation Signed'].isnull() | df['Contract Executed Date'].isnull() | ((df['Contract Executed Date'] - df['Recommendation Signed'] ) < timedelta(days=1)), np.datetime64('NaT'), df['Contract Executed Date'] - df['Recommendation Signed'])
I am trying to build a function which transform a dataframe based on certain conditions but I am getting a Systax Error. I am not sure what I am doing wrong. Any help will be appreciated. Thank you!
import pandas as pd
from datetime import datetime
from datetime import timedelta
df=pd.read_csv('example1.csv')
df.columns =(['dtime','kW'])
df['dtime'] = pd.to_datetime(df['dtime'])
df.head(5)
dtime kW
0 2019-08-27 23:30:00 0.016
1 2019-08-27 23:00:00 0
2 2019-08-27 22:30:00 0.016
3 2019-08-27 22:00:00 0.016
4 2019-08-27 21:30:00 0
def transdf(df):
a=df.loc[0,'dtime']
b=df.loc[1,'dtime']
c=a-b
minutes = c.total_seconds() / 60
d=int(minutes)
#d can be only 15 ,30 or 60
if d==15:
return df=df.set_index('dtime').asfreq('-15T',fill_value='Missing')
elif d==30:
return df=df.set_index('dtime').asfreq('-30T',fill_value='Missing')
elif d==60:
return df=df.set_index('dtime').asfreq('-60T',fill_value='Missing')
else:
return None
first. It is more efficient to have the return statement after the else at the end of your code. Inside each of the cases just update the value for df. Return is part of your function, not the if statement that's why you are getting errors.
def transform(df):
a = df.loc[0, 'dtime']
b = df.loc[1, 'dtime']
c = a - b
minutes = c.total_seconds() / 60
d=int(minutes)
#d can be only 15 ,30 or 60
if d==15:
df= df.set_index('dtime').asfreq('-15T',fill_value='Missing')
elif d==30:
df= df.set_index('dtime').asfreq('-30T',fill_value='Missing')
elif d==60:
df= df.set_index('dtime').asfreq('-60T',fill_value='Missing')
else:
None
return dfere