I have a code which I have it's performance timestamped, and I want to measure the average of time it took to run it on multiple computers, but I just cant figure out how to use the datetime module in python.
Here is how my procedure looks:
1) I have the code which simply writes into a text file the log, where the timestamp looks like
t1=datetime.datetime.now()
...
t2=datetime.datetime.now()
stamp= t2-t1
And that stamp variable is just written in say log.txt so in the log file it looks like 0:07:23.160896 so it seems like it's %H:%M:%S.%f format.
2) Then I run a second python script which reads in the log.txt file and it reads the 0:07:23.160896 value as a string.
The problem is I don't know how to work with this value because if I import it as a datetime it will also append and imaginary year and month and day to it, which I don't want, I simply just want to work with hours and minutes and seconds and microseconds to add them up or do an average.
For example I can just open it in Libreoffice and add the 0:07:23.160896 to 0:00:48.065130 which will give 0:08:11.226026 and then just divide by 2 which will give 0:04:05.613013, and I just can't possibly do that in python or I dont know how to do it.
I have tried everything, but neither datetime.datetime, nor datetime.timedelta allows simply multiplication and division like that. If I just do a y=datetime.datetime.strptime('0:07:23.160896','%H:%M:%S.%f') it will just give out 1900-01-01 00:07:23.160896 and I can't just take a y*2 like that, it doesnt allow arithmetic operations, plus if if I convert it into a timedelta it will also multiply the year,which is ridiculous. I simply just want to add and subtract and multiply time.
Please help me find a way to do this, and not just for 2 variables but possibly even a way to calculate the average of an entire list of timestamps like average(['0:07:23.160896' , '0:00:48.065130', '0:00:14.517086',...]) way.
I simply just want a way to calculate the average of many timestamps and give out it's average in the same format, just as you can just select a column in Libreoffice and take the AVERAGE() function which will give out the average timestamp in that column.
As you have done, you first read the string into a datetime-object using strptime: t = datetime.datetime.strptime(single_time,'%H:%M:%S.%f')
After that, convert the time part of your datestring into a timedelta, so you can easily calculate with times: tdelta = datetime.timedelta(hours=t.hour, minutes=t.minute, seconds=t.second, microseconds=t.microsecond)
Now you can easily calculate with the timedelta object, and convert at the end of the calculations back into a string by str(tdsum)
import datetime
times = ['0:07:23.160896', '0:00:48.065130', '0:12:22.324251']
# convert times in iso-format into timedelta list
tsum = datetime.timedelta()
count = 0
for single_time in times:
t = datetime.datetime.strptime(single_time,'%H:%M:%S.%f')
tdelta = datetime.timedelta(hours=t.hour, minutes=t.minute, seconds=t.second, microseconds=t.microsecond)
tsum = tsum + tdelta
count = count + 1
taverage = tsum / count
average_time = str(taverage)
print(average_time)
Related
I have a date variable calls today_date as below. I need to get the 1st calendar day of the current and next month.
In my case, today_date is 4/17/2021, I need to create two more variables calls first_day_current which should be 4/1/2021, and first_day_next which should be 5/1/2021.
Any suggestions are greatly appreciated
import datetime as dt
today_date
'2021-04-17'
Getting just the first date of a month is quite simple - since it equals 1 all the time. You can even do this without needing the datetime module to simplify calculations for you, if today_date is always a string "Year-Month-Day" (or any consistent format - parse it accordingly)
today_date = '2021-04-17'
y, m, d = today_date.split('-')
first_day_current = f"{y}-{m}-01"
y, m = int(y), int(m)
first_day_next = f"{y+(m==12)}-{m%12+1}-01"
If you want to use datetime.date(), then you'll anyway have to convert the string to (year, month, date) ints to give as arguments (or do today_date = datetime.date.today().
Then .replace(day=1) to get first_day_current.
datetime.timedelta can't add months (only upto weeks), so you'll need to use other libraries for this. But it's more imports and calculations to do the same thing in effect.
I found out pd.offsets could accomplish this task as below -
import datetime as dt
import pandas as pd
today_date #'2021-04-17' this is a variable that is being created in the program
first_day_current = today_date.replace(day=1) # this will be 2021-04-01
next_month = first_day_current + pd.offsets.MonthBegin(n=1)
first_day_next = next_month.strftime('%Y-%m-%d') # this will be 2021-05-01
This one has been bothering me for awhile. I have all the pieces (I think) that work individually to create the output I'm looking for (calculate a profit and loss for a stock), but when put together they return nothing.
The dataframe itself is pretty self-explanatory so I haven't included an example. Basically the series includes Stock Symbol, Opening Time, Opening Price, Closing Time, Closing Price, and whether or not it was a long or short position.
Here's my code to calculate the P-L for a long position:
import pandas as pd
from yahoo_fin import stock_info as si
from datetime import datetime, timedelta, date
import time
def create_df3():
return pd.read_excel('Base_Sheet.xlsx', sheet_name="Closed_Pos", header=0)
def update_price(sym):
return si.get_live_price(sym)
long_pl_calc = ((df3['Close_Price']) / (df3['Entry_Price'])) - 1
close_long_pl = df3['P-L'].isnull and (df3['Long_Short'] == 'Long')
for row in df3.iterrows():
if close_long_pl is True:
return df3['P-L'].apply(long_pl_calc)
If I print long_pl_calc or close_long_pl, I get exactly what I expect. However, when I iterate through the series to return the calculation, I still end up with a 'NaN' value (but not an error).
Any help would be appreciated! I already know the solution I came to is terrible, but I've also tried at least a dozen other iterations with no success either.
Create a column df3['Long'] with 1 for the date you are long and 0 for the rest, then to have your long P&L (you could do the same for the short but don't forget to take the opposite sign of the daily return) you can do :
df['P&L Long'] = ((df3['Close_Price'] / df3['Entry_Price']) - 1) * df['Long']
Then for your df3['P-L'] it will be:
df['P-L'] = df['P&L Long'] + df['P&L Short']
I am trying to figure out how to pass a date inputted at a prompt by the user to pandas to search by date. I have both the search and the input prompt working separately but not together. I will show you what I mean. And maybe someone can tell me how to properly pass the date to pandas for the search.
This is how I successfully use pandas to extract rows in an excel sheet if any cell in column emr_first_access_date is greater than or equal to '2019-09-08'
I do this successfully with the following code:
import pandas as pd
HISorigFile = "C:\\folder\\inputfile1.xlsx"
#opens excel worksheet
df = pd.read_excel(HISorigFile, sheet_name='Non Live', skiprows=8)
#locates the columns I want to write to file including date column emr_first_access_date if greater than or equal to '2019-09-08'
data = df.loc[df['emr_first_access_date'] >= '2019-09-08', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
#sorts the data
datasort = data.sort_values("emr_first_access_date",ascending=False)
#this creates the file (data already sorted) in panda with date and time.
datasort.to_excel(r'C:\\folder\sitesTestedInLastWeek.xlsx', index=False, header=True)
However, the date above is hardcoded of course. So, I need the user running this script to input the date. I created a very basic working input prompt with the following:
import datetime
#prompts for input date
TestedDateBegin = input('Enter beginning date to search for sites tested in YYYY-MM-DD format')
year, month, day = map(int, TestedDateBegin.split('-'))
date1 = datetime.date(year, month, day)
Obviously I want to pass TestedDateBegin to pandas, changing the pertinent code line:
data = df.loc[df['emr_first_access_date'] >= '2019-09-08', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
to something like:
data = df.loc[df[b]['emr_first_access_date'] >= 'TestedDateBegin', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
Obviously this doesn't work. But how do I proceed? I am very new to programming so I not always clear how to proceed. Does the date inputted in TestedDateBegin need to be added to a return? Or should it be put in a single item list? What is the right approach? Thx!
This is resolved.
I had to remove the single quotes around TestedDateBegin as python, of course, interpreted that as a string and not a variable. Living and learning. :-)
data = df.loc[df[b]['emr_first_access_date'] >= TestedDateBegin,['site_name','subs_num','emr_id', 'emr_first_access_date']]
is there a readily-available command in Python's datetime to understand a discrete time range given as HH:MM-HH:MM or HH:MM:ss-HH:MM:ss (e.g. 07:30-12:45)? Such a range would be entered like that in a single cell from a CSV file that the script would access.
Or, might specifying just the start time and then a timedelta value be a better idea?
You can just use split() to separate the two time values, then parse each as a datetime.datetime type and then calculate the timedelta.
Example:
from datetime import datetime
time_string = "07:30-12:45"
separate_times = time_string.split("-")
parsed_times = [datetime.strptime(t, "%H:%M") for t in separate_times]
difference = parsed_times[1] - parsed_times[0]
Calling difference.total_seconds() will return the total seconds between the two times and if you aren't interested in the direction of the difference between the times, you can use abs(difference.total_seconds()).
Thats a code a friend of mine helped me with in order to get files from diferent measurement systems, timestamps and layout into on .csv file.
You enter the timeperiod or like in the case below 1 day and the code looks for this timestamps in different files and folders, adjusts timestamps (different Timezone etc.) and puts everything into one .csv file easy to plot. Now I need to rewrite that stuff for different layouts. I managed to get everything working but now I don't want to enter every single day manually into the code :-( , cause I'd need to enter it 3 times in a row --> in order to get the day for one day into one file, dateFrom and dateTo needs to be the same and in the writecsv...section you'd have to enter the date again.
here's the code:
from importer import cloudIndices, weatherall, writecsv,averagesolar
from datetime import datetime
from datetime import timedelta
dateFrom = datetime.strptime("2010-06-21", '%Y-%m-%d')
dateTo = datetime.strptime("2010-06-21", '%Y-%m-%d')
....
code
code
....
writecsv.writefile("data_20100621", header, ciData)
what can I change here so that I get an automatic loop for all data between e.g 2010-06-21 to 2011-06-21
p.s. if i'd entered 2010-06-21in dataFromand 2011-06-21 in dateTo i'd get a huge cvs. file with all the data in it ..... I thought that would be a great idea but it's not really good for plotting so I enden up manually entering day after day which isn't bad if you do it on a regular basis for 2 or 3 days but now a dates showed up and I need to rund the code over it :-(
Generally speaking you should be using datetime.datetime and datetime.timedelta, here is an example of how:
from datetime import datetime
from datetime import timedelta
# advance 5 days at a time
delta = timedelta(days=5)
start = datetime(year=1970, month=1, day=1)
end = datetime(year=1970, month=2, day=13)
print("Starting from: %s" % str(start))
while start < end:
print("advanced to: %s" % str(start))
start += delta
print("Finished at: %s" % str(start))
This little snippet creates a start and end time and a delta to advance using the tools python provides. You can modify it to fit your needs or apply it in your logic.