convert the h:m:s in minutes format - python-3.x

I have the following data. The idea is to multiply all the data.
however the minute column is in h:m:s format. So whenever i try to multiply i get an error.
and morever i need to convert the h:m:s in minutes format before i actually want to multiply.
tried with the following to convert this to minute
time1 = df['time']
time2 = time1.hour * 60 + time1.minute + time1.second

Create timedeltas by to_timedelta, convert to seconds by Series.dt.total_seconds and divide by 60:
df['Minutes'] = pd.to_timedelta(df['(MIN)']).dt.total_seconds().div(60)
If input valeus are python times also convert to strings:
df['Minutes'] = pd.to_timedelta(df['(MIN)'].astype(str)).dt.total_seconds().div(60)

Related

Convert Days, Hours, Minutes to Seconds (Dd HH:MM:SS) in Excel

How can i convert cell in excel to seconds
Example:
2d 07:51:00
Expected Output:
201060
If one has TEXTSPLIT:
=SUM(--SUBSTITUTE(TEXTSPLIT(A1," "),"d",""))*86400
Else:
=(SUBSTITUTE(LEFT(A1,FIND(" ",A1)-1),"d","")+MID(A1,FIND(" ",A1)+1,LEN(A1)))*86400

Non-standard Julian day time stamp

I have a timestamp in a non-standard format, its a concatenation of a number of elements. I'd like to convert at least the last part of the string into hours/minutes/seconds/decimal seconds so I can calculate the time gap between them (typically of the order of 2-5 seconds).
I have looked at this link but it assumes a 'proper' Julian time. How to convert Julian date to standard date?
My time stamp looks like this
1380643373
It is set up as ddd hh mm ss.s
This timestamp represent 138th day, 06:43:37.3
Is there a datetime method of working with this or do I need to strip out the various parts (hh,mm,ss.s) and concatenate them in some way? As I am only interested in the seconds, if I can just extract them I could deal with that by adding 60 if the second timestamp is smaller than the first - i.e event passes over the minute change boundary.
If you're only interested in seconds, you can do:
timestamp = 1380643373
seconds = (timestamp % 1000) / 10 # Gives 37.3
timestamp % 1000 gives you the last three digits of timestamp. Then you divide that by 10 to get seconds.
If it's a string, you can take the last three characters by slicing it.
timestamp = "1380643373"
seconds = int(timestamp[-3:]) / 10 # Gives 37.3
It's pretty easy to convert the timestamp to a datetime using the divmod() function repeatedly:
import datetime
base_date = datetime.datetime(2000, 1, 1, 0, 0, 0) # Midnight on Jan 1 2000
timestamp = 1380643373
timestamp, seconds = divmod(timestamp, 1000) # Gives 1380643, 373
seconds = seconds / 10 # Gives 37.3
timestamp, minutes = divmod(timestamp, 100) # Gives 13806, 43
days, hours = divmod(timestamp, 100) # Gives 138, 6
tdelta = datetime.timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds) # Gives datetime.timedelta(days=138, seconds=24217, microseconds=300000)
new_date = base_date + tdelta

How to convert timestamp into milliseconds in python

I am trying to write a basic script that can read in a timestamp as a string and convert it into milliseconds. The timestamps I am working with are in minute:second.millisecond format.
from datetime import datetime
timestamp_start = '54:12.123'
MSM = '%M:%S.%f'
zero = '00:00.000'
start_sec = (datetime.strptime(timestamp_start, MSM) - datetime.strptime(zero, MSM)).total_seconds()
start_ms = start_sec * 1000
print(start_ms)
This may be a round about approach, but I am first using datetime.strptime to get a datetime object, then subtracting by 0 in order to get a timedelta object, getting the total seconds of the timedelta object, and finally multiplying by 1000 to convert to milliseconds.
The above code works fine, except for any timestamps over an hour.
The issue that I am running into- the timestamps do not have an hour counter. For example: 1 hour, 5 minutes, and 30 seconds comes in as 65:30.000. datetime.strptime cannot recognize this format, as it only allows the minutes to be between 0 and 59.
How can I convert these timestamps into a format recognizable by datetime? Should I first get the timestamp into hour:minute:second:millisecond format? Keep in mind the end goal is to convert these timestamps into milliseconds. If there is a better approach any suggestions are more than welcomed!
'54:12.123' isn't really a timestamp, but elapsed time, and there's no built-in method in Python that can deal with elapsed time with a format string like a timestamp format.
Since the format string in question is simply minutes and seconds separated by a colon, and seconds and milliseconds separated by a period, you can easily parse it with the str.split method:
def convert(msf):
minutes, seconds = msf.split(':')
seconds, milliseconds = seconds.split('.')
minutes, seconds, milliseconds = map(int, (minutes, seconds, milliseconds))
return (minutes * 60 + seconds) * 1000 + milliseconds
so that convert('54:12.123') returns:
3252123

Summing time fields over 24 hours in Power Query

I have a Power Query in excel linked to another file. This file has a time column. I understand that M language will not sum above 24 hours automatically without some work as it uses a datetime reference hence if I import a time of 25 hours it reverts back 2 hours to 1 hour...
In the 3rd column along in my image below using the second row as a reference, this is actually supposed to read 47:47:38. How can I get the instances where the value is above 24 hours to show the true hours?
I have tried using duration.hours(#hours()) this also does not work for some reason.
The same data from the source excel file is below also
Power Query doesn't have custom formats for how it displays data. If you have it read your data as a Duration instead of a DateTime it will display as [d].hh.mm.ss format, but still not with the total hours. Ultimately though this doesn't really matter because even when your data is formatted to display total hours in Excel, it's really being stored internally as days+hours+minutes+seconds. So how it displays in Power Query doesn't matter, as you can just use the hour formatting wherever you output the data to.
Now if you need to use the hours for a calculation between something that isn't another Duration, you can extract the hours by doing
Duration.Days([Your Hours]) * 24 + Duration.Hours([Your Hours])
Or now that I look at it, there is also a TotalHours function that gives you the hours plus mm:ss as a fractional amount of that
Duration.TotalHours([Your Hours])
Power BI doesn't handle this case very gracefully. A solution could be to convert the duration to a number to make it additive (so you can perform calculations and aggregations) and when you need to visualize it, to convert it to the desired format (HH:MM:SS).
Duration and Time are often confused. When such Excel files are read, the type of the column usually is DateTime, and date 1899-12-31 is added to the "time" part. You can change the data type of the column to be Decimal Number, but the "zero point" in Excel unfortunately is one day off (1899-12-30), so you need to subtract 1 from the result to get the actual "number of days" of the duration (i.e. 0.25 means 06:00:00).
So you must perform some conversion of the data. I would make a new column in the model to get the duration in the lowest granularity that I need (seconds in your example). In Power Query Editor add a custom column to calculate the duration in seconds (where Column1 is the name of the original duration column):
Duration in seconds = Duration.TotalSeconds([Column1] - #datetime(1899, 12, 31, 0, 0, 0))
Make sure the data type of this column is Whole Number (change it if necessary). Here 9144 seconds are calculated as 2 * 3600 + 32 * 60 + 24, or 02:32:24. Now you can calculate a sum on this column to get total duration in seconds for example. But when you visualize this column, don't do it directly, but make a measure to convert the data to the desired format. It could me made like this:
Measure Duration =
VAR duration_in_seconds = SUM(Sheet1[Duration in seconds])
VAR hours = ROUNDDOWN ( duration_in_seconds / 3600; 0 )
VAR minutes = ROUNDDOWN ( MOD ( duration_in_seconds; 3600 ) / 60; 0 )
VAR seconds = INT ( MOD ( duration_in_seconds; 60 ) )
RETURN hours & ":" & FORMAT(minutes; "00") & ":" & FORMAT(seconds; "00")
duration_in_seconds variable hold the total duration in seconds of the data in the context. From it we are calculating hours, minutes and seconds and constructing a string to represent the duration in the desired format. FORMAT is used to make sure there is a leading zero in case minutes or seconds are less than 10.
Here is how all three columns looks like when visualized:
Hope this helps!

Need to read excel dates as decimals without automatically converting to date time

I am reading in an excel sheet that has column 'Time (hr)' times in hours, minutes, seconds formatted like this : 64:45:00
I need to convert this to 64.75 hours
When I read this in with read_excel it automatically converts it to 1900-01-02 16:45
I have tried using dtype, converters, date_parse options in the read_excel function but always get an error
data = xl.parse(header = [0], dtype = {'Time (hr)': np.float64})
TypeError: float() argument must be a string or a number, not 'datetime.datetime'
EDIT:
I found out that some of the values in the Time (hr) column are less than 24 hours therefore are read in as time only. For example 10:45:00 is just read in as a time so when I tried the solution I got this error:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.datetime'
You can try creating a dataframe first from the excel file using the following code test_df = xl.parse(name)
and then convert the date column to a int type like test_df['Time (hr)'].dt.strftime("%Y-%m-%d %H:%M").astype(int)
Here's what my test file dates.xlsx looks like:
Read it in and parse the dates as usual:
df = pd.read_excel('dates.xlsx', parse_dates=['Time (hr)'])
Time (hr)
0 1900-01-02 16:45:00
1 1900-01-02 07:10:00
2 1900-01-05 15:59:01
Excel's day one is 1-Jan-1900, so zero is:
epoch = dt.datetime(1899, 12, 31)
Subtract the epoch to get a timedelta and then convert to total seconds:
df['seconds'] = (df['Time (hr)'] - epoch).dt.total_seconds()
Time (hr) seconds
0 1900-01-02 16:45:00 233100.0
1 1900-01-02 07:10:00 198600.0
2 1900-01-05 15:59:01 489541.0
Make column for total hours:
df['hours'] = df.seconds / 3600
Time (hr) seconds hours
0 1900-01-02 16:45:00 233100.0 64.750000
1 1900-01-02 07:10:00 198600.0 55.166667
2 1900-01-05 15:59:01 489541.0 135.983611

Resources