convert string as 'hours' and 'mins' into minutes

convert string as 'hours' and 'mins' into minutes - python-3.x

I have a column in my dataframe df:
Time
2 hours 3 mins
5 hours 10 mins
1 hours 40 mins
10 mins
4 hours
6 hours 0 mins
I want to create a new column in df 'Minutes' that converts this column over to minutes
Minutes
123
310
100
10
240
360
Is there a python function to do this?
What I have tried is:
df['Minutes'] = pd.eval(
df['Time'].replace(['hours?', 'mins'], ['*60+', ''], regex=True))

Here is ugly bug pd.eval processing only less like 100 rows, so after stripping + is called pd.eval in Series.apply for prevent it:
df['Minutes'] = (df['Time'].replace(['hours?', 'mins'], ['*60+', ''], regex=True)
.str.strip('+')
.apply(pd.eval))
print (df)
Time Minutes
0 2 hours 3 mins 123
1 5 hours 10 mins 310
2 1 hours 40 mins 100
3 10 mins 10
4 4 hours 240
5 6 hours 0 mins 360
#verify for 120 rows
df = pd.concat([df] * 20, ignore_index=True)
df['Minutes1'] = pd.eval(
df['Time'].replace(['hours?', 'mins'], ['*60+', ''], regex=True).str.strip('+'))
print (df)
ValueError: unknown type object
Another solution with Series.str.extract and Series.add:
h = df['Time'].str.extract('(\d+)\s+hours').astype(float).mul(60)
m = df['Time'].str.extract('(\d+)\s+mins').astype(float)
df['Minutes'] = h.add(m, fill_value=0).astype(int)
print (df)
Time Minutes
0 2 hours 3 mins 123
1 5 hours 10 mins 310
2 1 hours 40 mins 100
3 10 mins 10
4 4 hours 240
5 6 hours 0 mins 360

jezrael's answer is excellent, but I spent quite some time working on this so i figured i'll post it.
You can use a regex to capture 'hours' and 'minutes' from your column, and then assign back to a new column after applying the logical mathematical operation to convert to minutes:
r = "(?:(\d+) hours ?)?(?:(\d+) mins)?"
hours = df.Time.str.extract(r)[0].astype(float).fillna(0) * 60
minutes = df.Time.str.extract(r)[1].astype(float).fillna(0)
df['minutes'] = hours + minutes
print(df)
Time minutes
0 2 hours 3 mins 123.0
1 5 hours 10 mins 310.0
2 1 hours 40 mins 100.0
3 10 mins 10.0
4 4 hours 240.0
5 6 hours 0 mins 360.0
I enjoy using https://regexr.com/ to test my regex

Related

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

Below is my example dataframe
Date Indicator Value
0 2000-01-30 A 30
1 2000-01-31 A 40
2 2000-03-30 C 50
3 2000-02-27 B 60
4 2000-02-28 B 70
5 2000-03-31 C 90
6 2000-03-28 C 100
7 2001-01-30 A 30
8 2001-01-31 A 40
9 2001-03-30 C 50
10 2001-02-27 B 60
11 2001-02-28 B 70
12 2001-03-31 C 90
13 2001-03-28 C 100
Desired Output
Date Indicator Value
2000-01-31 A 40
2000-02-28 B 70
2000-03-31 C 90
2001-01-31 A 40
2001-02-28 B 70
2001-03-31 C 90
I want to write a code that groups data by particular month-year and then keep the entry of latest date in that particular month-year and drop the rest. The data is till year 2020
I was only able to fetch the count by month-year. I am not able to drop create a proper code that helps to group data as per month-year and indicator and get the correct results

Use Series.dt.to_period for months periods, aggregate index of maximal date per groups by DataFrameGroupBy.idxmax and then pass to DataFrame.loc:
df['Date'] = pd.to_datetime(df['Date'])
print (df['Date'].dt.to_period('m'))
0 2000-01
1 2000-01
2 2000-03
3 2000-02
4 2000-02
5 2000-03
6 2000-03
7 2001-01
8 2001-01
9 2001-03
10 2001-02
11 2001-02
12 2001-03
13 2001-03
Name: Date, dtype: period[M]
df = df.loc[df.groupby(df['Date'].dt.to_period('m'))['Date'].idxmax()]
print (df)
Date Indicator Value
1 2000-01-31 A 40
4 2000-02-28 B 70
5 2000-03-31 C 90
8 2001-01-31 A 40
11 2001-02-28 B 70
12 2001-03-31 C 90

How to convert multi-indexed datetime index into integer?

I have a multi indexed dataframe(groupby object) as the result of groupby (by 'id' and 'date').
x y
id date
abc 3/1/1994 100 7
9/1/1994 90 8
3/1/1995 80 9
bka 5/1/1993 50 8
7/1/1993 40 9
I'd like to convert those dates into an integer-like, such as
x y
id date
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
I thought it would be simple but I couldn't get there easily. Is there a simple way to work on this?

Try this:
s = 'day ' + df.groupby(level=0).cumcount().astype(str)
df1 = df.set_index([s], append=True).droplevel(1)
x y
id
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9

You can calculate the new level and create a new index:
lvl1 = 'day ' + df.groupby('id').cumcount().astype('str')
df.index = pd.MultiIndex.from_tuples((x,y) for x,y in zip(df.index.get_level_values('id'), lvl1) )
output:
x y
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9

How to remove Initial rows in a dataframe in python

I have 4 dataframes with weekly sales values for a year for 4 products. Some of the initial rows are 0 as no sales. there are some other 0 values as well in between the weeks.
I want to remove those initial 0 values, keeping the in between 0s.
For example
Week Sales(prod 1)
1 0
2 0
3 100
4 120
5 55
6 0
7 60.
Week Sales(prod 2)
1 0
2 0
3 0
4 120
5 0
6 30
7 60.
I want to remove row 1,2 from 1st table and 1,2,3 frm 2nd.

Few Assumption based on your example dataframe:
DataFrame is created using pandas
week always start with 1
will remove all the starting weeks only which are having 0 sales
Solution:
Python libraries Required
- pandas, more_itertools
Example DataFrame (df):
Week Sales
1 0
2 0
3 0
4 120
5 0
6 30
7 60
Python Code:
import pandas as pd
import more_itertools as mit
filter_col = 'Sales'
filter_val = 0
##function which returns the index to be removed
def return_initial_week_index_with_zero_sales(df,filter_col,filter_val):
index_wzs = [False]
if df[filter_col].iloc[1]==filter_val:
index_list = df[df[filter_col]==filter_val].index.tolist()
index_wzs = [list(group) for group in mit.consecutive_groups(index_list)]
else:
pass
return index_wzs[0]
##calling above function and removing index from the dataframe
df = df.set_index('Week')
weeks_to_be_removed = return_initial_week_index_with_zero_sales(df,filter_col,filter_val)
if weeks_to_be_removed:
print('Initial weeks with 0 sales are {}'.format(weeks_to_be_removed))
df = df.drop(index=weeks_to_be_removed)
else:
print('No initial week has 0 sales')
df.reset_index(inplace=True)
Result:df
Week Sales
4 120
5 55
6 0
7 60
I hope it helps, you can modify the function as per your requirement.

Sorting the time format in shell script

I have a file containing alerts occurence time. I want to sort those in ascending order. Can you please guide me about this.
Sample time format.
1 day, 19 hours
3 weeks
4 weeks, 1 day
2 minutes
1 month, 1 week
10 hours, 36 minutes
4 weeks, 1 day
4 weeks, 1 day
13 minutes
5 hours, 16 minutes
1 hour, 53 minutes
3 hours, 18 minutes
21 hours, 42 minutes
18 hours, 49 minutes
21 hours, 43 minutes

Maybe not super elegant, but straight forward in Python:
#!/usr/bin/env python
import operator
# 1 month = 28-31 days and 4 weeks = 28 days, so month is kept separate
time_in_seconds = {
'week': 7*24*60*60,
'day': 24*60*60,
'hour': 60*60,
'minute': 60,
'second': 1
}
if __name__ == '__main__':
times = []
with open('sample_time.txt', 'r') as f:
for line in f.read().split('\n'):
months = 0
seconds = 0
try:
for pair in line.split(', '):
num, denum = pair.split(' ')
if denum.startswith('month'):
months += int(num)
else:
seconds += time_in_seconds[denum.rstrip('s')]*int(num)
times.append([months, seconds, line])
except:
pass
sorted_times = sorted(times, key=operator.itemgetter(0,1))
for line in map(operator.itemgetter(2), sorted_times):
print(line)
It assumes your file is called sample_time.txt.

Pandas assign value of one column based on another

Given the following data frame:
import pandas as pd
df = pd.DataFrame(
{'A':[10,20,30,40,50,60],
'B':[1,2,1,4,5,4]
})
df
A B
0 10 1
1 20 2
2 30 1
3 40 4
4 50 5
5 60 4
I would like a new column 'C' to have values be equal to those in 'A' where the corresponding values for 'B' are less than 3 else 0.
The desired result is as follows:
A B C
0 10 1 10
1 20 2 20
2 30 1 30
3 40 4 0
4 50 5 0
5 60 4 0
Thanks in advance!

Use np.where:
df['C'] = np.where(df['B'] < 3, df['A'], 0)
>>> df
A B C
0 10 1 10
1 20 2 20
2 30 1 30
3 40 4 0
4 50 5 0
5 60 4 0

Here you can use pandas method where direct on the column:
In [3]:
df['C'] = df['A'].where(df['B'] < 3,0)
df
Out[3]:
A B C
0 10 1 10
1 20 2 20
2 30 1 30
3 40 4 0
4 50 5 0
5 60 4 0
Timings
In [4]:
%timeit df['A'].where(df['B'] < 3,0)
%timeit np.where(df['B'] < 3, df['A'], 0)
1000 loops, best of 3: 1.4 ms per loop
1000 loops, best of 3: 407 µs per loop
np.where is faster here but pandas where is doing more checking and has more options so it depends on the use case here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

convert string as 'hours' and 'mins' into minutes - python-3.x

Related

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

How to convert multi-indexed datetime index into integer?

How to remove Initial rows in a dataframe in python

Sorting the time format in shell script

Pandas assign value of one column based on another

Categories

Resources