Change all values of a column in pandas data frame - python-3.x

I have a panda data frame that contains values of a column like '01:00'. I want to deduct 1 from it means '01:00' will be '00:00'. Can anyone helps

You can use timedeltas:
df = pd.DataFrame({'col':['01:00', '02:00', '24:00']})
df['new'] = pd.to_timedelta(df['col'] + ':00') - pd.Timedelta(1, unit='h')
df['new'] = df['new'].astype(str).str[-18:-13]
print (df)
col sub
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
Another faster solution by map if format of all strings is 01:00 to 24:00:
L = ['{:02d}:00'.format(x) for x in range(25)]
d = dict(zip(L[1:], L[:-1]))
df['new'] = df['col'].map(d)
print (df)
col new
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00

It seems that what you want to do is subtract an hour from the time which is stored in your dataframe as a string. You can do the following:
from datetime import datetime, timedelta
subtract_one_hour = lambda x: (datetime.strptime(x, '%H:%M') - timedelta(hours=1)).strftime("%H:%M")
df['minus_one_hour'] = df['original_time'].apply(subtract_one_hour)

Related

Format datetime values in pandas by stripping

I have a df['timestamp'] column which has values in format: yyyy-mm-ddThh:mm:ssZ. The dtype is object.
Now, I want to split the value into 3 new columns, 1 for day, 1 for day index(mon,tues,wed,..) and 1 for hour like this:
Current:column=timestamp
yyyy-mm-ddThh:mm:ssZ
Desried:
New Col1|New Col2|New Col3
dd|hh|day_index
What function should I use?
Since you said column timestamp is of type object, I assume it's string. Since the format is fixed, use str.slice to get corresponding chars. To get the week days, use dt.day_name() on datetime64, which is converted from timestamp.
data = {'timestamp': ['2019-07-01T05:23:33Z', '2019-07-03T02:12:33Z', '2019-07-23T11:05:23Z', '2019-07-12T08:15:51Z'], 'Val': [1.24,1.259, 1.27,1.298] }
df = pd.DataFrame(data)
ds = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['datetime'] = ds
df['dd'] = df['timestamp'].str.slice(start=8, stop=10)
df['hh'] = df['timestamp'].str.slice(start=11, stop=13)
df['weekday'] = df['datetime'].dt.day_name()
print(df)
The output:
timestamp Val datetime dd hh weekday
0 2019-07-01T05:23:33Z 1.240 2019-07-01 05:23:33+00:00 01 05 Monday
1 2019-07-03T02:12:33Z 1.259 2019-07-03 02:12:33+00:00 03 02 Wednesday
2 2019-07-23T11:05:23Z 1.270 2019-07-23 11:05:23+00:00 23 11 Tuesday
3 2019-07-12T08:15:51Z 1.298 2019-07-12 08:15:51+00:00 12 08 Friday
First convert the df['timestamp'] column to a DateTime object. Then extract Year, Month & Day from it. Code below.
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['Year'] = df['timestamp'].dt.year
df['Month'] = df['timestamp'].dt.month
df['Day'] = df['timestamp'].dt.day

manipulating pandas dataframe - conditional

I have a pandas dataframe that looks like this:
ID Date Event_Type
1 01/01/2019 A
1 01/01/2019 B
2 02/01/2019 A
3 02/01/2019 A
I want to be left with:
ID Date
1 01/01/2019
2 02/01/2019
3 02/01/2019
Where my condition is:
If the ID is the same AND the dates are within 2 days of each other then drop one of the rows.
If however the dates are more than 2 days apart then keep both rows.
How do I do this?
I believe you need first convert values to datetimes by to_datetime, then get diff and get first values per groups by isnull() chained with comparing if next values are higher like timedelta treshold:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
2 2 2019-02-01 A
3 3 2019-02-01 A
Check solution with another data:
print (df)
ID Date Event_Type
0 1 01/01/2019 A
1 1 04/01/2019 B <-difference 3 days
2 2 02/01/2019 A
3 3 02/01/2019 A
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
1 1 2019-01-04 B
2 2 2019-01-02 A
3 3 2019-01-02 A

day of Year values starting from a particular date

I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018.
Date
02/11/2017
03/11/2017
05/11/2017
.
.
01/11/2018
I want to add an adjacent column called Day_Of_Year as follows:
Date Day_Of_Year
02/11/2017 1
03/11/2017 2
05/11/2017 4
.
.
01/11/2018 365
I apologize if it's a very basic question, but unfortunately I haven't been able to start with this.
I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd january and so on.. irrespective of the year. So, that wouldn't work for me.
First convert column to_datetime and then subtract datetime, convert to days and add 1:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(pd.Timestamp('2017-11-02')).dt.days + 1
print (df)
Date Day_Of_Year
0 02/11/2017 1
1 03/11/2017 2
2 05/11/2017 4
3 01/11/2018 365
Or subtract by first value of column:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(df['Date'].iat[0]).dt.days + 1
print (df)
Date Day_Of_Year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
Using strftime with '%j'
s=pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%j').astype(int)
s-s.iloc[0]
Out[750]:
0 0
1 1
2 3
Name: Date, dtype: int32
#df['new']=s-s.iloc[0]
Python has dayofyear. So put your column in the right format with pd.to_datetime and then apply Series.dt.dayofyear. Lastly, use some modulo arithmetic to find everything in terms of your original date
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['day of year'] = df['Date'].dt.dayofyear - df['Date'].dt.dayofyear[0] + 1
df['day of year'] = df['day of year'] + 365*((365 - df['day of year']) // 365)
Output
Date day of year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
But I'm doing essentially the same as Jezrael in more lines of code, so my vote goes to her/him

Multiple columns to datetime as an index without losing other column

I have a dataframe that looks like this (except much longer). I want to convert to a datetime index.
YYYY MM D value
679 1900 1 1 46.42
1355 1900 2 1 137.14
1213 1900 3 1 104.25
1380 1900 4 1 149.39
1336 1900 5 1 130.33
When I use this
df = pd.to_datetime((df.YYYY*10000+df.MM*100+df.D).apply(str),format='%Y%m%d')
I retrieve a datetime index but I lose the value column.
What I want in the end is -
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33
How can I do this?
Thank you for you time in advance!
You can use pandas to_datetime to convert this
df = df.astype(str)
df.index = pd.to_datetime(df['YYYY'] +' '+ df['MM']+' ' +df['D'])
df.drop(['YYYY','MM','D'],axis=1,inplace=True)
Out:
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33

Parse dates and create time series from .csv

I am using a simple csv file which contains data on calory intake. It has 4 columns: cal, day, month, year. It looks like this:
cal month year day
3668.4333 1 2002 10
3652.2498 1 2002 11
3647.8662 1 2002 12
3646.6843 1 2002 13
...
3661.9414 2 2003 14
# data types
cal float64
month int64
year int64
day int64
I am trying to do some simple time series analysis. I hence would like to parse month, year, and day to a single column. I tried the following using pandas:
import pandas as pd
from pandas import Series, DataFrame, Panel
data = pd.read_csv('time_series_calories.csv', header=0, pars_dates=['day', 'month', 'year']], date_parser=True, infer_datetime_format=True)
My questions are: (1) How do I parse the data and (2) define the data type of the new column? I know there are quite a few other similar questions and answers (see e.g. here, here and here) - but I can't make it work so far.
You can use parameter parse_dates where define column names in list in read_csv:
import pandas as pd
import numpy as np
import io
temp=u"""cal,month,year,day
3668.4333,1,2002,10
3652.2498,1,2002,11
3647.8662,1,2002,12
3646.6843,1,2002,13
3661.9414,2,2003,14"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), parse_dates=[['year','month','day']])
print (df)
year_month_day cal
0 2002-01-10 3668.4333
1 2002-01-11 3652.2498
2 2002-01-12 3647.8662
3 2002-01-13 3646.6843
4 2003-02-14 3661.9414
print (df.dtypes)
year_month_day datetime64[ns]
cal float64
dtype: object
Then you can rename column:
df.rename(columns={'year_month_day':'date'}, inplace=True)
print (df)
date cal
0 2002-01-10 3668.4333
1 2002-01-11 3652.2498
2 2002-01-12 3647.8662
3 2002-01-13 3646.6843
4 2003-02-14 3661.9414
Or better is pass dictionary with new column name to parse_dates:
df = pd.read_csv(io.StringIO(temp), parse_dates={'dates': ['year','month','day']})
print (df)
dates cal
0 2002-01-10 3668.4333
1 2002-01-11 3652.2498
2 2002-01-12 3647.8662
3 2002-01-13 3646.6843
4 2003-02-14 3661.9414

Resources