Pandas - Can't cast string to datetime - string

I have a datframe that stores some information from text files, this information gives me details about my execution jobs.
I store all this information in a dataframe called "df_tmp". On that dataframe I have a column "end_Date" where I want to store the end date from the file that is the last line of my file but if in the dataframe I don't have any value I want to store the current_time.
Imagine that the information from my file is on the following variable:
string_from_my_file = 'Execution time at 2019/10/14 08:06:44'
What I need is:
In case of my manual file don't have any date on the last line I want to store the current_time.
For that I am trying with this code:
now = dt.datetime.now()
current_time = now.strftime('%H:%M:%S')
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
However, it gives me the following error:
builtins.AttributeError: 'str' object has no attribute 'strftime'
What I am doing wrong?
Thanks

Try this:
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
In this part, lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S', need to change x to datetime to apply strftime().
Probable reason for your error:
Even if end_date column is of type datetime, but you are filling that column with values having str type. This is changing data type of end_date column.

Related

try convert string to date row per row in pandas or similar

I need to join dataframes with dates in the format '%Y%m%d'. Some data is wrong or missing and when I put pandas with:
try: df['data'] = pd.to_datetime(df['data'], format='%Y%m%d')
except: pass
If 1 row is wrong, it fails to convert the whole column. I would like it to skip only the rows with error without converting.
I could solve this by lopping with datetime, but my question is, is there a better solution for this with pandas?
Pass errors = 'coerce' to pd.to_datetime to convert the values with wrong date format to NaT. Then you can use Series.fillna to fill those NaT with the input values.
df['data'] = (
pd.to_datetime(df['data'], format='%Y%m%d', errors='coerce')
.fillna(df['data'])
)
From the docs
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If 'raise', then invalid parsing will raise an exception.
If 'coerce', then invalid parsing will be set as NaT.
If 'ignore', then invalid parsing will return the input.

Dataframe with datetime64 dtype insert into to postgressql timestamp column

I am taking a dataframe and inserting it into a Postgresql table.
One column in the dataframe is a datetime64 dtype. The column type in PostgreSQL is 'timestamp without time zone.' To prepare the dataframe to insert, I am using to_records:
listdf = df.to_records(index=False).tolist()
When I run the to_records, it gives an error at the psycopg2's cur.executemany() that I am trying to insert Biginit into a Timestamp without timezone.
So I tried to add a dict of column_dtypes to the to_records. But that doesn't work. The below gives the error: "ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas"
DictofDTypes = dict.fromkeys(SQLdfColHAedings,'float')
DictofDTypes['Date_Time'] = 'datetime64'
listdf = df.to_records(index=False,column_dtypes=DictofDTypes).tolist()
I have also tried type of str, int, and float. None worked in the above three lines.
How do I convert the column properly to be able to insert the column into a timestamp sql column?
I removed defining the dtypes from to_records.
And before to_recordes, I converted the datetime to str with:
df['Date_Time'] = df['Date_Time'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
The sql insert command then worked.

AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'

I'm trying to do forecast in my python 3.x. So I wrote following code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
But I'm getting error message
AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'
Can you please help me to resolve the issue
You need to make sure that your Panda Series object ts_log have a DateTime index with inferred frequency.
For example:
ts_log.index
>>> DatetimeIndex(['2014-01-01', ... '2017-12-31'],
dtype='datetime64[ns]', name='Date', length=1461, freq='D')
Noticed how there's a an attribute freq='D', it means that Pandas infer that the Pandas Series is indexed Daily (D=Daily).
Now to achieve this, I assume your Series have a column call 'Date'. And here's the code to do it:
# Convert your daily column from just string to DateTime (skip if already done)
ts_log['Date'] = pd.to_datetime(ts_log['Date'])
# Set the column 'Date' as index (skip if already done)
ts_log = ts_log.set_index('Date')
# Specify datetime frequency
ts_log = ts_log.asfreq('D')
For frequency other than Daily, refer here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
For statsmodel==0.10.1 and where ts_log is not a dataframe or a dataframe without datetime index, use the following
decomposition = seasonal_decompose(ts_log, freq=1)

Set days since first occurrence based on multiple columns

I have a pandas dataset with this structure:
Date datetime64[ns]
Events int64
Location object
Day float64
I've used the following code to get the date of the first occurrence for location "A":
start_date = df[df['Location'] == 'A'][df.Events != 0].iat[0,0]
I now want to update all of the records after the start_date with the number of days since the start_date, where Day = df.Date - start_date.
I tried this code:
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).days
However, that code returns an error:
AttributeError: 'Series' object has no attribute 'days'
The problem seems to be that the code recognizes df.Date as an object instead of a datetime. Anyone have any ideas on what is causing this problem?
Try, you need to add the .dt accessor.
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).dt.days

Parse date from multiple columns in pandas using parse_dates

I have a text file in which month, day and year are in different columns. I want to combine them to one column and covert it in date format. I am trying to use parce_dates option in pandas read_table. But it is not working and giving me error file structure not yet supported
dateparse = lambda x: pd.datetime.strptime(x, '%m-%d-%y')
date = pd.read_table("date.txt", sep = ' ', parse_dates = {'date':['month', 'day','year']}, date_parser=dateparse)
My data looks like this:
Data
Remove the date_parser arguments and it'll work just fine:
date = pd.read_table('date.txt', sep=' ', parse_dates={'date': ['month', 'day','year']})
Read the data as a pandas DataFrame and create a new column with combined date
df = pd.read_csv('date.txt', sep = ' ')
df['date'] = pd.to_datetime(df[['month','day','year']])
Parsing custom dates from multiple columns during pandas read_ step is possible.
date_parser= lambda x,y,z: datetime.strptime(f"{x}.{y}.{z}", "%m.%d.%Y")
date = pd.read_table('date.txt', sep=' ', parse_dates={'date': ['month', 'day','year']}, date_parser=date_parser)

Resources