I have a datframe that stores some information from text files, this information gives me details about my execution jobs.
I store all this information in a dataframe called "df_tmp". On that dataframe I have a column "end_Date" where I want to store the end date from the file that is the last line of my file but if in the dataframe I don't have any value I want to store the current_time.
Imagine that the information from my file is on the following variable:
string_from_my_file = 'Execution time at 2019/10/14 08:06:44'
What I need is:
In case of my manual file don't have any date on the last line I want to store the current_time.
For that I am trying with this code:
now = dt.datetime.now()
current_time = now.strftime('%H:%M:%S')
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
However, it gives me the following error:
builtins.AttributeError: 'str' object has no attribute 'strftime'
What I am doing wrong?
Thanks
Try this:
df_tmp['end_date'] = df_tmp['end_date'].fillna(current_time).apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S') if not pd.isnull(x) else pd.to_datetime(re.search("([0-9]{4}\/[0-9]{2}\/[0-9]{2}\ [0-9]{2}\:[0-9]{2}\:[0-9]{2})", str(df_tmp['string_from_my_file']))[0]))
In this part, lambda x: pd.to_datetime(x).strftime('%Y-%m-%d %H:%M:%S', need to change x to datetime to apply strftime().
Probable reason for your error:
Even if end_date column is of type datetime, but you are filling that column with values having str type. This is changing data type of end_date column.
Related
I need to join dataframes with dates in the format '%Y%m%d'. Some data is wrong or missing and when I put pandas with:
try: df['data'] = pd.to_datetime(df['data'], format='%Y%m%d')
except: pass
If 1 row is wrong, it fails to convert the whole column. I would like it to skip only the rows with error without converting.
I could solve this by lopping with datetime, but my question is, is there a better solution for this with pandas?
Pass errors = 'coerce' to pd.to_datetime to convert the values with wrong date format to NaT. Then you can use Series.fillna to fill those NaT with the input values.
df['data'] = (
pd.to_datetime(df['data'], format='%Y%m%d', errors='coerce')
.fillna(df['data'])
)
From the docs
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If 'raise', then invalid parsing will raise an exception.
If 'coerce', then invalid parsing will be set as NaT.
If 'ignore', then invalid parsing will return the input.
I am taking a dataframe and inserting it into a Postgresql table.
One column in the dataframe is a datetime64 dtype. The column type in PostgreSQL is 'timestamp without time zone.' To prepare the dataframe to insert, I am using to_records:
listdf = df.to_records(index=False).tolist()
When I run the to_records, it gives an error at the psycopg2's cur.executemany() that I am trying to insert Biginit into a Timestamp without timezone.
So I tried to add a dict of column_dtypes to the to_records. But that doesn't work. The below gives the error: "ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas"
DictofDTypes = dict.fromkeys(SQLdfColHAedings,'float')
DictofDTypes['Date_Time'] = 'datetime64'
listdf = df.to_records(index=False,column_dtypes=DictofDTypes).tolist()
I have also tried type of str, int, and float. None worked in the above three lines.
How do I convert the column properly to be able to insert the column into a timestamp sql column?
I removed defining the dtypes from to_records.
And before to_recordes, I converted the datetime to str with:
df['Date_Time'] = df['Date_Time'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
The sql insert command then worked.
I'm trying to do forecast in my python 3.x. So I wrote following code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
But I'm getting error message
AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'
Can you please help me to resolve the issue
You need to make sure that your Panda Series object ts_log have a DateTime index with inferred frequency.
For example:
ts_log.index
>>> DatetimeIndex(['2014-01-01', ... '2017-12-31'],
dtype='datetime64[ns]', name='Date', length=1461, freq='D')
Noticed how there's a an attribute freq='D', it means that Pandas infer that the Pandas Series is indexed Daily (D=Daily).
Now to achieve this, I assume your Series have a column call 'Date'. And here's the code to do it:
# Convert your daily column from just string to DateTime (skip if already done)
ts_log['Date'] = pd.to_datetime(ts_log['Date'])
# Set the column 'Date' as index (skip if already done)
ts_log = ts_log.set_index('Date')
# Specify datetime frequency
ts_log = ts_log.asfreq('D')
For frequency other than Daily, refer here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
For statsmodel==0.10.1 and where ts_log is not a dataframe or a dataframe without datetime index, use the following
decomposition = seasonal_decompose(ts_log, freq=1)
I have a pandas dataset with this structure:
Date datetime64[ns]
Events int64
Location object
Day float64
I've used the following code to get the date of the first occurrence for location "A":
start_date = df[df['Location'] == 'A'][df.Events != 0].iat[0,0]
I now want to update all of the records after the start_date with the number of days since the start_date, where Day = df.Date - start_date.
I tried this code:
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).days
However, that code returns an error:
AttributeError: 'Series' object has no attribute 'days'
The problem seems to be that the code recognizes df.Date as an object instead of a datetime. Anyone have any ideas on what is causing this problem?
Try, you need to add the .dt accessor.
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).dt.days
I have a text file in which month, day and year are in different columns. I want to combine them to one column and covert it in date format. I am trying to use parce_dates option in pandas read_table. But it is not working and giving me error file structure not yet supported
dateparse = lambda x: pd.datetime.strptime(x, '%m-%d-%y')
date = pd.read_table("date.txt", sep = ' ', parse_dates = {'date':['month', 'day','year']}, date_parser=dateparse)
My data looks like this:
Data
Remove the date_parser arguments and it'll work just fine:
date = pd.read_table('date.txt', sep=' ', parse_dates={'date': ['month', 'day','year']})
Read the data as a pandas DataFrame and create a new column with combined date
df = pd.read_csv('date.txt', sep = ' ')
df['date'] = pd.to_datetime(df[['month','day','year']])
Parsing custom dates from multiple columns during pandas read_ step is possible.
date_parser= lambda x,y,z: datetime.strptime(f"{x}.{y}.{z}", "%m.%d.%Y")
date = pd.read_table('date.txt', sep=' ', parse_dates={'date': ['month', 'day','year']}, date_parser=date_parser)