Recently I've used pd.to_datetime to assist with formatting date-time series data into the respective format I need for a matplotlib plot. I haven't had any issues until now where for an unknown reason, when I use a axvline (Matplotlib), a date-time error is thrown. However, this is only thrown when my code is executed in a function and not run in the console. After trying for the past two days, I've opted to ask this on SO for some guidance.
The expected output is:
Where the dotted red-line indicates a singular failure date.
My function follows the logic below:
for i in result:
Asset_Num = i.strip(".csv")
dataframe = pd.read_csv(i)
df2 = dataframe[(dataframe['Asset_Number']==Asset_Num)]
#Convert the Date/Time object into a list of Time Stamp Values
fdl = pd.to_datetime(df2['Date/Time'], format="%Y-%m-%d %H:%M:%S").to_list()
ax = dataframe.plot(linewidth=2)
ax.axvline(fdl, color='r', linestyle="--")
The error that is then thrown as I run this function is:
But I am confused as both the X-axis and fdl variable, both are in the same date-time format and I have checked this numerous times.
What am I doing wrong?
I've attached some sample data as per our minimum criteria guidelines if you'd like to try recreate this. (http://www.sharecsv.com/s/292b419dc674302ac5b6a96a2da0e06e/SampleData.csv)
Thank you.
Related
I had a dataframe like in image-1 - Input dataframe on which I want to rename Rows/indices by dates (dtype='datetime64[ns]) in YYYY-MM-DD format.
So, I used index re-naming option as shown in the image-2 below, which is last date of every 6th month for every row incrementing till end. It did rename the rows but end up making NaNs for all data values. I did try the transpose of dataframe, same result.
After trying few other things as shown in image-3, which were all unfruitful and mostly I had error suggesting TypeError: 'DatetimeIndex' object is not callable
As the final solution, I end up creating dataframe for all dates image-4, followed by merging two dataframes by columns, image-5 and then assign/set very first column as row names, image-6.
Dates have a weird format when converting to list, and wondering why it is so, image-7. How do we get exactly the year-month-date? I tried different combinations but didn't end up in fruitful results. strftime is the way to go here, but how?
Why I went this strftime approach, I was thinking to output a list of dates in a sensible YYYY-MM-DD format and then use function as --> pd.rename(index=list_dates) to replace default 0 1 2 by dates as new index names.
So, I have a solution but is it an economic solution or are there good solutions available?
This is an attempt to share my solution for those who can use it and learn new solutions from wizards here.
BRgrds,
Problem
I'm trying to accurately represent a date from Google Sheets in a DataFrame. I know that the "base" dates in Google Sheets are integers added to the date since 1/1/1900. Testing this is clear: I have a Sheet with the date 5/2/2019. Using the Python API, I download this Sheet with the parameter valueRenderOption='UNFORMATTED_VALUE' to ensure I'm getting raw values, and do a simple conversion to a DataFrame. The value shows up as 43587, and if I put that back into a Sheet and set the format to date, it appears as 5/2/2019. Sanity check complete.
The problem arises when I try to convert that date in the DataFrame to an actual datetime: it shows up as offset by two days, and I'm not sure why.
Attempts
In a DataFrame df, with datetime column timestamp, I do the following:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='d', origin='1900-01-01')
and I get a date of 2019-05-04, which is two days later than I would expect. I searched for this on SO and found a similar issue, but the accepted answer actually contains the exact same problem (albeit no mention of it): a two day offset.
This can be "solved" by setting the origin two days back, to 1899-12-30, though that feels almost like a cover, and not necessarily fixing the underlying issue (and could perhaps leads to further date inconsistencies down the road as more time has passed?).
Here's code for a toy DataFrame so that you don't have to type it out, if you want to experiment:
import pandas as pd
df = pd.DataFrame([{'timestamp': 43587}])
Question
I imagine this is on the Pandas side of things, but I'm not sure. Some internal conversion that happens differently than how they do it at Google? Does anyone have an idea of what's at play here, and if setting the origin date two days earlier is actually a solution?
I have been banging my head against this as well, and think that I finally figured it out. While for the Date() function, Sheets uses 1900-1-1 as the base, for the date format and for the TO_DATE() function, the origin date is 1899-12-30.
You can see this in Sheets by either
entering 0 in a cell, and then formatting to a date → 12/30/1899
entering =TO_DATE(0), which will result in 12/30/1899
One origin story for this odd choice is here in a very old MSDN forum. I have no idea of its veracity.
At any rate, that explains the two-day discrepancy and then the solution becomes
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='d', origin='1899-12-30')
which worked for me.
Inside my notebook I am reading data from a sqlite database using pandas.
The data is store in consecutive order, meaning there is an entry for each day (no gaps). And this is how a single entry looks like in the database:
Now when I try to plot this to a barplot (sns = seaborn) I get some strange gaps between data which seems to be grouped somehow:
data['timestamp'] = data['timestamp'].dt.date
sns.barplot(x='timestamp', y='steps', data=data, ci=None)
I have been using the same datetime format for other plots and it worked fine, so I rule that out to be the cause.
Please help me understand why those gaps occur in my plot. I would have expected the plot would look something like this (please ignore the colors):
I hope that someone can help me here. I'm pretty new to Python and I got stuck with a For Loop to create a couple of time shifts for my datetime Series. Once I iterated over the shifts and want access the columns by name to calculate the percentage change, I get a Key Error.
Here is what my code looks like:
i=1
x=50
for i in range (x):
df_data_1['visits_lag_',i] = df_data_1['visits'].shift(i)
The output looks the following:
df.dtypes
Now, If I want to calculate or access one of the newly created columns, I receive a Key Error Message:
df_data_1['percent_change_test'] =
(df_data_1['visits']/df_data_1['(visits_lag_, 1)'])*100
It says:
Please, can anyone help me here, what I'm doing all wrong.
I think the problem is related to how you call the newly created column.
Instead of:
df_data_1["(visits_lag_, 1)"]
Try to do:
df_data_1[("visits_lag_", 1)]
This is my first time working with the Pandas library so if this a silly mistake I apologize.
For a Kaggle Competition, I'm creating a new column, Date_Diff, in a data frame based off of the difference between two columns. Both columns are initially strings, I convert them to datetime data types and then the resultant is a timedelta after I perform subtraction on them. Following that, I convert the type to a float by getting the days of the timedelta object.
All of the features of the dataset are thrown into a matrix in a function. I keep getting the following error when running that function:
ValueError: negative dimensions are not allowed
When assigning Date_Diff to the data frame, how can I keep the index in place so we can keep the number of rows (if this isn't the right terminology please let me know)?
I thought this post would answer my question, but unfortunately it didn't.
I've been looking up how to do this for two days now and still haven't found exactly what I'm looking for.
Converting to float and assigning the column:
for i in range(len(cur_date)):
date_diff.append(abs(cur_date[i] - iss_date[i]))
date_diff[i] = float(date_diff[i].days)
# The first three column assignments work fine, the Date_Diff does not
raw_data_set = raw_data_set.assign(Date_Diff = pd.Series(date_diff).values)
After tracing through the program, it looks like the matrix is getting negative dimensions because one of the parameters used to build the matrix is set to the total number of rows in the feature matrix - 1. The feature matrix isn't getting any rows so this is returing as 0 - 1 for a value of -1.
The function where the matrix is being created was in the starter code for the competition so I don't think it's wrong. I also added three columns to the data frame before Date_Diff and I had no issues so I really think it's with how I'm assigning Date_Diff.