Google Sheets date to Pandas datetime unexpected offset - python-3.x

Problem
I'm trying to accurately represent a date from Google Sheets in a DataFrame. I know that the "base" dates in Google Sheets are integers added to the date since 1/1/1900. Testing this is clear: I have a Sheet with the date 5/2/2019. Using the Python API, I download this Sheet with the parameter valueRenderOption='UNFORMATTED_VALUE' to ensure I'm getting raw values, and do a simple conversion to a DataFrame. The value shows up as 43587, and if I put that back into a Sheet and set the format to date, it appears as 5/2/2019. Sanity check complete.
The problem arises when I try to convert that date in the DataFrame to an actual datetime: it shows up as offset by two days, and I'm not sure why.
Attempts
In a DataFrame df, with datetime column timestamp, I do the following:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='d', origin='1900-01-01')
and I get a date of 2019-05-04, which is two days later than I would expect. I searched for this on SO and found a similar issue, but the accepted answer actually contains the exact same problem (albeit no mention of it): a two day offset.
This can be "solved" by setting the origin two days back, to 1899-12-30, though that feels almost like a cover, and not necessarily fixing the underlying issue (and could perhaps leads to further date inconsistencies down the road as more time has passed?).
Here's code for a toy DataFrame so that you don't have to type it out, if you want to experiment:
import pandas as pd
df = pd.DataFrame([{'timestamp': 43587}])
Question
I imagine this is on the Pandas side of things, but I'm not sure. Some internal conversion that happens differently than how they do it at Google? Does anyone have an idea of what's at play here, and if setting the origin date two days earlier is actually a solution?

I have been banging my head against this as well, and think that I finally figured it out. While for the Date() function, Sheets uses 1900-1-1 as the base, for the date format and for the TO_DATE() function, the origin date is 1899-12-30.
You can see this in Sheets by either
entering 0 in a cell, and then formatting to a date → 12/30/1899
entering =TO_DATE(0), which will result in 12/30/1899
One origin story for this odd choice is here in a very old MSDN forum. I have no idea of its veracity.
At any rate, that explains the two-day discrepancy and then the solution becomes
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='d', origin='1899-12-30')
which worked for me.

Related

How to properly format dates in Google Sheets or Microsoft Excel

I have a spreadsheet I need to make in Google Sheets. The source of some of the data is exported to an Excel sheet. The data arrives in a dd/mm/yyyy format and I need to display it in a MON d format (Ex Sep 5).
The problem is both excel and sheets look at the date that arrives and think it is mm/dd/yyyy.
For example, 02/08/2022 is believed to be Febuary 8 even though it should be Aug 2. The problem then arises that neither of these platforms end up knowing how to convert this to Aug 2 and I end up having to do this manually.
Does anyone know how to get around this?
I have tried adjusting the format of the date, as well as using DateValue to convert (this fails since it understands the date as mm/dd/yyyy even when it is dd/mm/yyyy).
Any leads would be appreciated!
Thanks!
In Google Sheets, choose File > Settings > Locale and select a locale that uses the dd/mm/yyyy date format, before importing the data. You can then format the date column the way you prefer.
in gs:
=TEXT(REGEXREPLACE(A1&""; "(\d+)\/(\d+)\/(\d+)"; "$1/$1/$3"); "mmm d")
Try the following and format the result to your liking
=INDEX(IF(ISNUMBER(U2:U5),U2:U5,
IF(U2:U5=DATEVALUE("1899-12-30"),,
(MID(U2:U5,4,3)&LEFT(U2:U5,3)&RIGHT(U2:U5,4))*1)))
(Do adjust the formula according to your ranges and locale)
Functions used:
INDEX
IF
DATEVALUE
ISNUMBER
TRUNC
MID
LEFT
RIGHT
Well, for a formulaic solution, if the date is in A1, then the following places the correct date in B1:
=DATEVALUE(TEXT(A1,"DD/MM/YYYY"))
The TEXT function makes a string that will be the same form as your imported string out of the date produced during import. DATEVALUE then gives the proper date you desired.
The trick is in the TEXT step in which you reverse month and day in the string for DATEVALUE.
Naturally, instead of a helper column, it could just be wrapped around any reference to a date from column A, though one would have to remember to do so for all the years the spreadsheet is in use.
If you are importing, not just opening a .CSV file via File|Open and going from there, you have an opportunity to solve all your problems. You use the Ribbon menuing system's Data menu, select the very leftmost thing, Get Data and from the (no arguing THIS isn't a menu) menu that drops down, Legacy Wizards, then finally From Text (Legacy) which will open the old Excel Import Wizard. (You may notice this is very like the Data|Text to Columns Ribbon menu choice and that is because that choice is the old wizard minus the steps at the start that go looking to another file for the data because it knows, by law, that it has to already be in the spreadsheet... in other words, it looks the same because it IS the same.)
Then make selections for the first couple dialogs it presents you to get to the dialog in which you tell it to import columns as whatever: general (let Excel decide), text, date, and do not import. Choose Date and make the selection of DMY to import them properly as you desire them to be so you are never presented with the problem at all.
As you might guess, you can use the abbreviated wizard via the "Text to Columns" feature to do the same thing after import when you see they are reversed. Since it is a single column of data, the result will overwrite the original simplifying your work.
Why does this happen at all? Well, the "locale" folks have the idea. When Excel imports numbers that are in a form it recognizes could be a date, it looks to the operating system settings for the selected ways dates are understood. So if your operating system believes a date should be displayed "Month Day, Year" and Excel has a set of data it thinks fits that mold, it will convert them all using it. So you get those Feb 8's rather than Aug 2's.
Interestingly, it does two other things of note:
It looks at 8, count 'em, 8 rows of data to decide the data fits the pattern. Even with 1,000,000 rows to import, it looks at... 8.
Then it does them ALL as if God himself wrote the "8"... and dates like 25/03/2022 get imported as text not a real date, because they (oh, obviously) can't be dates... "25" can't be a month!
It IS possible to change settings (DEEP settings) to make Excel consider X number of rows in a data set before deciding such things. I found them here, on the internet, once upon a time, though I shouldn't like trying to find them again. It will consider up to a million rows in such an import, but... that'd make it pret-ty slow. And that's a million rows for EACH data column. I won't even say that "adds up" - I'll point out it "multiplies up."
Another technique is to add some number of starting rows to force the desired pattern onto the import. I've heard it works in TIME column imports so it ought to in DATE column imports but I've not verified such.
My bet is you will find the use of the "Text to Columns" feature of most use if you can use a hands-on approach - it does require literal action on your part, but is a fast operation. If you will see others using the spreadsheet though... well, you need a formulaic solution or a VBA one (macro with button for them to have some fun clicking as their reward for doing what they were trained to do instead of complaining to the boss you make bad spreadsheets). For a formulaic solution, the above formula is simple.
Last thought though: there's no error-checking and error-overcoming in it. So a date like "25/03/2022" in the data that imported as literal text is a problem. For handling the latter, an up-to-date approach could be:
=IF(TYPE(A1)=1,DATEVALUE(TEXT(A1,"dd/mm/yyyy")),DATE(INDEX(TEXTSPLIT(A1,"/"),1,3),INDEX(TEXTSPLIT(A1,"/"),1,2),INDEX(TEXTSPLIT(A1,"/"),1,1)))
in which the DATE(etc. portion handles finding text of the "25/03/2022" kind. Lots of less up-to-date ways to split the text Excel would have placed in the column, but since demonstrating what to do if it existed was the point, I took the easy way out. (Tried for a simple version but it wouldn't take INDEX(TEXTSPLIT(A1,"/"),1,{3,2,1}) from me for the input parameters to DATE.) TYPE will give a 1 if Excel imported a datum as a date (number), and a 2 if brought in as text. If empty or strange strings could exist, you'll need to deal with what those present you as well.

Problem and a Solution for pd.DataFrame values changing to Nan while changing index/row default names

I had a dataframe like in image-1 - Input dataframe on which I want to rename Rows/indices by dates (dtype='datetime64[ns]) in YYYY-MM-DD format.
So, I used index re-naming option as shown in the image-2 below, which is last date of every 6th month for every row incrementing till end. It did rename the rows but end up making NaNs for all data values. I did try the transpose of dataframe, same result.
After trying few other things as shown in image-3, which were all unfruitful and mostly I had error suggesting TypeError: 'DatetimeIndex' object is not callable
As the final solution, I end up creating dataframe for all dates image-4, followed by merging two dataframes by columns, image-5 and then assign/set very first column as row names, image-6.
Dates have a weird format when converting to list, and wondering why it is so, image-7. How do we get exactly the year-month-date? I tried different combinations but didn't end up in fruitful results. strftime is the way to go here, but how?
Why I went this strftime approach, I was thinking to output a list of dates in a sensible YYYY-MM-DD format and then use function as --> pd.rename(index=list_dates) to replace default 0 1 2 by dates as new index names.
So, I have a solution but is it an economic solution or are there good solutions available?
This is an attempt to share my solution for those who can use it and learn new solutions from wizards here.
BRgrds,

Concatenated DATEVALUE() and TIMEVALUE() only returns string in Google Sheets

Good morning everyone, currently losing my marbles over this and hoping there's a workaround. I concatenated two values to jerry-rig a timestamp, and converted a string into a number conditionally:
TIMEVALUE(IF(H2="Breakfast","08:00:00",0))+DATEVALUE(C2)
I then formatted this into a timestamp in Google Sheets, and used IMPORTRANGE()= to use in another sheet. BUT, when I plugged the values into MAX(), it always returns DATEVALUE(0), "12/30/1899 00:00:00". I wanted to sort and get the most recent date, but these values refuse to sort. Some notes:
Everything is in datevalue format;
I've tried a workaround using SORT(), which didn't work (it just returned the unsorted range);
Currently using this as a workaround: RIGHT(TEXTJOIN(",",TRUE,FILTER('Sheet1'!C1:DU1,'Sheet1'!C2:DU2=1))), but this is causing issues because I can't use SORT().
I am using a lot of helper sheets and imported ranges (I did read that this can cause problems, but it's not clear on if those are fixable issues.)
This formula does return the values I'm asking for; I just want it to sort, which I know I can't do with string values.
I'm probably missing something huge. Please help!
Sorry—here's a link to a dummy version of the Google Sheet: https://docs.google.com/spreadsheets/d/1NmdEgnDU0fHTeRWpagrr1niORnlfwZO8uYCkSYrfFLA/edit?usp=sharing

Problems changing date format of date as string

I am currently trying to convert yyyymmdd type of date to a ddmmyyy format.
I've tried using DATE function and something like this:
=DATE(LEFT(A3;4);MID(A3;5;3);RIGHT(A3;7))
Original date for this function is 20120401 but the formula returns: 16.12.2104.
I've tried using functions YEAR, MONTH and DAY instead of LEFT, MID and RIGHT but that's not the right way.
I also tried using DATEVALUE but since Excel probably doesn't recognize yyyymmdd as a date, it gives me a #VALUE! error.
I've found a couple of solutions for SQL but it isn't my strong side yet - and this should be achievable easily in Excel.
Is there a way to do this in Excel?
Applying =DATE(LEFT(A3;4);MID(A3;5;3);RIGHT(A3;7)) to 20120401 in A3 is effectively to reassemble the component characters as below on the left:
These effectively become converted as on the right.
The solution was simply to exclude the digits highlighted red with:
=DATE(LEFT(A3;4);MID(A3;5;2);RIGHT(A3;2))
but the locale might have been a factor and when dealing with date serial numbers and/or what appears to be a date but is a string various other issues might have been involved.
Consider:
=DATE(LEFT(A1,4),MID(A1,5,2),RIGHT(A1,2))

Any solution to the Today Calculated Column problem is SharePoint?

I would like to be able to use today's date in a calculated column in a SharePoint list to, for example, determine whether a task is overdue. There is a well-documented trick that involves creating a dummy column named "Today," using it in a formula, and then deleting it, thereby "tricking" SharePoint into using the Today function.
The problem is that this method does not work reliably -- the calculation is not dynamic; it is only made when the item is saved, and therefore the Today "column" effectively becomes the Modified Date. (This is probably why SharePoint won't let you use the Today function in a straight-forward way.)
Has anyone found a solution that works? I know I can use javascript to get the actual date on the client side and display colors, flags, whatever, but I am looking for a "server side" solution.
For reference, the Today column trick and its problems are described fairly well at these two posts and associated comments:
http://blogs.msdn.com/cjohnson/archive/2006/03/16/552314.aspx and http://pathtosharepoint.wordpress.com/2008/08/14/calculated-columns-the-useless-today-trick/
There simply isn't a work around for this. As the values for the list are stored in the database and returned "as is" to other featurs such as the search crawler, a dynamic field cannot be created.
It is possible to create a custom field that will display the value using todays date in its calculation.
In addition to Christophe's (PathToSharePoint)'s article this also covers the Today trick and why it doesn't work
The Truth about using Today in calculated columns
There are a number of fudges, probably the best one is Dessie's console app (mentioned above by MNM)
Dynamically updating a SharePoint calculated column containing a Today reference
Its good but its not perfect, for example you may have to worry about different timezones.
Before going down this route you should ask yourself if you really, really need to do this. For example :-
If you want a countdown (days overdue/days left to complete a task) then you can use SPD and a XLST Data View web part
If you want a view to show overdue items or items created in the last X days ec then you can use [Today] in a views filter 2
If you create a Today column it needs to be updated. You can do that with either a timer job or by placing a jquery script on a page that is hit by the user. The script could call SPServices.SPUpdateMultipleListItems to do the update. Pass a CAML clause so that you only update the list items where the Today value needs to be updated, e.g. once per day.
My advice is to create your on field that does this calculation for you and then reference it in your SharePoint list. Not a simple implementation but it would work.
I have been looking for a solution either, still no luck.. The Today column trick has the limitation of not being dynamic.
I do have one suggestion though, why don't we create a timer job that will update a certain a certain column with the current date every day at 12 AM. I know some of you all might think it an over head. Just my suggestion :D!!
I came up with a very rough, but working solution to this problem without having to do any coding. I'll explain both how i made the today column and how i worked that in to an overdue column, becuase that column was a pain to find out how to do as well.
First, I made a column named "today" (gasp!). Next I made a column named "Days Overdue". I then opened up sharepoint designer and created a new workflow. I set it to run every time an item is edited/updated (keep in mind I turned off versioning for this list, otherwise I would have had to resort to coding to avoid a bunch of useless data building up on our server). I set the actions to simply store the modified date in a workflow variable, then change the value of the today column to that variable. although the modified column is a date/time and my today column is just a date, it transfers just fine. I then set the workflow to pause for 2 hours. you can set this to whatever amount of time you want obviously, it will just change the latest possible time for your today column to update, i.e. 2AM in my case.
on to the days overdue column. this is the code for that guy -
=IF([Due Date]>Today,"None",IF([Date Closed]=0,Today-[Due Date],IF([Due Date]>[Date Closed],"None",IF(Today>=[Date Closed],[Date Closed]-[Due Date],IF([Due Date]<Today,Today-[Due Date])))))
This shows the days overdue in number form in days, or if its not overdue, it shows "None". You can use either a number format or a string format, but NOT A DATE FORMAT. Well, I hope this helps anyone who is running into this problem and doesn't want to have to delve into coding.
EDIT: I forgot to say that in the code above for the days overdue column, I put in that if today is past the date closed, to use the date closed minus the due date instead of today minus due date, to ensure that the calculation doesnt keep occurring after an item has been closed. you probably would have noticed that in the code, but i felt i should point it out just in case.
EDIT 2: The code I had in before my 2nd edit for my calculated column didn't calculate the days overdue properly after an issue had been marked "closed." I put in the updated code. The last part of the code doesn't make sense, as it is the same logic as the beginning, but it worked so I didn't want to take any chances! :)
Peace.
I've used the following and had no problems.
Field Name: Overdue
Field Type: Calculated
Data Type Returned: Yes/No
Formula:
=AND([Due Date]<NOW(),Status<>"Completed",[Due Date]<>"")
Here is a workaround:
Create a date column called Today.
Use this column in your calculated formula (ignore the fact that the formula returns a wrong value).
After you are done with the formula, delete the Today column from your list.
For some reason it works this way! Now Sharepoint treats the Today in your formula as today's date.
Note: If you decide you want to change the formula, you have to create the Today column again. Otherwise, it wouldn't recognize Today as a valid column.
I Tried #Farzad's approach and it seems to be working perfectly. I wanted to do a custom count on Days Elapsed so added a calculated column which previously I was using a difference between the Created Date and Modified Date Columns, which was only showing up whenever a user updated the post, much to my dismay.
I now have a formula which works as I would want to and uses the Today column, and here it is for anyone who would like to use it. I also have a Status column on the basis of which a base of On Hold is used, and the remaining formula are based on the date difference of Today - Created.
=IF(Status="On Hold","On Hold",IF(AND(Today=Created,(DATEDIF(Created,Today,"D")=0)),"New",IF(AND(Today<>Created,(DATEDIF(Created,Today,"D")=0)),"New (updated)",IF(DATEDIF(Created,Today,"d")>3,"Need Update Immediately",IF(DATEDIF(Created,Today,"d")=1,"One day old",IF(DATEDIF(Created,Today,"d")=2,"Two days old",""))))))
Basically its just a bunch of nested IF conditions which get me labels on the basis of which I can add a group to my view and filter out data if needed. Hope this helps anyone looking for an answer!

Resources