What to do if some datetime values 'don't match format'? - python-3.x

i have big data challenges using a big number of csv files. In the second column there data times and I just want to read the data.
I used
dt1=list1[1][1]
dt_obj1=datetime.datetime.strptime(dt1, '%Y-%m-%d %H:%M:%S')
and after this
first_date=dt_obj1.date() and it worked well.
The Problem is that there a a few (just 10 of over a million) entries where there is just a date instead of a date time and so it doesn't match the format.
Do you have any ideas how i can just read the date in this entries (or ignore them)?

You can use the dateutil library. The advantage of using this library is that you don't have to worry about the format. Its parser would automatically select the format matching your data.
from dateutil.parser import *
dt_1 = parse("Sat Oct 11 17:13:46 UTC 2003")

You can always use try/catch to design the way you read, suppose you have all possible formats in a formats list, then you can do
dt = None
for format in formats:
try:
dt = datetime.datetime.strptime(dt, format)
break
except:
pass
This will ensure you only break the loop when you get the correct format, otherwise keep trying possible formats.
Otherwise you can use the external dateutil library parse function parser.parse which can parse any datetime format irrespective of the format
from dateutil import parser
print(parser.parse("1990-01-21 14:12:11"))
print(parser.parse("1990-01-21"))
#1990-01-21 14:12:11
#1990-01-21 00:00:00

Related

Why are my dictionary datetime values writing into the csv file differently than how they print?

As my title states, I am trying to take the values from my dictionaries and write them into a .csv file. They are datetime files stored in ISO format YYYY-MM-DD. However, in the file, after the first 5 or so lines they go from YYYY-MM-DD format to MM/DD/YYYY format. For reference, this data in which these dictionaries gathered from have inconsistent forms (but that is handled). When I simply print the values they all print in YYYY-MM-DD format.
Here is a generic version of my code:
file = open(sys.argv[2], 'w')
for key in date_dict :
print(date_dict[key])
file.write('{}\n'.format(date_dict[key])
How do I fix this so it is all in YYYY-MM-DD format?

How to extract the data from pandas dataframe between two dates?

Initial "ImportDate" datatype Initial Pandas Dataframe interested in "ImportDate
Problem statement -
I want to extract the data where "ImportDate" last till "1-1-2019". For eg - start_date to 1-1-2019. I tried converting "object" into "datetime64[ns] and wrote the code as
df[df['ImportDate'].between(4/26/2018, 1/1/2019)]
But resulted in an error while extracting the data:
"'>=' not supported between instances of 'str' and 'float"
Can anyone help me how to deal with my problem statement?
My guess is that your input in the between function are not dates. You should try to convert them :
df[df['ImportDate'].between(pd.to_datetime("4/26/2018"), pd.to_datetime("1/1/2019"))]
Or directly create date objects : datetime.date(2019,1,1) (do not forget to import datetime).
As stated, it would be easier to check if you can provide a piece of data.
Is the column you say is a datetime really datetime? Fro mthe error you posted, it looks like it is not. Please check once more with df.dtypes. if its is not a datetime object, then convert it to datetime with for example df['ImportDate']= pd.to_datetime(df['ImportDate'],format='%d/%m/%y') (you will have to tweak parameters to suit your data). Then you can do df[df['ImportDate'].between(start_date,end_date)]

datetime instead of str in read_excell with pandas

I have a dataset saved in an xls file.
In this dataset there are 4 columns that represent dates, in the format dd/mm/yyyy.
My problem is that when I read it in python using pandas and the function read_excel all the columns are read as string, except one, read as datetime64[ns], also if I specify dtypes={column=str}. Why?
Dates in Excel are frequently stored as numbers, which allows you to do things like subtract them, even though they might be displayed as human-readable dates like dd/mm/yyyy. Pandas is handily taking those numbers and interpreting them as dates, which lets you deal with them more flexibly.
To turn them into strings, you can use the converters argument of pd.read_excel like so:
df = pd.read_excel(filename, converters={'name_of_date_column': lambda dt: dt.strftime('%d/%m/%Y')})
The strftime method lets you format dates however you like. Specifying a converter for your column lets you apply the function to the data as you read it in.

Converting timestamp to date in excel and SAS import

I am trying to extract the date from a date timestamp in excel. I currently have a data file with a mixture of date formats including date only and date timestamps. This is causing me problems as I am importing the data into SAS and it cannot read both the date only and date timestamps under the same column.
I have tried in Excel converting the timestamp to a date using the following formula:
=DATEVALUE(DAY(E32) & "/" & MONTH(E32) & "/" & YEAR(E32))
This works in excel and converts the date so that they are all formatted the same and therefore gets around the issue of the timestamp. However when I import the data into SAS, I get null values if the day is greater than 12, i.e. it is reading the date as mm/dd/yyyy. For example:
Excel Date SAS Import Date
09/12/2016 09/12/2016
15/12/2016 #VALUE!
I tried to reformat this in excel using the following to see if it would get around the issue:
=DATEVALUE(MONTH(E32) & "/" & DAY(E32) & "/" & YEAR(E32))
But I then get the same SAS error in excel.
Can anyone help suggest a formula to use in excel that will get around this issue or advice on importing the data into SAS?
It sounds like your Excel data is in DMY format, but SAS is using MDY. You can check SAS by running the following code :
proc options option=datestyle;
run;
If it is MDY, then change it (and if you're in the UK ask your SAS admin to change the default setting)
option datestyle='DMY';
You can also check the locale value, which in the UK will be EN_GB. This value determines the datestyle value used when working with dates.
proc options option=locale;
run;
If you asked SAS to import from an XLSX file then it should be able to tell that the column contains dates, independent of which display format you have attached to the cells. Make sure that all of the cells in a single column are the same type of data and use the same display format.
CSV files are not Excel files and so there is no place to put a formula or any metadata about what type of data is in each column. If you use PROC IMPORT to read the CSV file then SAS will have to guess at what type of data each column in the CSV contains. If you are saving an Excel files as a CSV file for later reading into SAS or other software then you should format your date columns using yyyy/mm/dd format in Excel to prevent confusion that can be caused by different defaults for month and day order. Nobody uses YDM order.
Since a CSV file is just a text file if you want complete control over how SAS reads the date strings then just write the data step to read it yourself. You could run PROC IMPORT and then recall the code that it generates and modify it to read your data. You could read the string into a character variable and then write your own statements to convert it using say the INPUT() function.
If the column has some date values and some date time values then you could try using the ANYDTDTE informat to pull just the date part. That informat should properly handle 15/10/2016 even if your LOCALE settings are for US or other locations where dates are normally represented in MDY order and not DMY order.
If your dates are consistently in the DMY order then use DDMMYY informat to prevent the LOCALE setting from causing PROC IMPORT or ANYDTDTE informat to convert 12/10/2016 to December 10th instead of October 12th. But if your text file actually has some rows with dates in month first order and others in day first order then you will really need some extra information to properly tell the difference between December 10th and October 12th.

Type conversion failure in Access 2013

When importing data from a text file (csv) into MS Access, I get an error "Type conversion failure" for 1 field. The field has data with date format "yyyy-mm-dd hh:nn:ss" and Access simply refuses to recognise it and places #Num! or simply blank data. The csv file is huge with 8m rows and cannot be opened in Excel to edit the date format. Facing no problems with any other fields.Anyway to avoid this error?
Use the Advanced... button at the field specification step of the import and try these settings:
I don't have the exact date format in the picture above, but it is just to show how to import that specific date.
Date Order should be YMD because in your dates, you have the years coming first, followed by the month and the date.
The date delimiter for your csv will be a dash -, while the time delimiter should be the default colon :. Make sure the 4 digit years checkbox is checked, and I would also check the Leading Zeros in Dates checkbox since your month and dates are in mm and dd formats respectively (i.e. they will begin with 0 if it is a single digit).
If there are problematic dates from your csv now, then this is another problem that won't be easy to tackle. You will maybe have to correct the date manually from the csv before importing it, or import the date as text and then create a new column to manipulate the text dates to date fields (and fix any problematic dates there).
Nothing wrong with the date format, but some records may be empty or have invalid entries.
Or you miss at the import to specify the separators and format for the date field.
If still no luck, link the file and specify text for the field. Then create a select query that uses the linked file as source and use CDate to convert the text date to true date values.
When done, change the query to an append or create table query to import your data.

Resources