How to handle date of type 10-January-2023 in python and pandas
I need to convert from format
10 January 2023 to format 10-01-2023.
I have tried pd.datetime() but not able to achieve desired output.
:Assuming you have a date string '10 January 2023', first convert it to datetime type:
dt = '10 January 2023'
dt_py = datetime.strptime(dt, '%d %B %Y')
then convert back to string format you want:
dt = dt_py.strftime('%d-%m-%Y')
So putting it all together:
from datetime import datetime
dt = '10 January 2023'
dt = datetime.strptime(dt, '%d %B %Y').strftime('%d-%m-%Y')
If you are getting date string in a dataframe column, you can apply a lambda function to do the above:
df['Datetime'] = df['DateTime'].apply(lambda x: datetime.strptime(x, '%d %B %Y').strftime('%d-%m-%Y'))
You can also use the to_datetime function to input the date using format='%d %B %Y' argument, but you would then again need a lambda function to convert it back to the string format you want.
Related
string DateAndTime = Cells[1].Text; // Output is 3/18/2020 3:00:18 PM
DateTime DT = DateTime.ParseExact(DateAndTime, "dd/MM/yyyy HH:mm:ss", CultureInfo.CurrentCulture);
Error: string was not recognized as a valid datetime
Current string is this 3/18/2020 3:00:18 PM
I want to convert and parse it to DateTime as 18/03/2020 15:00:18
ParseExact does exactly that, parses the string using the exact specification you provide. And, per your specification, "18" isn't a valid month. It sounds like you want to swap the month and day identifiers (and only use M instead of MM for the month, and use h for the single-digit 12-hour clock, and add tt for the AM/PM specification):
DateTime.ParseExact(DateAndTime, "M/dd/yyyy h:mm:ss tt")
Once it's parsed as a DateTime you can output the value in any format you like. For example:
DT.ToString("dd/MM/yyyy HH:mm:ss")
But your input format very much is not "dd/MM/yyyy HH:mm:ss". For parsing you need to match the input format, not the intended downstream format.
DateTime DT = DateTime.Parse(DateAndTime, new CultureInfo("en-US"));
I have a mixed format column in a dataframe from a pd.read_csv(). There is a lot of information out there about datetime handling but I didn't find anything for this specific problem:
2 datatime types:
Custom dd/mm/yyyy hh:mm that shows up in excel as such: 10/03/2018 07:18
General that shows up in Excel as such: 8/13/2018 2:28:34 PM
I used :
df.Last_Updated = pd.to_datetime(df['Last_Updated'])
df = df.sort_values('Last_Updated').drop_duplicates(['Name'], keep='last')
But I get a mixed bunch where the custom format returns as another datatime type :
yyyy-mm-dd hh:mm:ss and shows up in my Excel export as 2017-11-22 19:54:35
Upon checking it changes the dd/mm/yyyy hh:mm (02/09/2018 17:55:44) format to yyyy-mm-dd hh:mm:ss (2018-02-09 17:55:44) and since I have to perform an exclusion of the type 'older than' it causes errors; in this particular case, a computer that has it's last connection in September returns as having it in February.
Does anyone know a way to unify the datetime format?
Date format:
from notepad:
X = "10/2/2018 10:07:31 PM"
Y = "8/13/2018 2:28:34 PM"
from CSV (and by opening the .txt via Excel):
X = 10/02/2018 22:07 PM
Y = 8/13/2018 2:28:34 PM
after datetime applied in code:
X = 02/10/2018 22:07:31
Y = 13/08/2018 14:28:34
I have some time series data from an API request and when I am doing some data wrangling this error pops up below. The data wrangling is just some simple Pandas series math (not shown).
TypeError: unsupported operand type(s) for -: 'str' and 'str'
But when I save the data to a CSV:
elecMeter_df.to_csv('C:\\Python Scripts\\elecMeter_df.csv', sep=',', header=True, index=True, na_rep='N/A')
And then parse the dates on a read_CSV:
elecMeter_dfCSV = pd.read_csv('C:\\Python Scripts\\elecMeter_df.csv', index_col='Date', parse_dates=True)
I do not get the original error described above.. Why is that? Am I getting the error because the time stamp is a string and I need to convert into an integer format?
When I get the error, the index is in this format:
print(elecMeter_df.index)
But when read the CSV file and Parse the date column (No error in the data wrangling processes, the index is in this format: (no Chicago Time zone reference)
print(elecMeter_df.index)
Any help/tips that can be explained to me about time stamps and why this error happens would be greatly appreciated. Utilimetely I am trying to not have to use the read/write CSV process, but if its the only method to not get any errors Ill just stick with that!
Not sure what code you are running to generate that error. However the time stamp probably needs to be converted from a string to a date time. Try using pd.to_datetime, additionally you can specify the format (list of options and meanings are provided below). The example I used for the format is year-month-day hour-minutes.
pd.to_datetime(df['column'], format = %Y-%m-%d %H:%M)
%a Locale’s abbreviated weekday name.
%A Locale’s full weekday name.
%b Locale’s abbreviated month name.
%B Locale’s full month name.
%c Locale’s appropriate date and time representation.
%d Day of the month as a decimal number [01,31].
%f Microsecond as a decimal number [0,999999], zero-padded on the left
%H Hour (24-hour clock) as a decimal number [00,23].
%I Hour (12-hour clock) as a decimal number [01,12].
%j Day of the year as a decimal number [001,366].
%m Month as a decimal number [01,12].
%M Minute as a decimal number [00,59].
%p Locale’s equivalent of either AM or PM.
%S Second as a decimal number [00,61].
%U Week number of the year (Sunday as the first day of the week)
%w Weekday as a decimal number [0(Sunday),6].
%W Week number of the year (Monday as the first day of the week)
%x Locale’s appropriate date representation.
%X Locale’s appropriate time representation.
%y Year without century as a decimal number [00,99].
%Y Year with century as a decimal number.
%z UTC offset in the form +HHMM or -HHMM.
%Z Time zone name (empty string if the object is naive).
%% A literal '%' character.
I have a column called construction_year as numerical(int) year. I want to convert it to dd-mm-yyyy format in python. I have tried with datetime and pandas to_datetim and converting time stamp extracting the format but in vain.
Ex: I have year like 2013(int) I would like to convert it as 01-01-2013 in python 3.x.
Into a string
If you want to convert it into a string, you can simply use:
convert_string = '01-01-{}'.format
and then use it like:
>>> convert_string(2013)
'01-01-2013'
Into a datetime
If you want to convert it to a datetime object, you can simply use:
from datetime import date
from functools import partial
convert_to_date = partial(date,month=1,day=1)
Now convert_to_date is a function that converts a numerical year into a date object:
>>> convert_to_date(2013)
datetime.date(2013, 1, 1)
I have a file with about 200 lines of dates. Each of the dates is in YYYYMMDD format. How can I separate out each months data so I can get the averages of each month?
This is the best I've been able to figure out how to do it
Dates = line.split()
Year= Dates[0][0:4]
Month = Dates[0][4:6]
Date = Dates [0][6:8]
Assuming your file looks similar to this:
20131001 20131005 20130101 20130202
20130109 20130702 20130503 20130701
20130712 20130401 20131101 20131123
Here is what I would do to get a list of all the months in the file:
with open('dates.txt') as f:
lines = f.readlines()
months = [date[4:6] for line in lines for date in line.split()]
print(months)
To deal with the dates as actual datetime objects, use the datetime.strptime method to convert the date strings to datetime objects.