I'm importing a csv files which contain a datetime column, after importing the csv, my data frame will contain the Dat column which type is pandas.Series, I need to have another column that will contain the weekday:
import pandas as pd
from datetime import datetime
data =
pd.read_csv("C:/Users/HP/Desktop/Fichiers/Proj/CONSOMMATION_1h.csv")
print(data.head())
all the data are okay, but when I do the following:
data['WDay'] = pd.to_datetime(data['Date'])
print(type(data['WDay']))
# the output is
<class 'pandas.core.series.Series'>
the data is not converted to datetime, so I can't get the weekday.
Problem is you need dt.weekday with .dt:
data['WDay'] = data['WDay'].dt.weekday
Without dt is used for DataetimeIndex (not in your case) - DatetimeIndex.weekday:
data['WDay'] = data.index.weekday
use the command data.dtypes to check the type of the columns.
Related
I have pandas data frame that contains Month and Year values in a yyyy-mm format. I am using pd.to_sql to set the data type value to sent it to .db file.
I keep getting error:
sqlalchemy.exc.StatementError: (builtins.TypeError) SQLite Date type only accepts Python date objects as input.
Is there a way to set 'Date' Data type for 'MonthYear' (yyyy-mm) column? Or it should be set in a VARCHAR? I tried changing it to different types pandas's datetime data type, none of them seem to work.
I don't have any issues with 'full_date', it assigns it properly. Data type for 'full_date' is datetime64[ns] in pandas.
MonthYear full_date
2015-03 2012-03-11
2015-04 2013-08-19
2010-12 2012-06-29
2012-01 2018-01-01
df.to_sql('MY_TABLE', con=some_connection,
dtype={'MonthYear':sqlalchemy.types.Date(),
'full_date':sqlalchemy.types.Date()})
My opinion is that you shouldn't store unnecessarily the extra column in your database when you can derive it from the 'full_date' column.
One issue you'll run into is that SQLite doesn't have a DATE type. So, you need to parse the dates upon extraction with your query. Full example:
import datetime as dt
import numpy as np
import pandas as pd
import sqlite3
# I'm using datetime64[ns] because that's what you say you have
df = pd.DataFrame({'full_date': [np.datetime64('2012-03-11')]})
con = sqlite3.connect(":memory:")
df.to_sql("MY_TABLE", con, index=False)
new_df = pd.read_sql_query("SELECT * FROM MY_TABLE;", con,
parse_dates={'full_date':'%Y-%m-%d'})
Result:
In [111]: new_df['YearMonth'] = new_df['full_date'].dt.strftime('%Y-%m')
In [112]: new_df
Out[112]:
full_date YearMonth
0 2012-03-11 2012-03
I am new to python and exploring to get data from excel using it and found pandas library to get data
I need to get the rates from a HTML table on a website. Table from which the data has to be read
Then dump it in an excel file.
I am using Python
I have used the following code
import pandas as pd
from datetime import datetime
import lxml as lx
import openpyxl as oxl
url = "https://www.example.com"
tables = pd.read_html(url)
table = tables[0]
table.to_excel('output.xlsx')
The dates are in dd mmm yyyy format in the 'Effective Date' column
I would like to convert them to the dd/mm/yyyy format
I used the following code to convert the table
['Effective Date'] = pd.to_datetime(table['Effective Date'],
infer_datetime_format=False, format='%d/%m/%Y', errors='ignore')
but it fails to convert the dates in the column. Could someone head me in some proper direction please.
Here is the complete code
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'],
infer_datetime_format=False, format='%d/%m/%Y', errors='ignore')
table.to_excel('rates.xlsx')
You need to use pd.ExcelWriter to create a writer object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:
You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
As mentioned in the comments, you must pass format='%d %b %Y' to pd.to_datetime() as that is the Date format you are converting FROM.
You must pass errors='coerce' in order to return NaT for those that don't meet the specified format
For the pd.to_datetime() line of code, you must add .dt.date at the end, because we use a date_format parameter and not a datetime_format parameter in creating the writer object later on. However, you could also exclude dt.date and change the format of the datetime_format parameter.
Then, do table = table.dropna() to drop rows with any columns with NaT
Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use openpyxl and create a writer object and pass the date_format. In case someone says this, you CANNOT simply do: pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y') or .dt.strftime('%d/%m/%y'), because that creates a "General" date format in EXCEL.
Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing d/m/yyyy creates a "Custom" format in Excel.
NOTE: In my code, I have to pass m/d/yyyy in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy' since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.
Source + More on this topic:
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')
writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Unable to convert DataFrame column to date time format.
from datetime import datetime
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
type(Holidays['Date'])
Output: pandas.core.series.Series
Also tried
Holidays['Date'] = Holidays['Date'].astype('datetime64[ns]')
type(Holidays['Date'])
But same output
Output: pandas.core.series.Series
I think you are getting a bit mixed up. The dtypes of Holidays['Date'] is datetime64[ns]
Here's how I am checking.
from datetime import datetime
import pandas as pd
Holidays = pd.DataFrame({'Date':['2016-01-01','2016-01-06','2016-02-09','2016-02-10','2016-03-20'], 'Expenditure':[907.2,907.3,904.8,914.6,917.3]})
print ('Before converting : ' , Holidays['Date'].dtypes)
Holidays['Date'] = pd.to_datetime(Holidays['Date'])
print ('After converting : ' ,Holidays['Date'].dtypes)
The output is:
Before converting : object
After converting : datetime64[ns]
Thought I will also share some addition information for you around types and dtypes. See more info in this link for types-and-dtypes
I am downloading data from FXCM with fxcmpy, this is what the data looks like:
In the index column I would like only to have the time without the date how can this be done.
This is the code:
import fxcmpy
import pandas as pd
import matplotlib.pyplot as plt
con = fxcmpy.fxcmpy(config_file='fxcm.cfg', server='demo')
# To check if the connection is established
if(con.is_connected):
print('Connection is established')
else:
print('Erro in connecting to the server')
data = con.get_candles('USD/JPY', period='m5', number=500)
con.close()
Assuming that your index is already a DatetimeIndex, simply choose the time part from the index:
data.index = data.index.time
If it is not (say, it is a string), convert it to DatetimeIndex first:
data.index = pd.DatetimeIndex(data.index)
You have to make sure your df['Index'].dtype has type pandas datetime type dtype('<M8[ns]'). Then you use the following format to extract time. Refer to this answer
df['Index'].dt.strftime('%H:%m:%S')
one way is converting object to datetime then extract year from it.
from datetime import datetime as dt
date="2019-11-21 13:10:00"
fmt="%Y-%m-%d %H:%M:%S"
print(dt.strptime(date,fmt).time())
output
13:10:00
I am reading in some excel data that contains datetime values stored as '8/13/2019 4:51:00 AM' and formatted as '4:51:00 AM' in excel. I would like to have a data frame that converts the value to a timestamp formatted as '4:51 AM' or H%:M% p%.
I have tried using datetime strptime but I don't believe I have been using it correctly. None of my attempts have worked so I have left it out of the code below. The two columns I would like to convert are 'In Punch' and 'Out Punch'
import pandas as pd
import pymssql
import numpy as np
import xlrd
import os
from datetime import datetime as dt
rpt = xlrd.open_workbook('OpenReport.xls', logfile=open(os.devnull,'w'))
rpt = pd.read_excel(rpt, skiprows=7)[['ID','Employee','Date/Time','In Punch','Out Punch',
'In Punch Comment','Out Punch Comment', 'Totaled Amount']]
rpt
Any suggestions will be greatly appreciated. Thanks
EDIT:
Working with the following modifications now.
rpt['In Punch'] = pd.to_datetime(rpt['In Punch']).dt.strftime('%I:%M %p')
rpt['Out Punch'] = pd.to_datetime(rpt['Out Punch']).dt.strftime('%I:%M %p')
Try working with datetime inside pandas. Convert Pandas Column to DateTime has some good suggestions that could help you out.
rpt['In Punch'] = pd.to_datetime(rpt['In Punch'])
Then you can do all sorts of lovely tweaks to a datetime. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html