Coverting datetime.datetime into timestamp in pandas - python-3.x

I have a column in pandas which contains datetime.datetime array. For instance the rows has the following format:
datetime.datetime(2017,12,31,0,0)
I want to convert this to TimeStamp such that I get:
Timestamp('2017-12-31 00:00:00')
as output, I wonder how one does this?

Try: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_timestamp.html
So maybe: df['datetime'].to_timestamp

Related

I have a column date of the form '20041230' . I want to convert this column to the form 2004-12-30

I Have a column named "Date" which has values of the form '20041230'.
How to convert this to 2004-12-30 in pandas.
I tried applying pd.to_datetime to the column, but I am getting garbage values attached to the date.
A safe method to have strings would be:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d')
For datetime type, use normalize:
df['Date'] = pd.to_datetime(df['Date']).dt.normalize()

How to convert excel date to numeric value using Python

How do I convert Excel date format to number in Python? I'm importing a number of Excel files into Pandas dataframe in a loop and some values are formatted incorrectly in Excel. For example, the number column is imported as date and I'm trying to convert this date value into numeric.
Original New
1912-04-26 00:00:00 4500
How do I convert the date value in original to the numeric value in new? I know this code can convert numeric to date, but is there any similar function that does the opposite?
df.loc[0]['Date']= xlrd.xldate_as_datetime(df.loc[0]['Date'], 0)
I tried to specify the data type when I read in the files and also tried to simply change the data type of the column to 'float' but both didn't work.
Thank you.
I found that the number means the number of days from 1900-01-00.
Following code is to calculate how many days passed from 1900-01-00 until the given date.
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame(
{
'date': ['1912-04-26 00:00:00'],
}
)
print(df)
# date
#0 1912-04-26 00:00:00
def date_to_int(given_date):
given_date = datetime.strptime(given_date, '%Y-%m-%d %H:%M:%S')
base_date = datetime(1900, 1, 1) - timedelta(days=2)
delta = given_date - base_date
return delta.days
df['date'] = df['date'].apply(date_to_int)
print(df)
# date
#0 4500

How to convert all float columns in dataframe but except the first column?

I have searched but not found excatly what I need. I have a dataframe which has 50 columns. The first one is a date dtype, the rest are floats dtypes.
Now I want to convert ONLY the float columns into integer but NOT the date column.
Can someone guide please?
When I slice the df like this df_sub1=df_sub.iloc[:, 1:].apply(np.int64) and then concat with the date column after, it crashes my laptop and did therefore not work. I hope there is a better way.
Well assuming that date is your first column
import pandas as pd
cols = df.columns
df[cols[1:]] = df[cols[1:]].apply(pd.to_numeric, errors='coerce')
you can do it like this.
new_df = df.drop(nameoffirstcolumn,1)
new_df.apply(np.int64)
then you can do something like.
final_df = pd.concat([df1['nameoffirstcolumn'],new_df], axis=1)

Convert a field content to Pandas DataFrame

I have the following pandas dataframe:
The field '_source' has a JSON structure in the content. I'd like to convert this field in another dataframe with the correspondent columns.
The type of this field is Series:
type(df['_source'])
pandas.core.series.Series
Which is the best way to convert this field ('_source') in a Pandas DataFrame?
Thanks in advance
Kind regards
You can use these lines of code to convert '_source' in correspondent columns:
subdf= df['_source'].apply(json.loads)
pd.DataFrame(subdf.tolist())

Convert a Spark dataframe column from string to date

I have a spark dataframe i built from a sql context.
I truncated the a datetime field using DATE_FORMAT(time, 'Y/M/d HH:00:00') AS time_hourly
Now the column type is a string. How can I convert a string dataFrame column to datetime type?
You can use a trunc(column date, format) to not to lose date datatype.
There is a to_date function to convert string to date
Assuming that df is your dataframe and the column name to be cast is time_hourly
You can try the following:
from pyspark.sql.types import DateType
df.select(df.time_hourly.cast(DateType()).alias('datetime'))
For more info please see:
1) the documentation of "cast()"
https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html
2) the documentation of data-types
https://spark.apache.org/docs/1.6.2/api/python/_modules/pyspark/sql/types.html

Resources