How to handle dates in cx_oracle using python? - python-3.x

I'm trying to access Oracle table using cx_oracle module and convert that as a dataframe, every thing is fine except couple of date columns has date format like "01-JAN-01" Python considering it as datetime.datetime(1,1,1,0,0) and after creating dataframe it's showing as 0001-01-01 00:00:00. I am expecting output as 2001-01-01 00:00:00. Please help me on this. Thanks in advance.

You have a couple of choices. You could
* Retrieve it from the Oracle database with [read_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html specifying the date in a format (TO_CHAR) more appropriate for the default date format of pandas
* Retrieve it from the database as a string (as above) and then convert it into a date in the pandas framework.

Related

Eland loading pandas dataframe to elasticsearch changes date

Greetings Stackoverflowers
I have been using (eland to insert a pandas dataframe as an elasticsearch document. The code used to make this happen is shown as follows and is strongly based on the one in the url
import eland as ed
def save_to_elastic(data_df, elastic_engine, index_name, type_overrides_dict, chunk_size):
"""
es_type_overrides={
"fechaRegistro": "date",
"fechaIncidente": "date"
}
"""
df = ed.pandas_to_eland(
pd_df=data_df,
es_client=elastic_engine,
# Where the data will live in Elasticsearch
es_dest_index=index_name,
# Type overrides for certain columns, the default is keyword
# name has been set to free text and year to a date field.
es_type_overrides=type_overrides_dict,
# If the index already exists replace it
es_if_exists="replace",
# Wait for data to be indexed before returning
es_refresh=True,
chunksize=chunk_size
)
I have used to insert the pandas dataframe inside elasticsearch as follows:
from snippets.elastic_utils import save_to_elastic, conect2elastic
es = conect2elastic(user='falconiel')
save_to_elastic(data_df=siaf_consumados_elk,
type_overrides_dict={'fechaRegistro':"date",
'fechaIncidente':"date"},
elastic_engine=es,
index_name='siaf_18032021_pc',
chunk_size=1000)
Everything works fine but once I have the document in elasticsearch 26 dates have been inserted wrongly inside elasticsearch. All my data starts in january 1 2015. But elasticsearch shows some documents with December 31 2014. I haven't been able to find an explanation for this. Why some of the rows in the pandas dataframe that have the date field correct (from 2015-01-01) were changed during loading to last day of december of previous year. I would appreciate any help or insight to correct this behavior.
My datetime columns in pandas dataframe are typed as datetime. However, I am trying to test the following conversions to address the problem. They have not been so productive by now:
I have tried using the following conversions before inserting calling the function I use to save to the dataframe in elastic:
siaf_consumados_elk.fechaRegistro = pd.to_datetime(siaf_consumados_elk.fechaRegistro).dt.tz_localize(None)
siaf_consumados_elk.fechaRegistro = pd.to_datetime(siaf_consumados_elk.fechaRegistro, utc=True)
In fact the problem is UTC. I checked some of the rows in the pandas dataframe and they were reduced almost one day. For instance, one record which was registered in 2021-01-02 GMT -5 appeared as 2021-01-01. The solution was to apply the corresponding time zone before calling the function to save the dataframe as an elastic document/index. So, considering the good observation given by Mark Walkom, this what I used before calling the function:
siaf_consumados_elk.fechaRegistro = siaf_consumados_elk.fechaRegistro.dt.tz_localize(tz='America/Guayaquil')
siaf_consumados_elk.fechaIncidente = siaf_consumados_elk.fechaIncidente.dt.tz_localize(tz='America/Guayaquil')
A list with the corresponding time zones can be found at: python time zones
This permitted to index the time corretly

How to read string column in format ‘15Aug21:12:45:24’ as time stamp in Tera data?

I have a character column in teradata table with format like this - ‘15AUG21:06:38:03’. I need to convert this column into time stamp so that I can use this column in order by statement. I am using teradata sql assistant to read data.
Use TO_TIMESTAMP:
SELECT TO_TIMESTAMP ('15AUG21:06:38:03', 'DDMONYY:HH24:MI:SS');

Date conversion in pyspark or sparksql

Currently having a field with the below date format.
3/2/2021 18:48
I need to convert it to 2021-03-02. I tried taking a substring and converting to date format. But it is not providing the desired output. Any suggestions will be helpful
Below if you are using spark SQL:
from_unixtime(unix_timestamp('3/2/2021 18:48', 'M/d/yyyy'), 'yyyy-MM-dd')
Same functions are available in Dataframe API as well:
https://spark.apache.org/docs/2.4.0/api/sql/index.html#from_unixtime
https://spark.apache.org/docs/2.4.0/api/sql/index.html#unix_timestamp

How to load csv with datetime into postgresql using copy_from

I am trying to find a way to bulk load a csv file into postgresql. However, the data has datetime column with the format of "YYYY-MM-DD HH24:MI:SS". I couldn't find any documentation on how to bulk load date column using psycopg2 package in python 3.x. Can I get some help on this? Thanks in advance.
I am able to load the data using the below code:
cur.copy_from(dataIterator,'cmodm.patient_visit',sep=chr(31),size=8192,null='') conn.commit()
However, only the date part got loaded in the table. The time part was initialized:
2017-04-13 00:00:00 2017-04-13 00:00:00 2017-04-12 00:00:00
After discussing with #Belayer, it was concluded that copy_from takes the timestamp input value in the format 'YYYY-MM-DD HH:MI:SS'. If the source has some other format, then that needs to be converted to the desired format mentioned before in this response before feeding it into copy_from.

Error while writing data from python to redshift - Invalid date format - length must be 10 or more

I have a dataframe in python where date columns in datetime64[ns] data type. Now I am trying to write this data frame to redshift. I am getting following stl_load_errors:
Invalid date format - length must be 10 or more
All my dates are 2016-10-21 format, thus have length of 10. More over, I have ensured that no row has any messed up format like 2016-1-8 where it can have only 8 character. So the error is not making sense.
Any one faced similar error while writing data to redshift ? Any explanation ?
Note:
Here's some context. I am running the python script from EC2. This script writes the data in json format to S3 bucket and then this json is uploaded to an empty redshift table. The redshift table describes the date columns as 'date' format. I know there's another way which uses boto3/copy but for now I am stuck to this method.

Resources