BigQuery load parquet file

BigQuery load parquet file - python-3.x

I am trying to load a parquet type file into the bigquery table.
However, the date type column in the yyyy-mm-dd format is recognized as a String, and the following error occurs.
ERROR - Failed to execute task: 400 Provided Schema does not match Table my_prj.my_dataset.my_table. Field _P1 has changed type from DATE to STRING.
field_P1: 2022-10-05
Is there any way to solve it?

The first solution that comes to my mind is to load the data using Pandas library in Python. This way, you can convert string to related date format and load the data directly to BigQuery.

Related

Automatic change in data type when reopening CSV file

After changing Data type of column from General or Number to Text and saving as CSV file (column has Numbers only). When you reopen the file the data type is getting changed back to General automatically.
How to stop it getting changed automatically ? I need the change made in CSV file format for uploading to big query.
Thanks.
I tried VBA, data transformation in excel, Text function, Putting ' in front of number, Text to Columns option.

CSV has no data type but you can bigquery load with schema
load with schema
ex using bq load with schema string
bq load --source_format=CSV mydataset.mytable ./myfile.csv schema:STRING,string:FLOAT
or with schema file
bq load --source_format=CSV mydataset.mytable ./myfile.csv schema.json

Getting ORC file format error "It doesn't match the specified format" in Hive

I am trying to write data to hive table in ORC format and I am getting following error.
Job failed with message [pyspark.sql.utils.AnalysisException: The format of the existing table dbname.tablename is `HiveFileFormat`. It doesn't match the specified format `OrcDataSourceV2`.].
Following is hive write statement.
df.coalesce(1).write.partitionBy("date").mode('append').format('orc').saveAsTable(
'dbname.tablename'.format(dbname=destination_database))
I am not sure what is causing the issue but it is throwing above mentioned error. what is the purpose of format after tablename?
Thanks for your help and really appreciate.
Thanks,Bab

Big Query is not able to convert String to Timestamp

I have a BigQuery table where one of the column (publishTs) is timeStamp. I am trying to upload a parquet file into same table using GCP UI BQ upload option having same column name (publishTs) with String datatype (e.g. “2021-08-24T16:06:21.122Z “), But BQ is complaining with following error :-
I am generating parquet file using Apache Spark. I tried searching on internet but could not get the answer.

Try to generate this column as INT64 - link

Azure DF - when extracting a datetime from a database into a CSV it sometimes gets interpreted as a datetime2

When running an Azure Data Factory copy from CSV to a Synapse table we get intermittent Truncate errors. The destination table schema (in Synapse) is a mirror of the schema we originally extracted the data from.
What we find happening is that the original extract misinterpreted a datetime as a datetime2 and render the relevant field as such: 2019-10-07 11:22:31.4400000
When we run the copy from Azure Data Lake Storage Gen2 to the mirrored Synapse Table this schema has the field as a datetime.
The copy function attempts a conversion from string (being CSV and all) into datetime (as that is the same as the originating table) but fails. (Error: Conversion failed when converting date and/or time from character string.)
Interestingly this issue is intermittent - the original datetime field is sometimes correctly rendered into the CSV as: 2019-10-07 11:22:31.440 (go figure).
We have limited desire to refactor all our SQL Db Schemas into datetime2 data types (for obvious reasons).
Anyone know if we are missing something here?

Try changing the Mapping of the source to Datetime:
specify the date format "yyyy-MM-dd"
Run the pipeline
Alternatively:
Change the mapping of date format to string
Use the stored procedure approach to insert/copy data

Reading XLSX data in SpringBatch

I have an xlsx file that has to be read and the date field needs to be written to MYSQL DateTime column.
The date in the excel file is in the format " 2018-08-06 16:32:58"
But when I read it using PoiItemReader and then convert it in a custom rowmapper, I get the below exception :
java.text.ParseException: Unparseable date: "1533553378000"
at java.text.DateFormat.parse(DateFormat.java:366)
at org.springframework.batch.item.excel.RowMapperImpl.mapRow(RowMapperImpl.java:63)
I feel that this is due to PoiItemReader not being able to read the date field correctly. Please note that I have tried converting it into sql date using SDF.
Code: https://github.com/vishwabhat19/TimeWorkedData.git
Should i be using XSSFWorkbook instead ? And if I do how would i push this into a Reader? My project is a spring batch project and it needs a InputReader object.
Thank you in advance.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

BigQuery load parquet file - python-3.x

The first solution that comes to my mind is to load the data using Pandas library in Python. This way, you can convert string to related date format and load the data directly to BigQuery.

Related

Automatic change in data type when reopening CSV file

Getting ORC file format error "It doesn't match the specified format" in Hive

Big Query is not able to convert String to Timestamp

Azure DF - when extracting a datetime from a database into a CSV it sometimes gets interpreted as a datetime2

Reading XLSX data in SpringBatch

Categories

Resources