Parquet file generated using polybase does not contain column names

Parquet file generated using polybase does not contain column names - azure

I have used the following query to create an external table in SQL 2016 with polybase.
CREATE EXTERNAL TABLE dbo.SampleExternal (
DateId INT NULL,
CalendarQuarter TINYINT NULL,
FiscalQuarter TINYINT NULL)
WITH (LOCATION='/SampleExternal.parquet',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=ParquetFile);
Inserted the data to external table from local table and the parquet file was successfully generated in azure container.But while reading the parquet file ,coulmn names are shown as col-0,col-1.Is there any way to
add original coumn names in parquet file as given in external tables.
Column Names

This seems 'as designed' in polybase. The consuming application has to do mapping from these numbered column names to meaningful column names. If the producing application is different than consumer app, they should handshake on the column mapping.

Related

Hive Create Table Reading CSV from S3 data spill

I’m trying to create a hive table from external location on S3 from a CSV file.
CREATE EXTERNAL TABLE coder_bob_schema.my_table (column data type)
ROW DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘s3://mybucket/path/file.CSV’
The resultant table has data from n-x fields spilling over to n which leads me to believe Hive doesn’t like the CSV. However, I downloaded the CSV from s3 and it opens and looks okay in excel. Is there a workaround like using a different delimiter?

Creating a multiple CSV files from a Single Table in Azure Data Factory

I have a Single SQL Table named an Employee Table. In Employee Table there are the following columns 1. ID, 2. Designation, 3. Salary, 4 Joining Date .. etc.
Now let say there are 200 records in this Employee Table, with 10 different designations of Employee. Now, I want to create a CSV file of this table using Azure Data Factory, based on(filter by) designations, i.e 10 CSV files(bcoz there are 10 different designations) should generate and store this in the storage account in Azure.
Kindly guide me in this.

You would need to follow the below flow chart:
LookUp Activity : Query : Select distinct designations from table
For each activity
Input : #activity('LookUp').output.value a) Copy activity i) Source : Dynamic Query Select * from t1 where designation=#item().designation
This should generate separate files for each Designation as needed
For more details:
How to export from sql server table to multiple csv files in Azure Data Factory

Another way: use Data Flow, set Key partition type and choose Designation as partition column at Source. Then choose Pattern as File name option and use employee[n].csv as file name.
Details:
1.set Key partition type and choose Designation as partition column at Source.
2.choose Pattern as File name option and use employee[n].csv as file name

Ingesting a CSV file thru Polybase without knowing the sequence of columns

I am trying to ingest a few CSV files from Azure Data Lake into Azure Synapse using Polybase.
There is a fixed set of columns in each CSV file and the column names are given on the first line. However, the columns can come in different ordering sequence.
In Polybase, I need to declare external table which I need to know the exact sequence of columns during design time and hence I cannot create the external table. Are there other ways to ingest the CSV file?

I don't believe you can do this directly with Polybase because as you noted the CREATE EXTERNAL TABLE statement requires the column declarations. At runtime, the CSV data is then mapped to those column names.
You could accomplish this easily with Azure Data Factory and Data Flow (which uses Polybase under the covers to move the data to Synapse) by allowing the Data Flow to generate the table. This works because the table is generated after the data has been read rather than before as with EXTERNAL.
For the sink Data Set, create it with parameterized table name [and optionally schema]:
In the Sink activity, specify "Recreate table":
Pass the desired table name to the sink Data Set from the Pipeline:
Be aware that all string-based columns will be defined as VARCHAR(MAX).

Need to insert csv source data to Azure SQL Database

I have a source data in csv. I created a sql table to insert the csv data. My sql table has primary key column & foreign key column in it. I cannot skip these 2 columns while mapping in Data factory. How to overcome this & insert data ?

Please refer to the rules in the Schema mapping in copy activity.
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a
column name that is specified in the output dataset "structure"
section. Either fewer columns or more columns in the "structure"
of sink dataset than specified in the mapping.
Duplicate mapping.
So,if your csv file does not cover all the columns in the sql database,copy activity can't work.
You could consider creating temporary table in sql database to match your csv file ,then use stored procedure to fill the exact table. Please refer to the detailed steps in this case to implement your requirement:Azure Data Factory mapping 2 columns in one column

How to map table from databricks to Azure Data Warehouse using external table?

I try to create external table in Azure DataWarehouse from table in Azure Databricks. I fail to convert a few column types such as date and dcimal.
Example of my table structure in the databricks:
Table schema in azure data warehouse:
CREATE EXTERNAL TABLE table.NAME (
aBooleanFlag BIT NULL
,bIntID int NULL
,cStringColumn VARCHAR(50)
,dDateColumns DATETIME null
,eMoneyAmount DECIMAL(13,3) null
)
WITH(DATA_SOURCE=[DS_DTS_LAKE], LOCATION=N'//Folder/Table/', FILE_FORMAT=[ParquetFileFormat], REJECT_TYPE=VALUE, REJECT_VALUE=0) ;
GO
What have i tried so for:
I've arranged column in alpabetic order as I have noticed sometimes external table does not map columns correctly
I converted all columns to string and table was created successfuly
I tried casting to different column data types such as DoubleType() or float without any luck. Instead, I get error message below:
Error message:
Msg 106000, Level 16, State 1, Line 38
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException:
I'm considering creating an external table again with all columns being string and then creating a view on top of it with proper conversion.
Please advise how to proper map data types or wheter view option is feasible.

I've found a solution just by accident. I've changed conversion to the following:
from DecimalType(13,3) to DecimalType(24,10)
from DateType to TimestampType

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parquet file generated using polybase does not contain column names - azure

This seems 'as designed' in polybase. The consuming application has to do mapping from these numbered column names to meaningful column names. If the producing application is different than consumer app, they should handshake on the column mapping.

Related

Hive Create Table Reading CSV from S3 data spill

Creating a multiple CSV files from a Single Table in Azure Data Factory

Ingesting a CSV file thru Polybase without knowing the sequence of columns

Need to insert csv source data to Azure SQL Database

How to map table from databricks to Azure Data Warehouse using external table?

Categories

Resources