Azure Synapse - Select Insert - azure

This is my 1st time working with Azure synapse and It seems Select Insert is not working, is there any workaround for this one, where I will just use select statement and then dump it into a temporary table?
here are the error prompted
The query references an object that is not supported in distributed processing mode.
and this is my query
Select *
Into #Temp1
FROM [dbo].[TblSample]
This is the azure synapse we are currently using
ondemand-sql.azuresynapse.net

In Synapse On-Demand, the use of Temporary Tables is limited. In your case I am assuming that dbo.TblSample is an External Table which is possibly why you are facing this restriction.
Instead of using a Temp Table, can you either just JOIN the TblSample directly or use a CTE if you are SELECTing specific rows and columns?

Related

Is there a way to calculate the number of rows by table, schema and catalog in Databricks SQL (Spark SQL)?

I need to create a dashboard inside Databricks that summarizes the number of rows in the current workspace right now.
Is there a way to create a SQL query to calculate the number of rows by table, schema, and catalog? The expected result would be:
Catalog
Schema
Table
Rows
example_catalog_1
Finance
table_example_1
1567000
example_catalog_1
Finance
table_example_2
67000
example_catalog_2
Procurement
table_example_1
45324888
example_catalog_2
Procurement
table_example_2
89765987
example_catalog_2
Procurement
table_example_3
145000
Currently, I am working on a pure SQL workflow. So I would like to understand if it's possible to execute such an action using SQL, because as much as I know, the dashboards in Databricks do not accept PySpark Codes.
I was looking for a way to do that. I know that it's possible to access the tables in the workspace by using system.information_schema.tables but how to use it to count to total rows for each table presented there?
I was checking that via SQL Server it's possible via sys schema, dynamic query, or BEGIN...END clause. I couldn't find a way in Databricks to do that.
I strongly doubt if you can run that kind of query in the databricks dashboard . The link shared by #Sharma is more as to how to get the record count using dataframe and not how to link that with the databricks dashboard .

How to copy data from ADF to Synapse with special character("Ã") in one of the fields without errors or record rejection

I have a scenario where I copy data from Azure storage account(CSV - Pipe delimited source file) to Azure Synapse using ADF Copy utility. However the pipeline is failing because three of the records has special character "Ã" in one of the character field. Tried different encodings UTF-8,UTF-16 and Windows-1252, but none of them resolved my issue. I have also tried direct Copy utility(Copy into "") within Azure Synapse and getting the same error. I am able to manually insert those three records using "Insert into " statement.
Is there a better way to handle this without Manual inserts through something like doing pre conversion of that character before copy or through any available settings in ADF?
Please re-check the format settings for the source csv dataset as given in this Microsoft Documentation.
I reproduced this in my environment, and I am able to copy the csv data with special characters into Synapse with default UTF-8 encoding.
This is my source csv with special characters:
I have created a table named mytable in Synapse.
create table mytable (
firstname VARCHAR(32),
lastname VARCHAR(32)
)
WITH
(
DISTRIBUTION = HASH (firstname),
CLUSTERED COLUMNSTORE INDEX
)
GO
In the source give the format settings as per the above documentation.
Here I have used copy command to copy. If you want to create table automatically, you can enable it in the sink.
Copied data in Synapse table:

Delta Load on BSEG table into AZure using SAP table conenctor

We are using SAP ABAP oracle environment.I'm trying to implement Change Data capture for the SAP BSEG table in Azure datafactory using SAP table connector. In SAP table connector, I don't see an option to pass any join conditions. Based on what fields we can capture the CDC on BSEG table.
BSEG is a cluster table.
It dates back to R2 days on Mainframes.
See Se11 BSEG --> Menu option Database Object --> Database utility.
Run Check.
It will most likely say NOT ON DATABASE.
If you want to access the data via views see one of the numerous index tables.
BSxx description Accounting: Secondary Index for xxxxx
These so called Index tables are separate tables that behave like indexes
on bseg but arent true indexes as cluster tables can not have indexes.
The index tables are real tables you can access with joins/views.
The document number can be used read BSEG later should that still be necessary.
You may find FI_DOCUMENT_READ and BKPF useful too.
In theory the Index tables should be enough.
From the SAP Table connector help:
Currently SAP Table connector only supports one single table with the default function module. To get the joined data of multiple tables, you can leverage the customRfcReadTableFunctionModule property in the SAP Table connector following steps below
...
So no, table joins are not supported by default, you need to write in SAP backend a custom FM with the predefined interface. The interface to do is described in the help.
If you use Azure Data factory to Azure Data Explorer doing big tables like BSEG can be done with a work around.
Although BSEG is a cluster of tables in SAP, from the SAP Connector point of view it is a table with rows and columns which can be partitioned.
Here is an example for MSEG which is similar.
MSEG_Partitioned
Kind Regards
Gauchet

Delta tables in Databricks and into Power BI

I am connecting to a delta table in Azure gen 2 data lake by mounting in Databricks and creating a table ('using delta'). I am then connecting to this in Power BI using the Databricks connector.
Firstly, I am unclear as to the relationship between the data lake and the Spark table in Databricks. Is it correct that the Spark table retrieves the latest snapshot from the data lake (delta lake) every time it is itself queried? Is it also the case that it is not possible to effect changes in the data lake via operations on the Spark table?
Secondly, what is the best way to reduce the columns in the Spark table (ideally before it is read into Power BI)? I have tried creating the Spark table with specified subset of columns but get a cannot change schema error. Instead I can create another Spark table that selects from the first Spark table, but this seems pretty inefficient and (I think) will need to be recreated frequently in line with the refresh schedule of the Power BI report. I don't know if it's possible to have a Spark delta table that references another Spark Delta table so that the former is also always the latest snapshot when queried?
As you can tell, my understanding of this is limited (as is the documentation!) but any pointers very much appreciated.
Thanks in advance and for reading!
Table in Spark is just a metadata that specify where the data is located. So when you're reading the table, Spark under the hood just looking up in the metastore for information where data is stored, what schema, etc., and access that data. Changes made on the ADLS will be also reflected in the table. It's also possible to modify table from the tools, but it depends on what access rights are available to the Spark cluster that processes data - you can set permissions either on the ADLS level, or using table access control.
For second part - you just need to create a view over the original table, and that view will select only limited set of columns - the data is not copied and latest updates in the original table will be always available for querying. Something like:
CREATE OR REPLACE VIEW myview
AS SELECT col1, col2 FROM mytable
P.S. If you're only accessing via PowerBI or other BI tools, you may look onto Databricks SQL (when it will be in the public preview) that is heavily optimized for BI use cases.

Azure SQL External Table alternatives

Azure external tables between two azure sql databases on the same server don't perform well. This is known. I've been able to improve performance by defining a view from which the external table is defined. This works if the view can limit the data set returned. But this partial solution isn't enough. I'd love a way to at least nightly, move all the data that has been inserted or updated from the full set of tables from the one database (dbo schema) to the second database (pushing into the altdbo schema). I think Azure data factory will let me do this, but I haven't figured out how. Any thoughts / guidance? The copy option doesn't copy over table schemas or updates
Data Factory Mapping Data Flow can help you achieve that.
Using the AlterRow active and select an uptade method in Sink:
This can help you copy the new inserted or updated data to the another Azure SQL database based on the Key Column.
Alter Row: Use the Alter Row transformation to set insert, delete, update, and
upsert policies on rows.
Update method: Determines what operations are allowed on your
database destination. The default is to only allow inserts. To
update, upsert, or delete rows, an alter-row transformation is
required to tag rows for those actions. For updates, upserts and
deletes, a key column or columns must be set to determine which row
to alter.
Hope this helps.

Resources