We use node-mssql and we're trying to send an array of data to a stored procedure.
Unlikely the TVP which seems a little bit complex, we found this bulk method which is very interesting, but all the examples that we found create a new table instead of pushing data to a stored procedure.
Is there a way to use it to get the bulk results in a stored procedure?
Our SQL Server version is 2012. Really appreciate any help in advance.
If table valued parameters, seems very complex to pass from node.js, alternatively, you can have a staging table and bulk load data into the staging table and you can process the bulkload table inside the stored procedure.
Follow below steps:
BULK LOAD INTO Staging Table
Process the staging table inside the stored procedure
Clean up the Staging table inside the stored procedure, once you are done with the processing.
Related
I want to write the data from a PySpark DataFrame to external databases, say an Azure MySQL database. So far, I have managed to do this using .write.jdbc(),
spark_df.write.jdbc(url=mysql_url, table=mysql_table, mode="append", properties={"user":mysql_user, "password": mysql_password, "driver": "com.mysql.cj.jdbc.Driver" })
Here, if I am not mistaken, the only options available for mode are append and overwrite, however, I want to have more control over how the data is written. For example, I want to be able to perform update and delete operations.
How can I do this? Is it possible to say, write SQL queries to write data to the external databases? If so, please give me an example.
First I suggest you use the specific Azure SQL connector. https://learn.microsoft.com/en-us/azure/azure-sql/database/spark-connector.
Then I recommend you use bulk mode as row by row mode is slow, and can incur unexpected charges if you have log analytics turned on.
Lastly, for any kind of data transformation, you should use an ELT pattern:
Load raw data into an empty staging table
Run SQL code, or even better, a stored procedure which performs required logic (for example merging into a final table) run DML such as a stored proc
I'm working with two environments in Azure: Databricks and SQL Database. I'm working with a function that generate a dataframe that it's going to be used to overwrite the table that is stored in the SQL Database. I have many problems because the df.write.jdbc(mode = 'overwrite') only drops the table and, I'm guessing, my user didn't have the right permissions to created again (I've already seen for DML and DDL permission that I need to do that). In resume, my functions only drops the table but without recreating again.
We discuss about what could be the problem and we conclude that maybe the best thing that I can do is truncate the table and re-add the new data there. I'm trying to find how to truncate the table, I tried these two approaches but I can't find more information related to that:
df.write.jdbc()
&
spark.read.jdbc()
Can you help me with these? The overwrite doesn't work (maybe I don't have the adequate permissions) and I can't figure out how to truncate that table using a jdbc.
It's in the Spark documentation - you need to add the truncate when writing:
df.write.mode("overwrite").option("truncate", "true")....save()
Also, if you have a lot of data, then maybe it's better to use Microsoft's Spark connector for SQL Server - it has some performance optimizations that should allow to write faster.
You can create stored procedure for truncating or dropping in SQL Server and call that stored procedure in databricks using ODBC connection.
I pulled data from sharepoint to sql database through SSIS package
I need to schedule this for every 10 minutes everything is good
Every time i run package.we are having duplicate records
I need to pull only updated and new items only to sql
I have applied composit primary key option at destination its not working
Please help me
Without knowing much about the details of what you are doing, two things come to mind.
Put constraints on your database so that duplicates aren't allowed.
Better a foreign key violation or constraint error than duplicate
data arriving.
If you are utilizing an execute sql task, try using a merge
statement.
I am not sure if I understand the idea of reference tables correctly. In my mind it is a table that contains the same data in every shard. Am I wrong? I am asking because I have no idea how should I insert data to the reference table to make the data multiply in every shard. Or maybe it is impossible? Can anyone clarify this issue?
Yes, the idea of a Reference Table is that the same data is contained on every shard. If you have small numbers of shards and data changes are rare, you can open multiple connections in your application and apply the changes to multiple DBs simultaneously. Or you can construct a management script that iterates periodically across all shards to update reference data or performs a bulk-insert of a fresh image of rows.
The new feature previewing in Azure SQL Database called Elastic DB Jobs, which allows you to define a SQL script for operations that you want to take place on all shards, and then runs the script asynchronously with eventual completion guarantees. You can potentially use this to update reference tables. Details on the feature are here.
I am using a big stored procedure which is using many linked server queries. If i run this stored procedure manually it runs fine but if i call this stored procedure with exe using mufti-threading, it is raising "Cannot get the data of the row from the OLE DB provider "SQLNCLI11" for linked server "linkedserver1". and "Row handle referred to a deleted row or a row marked for deletion." for each execution. Performance of stored procedure is also very slow in comparison of same stored procedure without linked server queries. Please provide me some tips to improve performance of stored procedure and fix the issue mentioned above.
Thanks
If you are querying over linked servers, you will see a decrease in performance. Could it be possible that the procedures are affecting the same results - therefore giving you exceptions? If so you might be looking at dirty reads. Is that OK for your result set?
From the looks of it you seem to have to call the procedures sequentially and not in parallel. What you can do is cache the data on a server, and sync the updates etc, in batches.