How to truncate Dynamics 365 entities with Data Factory (and copy to Azure data lake)? - azure

I am currently using a Data Factory to copy entities from Dynamics 365 in bulk to an Azure Data Lake. The entities are saved as CSV files in the Data Lake every 24 hours.
Instead of bulk copying, I would like to truncate entities to new data and append to the files that already exist in the data lake.
I think this is a common operation for SQL databases, but can this be done between Dynamics 365 and a Data Lake?

You could add a filter to your queries to get those records that have been modified within the last 24 hours.
Additionally you can setup Dynamics to replicate its data to an external SQL database.
Replicate data to Azure SQL Database

Azure Data Lake storage Gen2 as a source type only support three Copy behaviors.
I tried the three cope behaviors, they all could not help you append to the files that already exist in the data lake. If you choose the exist file, when the copy active completed, the exist file will be overwrite.
Fore more details, you can reference: Azure Data Lake storage Gen2 as a source type.
It can not be done between between Dynamics 365 and a Data Lake with Azure Data Factory.
Thanks for James Wood provided a good solution for us. And Combine my answer and his , the problem will be solved.
Hope this helps.

Related

Using Azure Data Factory to migrate Salesforce data to Dynamics 365

I'm looking for some advice around using Azure Data Factory to migrate data from Salesforce to Dynamics365.
My research has discovered plenty of articles about moving salesforce data to sinks such as azure data lakes or blob storage and also articles that describe moving data from azure data lakes or blob storage into D365.
I haven't found any examples where the source is salesforce and the sink is D365.
Is it possible to do it this way or do I need to copy the SF data to an intermediate sink such as Azure Data Lake or blob storage and then use that as the source of a copy/dataflow to then send to D365?
I will need to perform transformations on the SF data before storing it in D365.
Thanks
I would recommend to add ADLS Gen 2 as a Stage between SalesForce and D365
I am afraid that a direct sink as D365 can be done

When to use Data Factory (copy) over direct pull in SQL synapse

I am just going through some Microsoft Document and doing handOn for Data engineering related things.
I have couple of queries for a scenrerio - "copy CSV file(s) from Blob storage to Synapse analytics (stage table(s)):
I read that we can do direct data pull in Synapse with the process of creating external tables. (https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/load-data-wideworldimportersdw)
If above is possible, then in what cases we do use Azure Data factory Copy or data flow method?
While working with Azure data factory, is it a good idea to use Polybase, because it will use Blob storage again as staging in this scenrerio (i.e. I am copying file from Blob only and again using blob for staging)?
I searched for answers to my queries but haven't found any satisfactory answer yet.
If you're just straight loading data from CSV into DW, use Copy. Polybase is recommended, but not always needed for small files.
If you need to transform that data or perform updates, then use data flows.

How to decide between Azure Data Lake vs Azure SQL vs Azure Data Lake Analytics vs Azure SQL VM?

I am new to Azure and hence trying to understand what services to use when and how.
At the moment, I have one excel file that has couple of tabs that require some transformation to create one excel file tab (inside the source file itself - say Tab "x"). The final tab "x" created is then being useful for creating one final excel file that is shared to various team.
At present, everything is done manually.
This needs to change and the excel file shared to team has to be automated. The source of the file is the excel file that has various tabs (excluding tab "x") and the reporting tool will be SSRS with excel data being stored in cloud.
Keeping this scenario in mind, what is the best way to store excel data into cloud? The excel data will be stored in cloud on a monthly basis. I am confused as to whether to store data in Azure-SQL, Azure Data Lake Gen 2 or Azure Data Lake Analytics or Azure SQL VM?
Every month data can be fetched from Excel file and populate into Azure using azure data factory. But I am not sure what is the best way to store data in the cloud considering the fact that some ETL process is needed to generate data in format similar to tab "X".
I think you can think about to using Azure SQL database.
Azure SQL database or SQL server support you import data from the excel( or csv) files. For more details and limits, please see: Import data from Excel to SQL Server or Azure SQL Database.
If your data have stored in Azure SQL database, you also can using EXCEL to get the data from Azure SQL database:
Connect Excel to a single database in Azure SQL Database and import data and create tables and charts based on values in the database. In this tutorial you will set up the connection between Excel and a database table, save the file that stores data and the connection information for Excel, and then create a pivot chart from the database values.
Reference: Import data from Excel to SQL Server or Azure SQL Database.
I think you don't need to store these excel files in Azure Data Lake.Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. It's still a storage.
The more Azure resource you use, the more cost you need to pay.
If your excel file stored in you local computer, you can using Azure Data Factory to access these local files or with self host integration runtime.
Please referenceļ¼š Copy data to or from a file system by using Azure Data Factory.
Hope this helps.
Your storage requirements are very minimal, so I would select Data Lake to store your documents. The alternative is Blob Storage, but I always prefer Data Lake because it works with Azure Active Directory.
In your scenario, drop it in the ADL, and use the ADL as the source in Azure Data Factory.
Edit:
Honestly, your original post is a little confusing. You have a RAW Excel document, you do some transformations on the RAW document, to generate an Excel Source document. This source document holds the final dataset that the dev team will use to build out SSRS reports. You need to make this dataset available to the teams so that they can connect to it to build the reports? My suggestion is to keep it simple and drop the final source dataset in Excel format, into blob or data lake storage and then ask the dev guys to pick it up from the location. If you are going the route of designing and maintaining a data pipeline (Blob > Data Factory > SQL, or CSV, TSV - then you are introducing unnecessary complications.

How to use Data Factory to ingest all Dynamics 365 entities to a Data Lake?

I'm currently using a Data Factory (V2) to copy a few entities from Dynamics 365 to an Azure Data Lake (Gen1).
So far I've just been creating each sink dataset individually as they become relevant. But there are hundreds of potential entities to copy and setting that up with my current process will be ridiculously time consuming.
Is it possible (or is there a better way) to copy all the entities to a data lake?
Maybe the Dynamics 365 Data Export Service is helpful in your Case. It allows you to easly Export Dynamics 365 Tables to Azure SQL. I never done it by myself, but it seems very easy to Setup. Maybe it is easier to move stuff from an Azure SQL DB into the DataLake.
https://learn.microsoft.com/en-us/dynamics365/customer-engagement/admin/replicate-data-microsoft-azure-sql-database
https://community.dynamics.com/365/b/dynamicspeople/archive/2017/05/29/dynamics-365-data-export-service-with-azure-sql-database
Microsoft have just announced an export to data lake feature that may help with your scenario. It's currently in preview.
https://powerapps.microsoft.com/en-us/blog/exporting-cds-data-to-azure-data-lake-preview/

Are we able to use Snappy-data to Update a record in Azure Data lake ? OR is Azure data lake append only?

I am currently working on azure data lake with snappy-data integration,I have a query on snappy-data are we able to update the data in the snappy-data to azure data lake storage, or we can append only on the azure data lake storage i searched in forum but i can't reach for that proper solution on it,if any one know about that query on it please share it,thank you.
Azure Data Lake Store, much like HDFS, is an append only store. You can append to a file or replace it altogether. There is no way to update an existing file.
I've achieved MERGE style behaviour in USQL by using a Azure Data Lake table as the middle ground between input and output. Check out my blog post with the code showing how I did it with a series of joins.
https://www.purplefrogsystems.com/paul/2016/12/writing-a-u-sql-merge-statement/
This will give you append behaviour in your output.

Resources