How to perform Event based data ingestion using Azure Data Lake Storage Gen2 and Azure Data factory V2? - azure

Recently we came across a scenario where our source and sink location are of ADLS Gen2 type. Now we got one interesting use case wherein we have to push data from source to sink with the help of ADF V2. Having said that, its not just normal copy activity we are expecting but we need to perform this activity on an event basis.
While going through the ADLS Gen2 documents found that ADLS Gen2 yet to support "Azure Event Grids" and that's the reason though we are able to configure ADF's event-based triggers they did not work.
Can anyone suggest me to tackle this situation, since Azure Event Gird is not supported at this instance of time we don't believe we can achieve this with Azure Event Hubs and their integration with ADF?
Thanks.

From my repro, currently event based trigger are supported only on v2 storage accounts.
Data Factory is now integrated with Azure Event Grid, which lets you trigger pipelines on an event.
Note: This integration supports only version 2 Storage accounts (General purpose).
Azure Event Grid doesn't receive events from Azure Data Lake Gen2 accounts because those accounts don't yet generate them.
For more details, refer “Known issues with Azure Data Lake Storage Gen2”.

Related

Using Azure Data Factory to migrate Salesforce data to Dynamics 365

I'm looking for some advice around using Azure Data Factory to migrate data from Salesforce to Dynamics365.
My research has discovered plenty of articles about moving salesforce data to sinks such as azure data lakes or blob storage and also articles that describe moving data from azure data lakes or blob storage into D365.
I haven't found any examples where the source is salesforce and the sink is D365.
Is it possible to do it this way or do I need to copy the SF data to an intermediate sink such as Azure Data Lake or blob storage and then use that as the source of a copy/dataflow to then send to D365?
I will need to perform transformations on the SF data before storing it in D365.
Thanks
I would recommend to add ADLS Gen 2 as a Stage between SalesForce and D365
I am afraid that a direct sink as D365 can be done

Migrate data from Azure data lake in one subscription to another

I have been looking for options to migrate data present in my ADLS in one subscription to ADLS in another subscription within Azure. I tried ADF for this purpose and it worked fine.
But the copy speed is too slow in ADF. It copies at a speed of 10-15 KB/sec. Is there some way to increase speed of copy while using ADF?
Yes, there is a way you can migrate data from Azure Data Lake between different subscription: Data Factory.
No matter Data Lake Gen1 or Gen2, Data Factory all support them as the connector. Please ref these tutorials:
Copy data to or from Azure Data Lake Storage Gen1 using Azure Data Factory.
Copy and transform data in Azure Data Lake Storage Gen2 using Azure
Data Factory.
You can create the source and sink dataset in different subscription through linked service:
But this option may cost you some money. You also could ref the Azure Az-copy tutorials: Copy blobs between Azure storage accounts by using AzCopy.
Here is another blog How To Copy Files From One Azure Storage Account To Another:
In this post, Bloger will outline how to copy data from one Azure
Storage Account in one subscription to another Storage Account in
another subscription.
These maybe what you're looking for.

Is there any possibility to transfer data from Azure data lake gen2 to Azure event hub by using Azure data factory?

Is there any possibility to transfer data from Azure data lake gen2 to Azure event hub by using Azure data factory? Is there any alternative ways to to preserve same folder structure in Event hub once transfer to Event hub from Data Lake?
Azure Data Factory support Azure data lake gen2 but doesn't support Azure event hut now.
Please see Azure Data Factory connector overview.
Hope this helps.
There is no direct connection to Event Hub, but you can use this service to see what IO direct endpoints are available and use the IO tree to see how you can connect multiple services

How to trigger a pipeline in Azure Data Factory v2 or a Azure Databricks Notebook by a new file in Azure Data Lake Store gen1

I am using a Azure Data Lake Store gen1 for storing JSON files. Based on these files i have Notebooks in Azure Databricks for processing them. Now i want to trigger such a Azure Databricks Notebook when a new file is creating in Azure Data Lake Store gen1. I couldnt find any Trigger which could do this. do you know any way?
Currently, this is not yet implemented/Supported by Microsoft. But it is on their Roadmap(I believe).
You can do this in 2 ways,
Azure Functions(through Event Grid)
Logic Apps
Option #1
Currently, Microsoft is building on #1.
You can track the issue here.
As per this
This feature is not a high priority for us right now, but I will note
that the announcement for Azure Event Grid listed Data Lake as one of
the integrations they are building. Once you can subscribe to Data
Lake updates through Event Grid, running an Azure Function would be
trivial (see here for some info).
You can vote your voice to support the event grid (provider) in DataLake.
Option #2
This is also not yet implemented, but you can Upvote your voice here to support this feature

Connect Azure Event Hubs with Data Lake Store

What is the best way to send data from Event Hubs to Data Lake Store?
I am assuming you want to ingest data from EventHubs to Data Lake Store on a regular basis. Like Nava said, you can use Azure Stream Analytics to get data from EventHub into Azure Storage Blobs. Thereafter you can use Azure Data Factory (ADF) to copy data on a scheduled basis from Blobs to Azure Data Lake Store. More details on using ADF are available here: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-datalake-connector/. Hope this helps.
==
March 17, 2016 update.
Support for Azure Data Lake Store as an output for Azure Stream Analytics is now available. https://blogs.msdn.microsoft.com/streamanalytics/2016/03/14/integration-with-azure-data-lake-store/ . This will be the best option for your scenario.
Sachin Sheth
Program Manager, Azure Data Lake
In addition to Nava's reply: you can query data in a Windows Azure Blob Storage container with ADLA/U-SQL as well. Or you can use the Blob Store to ADL Storage copy service (see https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/).
One way would be to write a process to read messages from the event hub event hub API and writes them into a Data Lake Store. Data Lake SDK.
Another alternative would be to use Steam Analytics to get data from Event Hub into a Blob, and Azure Automation to run a powershell that would read the data from the blob and write into a data lake store.
Not taking credit for this, but sharing with the community:
It is also possible to archive the Events (look into properties\archive), this leaves an Avro blob.
Then using the AvroExtractor you can convert the records into Json as described in Anthony's blob:
http://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/
One of the ways would be to connect your EventHub to Data Lake using EventHub capture functionality (Data Lake and Blob Storage is currently supported). Event Hub would write to Data Lake every N mins interval or once data size threshold is reached. It is used to optimize storage "write" operations as they are expensive on a high scale.
The data is stored in Avro format, so if you want to query it using USQL you'd have to use an Extractor class. Uri gave a good reference to it https://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/.

Resources