Invoking a java method from Azure Data factory pipeline - azure

Im working on a project where i need to ingest EBCDIC files to Azure blob storage and transform it using Azure Paas Services.
High Level design is as follows:
Ingest file to Azure using AZcopy Do the transformation using ADF.
Before doing transformation i need to convert EBCDIC file to Ascii and the best solution is using JRecord which is written in java.
I created a custom solution to convert my files using JRecord and deployed it as an app in Azure.
Can some one help how can I invoke this app from an activity in Azure Data Factory pipeline?

Related

Azure Data Factory how to convert csv to pdf?

Is it possible to convert csv files into pdf using Azure Data Factory or any other Azure technology?
Converting CSV (or any other file type) to PDF is not supported in Azure data factory.
You can use Azure Logic apps or Azure functions and Microsoft Graph to convert a file to PDF.
Note: You can call an API or execute azure functions in azure data factory pipeline.

Microsoft Sharepoint and Snowflake integrations and automations

I'm trying to integrate the Sharepoint site with Snowflake. Read the data from SharePoint files and upload it to the snowflake.
We are not looking to use a separate integration tool.
One option that i can think of is loading these files into external stage object, external stage object is Blob Storage In case of Azure Cloud Service Or S3 Bucket in Case of Azure Cloud Service and then using Snowpipe to load continuously using these files into Snowflake. This is link you can refer to know more about Snowpipe.
https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html
Other option i can think of is , You would have to write dot net code in Share point to read the data files in sharepoint and using Snowflake Connector/Driver to Connect to Snowflake and load the data into it.

Upload to Azure Storage container from FileServer using Azure databricks

I want to upload binary files from Windows FileSystem to Azure blob. I achieved it with Azure data factory with the below steps
Installed integration run time on the FileSystem
Created a linked service in ADF as FileSystem
Created a binary dataset with the above linked service
Use CopyData activity in a ADF Pipeline, set the binary dataset as source and Azure Blob as Sink
Post upload, I am performing some ETL activities. So my ADF pipeline has two components,
Copy Data
Databricks Notebook
I am wondering if I could move the Copy Data fragment to Databricks?
Can we upload binary files from Windows FileSystem to Azure blob using Azure Databricks?
I think it is possible, you may have to do network changes
https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network

Generating and storing JSON files from the run-time parameters passed to Azure Data Factory v2 pipeline?

Can we create a file (preferably json) and store it in its supported storage sinks (like Blob, Azure Data Lake Service etc) using the parameters that are passed to Azure Data Factory v2 pipeline at run-time. I suppose it can be done via Azure Batch but it seems to be an overkill for such a trivial task. Is there a better way to do that?
Here are all the transform activities ADFv2 currently equips with, I'm afraid there isn't a direct way to create a file in ADFv2. You could leverage Custom activity to achieve this by running your customized code logic on an Azure Batch pool of virtual machines. Hope it'll help a little.

Logic Apps - Get Blob Content Using Path

I have a event driven logic app (blob event) which reads a block blob using the path and uploads the content to Azure Data Lake. I noticed the logic app is failing with 413 (RequestEntityTooLarge) reading a large file (~6 GB). I understand that logic apps has the limitation of 1024 MB - https://learn.microsoft.com/en-us/connectors/azureblob/ but is there any work around to handle this type of situation? The alternative solution I am working on is moving this step to Azure Function and get the content from the blob. Thanks for your suggestions!
If you want to use an Azure function, I would suggest you to have a look at this at this article:
Copy data from Azure Storage Blobs to Data Lake Store
There is a standalone version of the AdlCopy tool that you can deploy to your Azure function.
So your logic app will call this function that will run a command to copy the file from blob storage to your data lake factory. I would suggest you to use a powershell function.
Another option would be to use Azure Data Factory to copy file to Data Lake:
Copy data to or from Azure Data Lake Store by using Azure Data Factory
You can create a job that copy file from blob storage:
Copy data to or from Azure Blob storage by using Azure Data Factory
There is a connector to trigger a data factory run from logic app so you may not need azure function but it seems that there is still some limitations:
Trigger Azure Data Factory Pipeline from Logic App w/ Parameter
You should consider using Azure Files connector:https://learn.microsoft.com/en-us/connectors/azurefile/
It is currently in preview, the advantage it has over Blob is that it doesn't have a size limit. The above link includes more information about it.
For the benefit of others who might be looking for a solution of this sort.
I ended up creating an Azure Function in C# as the my design dynamically parses the Blob Name and creates the ADL structure based on the blob name. I have used chunked memory streaming for reading the blob and writing it to ADL with multi threading for adderssing the Azure Functions time out of 10 minutes.

Resources