How to transfer csv files from Google Cloud Storage to Azure Datalake Store - azure

I'd like to have our daily csv log files transferred from GCS to Azure Datalake Store, but I can't really figure out what would be the easiest way for it.
Is there a built-in solution for that?
Can I do that with Data Factory?
I'd rather avoid running a VM scheduled to do this with the apis. The idea comes from the GCS->(DataFlow->)BigQuery solution.
Thanks for any ideas!

Yes, you can move data from Google Cloud Storage to Azure Data lake Store using Azure Data Factory by developing custom copy activity. However, in this activity, you will be using APIs for transferring that data. See details on this article.

Related

Any best alternative for Azcopy for data movement?

I have 100s of TB data to move from S3 to blob storage. Is there any best alternative of Azcopy? Because Azcopy use high bandwidth and have high CPU usage. I don't want to use It. In Azcopy v10 still these issues are coming after applying the required parameters. Can someone help me in this regard, I did R&D on It but not found any alternate.
I agree with #S RATH.
For big data moving, Data Factory is the best alternative of Azcopy. It has the better Copy performance :
Data Factory support Amazon S3 and Blob Storage as the connector.
With Copy active, You could create the Amazon S3 as the source dataset and Blob Storage as Sink dataset.
Ref these tutorials:
Copy data from Amazon Simple Storage Service by using Azure Data
Factory: This article outlines how to copy data from Amazon Simple Storage Service (Amazon S3). To learn about Azure Data Factory, read the introductory article.
Copy and transform data in Azure Blob storage by using Azure Data
Factory: This article outlines how to use the Copy activity in Azure Data Factory to copy data from and to Azure Blob storage. It also describes how to use the Data Flow activity to transform data in Azure Blob storage. To learn about Azure Data Factory, read the introductory article.
Data Factory also provide many ways to improve the data copy performance, ref: Copy activity performance and scalability guide
I thinks it will help you save much time, as we know, time is money.

Azure Lake to Lake transfer of files

My company has two Azure environments. The first one was a temporary environment and is being re-purposed / decommissioned / I'm not sure. All I know is I need to get files from one Data Lake on one environment, to a DataLake on another. I've looked at adlcopy and azcopy and neither seem like they will do what I need done. Has anyone encountered this before and if so, what did you use to solve it?
Maybe you can think about Azure Data Factory, it can helps you transfer files or data from one Azure Data Lake to Another Data Lake.
You can reference Copy data to or from Azure Data Lake Storage Gen2 using Azure Data Factory.
This article outlines how to use Copy Activity in Azure Data Factory to copy data to and from Data Lake Storage Gen2. It builds on the Copy Activity overview article that presents a general overview of Copy Activity.
For example, you can learn from this tutorial: Quickstart: Use the Copy Data tool to copy data.
In this quickstart, you use the Azure portal to create a data factory. Then, you use the Copy Data tool to create a pipeline that copies data from a folder in Azure Blob storage to another folder.
Hope this helps.

Logic Apps - Get Blob Content Using Path

I have a event driven logic app (blob event) which reads a block blob using the path and uploads the content to Azure Data Lake. I noticed the logic app is failing with 413 (RequestEntityTooLarge) reading a large file (~6 GB). I understand that logic apps has the limitation of 1024 MB - https://learn.microsoft.com/en-us/connectors/azureblob/ but is there any work around to handle this type of situation? The alternative solution I am working on is moving this step to Azure Function and get the content from the blob. Thanks for your suggestions!
If you want to use an Azure function, I would suggest you to have a look at this at this article:
Copy data from Azure Storage Blobs to Data Lake Store
There is a standalone version of the AdlCopy tool that you can deploy to your Azure function.
So your logic app will call this function that will run a command to copy the file from blob storage to your data lake factory. I would suggest you to use a powershell function.
Another option would be to use Azure Data Factory to copy file to Data Lake:
Copy data to or from Azure Data Lake Store by using Azure Data Factory
You can create a job that copy file from blob storage:
Copy data to or from Azure Blob storage by using Azure Data Factory
There is a connector to trigger a data factory run from logic app so you may not need azure function but it seems that there is still some limitations:
Trigger Azure Data Factory Pipeline from Logic App w/ Parameter
You should consider using Azure Files connector:https://learn.microsoft.com/en-us/connectors/azurefile/
It is currently in preview, the advantage it has over Blob is that it doesn't have a size limit. The above link includes more information about it.
For the benefit of others who might be looking for a solution of this sort.
I ended up creating an Azure Function in C# as the my design dynamically parses the Blob Name and creates the ADL structure based on the blob name. I have used chunked memory streaming for reading the blob and writing it to ADL with multi threading for adderssing the Azure Functions time out of 10 minutes.

Export Azure application Insight log files to Azure Data Lake storage

I am beginner of the azure portal , I configured the Azure Application insight in front-end side (Angular 2) and Back-end side (Asp.net core)
I can track my application log file through azure application insight,and export the xls sheet also http://dailydotnettips.com/2015/12/04/exporting-application-insights-data-to-excel-its-just-a-single-click/ ,But i need to store all my log file into azure data lake storage for the Backup tracking purpose
I need to debug the issue on my application while facing issues.but i got the link https://learn.microsoft.com/en-us/azure/application-insights/app-insights-code-sample-export-sql-stream-analytics and Can I download data collected by Azure Application Insights (events list)? continues export for sql,blob storage,i dont want unwanted storage for storing my data in azure resources.
So If there is any way for connect application insight to Azure Data lake through connector or plugins.IF its could you please share me the link.
Thank you..
Automatic
If you export the events to azure blob storage you can do multiple things:
Use Azure Data Factory to copy the data from blob storage to Azure Data Lake
Use AdlCopy to copy the data from blob storage to Azure Data Lake
Write an U-Sql job to copy the data to Azure Data Lake
Manual
To manually place exported Application Insights data (in .xls format) you can use the portal to upload the file to Azure Data Lake.
If you need to have more control about the exported data you can use Application Insights Analytics to create a query based on the available data and export it to an .xls file.
If course you can also create a small app to export the .xls file to Azure Data Lake if you do not want to upload it using the portal. You can use the api for that.

Bulk upload Excel to SQL Azure daily

I have a requirement to bulk upload data from a excel file to an Azure SQL table on a daily basis. I did some research and found that we could create a VM install full SQL and use SSIS package to do this.
Is there any other reliable way to go about this? The excel may contain up to 10,000 rows.
I have also read we could upload file to a blob storage and read from there but found it's not very robust approach.
Can anyone suggest if this is feasible approach-
Place excel file in Azure Website accessed via FTP
Azure Timer job using SQL Bulk copy code to update the SQL table
Any help would be highly appreciated!
You could use Azure Data Factory - check out the documentation here. Place your files in Azure Data Lake and the ADF will process them.

Resources