I am trying to learn using the Azure Data Factory to copy data (a collection of csv files in a folder structure) from an Azure File Share to a Cosmos DB instance.
In Azure Data factory I'm creating a "copy data" activity and try to set my file share as source using the following host:
mystorageaccount.file.core.windows.net\\mystoragefilesharename
When trying to test the connection, I get the following error:
[{"code":9059,"message":"File path 'E:\\approot\\mscissstorage.file.core.windows.net\\mystoragefilesharename' is not supported. Check the configuration to make sure the path is valid."}]
Should I move the data to another storage type like a blob or I am not entering the correct host url?
You'll need to specify the host in json file like this "\\myserver\share" if you create pipeline with JSON directly or you use set the host url like this "\myserver\share" if you're using UI to setup pipeline.
Here is more info:
https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions
I believe when you create file linked service, you might choose public IR. If you choose public IR, local path (e.g c:\xxx, D:\xxx) is not allowed, because the machine that run your job is managed by us, which not contains any customer data. Please use self-hosted IR to copy your local files.
Based on the link posted by Nicolas Zhang: https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions and the examples provided therein, I was able to solve it an successfully create the copy action. I had two errors (I'm configuring via the data factory UI and not directly the JSON):
In the host path, the correct one should be: \\mystorageaccount.file.core.windows.net\mystoragefilesharename\myfolderpath
The username and password must be the one corresponding to the storage account and not to the actual user's account which I was erroneously using.
Related
I have an azure storage account.
Inside the container, with a client-specific folder structure, every morning, some files get pushed.
I have a function app which processes and converts these files and calls some external service to work upon on these processed files.
I have got a file-share as well, which is basically mounted on a vm.
The external service, after processing the files (#3), generates the resultant success/failure files inside this file-share(#4).
Now the ask is:
Create a simple dashboard which will monitor the storage account(and in effect the container and the file-shares),it should capture & show basic information's, and should look like below table structure(with 3 simple variations of data):
FileName|ReceivedDateTime|NumberOfRecords
Original_file.csv20221011 5:21 AM|10
Original_file_Success.csv20221011 5:31 AM|9
Original_file_Failure.csv20221011 5:32 AM|1
In here the first record is captured from the Container and the second and third - both are generated in the file-share.
Also, whenever a new failure file is generated, i.e., Original file_Failure, it should send email with a predefines template adding the file name to a predefined recipient list.
Any guidance on the azure service to use?
I have seen Azure Monitor,workbook and other stuffs, but I feel that would be an overkill for such simple requirement.
Thanks in advance.
I am going through Matillion Academy (Building a Data Warehouse). There is a slide deck to follow online and I am running my own instance of Matillion to recreate the building of the warehouse.
My Matillion is on Azure, as is my Snowflake database.
The training is AWS-based, but gives information about the adjustments needed for Azure or GS.
One of the steps shows how to Load data from blob storage. It is S3 based.
For Azure different components need to be used (as the S3 ones don't exist there), and data needs to be loaded from azure storage instead of S3 storage.
It also explains that for Snowflake on Azure yet another component needs to be used.
I have created a Stage in Snowflake:
CREATE STAGE "onlinemtlntrainingazure_flights"
URL='azure://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights'
The stage shows in Snowflake (external stage) and in Matillion (when using 'manage stages' on the database). The code is taken from the json file I imported to create the job to do this (see first step below).
I have created the target table in my database. It is accessible and visible in Matillion IDE.
The adjusted component I am to use is 'Azure Blob Storage Load'.
According to the documentation, I will need:
For Snowflake on Azure:
Create a Stage in Snowflake:
You should create a Stage in Snowflake which will be pointing to the
public data we provide. Please, find below the .json file containing
the job that will help you to do this. Don't forget to change the SQL
Script for pointing to your own schema
After Creating the Stage in Snowflake:
You should use the 'Create Table' and the 'Azure Blob Storage Load'
components individually as the 'Azure Blob Load Generator' won't let
you to select the Stage previously created. We have attached below the
Create Table metadata to save you some time.
'Azure Blob Storage Load' Settings:
Stage: onlinemtlntrainingazure_flights
Pattern: training_azure_flights_2016.gz
Target Table: training_flights
Record Delimiter: 0x0a
Skip Header: 1
The source data on Azure is located here:
Azure Blob Container (with flights data)
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2016.gz
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2017.gz
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2018.gz
Unfortunately, when using these settings on the 'Azure Blob Storage Load' component, it complains.
the stage does not appear in the list, and manually inputting the stage name yields an error (unrecognised option). prefixing the stage name with my schema (and even database) does not help.
azure storage location property does not accept the https://... URI to the data files. When I replace the 'https' by 'azure', or remove the part after the last '/' it complains with 'Unable to find an account with name: [onlinemtlntrainingazure]'
using [Custom] for stage property removes the error message, but when running the component, it comes back again with the 'Unable to find account'.
Any thoughts?
Edit: I found a workaround by using the Data Transfer Object, which first copies the files from the public https location to my own Azure blob location and then I process it further from there. But I would like to know how to do it as suggested in the training, and why it now fails.
The example files are in a storage account that your Azure Blob Storage Load Generator can not read from. But instead of using a Snowflake Stage, you might find it easier to just copy the files into a storage account that you do own, and then use the Azure Blob Storage Load Generator on the copied files.
In a Matillion ETL instance on Azure, you can access files over https and copy them into your own storage account using a Data Transfer component.
You already have the https:// source URLs for the three files, so:
Set the source type to HTTPS (no username or password is needed)
Add the source URL
Set the target type to Azure Blob Storage
In the example I used two variables, with defaults set to my storage account and container name
Repeat for all three files
After running the Data Transfer three times, you will then be able to proceed with the Azure Blob Storage Load Generator, reading from your own copies of the files.
My package is very simple. It is loading data from a csv file that I have stored in an Azure storage container, and inserting that data into an Azure SQL database. The issue is stemming from the connection to my Azure storage container. here is an image of the output:
Making this even more odd, while the data flow task is failing:
The individual components within the data flow task all indicate success:
Setting up the package, it seems that the connection to the container is fine (after all, it was able to extract all the column names from the desired file and map them to their destination). Here is an image showing the connection is fine:
So the issue is only realized upon execution.
I will also note that I found this post that was experiencing the exact same issue that I am now. As the top response there instructed, I added the new registry keys, but no cigar.
Any thoughts would be helpful.
First, make sure your blob can be access by public:
And if you don't have requirement to set networking, please make sure:
Then set the container access level:
And make sure the container is correct.
I am trying to load a Flat file to BLOB using the ADF V2. I have installed the Self Hosted Integration Runtime for the same. The Integration Runtime on Local Machine shows that is successfully connected to the cloud Service as in the SC below. However while making the LinkedService to the on Prem File, some credentials are required. I am not sure of what UserName or Password should be fed in. I have tried both On-Prem and Azure passwords (Wanted to try). Please see the SC.
Could you please guide as how the connection can be made to a local Flat file in my case.
Thanks
- Akshay
Note: You can choose a file while creating the File System as a source in ADF.
You may follow the following steps to select the text file while creating File system as source:
First create a linked service as follows:
Host: **C:\AzureLearn\**
Create a copy activity and select Source as follows:
Click on Source => New
Select New DataSet => Select File => File System and continue
Select Format= > Choose DelimitedText and continue
=> Select previously created File system linked service and click on browse.
Choose a file or folder.
Here you can find the file located under the previously selected folder while creating File System.
Hope this helps.
The LinkedService connection to BLOB or AzureSQL Server are being blocked by the firewall of my organisation. It won't let my system Integration runtime connect my resources to the public cloud.
I followed the same steps on my personal machine and everything went smoothly. Will get the firewall restrictions sorted and update this link for more information.
I want to create following folder structure on Azure:
mycontainer
-images
--2007
---img001.jpg
---img002.jpg
Now, one way is to use PUT Blob request and upload img001.jpg specifying the whole path as
PUT "mycontainer/images/2007/img001.jpg"
But, I want to first create the folders images and 2007 and then in a different request upload the blob img001.jpg.
Right now when I tried to doing this using PUT BLOB request:
StringToSign:
PUT
x-ms-blob-type:BlockBlob
x-ms-date:Tue, 07 Feb 2017 23:35:12 GMT
x-ms-version:2016-05-31
/account/mycontainer/images/
HTTP URL
sun.net.www.protocol.http.HttpURLConnection:http://account.blob.core.windows.net/mycontainer/images/
It is creating a folder but its not empty. By, default its creating an
empty blob file without name.
Now, a lot of people say we can't create a empty folder. But, then how come, we can make it using the azure portal as the browser must be sending some type of rest request to create the folder.
I think it has to do something with Content-Type i.e. x-ms-blob-content-type, which should be specified in order to tell azure that its a folder not a blob.
But, I am confused.
I want to first create the folders images and 2007 and then in a different request upload the blob img001.jpg
I agree with Brendan Green, currently, Azure blob storage just enable us to create virtual directory structure by naming blobs with path information in their names.
I think it has to do something with Content-Type i.e. x-ms-blob-content-type, which should be specified in order to tell azure that its a folder not a blob. But, I am confused.
You could check the description of Request Headers that could be set for Put Blob operation and you will find it does not support creating an empty folder by specifying some request headers.
Besides, as Gaurav Mantri said, if you really want to create an empty folder structure without content, you could try to use Azure File storage and it also enables us to use REST API to access Azure File storage. And the Create Directory operation cloud be used to create a new directory under the specified share or parent directory.
PUT https://myaccount.file.core.windows.net/myshare/myparentdirectorypath/mydirectory?restype=directory
This is not possible - the folder structure is virtual only.
See Get started with Azure Blob storage using .NET. You can only create a container, and everything else held in that container is a blob.
Excerpt:
As shown above, you can name blobs with path information in their
names. This creates a virtual directory structure that you can
organize and traverse as you would a traditional file system. Note
that the directory structure is virtual only - the only resources
available in Blob storage are containers and blobs.