"Azure Blob Source 400 Bad Request" when using Azure Blob Source in SSIS to pull a file from Azure Storage container - azure

My package is very simple. It is loading data from a csv file that I have stored in an Azure storage container, and inserting that data into an Azure SQL database. The issue is stemming from the connection to my Azure storage container. here is an image of the output:
Making this even more odd, while the data flow task is failing:
The individual components within the data flow task all indicate success:
Setting up the package, it seems that the connection to the container is fine (after all, it was able to extract all the column names from the desired file and map them to their destination). Here is an image showing the connection is fine:
So the issue is only realized upon execution.
I will also note that I found this post that was experiencing the exact same issue that I am now. As the top response there instructed, I added the new registry keys, but no cigar.
Any thoughts would be helpful.

First, make sure your blob can be access by public:
And if you don't have requirement to set networking, please make sure:
Then set the container access level:
And make sure the container is correct.

Related

Azure Stream Analytics job path pattern for ApplicationInsights data in storage container

My goal is to import telemetry data from an ApplicationInsights resource into a SQL Azure database.
To do so, I enabled allLogs and AllMetrics in the Diagnostic settings of the ApplicationInsights instance, and set the Destination details to "Archive to a storage account".
This works fine, and I can see data being saved to containers beneath the specified storage account, as expected. For example, page views are successfully written to the insights-logs-apppageviews container.
My understanding is that I can use StreamAnalytics job(s) from here to import these JSON files into SQL Azure by specifying a container as input, and a SQL Azure table as output.
The problems I encounter from here are twofold:
I don't know what to use for the "Path pattern" on the input resource: there are available tokens to use for {date} and {time}, but the actual observed container path is slightly different, and I'm not sure how to account for this discrepancy. For example, the {date} token expects the YYYY/MM/DD format, but that part of the observed path is of the format /y={YYYY}/m={MM}/d={DD}. If I try to use the {date} token, nothing is found. As far as I can tell, there doesn't appear to be any way to customize this.
For proof-of-concept purposes, I resorted to using a hard-coded container path with some data in it. With this, I was able to set the output to a SQL Azure table, check that there were no schema errors between input and output, and after starting the job, I do see the first batch of data loaded into the table. However, if I perform additional actions to generate more telemetry, I see the JSON files updating in the storage containers, yet no additional data is written to the table. No errors appear in the Activity Log of the job to explain why the updated telemetry data is not being picked up.
What settings do I need to use in order for the job to run continuously/dynamically and update the database as expected?

Load data from public Azure blob in Matillion

I am going through Matillion Academy (Building a Data Warehouse). There is a slide deck to follow online and I am running my own instance of Matillion to recreate the building of the warehouse.
My Matillion is on Azure, as is my Snowflake database.
The training is AWS-based, but gives information about the adjustments needed for Azure or GS.
One of the steps shows how to Load data from blob storage. It is S3 based.
For Azure different components need to be used (as the S3 ones don't exist there), and data needs to be loaded from azure storage instead of S3 storage.
It also explains that for Snowflake on Azure yet another component needs to be used.
I have created a Stage in Snowflake:
CREATE STAGE "onlinemtlntrainingazure_flights"
URL='azure://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights'
The stage shows in Snowflake (external stage) and in Matillion (when using 'manage stages' on the database). The code is taken from the json file I imported to create the job to do this (see first step below).
I have created the target table in my database. It is accessible and visible in Matillion IDE.
The adjusted component I am to use is 'Azure Blob Storage Load'.
According to the documentation, I will need:
For Snowflake on Azure:
Create a Stage in Snowflake:
You should create a Stage in Snowflake which will be pointing to the
public data we provide. Please, find below the .json file containing
the job that will help you to do this. Don't forget to change the SQL
Script for pointing to your own schema
After Creating the Stage in Snowflake:
You should use the 'Create Table' and the 'Azure Blob Storage Load'
components individually as the 'Azure Blob Load Generator' won't let
you to select the Stage previously created. We have attached below the
Create Table metadata to save you some time.
'Azure Blob Storage Load' Settings:
Stage: onlinemtlntrainingazure_flights
Pattern: training_azure_flights_2016.gz
Target Table: training_flights
Record Delimiter: 0x0a
Skip Header: 1
The source data on Azure is located here:
Azure Blob Container (with flights data)
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2016.gz
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2017.gz
https://onlinemtlntrainingazure.blob.core.windows.net/online-mtln-training-azure-flights/training_azure_flights_2018.gz
Unfortunately, when using these settings on the 'Azure Blob Storage Load' component, it complains.
the stage does not appear in the list, and manually inputting the stage name yields an error (unrecognised option). prefixing the stage name with my schema (and even database) does not help.
azure storage location property does not accept the https://... URI to the data files. When I replace the 'https' by 'azure', or remove the part after the last '/' it complains with 'Unable to find an account with name: [onlinemtlntrainingazure]'
using [Custom] for stage property removes the error message, but when running the component, it comes back again with the 'Unable to find account'.
Any thoughts?
Edit: I found a workaround by using the Data Transfer Object, which first copies the files from the public https location to my own Azure blob location and then I process it further from there. But I would like to know how to do it as suggested in the training, and why it now fails.
The example files are in a storage account that your Azure Blob Storage Load Generator can not read from. But instead of using a Snowflake Stage, you might find it easier to just copy the files into a storage account that you do own, and then use the Azure Blob Storage Load Generator on the copied files.
In a Matillion ETL instance on Azure, you can access files over https and copy them into your own storage account using a Data Transfer component.
You already have the https:// source URLs for the three files, so:
Set the source type to HTTPS (no username or password is needed)
Add the source URL
Set the target type to Azure Blob Storage
In the example I used two variables, with defaults set to my storage account and container name
Repeat for all three files
After running the Data Transfer three times, you will then be able to proceed with the Azure Blob Storage Load Generator, reading from your own copies of the files.

Azure Data Explorer oneclick Ingest from blob container (UI)

I'm trying to configure and use the Azure Data Explorer OneClick Ingest from blob container (continous ingest).
Whatever I try the URL is never accepted, I always end up with this error:
Invalid URL. Either the URL leads to a blob instead of a container, or the permissions are incorrect. If you just grant permission, please wait couple of minutes and try again.
The URL I'm using follow that pattern:
https://mystorageaccount.blob.core.windows.net/mycontainer?sp=rl&st=2022-04-26T22:01:42Z&se=2032-04-27T06:01:42Z&spr=https&sv=2020-08-04&sr=c&sig=Z4Mlh7s5%2Fm1890kdfzlkYLSIHHDdGJmTSyYXVYsHdn01o%3D
I'm probably missing something, either in the URL syntax ou SAS generation.
Has anyone successfully used it? Any idea what could be wrong?
Thanks
I finally found out what was the issue.
Probably due to the security in place on my Storage account I had to create in Azure Data Explorer Networking panel, a Managed private enpoint, pointing to my storage resource (and then approve that endpoint in the storage account Networking)
https://learn.microsoft.com/en-us/azure/data-explorer/security-network-managed-private-endpoint-create

Azure form recognizer app invalid resource name

I'm traying to daploy an instance of the form recognizer app in Azure. For that I'm following the instructions in the documentation: https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/deploy-label-tool
I have created the docker instance and the connection, but the step to create the APP is failing.
This are the parameters I'm using:
Display Name: Test-form
Source Connection: <previuosly created connection>
Folder Path: None
Form Recognizer Service Uri: https://XXX-test.cognitiveservices.azure.com/
API Key: XXXXX
Description: None
And this is the error and getting:
I had the same error. It turned out to be due to incorrect SAS URI formatting because I generated and copied the SAS token via the Storage Accounts interface. It's much easier to get the correct format for the SAS URI if you generate it through the Storage Explorer (currently in Preview) as opposed to through the Storage Accounts.
If you read the documentation carefully it gives you a step by step guide
"To retrieve the SAS URL, open the Microsoft Azure Storage Explorer, right-click your container, and select Get shared access signature. Set the expiry time to some time after you'll have used the service. Make sure the Read, Write, Delete, and List permissions are checked, and click Create. Then copy the value in the URL section. It should have the form: https://.blob.core.windows.net/?"
Form Recognizer Documentation
The error messages point to a configuration issue with the AzureBlobStorageTemplate Thing. Most likely the containerName field for the Blob Storage Thing is empty or contains invalid characters
Ensure the containerName is a valid Azure storage container name.
Check https://learn.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata for more information.
A container name must be a valid DNS name
The Connector loads and caches all configuration settings during startup. Any changes that you make to the configuration when troubleshooting are ignored until the Connector is restarted.
When creating the container connection, you must add the container into the SAS URI, such as
https://<storage-account>.blob.core.windows.net/<Enter-My-Container-Here>?<SAS Key>
You can also directly use the open source labeling tool, please see the section further down in the doc:
The OCR Form Labeling Tool is also available as an open-source project on GitHub. The tool is a web application built using React + Redux, and is written in TypeScript. To learn more or contribute, see OCR Form Labeling Tool.

Copying files in fileshare with Azure Data Factory configuration problem

I am trying to learn using the Azure Data Factory to copy data (a collection of csv files in a folder structure) from an Azure File Share to a Cosmos DB instance.
In Azure Data factory I'm creating a "copy data" activity and try to set my file share as source using the following host:
mystorageaccount.file.core.windows.net\\mystoragefilesharename
When trying to test the connection, I get the following error:
[{"code":9059,"message":"File path 'E:\\approot\\mscissstorage.file.core.windows.net\\mystoragefilesharename' is not supported. Check the configuration to make sure the path is valid."}]
Should I move the data to another storage type like a blob or I am not entering the correct host url?
You'll need to specify the host in json file like this "\\myserver\share" if you create pipeline with JSON directly or you use set the host url like this "\myserver\share" if you're using UI to setup pipeline.
Here is more info:
https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions
I believe when you create file linked service, you might choose public IR. If you choose public IR, local path (e.g c:\xxx, D:\xxx) is not allowed, because the machine that run your job is managed by us, which not contains any customer data. Please use self-hosted IR to copy your local files.
Based on the link posted by Nicolas Zhang: https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions and the examples provided therein, I was able to solve it an successfully create the copy action. I had two errors (I'm configuring via the data factory UI and not directly the JSON):
In the host path, the correct one should be: \\mystorageaccount.file.core.windows.net\mystoragefilesharename\myfolderpath
The username and password must be the one corresponding to the storage account and not to the actual user's account which I was erroneously using.

Resources