Matillion: Delete files from Azure Blob Storage Container and Windows Fileshare - azure

I have a use case where I am Transferring XML files from Windows fileshare to Azure Blob Storage and then loading data to Snowflake Tables. I am using Matillion to achieve this.
The Windows Fileshare gets Zipped XML file which contains .xml and .xml.chk files. I am using Azure Blob Storage component of Matillion to Copy the .xml files to Snowflake table and have set Purge = True to delete them afterwards.
I need help in deleting the leftover .xml.chk files from Blob Storage Container. Also, once the data loading is complete, I would like to delete the zipped files from Windows Fileshare.
Thanks,
Shivroopa

you can delete the files from blob storage using the matillion python script component (orchestration->scripting->Python script)
here is an example of python code to delete blob items and containers
Delete Blob Example
I don't see a way to delete files on the windows machine from Matillion other than creating an API endpoint on the fileshare and calling the API from Matillion.

Related

Transfer failed while trying to upload file of size 300MB in Azure Storage Explorer under my Container

My transfer failed while trying to upload a file of size 300MB in Azure Storage Explorer, but when I creating a new folder under my Container, I am able to upload successfully.
I would like to understand why it worked when creating a folder, but did not work when I tried to upload directly to my Container.
Blob containers are the structures used to store blobs. An individual
Blob container can hold anywhere from zero to an infinite number of
individual blobs. By default, all blobs stored in a container share
the same level of sharing, either private or public.
I tried uploading 300 MB files to blob containers(in the same folder of the container(name of the folder>>standardmode)) in azure storage explorer in both standard(both hot and cool tier) and data lake storage which already has few files in them and uploaded successfully.
Here are images related to them:
Uploaded few files first in container named “standardmode”
Then selected  files of size 300MB to upload in the same container.
And then files got uploaded successfully.
I then tried the same for Data Lake storage and successfully uploaded files more than 300MB in one go.
In your case ,it may be some temporary issue
(Or)
You may check the version of the services .
Each block in a block blob can be a different size, up to the maximum size permitted for the service version in use.
Reference:
Understanding-Block-Blobs--Append-Blobs--and-Page-Blobs
(or)
You may try uploading files using AzCopy command.
References to use AzCopy command:
storage-use-azcopy-blobs-upload
Getting started with AzCopy

Azure blob storage - duplicate file issue

I'm facing weird issue. I have one Azure Blob storage container which consists of 2 folders, let's say folder A and folder B. If I copy the files using WinSCP only to folder A, the file is copied in folder A as well as folder B and vice versa.
I see this issue only using WinSCP with Blob storage and not with Azcopy or Azure storage explorer. Also i don't see this issue with Azure file storage using WinSCP.
Any help appreciated.

Not able to download Page blob type file from Azure blob storage using SSIS

In SSIS job, I added a step for download files from Azure Blob Storage. After adding a Azure Blob Download task, in the control flow, and entering in the storage configuration (Test connection successfully).
But it download only Block blob type file only. Not able to download Page blob type files.
How we can download page blob type file using SSIS? If anyone have other approach then it also fine for me.
Using for each container and Flexible File task component, we can download the Page blob file.

Downloading parquet files from Azure Blob Storage. File and folder with same names

I have created parquet files on Azure Blob Storage. And now I want to download them. Problem is it keeps failing. I think its because the is a file and folder with same names. Why is that? Do I just need the folder? Since the file is only 0B?
The error I get looks like:
Its saying that because it already downloaded the 0B file
As mentioned in the comments, instead of downloading the actual file, you might have downloaded the Block Blob file which is an Azure's implementation to handle and provide FileSystem like access when the blob storage is being used as a filesystem (Azure HDInsight clusters have their HDFS backed with Azure Blob Storage). Object storage like Azure Blob Storage and AWS S3 have these sort of custom implementation to provide (or simulate) a seamless filesystem like experience.
In short, don't download the Block Blob. Download the actual files.

Azure: Is there a way to cache/reuse files downloaded from Azure blob storage?

I have a file upload/download service that uploads files to Blob storage. I have another service (a job service) that needs to download files from file service (using the blob storage URLs) and process those files. The files are read-only (they are not going to change during their lifetime). In many cases, the same file can be used in different jobs. I am trying to figure out if there is a way to download a file once and all the instances of my job service use that downloaded file. So can I store the downloaded file in some shared location and access it from all the instances of my service? Does it even make sense to do it this way? Would the cost of fetching the file from blob be the same as reading it from a shared location (if that is even possible)?
Azure also provide a file storage. Azure file storage provide a facility to mount that storage as a drive and access contain of azure file storage.
Buy for this you need to download it once and then upload to file storage.
Then you can mount that to any instance of virtual machine or local drive.
That is a alternate way to achieve your goal.
Check this
http://www.ntweekly.com/?p=10034

Resources