Output file in Azure-automation script - azure

I'm adapting a powershell script I have at work for use in Azure-automation, which outputs 3 different CSV files. I'm trying to avoid having to create a DB and send the information there since it would require a changing the script too much, and its quite complex.
Does anyone know if there's a way to just send the 3 files to some kind of folder in Azure? Or maybe another solution that wouldn't require messing too much with the script?
Sorry if it is a dumb question, I'm not very familiar with Azure yet.

Probably the easiest option is to continue writing the file as you are now, then after the file is written have your Powershell code upload it to Blob storage using Set-AzureStorageBlobContent. See https://savilltech.com/2018/03/25/writing-to-files-with-azure-automation/ for an example.
You can read more about using Powershell to upload to blob storage, including all the steps you need to create the storage account and container, at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell.

Related

Uploading large .bak file to Azure Blob through Powershell

So I am trying to create a powershell script which will upload a large (> 4GB) .Bak file to Azure Blob Storage but currently it is getting hung. This script works with small files which I have been using to test.
Originally the issue I was having was the requirement to have a Content-Length specified (I imagine due its size) so I now calculate the file size of the .bak file (as it varies slightly each week) and pass this through as a request header
I am a total powershell newbie, as well as being very new to Azure blob. (NOTE: I am trying to do this purely in powershell, without relying on other tools such as AzCopy)
Below is my script
Powershell Script
Any help would be greatly appreciated..
There are a few things to check. Since file is big, are you sure it isn't uploading? Have you checked network activity in performance tab of task explorer? AzCopy seems like a good option too that you can use from within Powershell, but if it's not an option in your case, then why not to use native AZ module for Powershell?
I suggest you using Set-AzStorageBlobContent cmdlet to see if it helps. You can find examples at Microsoft docs

Use Azure Data Factory to copy files and place a csv of files copied

I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)

How to uncompress rar files using Azure DataFactory

We have a new client, while landing the project we gave them a blob storage for them to leave files so we could later automate and process the information.
The idea is to use Azure Datafactory but we find no way of dealing with .rar files, and even .zip, being it files from windows, are giving us trouble. And since it is the clien giving the .rar format, we wanted to make absolutely sure there is no way to process before asking them to change it, or deploying a databricks or similar service just for the purpose of transforming the file.
Is there any way to get a .rar file from a blob storage, uncompress it, then process it?
I have been looking in posts like this and related official documentation and closest we have come is using ZipDeflate, but it does not seem to fill our requirement.
Thanks in advance!
Data factory compression only supported types are GZip, Deflate, BZip2, and ZipDeflate.
For the Unsupported file types and compression formats, Data Factory provides some workarounds for us:
You can use the extensibility features of Azure Data Factory to transform files that aren't supported. Two options include Azure Functions and custom tasks by using Azure Batch.
You can see a sample that uses an Azure function to extract the contents of a tar file. For more information, see Azure Functions activity.
You can also build this functionality using a custom dotnet activity. Further information is available here.
Next way, you may need to figure out how to using Azure function to extract the contents of a rar file.
you can use logic apps
you can use webhook activity calling a runbook
both are easiee than using a custom activity

azcopy list function gives a different count (almost double) of objects than Storage Explorer

I am uploading files with AZCOPY (one by one as and when they are provided) to Azure Datalakes gen 2 and keep a track with Storage explorer and individual log of each file.
There have been 6253 file uploads and Storage explorer shows the same along with number of logs for each file upload
But when i use AZCOPY LIST it gives me 11254.
Making it difficult to script and automate.
Is there a logical explanation for this?
There is no access issue, in fact the same AZCOPY is working on copying the files
I have tried to redownload if that makes sense
This is a known bug, scheduled for fixing in our next release: https://github.com/Azure/azure-storage-azcopy/issues/692

Azure - Process Message Files in real time

I am working on Azure platform and use Python 3.x for data integration (ETL) activities using Azure Data Factory v2. I got a requirement to parse the message files in .txt format real time as and when they are downloaded from blob storage to Windows Virtual Machine under the path D:/MessageFiles/.
I wrote a Python script to parse the message files because it's a fixed width file and it parses all the files in the directory and generates the output. Once the files are successfully parsed, it will be moved to archive directory. This script runs well in local disk on ad-hoc mode whenever i need it.
Now, i would like to make this script run continuously in Azure so that it looks for the incoming message files in the directory D:/MessageFiles/ all the time and perform the processing as and when it sees the new files in the path.
Can someone please let me know how to do this? Should i use any stream analytics application to achieve this?
Note: I don't want to use Timer option in Python script. Instead, i am looking for an option in Azure to use Python logic only for File Parsing.

Resources