I am working on Azure platform and use Python 3.x for data integration (ETL) activities using Azure Data Factory v2. I got a requirement to parse the message files in .txt format real time as and when they are downloaded from blob storage to Windows Virtual Machine under the path D:/MessageFiles/.
I wrote a Python script to parse the message files because it's a fixed width file and it parses all the files in the directory and generates the output. Once the files are successfully parsed, it will be moved to archive directory. This script runs well in local disk on ad-hoc mode whenever i need it.
Now, i would like to make this script run continuously in Azure so that it looks for the incoming message files in the directory D:/MessageFiles/ all the time and perform the processing as and when it sees the new files in the path.
Can someone please let me know how to do this? Should i use any stream analytics application to achieve this?
Note: I don't want to use Timer option in Python script. Instead, i am looking for an option in Azure to use Python logic only for File Parsing.
Related
So I am trying to create a powershell script which will upload a large (> 4GB) .Bak file to Azure Blob Storage but currently it is getting hung. This script works with small files which I have been using to test.
Originally the issue I was having was the requirement to have a Content-Length specified (I imagine due its size) so I now calculate the file size of the .bak file (as it varies slightly each week) and pass this through as a request header
I am a total powershell newbie, as well as being very new to Azure blob. (NOTE: I am trying to do this purely in powershell, without relying on other tools such as AzCopy)
Below is my script
Powershell Script
Any help would be greatly appreciated..
There are a few things to check. Since file is big, are you sure it isn't uploading? Have you checked network activity in performance tab of task explorer? AzCopy seems like a good option too that you can use from within Powershell, but if it's not an option in your case, then why not to use native AZ module for Powershell?
I suggest you using Set-AzStorageBlobContent cmdlet to see if it helps. You can find examples at Microsoft docs
I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)
I'm adapting a powershell script I have at work for use in Azure-automation, which outputs 3 different CSV files. I'm trying to avoid having to create a DB and send the information there since it would require a changing the script too much, and its quite complex.
Does anyone know if there's a way to just send the 3 files to some kind of folder in Azure? Or maybe another solution that wouldn't require messing too much with the script?
Sorry if it is a dumb question, I'm not very familiar with Azure yet.
Probably the easiest option is to continue writing the file as you are now, then after the file is written have your Powershell code upload it to Blob storage using Set-AzureStorageBlobContent. See https://savilltech.com/2018/03/25/writing-to-files-with-azure-automation/ for an example.
You can read more about using Powershell to upload to blob storage, including all the steps you need to create the storage account and container, at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell.
I have a Ruby on Rails application that needs to find a home in an Azure Worker Role.
I currently automate the deployment of the application with a batch file - a file that takes the apache and ruby installers, runs them, and then drops the RoR app in the appropriate directory. After the batch script finishes, Apache is serving to and from the application via port 80.
I'm new to Azure and trying to figure out how to do this.
From my understanding, I have two options here: OnStart with the installation files in Blob Storage, or a startup script. I'm not sure how to do the latter, but I have located the onStart method within the WorkerRole.vb file in the new Azure project I just created.
My question: Is it recommended to use OnStart to deploy the application (using the batch script)? If so, how would I go about integrating the script into the project? And - how do I get started with storing and referencing the files in blob storage?
I know these are super high-level questions. Any input or suggested reading would be super helpful. I have tried to google / search for relevant resources but haven't been able to find much. Thank you for your time!
When you are inside OnStart() function it is better to do role configuration things i.e. IP binding, etc however if you would want to install runtime, download application zip, configured role specific setting, it is best to use Startup task. Please visit my blog Windows Azure: Startup task or OnStart(), which to choose? to learn more about it.
Now in your case it is best to use Startup task. What you can do it as below:
Create your ROR package a zip and place it at Windows Azure Blob Storage
Create a Cmmmand batch file which will do:
2.1 Download the ZIP
2.2 Unzip to Zip content to a specific location
2.3 Update the status back to AZure Blob Storage (Optional)
In your OnStart() function you just need to configure the ROR
The code will look as below if you have TCP Endpoint name "RORWeb80" set to use port 80:
TcpListener RoRPortListener = new TcpListener(RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["RORWeb80"].IPEndpoint);
RoRPortListener.Start();
I have written a sample app for Tomcat/Java based worker role which does exactly the same. So what you can do it just replace the Tomcat ZIP file with ROR ZIP and reuse the code exactly.
As long as you don't need admin-level access (e.g. modifying registry, installing msi's, etc.) you can do your setup from OnStart(), including launching your script. Just include the startup script with your project (don't forget to set Copy Local to true).
Same goes with startup script: you call your cmd file, which then executes the sequence for you. And if you give it elevated permissions, you can run installers, modify registry settings, install custom perf counters, whatever.
In either case: you can keep your apache zip, ruby installers, etc. in blob storage and, at startup, download them to local storage. This saves you from bundling everything within the deployment, which gives you a few advantages (being able to update ruby / apache without redeploy, reduced package size, etc.).
There's a sample app on codeplex that demonstrates the basics of setting up Tomcat via startup script. For one more example, you can look at the scripts installed via Eclipse Windows Azure plugin for Java. These scripts are quite similar. The key is to have some way of downloading files from blob storage and then unzipping them. the codeplex project I referred to points to a sample app that does simple blob downloading. The Eclipse packaging provides similar functionality in a .vbs app. Here's a snippet of one of my scripts from an Eclipse-based project:
SET SERVER_DIR_NAME=apache-tomcat-7.0.25
SET WAR_NAME=myapp.war
rd "\%ROLENAME%"
mklink /D "\%ROLENAME%" "%ROLEROOT%\approot"
cd /d "\%ROLENAME%"
cscript /NoLogo util\unzip.vbs jre7.zip "%CD%"
cscript /NoLogo util\unzip.vbs tomcat7.zip "%CD%"
copy %WAR_NAME% "%SERVER_DIR_NAME%\webapps\%WAR_NAME%"
cd "%SERVER_DIR_NAME%\bin"
set JAVA_HOME=\%ROLENAME%\jre7
set PATH=%PATH%;%JAVA_HOME%\bin
cmd /c startup.bat
The codeplex project has a similar-looking script.
Don't forget: you'll need to set up an Input Endpoint for your role (part of the role properties).
To get blobs into blob storage, there are both free tools (like Clumsy Leaf CloudXplorer and paid tools (such as Cerebrata's Cloud Storage Studio).
To download blobs to local storage, you can either write a few lines of .net code (from OnStart) or just use the utility pointed to in the codeplex project.
I have built one startup task for Azure application contain exe file(running periodically with some time interval) and now i would like to make it autoupdating at every week as i have asked before here
However i'll do some logic of replacing that file through that exe(startup task) then also it is not going to take any effect of new file. I have concluded that new startup task will take effect only if we upgrade/created that azure project with new file. (Correct me if i understood something wrong)
So is there any way to do my logic works by rebooting instance (by exe/startuptask) ?
I think it will also take original file(added in startuptask at the time of upgrading/creating application) instead of new file!
Is it possible anyway?
This is a very unreliable solution. If an Azure instance crashes or is taken down for updates you will have a new instance started from the original service package. All the state of the modified instance will be lost.
A much more reliable way would be to have the volatile executable stored somewhere like Azure Blob storage. You upload a new version to the blob storage and the role somehow sees that (either by polling the storage or by some user-invoked operation - doesn't matter), downloads the new version and replaces the existing version with the new one.
This way if your role crashes it will reliably fetch the newest version from the persistent storage on startup.
After I studied your problem i can propose a very simple solution as below which I have done before for a Tomcat/Java Sample:
Prepare your EXE to Reboot the VM along with your original code:
In your EXE, create a method to look for specific XML file on Azure storage at certain interval, also add retry logic to access XML
Parse XML for specific value and if certain value is set reboot the Machine
Package your EXE in ZIP format and place at your Azure Storage
Be sure to place the XML on Cloud and set the reboot = false value
What to do in Startup Task:
Create a startup task and download the ZIP from Azure Storage which contains your EXE
After the download, unzip the file and place the EXE to specific folder
launch the EXE
What to do when you want to update the EXE:
Update your EXE, package into ZIP and place at same place at Azure Storage with same name
Update your XML to enable Reboot
How update will occur:
The EXE will look for XML after certain internal as designed
Once it sees Reboot is set, it will reboot the VM
After the reboot, the Startup task will be launched and your new EXE will be downloaded to Azure VM and will be updated. Be sure that download and update is done at same folder.
Take a look at Startup tak in the sample below which use similar method:
http://tomcatazure.codeplex.com/