Azure Data Factory v2 - SSIS lift and shift - azure

I'm in the process of evaluating the possibilites of lifting and shifting my ssis packages to ADFv2 but without testing I'm finding it hard to see if all SSIS functionalities are supported.
For example my package unzips files, modifies contents of files (script task) saving new version in different directory, loads modified files to DB and update data etc
What I'm not sure about is unzipping the files (I dont want to transfer unzipped files from on prem) and also modifying files with script task. I believe these would have to be moved outside of SSIS and created as an activity of ADF? And leave only the load of files, updating data etc as my SSIS package? Probably with the files stored in Blob storage?
Or can all this still be done directly in SSIS?
Thanks

What you currently do using SSIS on premises, you could also do using SSIS in ADF. For example, you could install additional (un)zip programs using custom setup and utilize the %TEMP% folder/current working directory (".") of your SSIS IR to modify files, see
https://learn.microsoft.com/en-us/azure/data-factory/how-to-configure-azure-ssis-ir-custom-setup
https://learn.microsoft.com/en-us/sql/integration-services/lift-shift/ssis-azure-files-file-shares?view=sql-server-2017

Related

Move data from Sharepoint through a Logic App

We are using Logic App to move data from a Sharepoint folder to an Azure Blob Storage.
We were using the Sharepoint trigger "When a file is created or modified in a folder". Unfortunately, this trigger has been deprecated and does not work anymore (i.e., when a file is indeed created or modified, no further action is done after running the trigger).
No file is moved around anymore. The trigger does not execute the Logic App even though a file is created or modified in the Sharepoint origin folder. I have been through the various other Sharepoint triggers but they do not seem to fit our use case. We cannot create a Logic App for each file. We are not using Sharepoint lists but classic folders. We could use several triggers pointing directly at each existin file, but as we have many files to move in the same folder, we would have to create many Logic Apps and that is not how we want to do it. Moreover, some new files may be created in the future.
What could we do to keep the same architecture of moving data around from Sharepoint to Blob Storage through the non-deprecated Logic App triggers?
Thank you in advance,
Alexis
You can use When a file is created or modified (properties only) and get the properties of the file that is getting created or updated. Then you can use Get file content using the properties from the previous step. Finally, you can create a blob using the previous steps. Below is the flow of my logic app.
RESULTS:

Use Azure Data Factory to copy files and place a csv of files copied

I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)

Build a pipeline in azure data factory to load Excel files, format content, transform in csv and send to azure sql DB

I'm approaching to Azure environment and watching tutorials/reading documents, but I'm trying to figure out how to setup a flow that enables the process that I will describe hereunder. The starting point are reports in .xlsx format produced monthly by Mktg Dept: the requirements are to bring them in Azure SQL DB so that data can be stored and analysed. Sofar I managed to put those files (previously manually converted in .csv format) in a BLOB storage and build an ADF pipeline that copy each file in a table on the SQL DB.
The problem is that as far as I understood with ADF it's not possible to directly manage xlsx files, and I'm wondering how to set up an automated procedure that enables the conversion from .xlsx to .csv and save them on BLOB storage. I was thinking about adding to the pipeline a python script/Databricks notebook to convert format, but I'm not sure this could be the best solution. Any hint/reference to existing tutorial or resources would be very appreciated
I found a tutorial which uses Logic Apps to do the conversion.
Datanovice indirectly suggested using a Custom activity to run either a C# or Python application to do the conversion for you.
The least expensive solution would be to do the conversion before uploading to blob, like Datanovice said.

Transfer data into sharepoint using ssis

Is it possible to transfer data from SSIS to SharePoint and place the csv in a sharepoint list.
I have tried automating this with ssis and for some reason when I execute the package under a scheduled task the package will not run....
If i create a scheduled task and run dtexec (and the package path) this will not run under the scheduled task but it will run if i am using a .bat file and enter the same command script.. I am using creds that have access to the sharepoint site. It seems that there is just no way to automate placing csv files onto sharepoint.
http://social.msdn.microsoft.com/Forums/en/sharepoint2010programming/thread/905fd9fb-ae70-4335-9628-d28d040f0bdc
http://social.msdn.microsoft.com/Forums/en/sharepointdevelopment/thread/d59bbc46-27b4-468e-9ed6-70435200bef2
Although I haven't had the need to use it in a production environment yet, I'm sure this custom component will suit your needs :)
http://ssisctc.codeplex.com/wikipage?title=SharePoint List Destination&referringTitle=Home

Will Autoupdate Startup task work in azure application?

I have built one startup task for Azure application contain exe file(running periodically with some time interval) and now i would like to make it autoupdating at every week as i have asked before here
However i'll do some logic of replacing that file through that exe(startup task) then also it is not going to take any effect of new file. I have concluded that new startup task will take effect only if we upgrade/created that azure project with new file. (Correct me if i understood something wrong)
So is there any way to do my logic works by rebooting instance (by exe/startuptask) ?
I think it will also take original file(added in startuptask at the time of upgrading/creating application) instead of new file!
Is it possible anyway?
This is a very unreliable solution. If an Azure instance crashes or is taken down for updates you will have a new instance started from the original service package. All the state of the modified instance will be lost.
A much more reliable way would be to have the volatile executable stored somewhere like Azure Blob storage. You upload a new version to the blob storage and the role somehow sees that (either by polling the storage or by some user-invoked operation - doesn't matter), downloads the new version and replaces the existing version with the new one.
This way if your role crashes it will reliably fetch the newest version from the persistent storage on startup.
After I studied your problem i can propose a very simple solution as below which I have done before for a Tomcat/Java Sample:
Prepare your EXE to Reboot the VM along with your original code:
In your EXE, create a method to look for specific XML file on Azure storage at certain interval, also add retry logic to access XML
Parse XML for specific value and if certain value is set reboot the Machine
Package your EXE in ZIP format and place at your Azure Storage
Be sure to place the XML on Cloud and set the reboot = false value
What to do in Startup Task:
Create a startup task and download the ZIP from Azure Storage which contains your EXE
After the download, unzip the file and place the EXE to specific folder
launch the EXE
What to do when you want to update the EXE:
Update your EXE, package into ZIP and place at same place at Azure Storage with same name
Update your XML to enable Reboot
How update will occur:
The EXE will look for XML after certain internal as designed
Once it sees Reboot is set, it will reboot the VM
After the reboot, the Startup task will be launched and your new EXE will be downloaded to Azure VM and will be updated. Be sure that download and update is done at same folder.
Take a look at Startup tak in the sample below which use similar method:
http://tomcatazure.codeplex.com/

Resources