Programatically move or copy Google Drive file to a shared Windows folder/UNC - excel

I have a Google Apps Script that builds a new csv file any time someone makes an edit to one of our shared google sheets (via a trigger). The file gets saved off to a dedicated Shared Drive folder (which is first cleaned out by trashing all existing files before writing the updated one). This part works splendidly.
I need to take that CSV and consume it in SSIS so I can datestamp it and load it into a MSSQL table for historical tracking purposes, but aside from paying for some third-party apps (i.e. CDATA, COZYROC), I can't find a way to do this. Eventually this package will be deployed on our SQL Agent to run on a daily schedule, so it will be attached to a service account that wouldn't have any sort of access to the Google Shared drive. If I can get that CSV over to one of the shared folders on our SQL server, I will be golden...but that is what I am struggling with.
If something via Apps Script isn't possible, is there someone that can direct me as to how I might then be able to programmatically get an Excel spreadsheet to open, refresh its dataset, then save itself and close? I can get the data I need into Excel out of the Google Sheet directly using a Power Query, but need it to refresh itself in an unattended process on a daily schedule.

I found that CData actually has a tool called Sync which got us what we needed out of this. There is a limited-options free version of the tool (that they claim it's "free forever") that runs as a local service. On a set schedule, it can query all sorts of sources, including Google Sheets, and will write out to various destinations.
The free version has limited availability in terms of the sources and destinations you can use (though there are quite a few), but it only allows 2 connection definitions. That said, you can define multiple source files, but only 1 source type (i.e. I can define 20 different Google Sheets to use in 20 different jobs, but can only use Google Sheets as my source).
I have it setup to read my shared google sheet and output the CSV to our server's share. A SSIS project reads the local CSV, processes it as needed, and then writes to our SQL server. Seems to work pretty well if you don't mind having an additional service running, and don't need a series of different sources and destinations.
Link to their Sync landing page: https://www.cdata.com/sync/
Use the Buy Now button and load up the free version in your cart, then proceed to check out. They will email you a key and a link to get the download from.

Related

Response to be Captured at Multiple Steps

I am currently testing a functionality which is built on SAP SRM with Fiori as its frontend, for which I need some support
Testing Tool - MF Load Runner
Protocol - SAP-Web
Test Case
After the initial login into the system, we need to Create an invoice in the system which will generate an excel. Need to download the same and alter with few document specific data.
After the data is modified, the excel needs to be uploaded again, which will automatically trigger save action.
At this point, we need to measure the time taken to upload excel and the time taken to save the document.
Follwed by this, i need to trigger the next step for calculation of few values which is a async process and its completion is only informed by a notification on the Fiori screen of the application.
After the calculation is done, i need to perform an another step. the completion of that step is also notified only by a notification.
We have tried by Ajax TruClient protocol as well, but it didn't worked out.
I am able to download the excel and save it locally but not able to edit the data. Any alteration of the data with the script is not happening.
Also, Can someone suggest any other method to capture the time taken at multiple steps.
Think about this architecturally. You have Truclient which is a browser. It is not a generic functional automation solution that exercises anything that is not a browser. You need to operate two apps, Excel and a Browser, so you need to move up the stack.
Don't time the stuff in Excel. This is 100% client based.
You have three options if you absolutely under every circumstance must run excel.
GUI Virtual User. This is a full blown automation client. In the Microfocus universe, this is QuickTest Pro. I can automate your browser. It can automate your excel. It can automate your Powerpoint. It can even automate solitaire. You will need a single full OS instance to run this
Citrix Virtual User. If you head this route you have options. You have Microfocus, Tricentis, Login VSI, and perhaps a handful of smaller players
Remote Desktop Virtual User. Similar to Citrix but fewer players. Microfocus is the Dominant solution.
If you leave the absolute non volatile requirement to run excel behind, then you can have a pre-prepared modified excel file for upload that represents the "changed" document and run this purely as an HTTP virtual user. TruClient is not going to operate your Windows Common Dialog Box (again, not a Browser) to allow you to pick a file.

Syncing incremental data with ADF

In Synapse I've setup 3 different pipelines. They all gather data from different sources (SQL, REST and CSV) and sink this to the same SQL database.
Currently they all run during the night, but I already know that the question is coming of running it more frequently. I want to prevent that my pipelines are going to run through all the sources while nothing has changed in the source.
Therefore I would like to store the last succesfull sync run of each pipeline (or pipeline activity). Before the next start of each pipeline I want to create a new pipeline, a 4th one, which checks if something has changed in sources. If so, it triggers the execution of one, two or all three the pipelines to run.
I still see some complications in doing this, so I'm not fully convinced on how to do this. So all help and thoughts are welcome, don't know if someone has experience in doing this?
This is (at least in part) the subject of the following Microsoft tutorial:
Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal
You're on the correct path - the crux of the issue is creating and persisting "watermarks" for each source from which you can determine if there have been any changes. The approach you use may be different for different source types. In the above tutorial, they create a stored procedure that can store and retrieve a "last run date", and use this to intelligently query tables for only rows modified after this last run date. Of course this requires the cooperation of the data source to take note of when data is inserted or modified.
If you have a source that cannot by intelligently queried in part (e.g. a CSV file) you still have options to use things like the Get Metadata Activity to e.g. query the lastModified property of a source file (or even its contentMD5 if using blob or ADLGen2) and compare this to a value saved during your last run (You would have to pick a place to store this, e.g. an operational DB, Azure Table or small blob file) to determine whether it needs to be reprocessed.
If you want to go crazy, you can look into streaming patterns (might require dabbling in HDInsights or getting your hands dirty with Azure Event Hubs to trigger ADF) to move from the scheduled trigger to an automatic ingestion as new data appears at the sources.

Azure Storage - File Share - Move 16m files in nested folders

Posting here as server fault doesn't seem to have the detailed Azure knowledge.
I have a Azure storage account, a file share. This file share is connected to a Azure VM through mapped drive. A FTP server on the VM accepts a stream of files and stores them in the File Share directly.
There are no other connections. Only I have Azure admin access, limited support people have access to the VM.
Last week, for unknown reasons 16 million files, which are nested in many sub-folders (origin, date) moved instantly into a unrelated subfolder, 3 levels deep.
I'm baffled how this can happen. There is a clear instant cut off when files moved.
As a result, I'm seeing increased costs on LRS. I'm assuming because internally Azure storage is replicating the change at my expense.
I have attempted to copy the files back using a VM and AZCOPY. This process crashed midway through leaving me with a half a completed copy operation. This failed attempt took days, which makes me confident I wasn't the support guys dragging and moving a folder by accident.
Questions:
Is it possible to just instantly move so many files (how)
Is there a solid way I can move the files back, taking into account the half copied files - I mean an Azure backend operation way rather than writing an app / power shell / AZCOPY?
So there a cost efficient way of doing this (I'm on Transaction Optimised tier)
Do I have a case here to get Microsoft to do something, we didn't move them... I assume something internally messed up.
Thanks
A tool that supports server-side copy (like AzCopy) can move the files quickly because only the metadata is updated. If you wants to investigate the root cause, I recommend opening a support case. (To sort this out – Your best bet is to connect with our Azure support team by filing a ticket, our support team on best effort basis can help you guide on this matter. )

The best way to read and process email's attachments in Azure?

We have a few third-party companies sending us emails with CSV/excel data files attached to the emails. I want to build a pipeline (preferably in ADF) to get the attachments, load the raw files (attachments) to blob, process/transform them, and finally load the processed files to another dir in the blob.
To get the attachment, I think I can use the instructions (using Logic App) in this link. Then, trigger an ADF pipeline using storage trigger, get the file and process it and do the rest of the stuff.
However, first, I'm not sure how reliable storage triggers are?
Second, although it seems ok, this approach makes it difficult to monitor the runs and make sure things are working properly. For example, if the logic app doesn't read/load the attachments for any reason and fails, you can't pick it up in ADF as nothing has written in the blob to trigger the pipeline.
Anyway, is this approach good, or there are better ways to do this?
Thanks
If you are able to save attachments into a blob or something, you can schedule a ADF pipeline that imports every file in the blob every minute og 5 minute or so.
Does the files have same data structure everytime? (that makes things much easier)
It is most common to schedule imports in ADF, rather that trigger based on external events.

Automated processing of large text file(s)

The scenario is as follows: A large text file is put somewhere. At a certain time of the day (or manually or after x number of files), a Virtual Machine with Biztalk installed is about to start automatically for processing of these files. Then, the files should be put in some output place and the VM should be shut down. I don´t know the time it takes for processing these files.
What is the best way to build such a solution? The solution is preferably to be used for similar scenarios in the future.
I was thinking of Logic Apps for the workflow, blob storage or FTP for input/output of the files, an API App for starting/shutting down the VM. Can Azure Functions be used in some way?
EDIT:
I also asked the question elsewhere, see link.
https://social.msdn.microsoft.com/Forums/en-US/19a69fe7-8e61-4b94-a3e7-b21c4c925195/automated-processing-of-large-text-files?forum=azurelogicapps
Just create an Azure Runbook with a Schedule, make that Runbook check for specific files in a Storage Account, if they exist, start up a VM and wait till the files are gone, once the files are gone (so BizTalk processed them, deleted and put them in some place where they belong), Runbook would stop the VM.

Resources