Retrieving new files from sharepoint to blob - azure

I am currenly working for a client that has a Sharepoint list in which every month, a new sub-map is made. Every day a new file is added to that month's sub-map.
I already designed a Logic App which copies all files to an Azure Storage Account, the problem is, I only need the most recent file. This is what the Logic App looks like:
Logic App picture link
I tried to compare the Sharepoint-list with the blob storage list of yesterday so that every new day, a new file is recoqnized, but that didn't work out as i hoped.
Is there any way to retrieve only the most recently added files from a sharepoint list?

You can use filter array in compose Connector and compare LastModified with the specified time that you wanted i.e.. 24h, 12h
Here is the screenshot of the flow for your reference
And there by you can able to view all the blobs which are in timeframe of 24hours.
Updated Answer
If it is from Sharepoint then you can add Get files from Sharepoint connector and add filter array.
Here is the screenshot for your reference
In the image, I was referring to the list of 4 months value and could able to retrieve the same.

Related

How to retrieve latest file from SharePoint to blob using logic app?

I am getting the everyday one new file in my SharePoint. My file is stored in "Shared Document/Data"
Below is my logic app flow. I used Get Files and I am selecting Document and included Nested items as I am getting new data under "data folder" which is stored in the Shared document.
I used Filter Array to get the last modified file with less than or equal to 5m so I can get the latest file
I am facing two issues,
It takes all the files from SharePoint under "Get Files" and the filter array is not working.
I have used create blob wrongly
Can anyone advise me on how to do this?
Follow the workaround
You can use the below Trigger & Connector
Sharepoint (Trigger) - When a file is created or modified in a folder
You can use this connector to select and get the exact Directory to fetch the recent Modified/Created File.
Azure Blob Storage (Connector)- Create Blob (V2)
Use this connector to create a blob.
Result
Modified file fetched and added in a Blob
Refer here for more information
Updated Answer
Here is the list of available directories in SharePoint that will be shown in a SharePoint Trigger. You can select according to your requirement.

I want to transfer files as well as folders which is in same structure in blob storage to sharepoint document library

The files will get updated in blob storage everyday so I want in incremental method which transfer only new files and create folder in SharePoint if it is not exists. For example mycontainer/folder/20210101/test.csv , mycontainer/folder/20210102/test.csv the csv files may be single file or multiple files. I have created workflow in logic app but somehow I got stuck here I am attaching my screenshot of my workflow.
Image screenshot:
Here is the overall flow
This is how I achieved your requirement
I first built a folder where files will be added on a daily basis, and then I used compose connector to retrieve the 'LastModifiedDate' from it.
Here is the Compose Connector Expression that I used [Compose]
substring(join(split(triggerBody()?['LastModified'],'-'),''),0,8)
Later I have created another folder with it and added the file to that folder. Then I used compose connector in order to get the path
Here is the Path Compose Connector [Compose2]
substring(body('Create_blob_(V2)')?['Path'],0,lastIndexOf(body('Create_blob_(V2)')?['Path'],'/'))
Lastly, I have used a SharePoint connector and created a folder using the above path where in the next step I created a file same as the blob structure.
Here are the screenshots from my storage account and SharePoint
Storage Account
Sharepoint
Update
I noticed that the blob was erased when I added the Delete blob connector at the end with blob as the 'List of files path.' As a result, this may meet the criteria.

Datafactory to sharepoint list

I've setup a connection from our data factory in azure to a sharepoint site so I can pull some of the lists on the site into blob storage so I can then process into our warehouse. This all works fine and I can see the data I want. I don't however want to pull all the columns contained in the list I'm after. Looking at the connection I can specify a query, however anything I put in here has no affect on the data that comes back. Is there a way to specify the columns from a list in sharepoint through the copy activity into blob storage?
You need to user select query like below -
$select=Title,Number,OrderDate in the Query text field of the Azure Data Factory Source.
You could use Preview Data button to validate the results. Please refer to the documentation for using Custom OData query options.
I have tried this and it works fine for me (see screenshot below)-
Thanks
Saurabh

Azure Cognitive Search - Index and Deletes

I setup a demo instance of Azure Search with the web front end app.
One thing I have noticed is that even after I remove a document from Blob storage and the indexer runs again, the deleted document and its contents are still stored in the index. How can I remove the document’s contents from the index without deleting and recreating the index?
Here is the link to my GitHub repository for the template for this environment… https://github.com/jcbendernh/Azure-Search-Ignite-2018-Demo
Any insight that you can provide is extremely appreciated.
In order to get a document to be removed from your index by the indexer when it is no longer in the data source, you need to define a data deletion detection policy in your indexer.
There are two different approaches:
1. By defining a column that defines which fields are supposed to be deleted from your data source (SoftDeleteColumnDeletionDetectionPolicy)
2. Or by using the new native soft delete support in blob storage (NativeBlobSoftDeleteDeletionDetectionPolicy)
Both of these approaches are documented at https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage#incremental-indexing-and-deletion-detection
Thanks,
-Luis Cabrera (Azure Search PM)

Azure Data Factory Only Retrieve New Blob files from Blob Storage

I am currently copying blob files from an Azure Blob storage to an Azure SQL Database. It is scheduled to run every 15 minutes but each time it runs it repeatedly imports all blob files. I would rather like to configure it so that it only imports if any new files have arrived into the Blob storage. One thing to note is that the files do not have a date time stamp. All files are present in a single blob container. New files are added to the same blob container. Do you know how to configure this?
I'd preface this answer with a change in your approach may be warranted...
Given what you've described your fairly limited on options. One approach is to have your scheduled job maintain knowledge of what it has already stored into the SQL db. You loop over all the items within the container and check if it has been processed yet.
The container has a ListBlobs method that would work for this. Reference: https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/
foreach (var item in container.ListBlobs(null, true))
{
// Check if it has already been processed or not
}
Note that the number of blobs in the container may be an issue with this approach. If it is too large consider creating a new container per hour/day/week/etc to hold the blobs, assuming you can control this.
Please use CloudBlobContainer.ListBlobs(null, true, BlobListingDetails.Metadata) and check CloudBlob.Properties.LastModified for each listed blob.
Instead of a copy activity, I would use a custom DotNet activity within Azure Data Factory and use the Blob Storage API (some of the answers here have described the use of this API) and Azure SQL API to perform your copy of only the new files.
However, with time, your blob location will have a lot of files, so, expect that your job will start taking longer and longer (after a point taking longer than 15 minutes) as it would iterate through each file every time.
Can you explain your scenario further? Is there a reason you want to add data to the SQL tables every 15 minutes? Can you increase that to copy data every hour? Also, how is this data getting into Blob Storage? Is another Azure service putting it there or is it an external application? If it is another service, consider moving it straight into Azure SQL and cut out the Blob Storage.
Another suggestion would be to create folders for the 15 minute intervals like hhmm. So, for example, a sample folder would be called '0515'. You could even have a parent folder for the year, month and day. This way you can insert the data into these folders in Blob Storage. Data Factory is capable of reading date and time folders and identifying new files that come into the date/time folders.
I hope this helps! If you can provide some more information about your problem, I'd be happy to help you further.

Resources