I am using the Azure Logic App to upload existing data from Onedrive to Azure File Storage.
In Onedrive more than 300 directories and more than 10000 files are there.
I tried to use the Onedrive list file in folder connector to list all files and directories from that result I can filter out the files. But the Onedrive file connector returns only 20 entries alone.
I could not get all entries. I searched quite a lot but I could not get any resources.
In Azure Logic App there is an option nextLink to get data from next pages but I couldn't get the proper documentation for how to use nextLink.
Does anybody have an idea about how to retrieve data from paginations in Azure Logic App?
We recently worked on a Logic App, where we get paged data from Azure Activity Logs. There also we have paged responses by default. We used 'Until' loop in Azure Logic Apps till we get the NextLink as undefined.
The following is how the condition in Until look like. (GET_Logs is our azure monitor api connector, you can replace this with your connector to get the file list from OneDrive)
#equals(coalesce(body('Get_Logs')?.nextLink, 'undefined'), 'undefined').
Hope this helps!!
Method 1 :
1. Create a variable as string type
1. Use Until connector
2. If no records further "nextLink" will be undefined.
3. Determine using coalesce. By default it is not supporting
4. Add it to to variable
Method 2 :
1. Use inline code connector which gives ability to write code in javascript
Related
I need to get the file name of a file when it is uploaded to blob storage using Logic Apps. I'm new to Logic Apps and this seems like it should be easy but I'm not having any luck.
To try and find the filename I'm sending what's available to me in an email. I will eventually use the filename as part of an http post to another service.
The logic app is triggered as it should be when I upload but I do not get any data in my email for the items I chose. I am not uploading to a subfolder. I've looked at code views and searched other post but not finding a solution. Any help most appreciated.
Thanks
Instead of using Inbuilt When a blob is added or modified in Azure Storage connector, try using When a blob is added or modified (properties only) (V2) and add List of Files Display Name connector in order to get the file name.
Here are the screenshots
Here is the overall Logic app flow
Here is the screenshot from my outlook
I have a working query for my app data to be analyzed.
currently it analyzes the last two weeks data with an ago(14d).
Now i want to use a value containing the release date of the apps current version. Since i havent found a way to add a new database table to the already existing database containing the log data in azure analytics, i created a new database in azure and entered my data there.
Now i just don't know, if i can get access to that database at all from within the web query interface of Azure log analytics, or if i have to use some other tool for that?.
i hope that somebody can help me on this.
As always with azure there is a lot of stuff to read about it, but nothing concrete for my issue (or at least i haven't found it yet).
And yes, i know how to insert the data into the query with a let, but since I want to use the same data in different queries, an external location which can be accessed from all the queries would be the solution I prefer.
Thx in advance.
Maverick
You cannot access a db directly. You are better of using a csv/json file in blob storage. In the following example I uploaded a txt file with csv data like this:
2a6c024f-9093-434c-b3b1-000821a15b1a,"Customer 1"
28a658a8-5466-45ea-862c-003b20507dd4,"Customer 2"
c46fb949-d807-4eea-8de4-005dd4beb39a,"Customer 3"
e05b67ee-ff83-4805-b004-0064449f196c,"Customer 4"
Then I can reference this data from log analytics / application insights in a query like this using the externaldata operator:
let customers = externaldata(id:string, companyName:string) [
h#"https://xxx.blob.core.windows.net/myblob.txt?sv=2019-10-10&st=2020-09-29T11%3A39%3A22Z&se=2050-09-30T11%3A39%3A00Z&sr=b&sp=r&sig=xxx"
] with(format="csv");
requests
| extend CompanyId = tostring(customDimensions.CustomerId)
| join kind=leftouter
(
customers
)
on $left.CompanyId == $right.id
The url https://xxx.blob.core.windows.net/myblob.txt?sv=2019-10-10&st=2020-09-29T11%3A39%3A22Z&se=2050-09-30T11%3A39%3A00Z&sr=b&sp=r&sig=xxx is created by creating a url including a SAS token by using the Microsoft Azure Storage Explorer, selecting a blob and then right click -> Get Shared Access Signature. In the popup create a SAS and then copy the uri.
i know Log Analytics uses Azure Data Explorer in the back-end and Azure Data Explorer has a feature to use External Tables within the queries but I am not sure if Log Analytics support External Tables.
External Tables in Azure Data Explorer
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/externaltables#:~:text=An%20external%20table%20is%20a,and%20managed%20outside%20the%20cluster.
I am fairly new to Azure and I have a task in hand to make use of any Azure Service (or group of azure services in integration together for that matter) to o download a million files in parallel from a third party Rest API endpoint, that returns one file at a time, using Azure Data Factory into Blob Storage?
WHAT I RESEARCHED :
From what I researched my task had three requirements in a nutshell :
Parallel runs in millions - For this I deduced Azure Batch would be a good option as it lets run a large number of tasks in parallel on VMs ( it uses that concept for graphic rendering processes or Machine Learning Tasks)
Save response from Rest API to Blob Storage : I found that Azure Data Factory is able to handle such ETL type of operation from a Source/Sink style, where I could set the REST API as source and target as blob.
WHAT I HAVE TRIED:
Here are some things to note:
I added the REST API and Blob as linked services.
The API endpoint takes in a query string param named : fileName
I am passing the whole URL with the query string
The Rest API is protected by Bearer Token, which I am trying to pass using additional headers.
THE MAIN PROBLEM:
I get an error message on publishing pipeline that model is not appropriate, just that one line, and it gives no insight what's wrong
OTHER QUERIES:
It is possible to pass query string values dynamically from a sql table such that each filename can be picked a single row/column item from single columned rows of data from stored procedure/inline query?
Is it possible to make this pipeline run in parallel using Azure Batch somehow? How can we integrate this process ?
Is it possible to achieve the million parallel without data factory just using Batch ?
Hard to help with you main problem - you need to provide more examples of your code
In relation to your other queries:
You can use a "Lookup activity" to fetch a list of files from a database (with either sproc or inline query). The next step would be a ForEach activity that iterates over the array and copies the file from the REST endpoint to the storage account. You can adjust the parallelism on the ForEach activity to match your requirement but around 20 concurrent executions is what you normally see.
Using Azure Batch to just download a file seems a bit overkill as it should be a fairly quick operation. If you want to see an example of a Azure Batch job written in C# I can recommend this example => `https://github.com/Azure-Samples/batch-dotnet-quickstart/blob/master/BatchDotnetQuickstart. In terms of parallelism I think you will manage to achieve a higher degree on Azure Batch compared to Azure Data Factory.
In you need to actually download 1M files in parallel I don't think you have any other option then Azure Batch to get close to such numbers. But you most have a pretty beefy API if it can handle 1M requests within a second or two.
I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.
I have dataset of 442k JSON documents in single ~2.13GB file in Azure Data Lake Store.
I've upload it to collection in CosmosDB via Azure Data Factory pipeline. Pipeline is completed successfully.
But when I went to CosmosDB in Azure Portal, I noticed that collection size is only 1.5 GB. I've tried to run SELECT COUNT(c.id) FROM c for this collection, but it returns only 19k. I've also seen complains that this count function is not reliable.
If I open collection preview, first ~10 records match my expectations (ids and content are the same as in ADLS file).
Is there a way to quickly get real record count? Or some other way to be sure that nothing is lost during import?
According to this article, you could find:
When using the Azure portal's Query Explorer, note that aggregation queries may return the partially aggregated results over a query page. The SDKs produces a single cumulative value across all pages.
In order to perform aggregation queries using code, you need .NET SDK 1.12.0, .NET Core SDK 1.1.0, or Java SDK 1.9.5 or above.
So I suggest you could firstly try to use azure documentdb sdk to get the count value.
More details about how to use , you could refer to this article.