So I'm writing an app in Node.js which analyses sentiment and extracts key phrases using Microsoft Cognitive services. To do these two separate things, I use two nested POST calls, one starting when the other finishes. However, this seems inefficient. The documentation mentions that, to examine sentiment for example, the URL 'must contain /sentiment', not that it must be the only endpoint for that one call. Perhaps I could do both in one call and save myself Azure budget and execution time?
I have tried /sentiment/keyPhrases, that returns 404. /sentiment/../keyPhrases just does keyPhrases. I cannot think of another way to combine the two endpoints.
Turns out I was mislead by my understanding of the documentation. Endpoints must be called separately, and the answers can be combined in Node afterwards using response = response1.concat(response2);.
Related
My needs are following:
- Need to fetch data from a 3rd party API into SQL azure.
The API's will be queried everyday for incremental data and may require pagination as by default any API response will give only Top N records.
The API also needs an auth token to work, which is the first call before we start downloading data from endpoints.
Due to last two reasons, I've opted for Function App which will be triggered daily rather than data factory which can query web APIs.
Is there a better way to do this?
Also I am thinking of pushing all JSON into Blob store and then parsing data from the JSON into SQL Azure. Any recommendations?
How long does it take to call all of the pages? If it is under ten minutes, then my recommendation would be to build an Azure Function that queries the API and inserts the json data directly into a SQL database.
Azure Function
Azure functions are very cost effective. The first million execution are free. If it takes longer than ten, then have a look at durable functions. For handling pagination, we have plenty of examples. Your exact solution will depend on the API you are calling and the language you are using. Here is an example in C# using HttpClient. Here is one for Python using Requests. For both, the pattern is similar. Get the total number of pages from the API, set a variable to that value, and loop over the pages; Getting and saving your data in each iteration. If the API won't provide the max number of pages, then loop until you get an error. Protip: Make sure specify an upper bound for those loops. Also, if your API is flakey or has intermittent failures, consider using a graceful retry pattern such as exponential backoff.
Azure SQL Json Indexed Calculated Columns
You mentioned storing your data as json files into a storage container. Are you sure you need that? If so, then you could create an external table link between the storage container and the database. That has the advantage of not having the data take up any space in the database. However, if the json will fit in the database, I would highly recommend dropping that json right into the SQL database and leveraging indexed calculated columns to make querying the json extremely quick.
Using this pairing should provide incredible performance per penny value! Let us know what you end up using.
Maybe you can create a time task by SQL server Agent.
SQL server Agent--new job--Steps--new step:
In the Command, put in your Import JSON documents from Azure Blob Storage sql statemanets for example.
Schedules--new schedule:
Set Execution time.
But I think Azure function is better for you to do this.Azure Functions is a solution for easily running small pieces of code, or "functions," in the cloud. You can write just the code you need for the problem at hand, without worrying about a whole application or the infrastructure to run it. Functions can make development even more productive, and you can use your development language of choice, such as C#, F#, Node.js, Java, or PHP.
It is more intuitive and efficient.
Hope this helps.
If you could set the default top N values in your api, then you could use web activity in azure data factory to call your rest api to get the response data.Then configure the response data as input of copy activity(#activity('ActivityName').output) and the sql database as output. Please see this thread :Use output from Web Activity call as variable.
The web activity support authentication properties for your access token.
Also I am thinking of pushing all JSON into Blob store and then
parsing data from the JSON into SQL Azure. Any recommendations?
Well,if you could dump the data into blob storage,then azure stream analytics is the perfect choice for you.
You could run the daily job to select or parse the json data with asa sql ,then dump the data into sql database.Please see this official sample.
One thing to consider for scale would be to parallelize both the query and the processing. If there is no ordering requirement, or if processing all records would take longer than the 10 minute function timeout. Or if you want to do some tweaking/transformation of the data in-flight, or if you have different destinations for different types of data. Or if you want to be insulated from a failure - e.g., your function fails halfway through processing and you don't want to re-query the API. Or you get data a different way and want to start processing at a specific step in the process (rather than running from the entry point). All sorts of reasons.
I'll caveat here to say that the best degree of parallelism vs complexity is largely up to your comfort level and requirements. The example below is somewhat of an 'extreme' example of decomposing the process into discrete steps and using a function for each one; in some cases it may not make sense to split specific steps and combine them into a single one. Durable Functions also help make orchestration of this potentially easier.
A timer-driven function that queries the API to understand the depth of pages required, or queues up additional pages to a second function that actually makes the paged API call
That function then queries the API, and writes to a scratch area (like Blob) or drops each row into a queue to be written/processed (e.g., something like a storage queue, since they're cheap and fast, or a Service Bus queue if multiple parties are interested (e.g., pub/sub)
If writing to scratch blob, a blob-triggered function reads the blob and queues up individual writes to a queue (e.g., a storage queue, since a storage queue would be cheap and fast for something like this)
Another queue-triggered function actually handles writing the individual rows to the next system in line, SQL or whatever.
You'll get some parallelization out of that, plus the ability to start from any step in the process, with a correctly-formatted message. If your processors encounter bad data, things like poison queues/dead letter queues would help with exception cases, so instead of your entire process dying, you can manually remediate the bad data.
I need to set up an application in Azure and make communicate 2 functions (one written in C# and one written in JavaScript).
The C# fragment consists in analyzing a XML feed, get the data and save in objects then finally send them to the other JavaScript function by parameter.
I did read that we could establish communication between both functions using HTTP calls but is it possible to do it with parameters ?
If not, would have any suggestions in order to achieve something like this properly? I'm getting started with Azure and i don't have enough visibility to know what is recommened in such a situation
Thank you for your advices
Yes, this is absolutely possible. How you do this is up to you. If you look at the default HTTP trigger templates, you can see that they take parameters (for example, as query string parameters). You can find more examples in the HTTP and webhook recipes documentation.
You can use other trigger types for cross-function communication as well. Take a look at this documentation for related best practices: https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practices#cross-function-communication
As it says in the documentation for the Microsoft Bot Framework, they have different types of data. One of them is the dialogData, privateConversationData, conversationData and userData.
By default, it seems the userData is/should be prepared to handle the persistency across nodes, however the dialogData should be used for temporary data.
As it says here: https://learn.microsoft.com/en-us/bot-framework/nodejs/bot-builder-nodejs-dialog-waterfall
If the bot is distributed across multiple compute nodes, each step of
the waterfall could be processed by a different node, therefore it's
important to store bot data in the appropriate data bag
So, basically, if I have two nodes, how/why should I used dialogData at all, as I cannot guarantee it will be kept across nodes? It seems that if you have more than one node, you should just use userData.
I've asked the docs team to remove the last portion of the sentence: "therefore it's important to store bot data in the appropriate data bag". It is misleading. The Bot Builder is restful and stateless. Each of the dialogData, privateConversationData, conversationData and userData are stored in the State Service: so any "compute node" will be able to retrieve the data from any of these objects.
Please note: the default Connector State Service is intended only for prototyping, and should not be used with production bots. Please use the Azure Extensions or implement a custom state client.
This blog post might also be helpful: Saving State data with BotBuilder-Azure in Node.js
I want to use the spotify api to create a webapp. Without going into too much detail about the project, I want to clear up whether it would be against the terms and conditions or not.
After reading the terms and conditions, i read this line under things NOT to do: "aggregate Metadata to create data bases, or any other compilations of Metadata".
I don't plan to do any automated requests, for example, hammering the service with different queries to build a database... I'm just wondering whether I can store results from users who have performed searches via my application to the api, so that I can build content from my database on other parts of the application.
Thanks
I'm not a lawyer, so you'll need to have a lawyer confirm this (contracts, including ToS contracts, are important), but the general gist is that if you cache the results of user-generated requests to create features then you're ok. If you start caching stuff not generated by a user, you're in muddy water.
Good:
Other users who searched for "Madonna" in MyAwesomeApp also searched for "Backstreet Boys"!
Bad:
Here's a list of all the blue cover arts on Spotify: [list]
To generate the first example you can cache and work with searches explicitly done by users of your application. The second would require scraping all of the coverart in the service, which isn't allowed.
Trying to sell a move to ServiceStack from traditional ASP.Net /SOAP web services with the management team.
I am struggling with a some RPC'ish issues. Requirement is that I support SOAP (even backhandedly) in the hope of selling my service consumers on REST.
Take for example a service called "ReplaceItem" which basically requires:
Close out item number
Replacement item number
Store Number
Bunch of other replacement item data
Should I create a ReplacementItem DTO? It seems to be if I have a number of these type of functions I am just going to have tons of DTOs instead of tons of RPC methods. Plus what is the "id" in this case and what REST method would I be using?
I get that REST/SS gives me basic CRUD functionality for domain level structures like Items/Customers/etc, but how do I handle non-CRUD methods in SS.
I am also having issues with multiple parameters making up the primary key for a certain service. Almost all Inventory tables are structured by Item Number AND Store Number. I'd rather not dump the creation of some composite string on the service client. How do I handle this?
Thanks.
ServiceStack promotes a SOA-like message-based design that is optimal and provides many natural benefits for remote services.
My initial thoughts would look something like
POST {CloseItemNumber} /item/1/close
POST {ItemNumber} /item/1?replace=true
POST {ItemNumber} /item/1
POST {ItemNumber} /item/1 i.e. same DTO/service different values.
Where ItemNumber and CloseItemNumber are separate Request DTOs and services.
Designing Service APIs
I prefer to structure my services around 'resources/nouns' and design my service APIs as actions that apply operations to them.
If the operation requires more information than storing the Resource DTO I would create a separate service with the additional metadata.
i.e. Here's how I would convert Amazons 'RPC' service to be more REST-ful:
https://ec2.amazonaws.com/?Action=AttachVolume
&VolumeId=vol-4d826724
&InstanceId=i-6058a509
&Device=/dev/sdh
&AUTHPARAMS
Into how I prefer to write it:
POST https://ec2.amazonaws.com/volumes/vol-4d826724/attach
FormData: InstanceId=i-6058a509&Device=/dev/sdh&AUTHPARAMS
Which would still use an explicit AttachVolume Request DTO.
Another example I use to showcase the different between WCF RPC and ServiceStack's coarse-grained message-based approach is in: https://gist.github.com/1386381
Difference between an RPC-chatty and message-based API:
This is a typical API that WCF encourages:
public interface IService
{
Customer GetCustomerById(int id);
Customer[] GetCustomerByIds(int[] id);
Customer GetCustomerByUserName(string userName);
Customer[] GetCustomerByUserNames(string[] userNames);
Customer GetCustomerByEmail(string email);
Customer[] GetCustomerByEmails(string[] emails);
}
This is an equivalent message-based API we encourage in ServiceStack:
public class Customers {
int[] Ids;
string[] UserNames;
string[] Emails;
}
public class CustomersResponse {
Customer[] Results;
}
Note: If you want your same services to support a both SOAP and a REST-based API, you will need to structure your services slightly differently to overcome SOAP's limitation of tunnelling all operations through HTTP POST.
Problem i still have when deciding to switch from 1 chatty RPC api to a REST api is that instead of having several functions easy to maintain, i find myself with either 2 solutions :
multiplying DTOs and services that makes internal code for the services being chatty and complex
or
putting into a single route (OnGet) all the code managing the service but this way i have to parse the different parameters to 'discover' which parameters have been requested (to simplify instead of having multiple simple functions with pre-defined parameters i now have only one function that has to determine which parameters are meaningful ... but that is VERY hard to maintain - code is more complex to me now).
In the proposed solution to solve GetCustomerById, GetCustomersByEmails etc. the point to me is that instead of having simple queries, we now have to dynamically construct the query based on the filled parameters - that can make the code tricky and hard to maintain - having to manage possible combinations of multiple parameters - some combinations not being possible too.
Feeling little bit sad about that as i REALLY don't like WCF at all.
Mixing WCF and REST seems summum of the complexity - the worst for me (complexity of defining a REST api + complexity of WCF).
Are my feelings shared or did i miss something ?