How to correct paging by using top and skip in Azure Search?

How to correct paging by using top and skip in Azure Search? - azure

According to this document Total hits and Page Counts, if I want to implements paging, I should combine skip and top in my query.
The following is my POST query
{
"count": true ,
"search": "apple",
"searchFields": "content",
"select": "itemID, content",
"searchMode": "all",
"queryType":"full",
"skip": 100,
"top":100
}
It should return something like from 100 --> 200
{
"#odata.context": "https://example.search.windows.net/indexes('example-index-')/$metadata#docs(*)",
"#odata.count": 1611,
"#search.nextPageParameters": {
"count": true,
"search": "apple",
"searchFields": "content",
"select": "itemID, content",
"searchMode": "all",
"queryType": "full",
"skip": 200
},
But it just return top 100 , not paging offset to 100
"#odata.context": "https://example.search.windows.net/indexes('example-index')/$metadata#docs(*)",
"#odata.count": 1611,
Is there any thing I should setup?

The short answer of how to implement paging (from this article): Set top to your desired page size, then increment skip by that page size on every request. Here is an example of how that would look with the REST API using GET:
GET /indexes/onlineCatalog/docs?search=*$top=15&$skip=0
GET /indexes/onlineCatalog/docs?search=*$top=15&$skip=15
GET /indexes/onlineCatalog/docs?search=*$top=15&$skip=30
The REST API documentation also covers how to implement paging, as well as the true purpose of #odata.nextLink and #search.nextPageParameters (also covered in this related question).
Quoting from the API reference:
Continuation of Partial Search Responses
Sometimes Azure Search can't return all the requested results in a single Search response. This can happen for different reasons, such as when the query requests too many documents by not specifying $top or specifying a value for $top that is too large. In such cases, Azure Search will include the #odata.nextLink annotation in the response body, and also #search.nextPageParameters if it was a POST request. You can use the values of these annotations to formulate another Search request to get the next part of the search response. This is called a continuation of the original Search request, and the annotations are generally called continuation tokens. See the example in Response below for details on the syntax of these annotations and where they appear in the response body.
The reasons why Azure Search might return continuation tokens are implementation-specific and subject to change. Robust clients should always be ready to handle cases where fewer documents than expected are returned and a continuation token is included to continue retrieving documents. Also note that you must use the same HTTP method as the original request in order to continue. For example, if you sent a GET request, any continuation requests you send must also use GET (and likewise for POST).
Note
The purpose of #odata.nextLink and #search.nextPageParameters is to protect the service from queries that request too many results, not to provide a general mechanism for paging. If you want to page through results, use $top and $skip together. For example, if you want pages of size 10, your first request should have $top=10 and $skip=0, the second request should have $top=10 and $skip=10, the third request should have $top=10 and $skip=20, and so on.

The follwing example is Get method without top.
GET Method
https://example.search.windows.net/indexes('example-content-index-zh')/docs?api-version=2017-11-11&search=2018&$count=true&$skip=150&select=itemID
The Result:
{
"#odata.context": "https://example.search.windows.net/indexes('example-index-zh')/$metadata#docs(*)",
"#odata.count": 133063,
"value": [ //skip value ],
"#odata.nextLink": "https://example.search.windows.net/indexes('example-content-index-zh')/docs?api-version=2017-11-11&search=2018&$count=true&$skip=150"
}
No matter I use post or get, onces I add $top=# the #odata.nextLink will not show in result.
Finally, I figure out although #odata.nextLink or #search.nextPageParameters will not show. But the paging is working, I have to count myself.

Related

Youtube Data API Search Returning Repeating Items

(Duplicate of this question since I don't have enough rep to add a comment).
Essentially when using Search and using the page token to get more results, the results in the following pages tend to have results from the previous pages. The more pages you go through, the more and more repeating videos appear.
You can test this directly via the documentation which allows you to perform calls from there. Do a search query for anything, keep track of the video IDs in the results, wait a few seconds, and then do another query with the next page token, and repeat. It sometimes takes around 5 or so pages before a duplicate shows up. The same issue happens if you search for related videos instead of a query.
Is this intended behavior? I cannot seem to locate anything in the documentation mentioning this. I may be wrong, but I feel like this issue only started happening this month because I did not notice this behavior in an application I was working on around a month ago.

The Youtube API returns the response in a paginated manner. This means that if you use the search functionality, your search results will be available on different pages where each page has a different page token. The maxResults parameter determines the number of results on each page(default=50). To tackle this problem and return new/different responses with each call , pass the nextPageToken to your next API call.
For example, if your first API call looks like this :
GET https://youtube.googleapis.com/youtube/v3/search?part=snippet&maxResults=10&q=cricket&key=[YOUR_API_KEY]
Your API response would look like :
{
"kind": "youtube#searchListResponse",
"etag": "uN1c33JfiFaPBemlxN5kH8lSaHw",
"nextPageToken": "CAoQAA",
"regionCode": "IN",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 10
},
To get the next 10 results of those 1000000, add pagetoken = nextPageToken to your query ,something like this :
GET https://youtube.googleapis.com/youtube/v3/search?part=snippet&maxResults=10&pageToken=CAoQAA&q=cricket&key=[YOUR_API_KEY]
AND VOILA!
{
"kind": "youtube#searchListResponse",
"etag": "NeaA5DLyr3YIaKdX5ZxETA3GfhY",
"nextPageToken": "CBQQAA",
"prevPageToken": "CAoQAQ",
"regionCode": "IN",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 10
}
YOU GET THE NEXT 10 RESULTS!
THE FIRST PAGE WOULD NOT HAVE ANY PAGETOKEN, SO THE FIRST API CALL NEEDS TO BE MADE WITHOUT PAGETOKEN PARAMETER
Refer to the official documentation for more details:
https://developers.google.com/youtube/v3/docs/search/list

Dynamic REST calls in Azure Synapse Pipeline

I am making a call to a REST API with Azure Synapse and the return dataset looks something like this:
{
"links": [
{
"rel": "next",
"href": "[myRESTendpoint]?limit=1000&offset=1000"
},
{
"rel": "last",
"href": "[myRESTendpoint]?limit=1000&offset=60000"
},
{
"rel": "self",
"href": "[myRESTendpoint]"
}
],
"count": 1000,
"hasMore": true,
"items": [
{
"links": [],
"closedate": "6/16/2014",
"id": "16917",
"number": "62000",
"status": "H",
"tranid": "0062000"
},...
],
"offset": 0,
"totalResults": 60316
}
I am familiar with making a REST call to a single endpoint that can return all the the data with a single call using a Synapse pipeline, but this particular REST endpoint has a hard limit on only returning 1000 records, but it does give a property named "hasMore".
Is there a way to recursively make rest calls in a Synapse pipeline until the "hasMore" property equals false?
The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there.

I have tried to achieve the same scenario using Azure Data Factory which seems to be more appropriate and easy to achieve the goal "The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there".
As you have to hit the page recursively to fetch 1000 records , you might set it in the following fashion if the response header/response body contain the URL for the next page.
You're less likely to be able to use the functionality if the next page link or query parameter isn't included in the response headers/body.
Alternatively, you may utilise loop logic and do the Copy Activity.
Create two parameters in the Rest Connector:
Fill in the parameters for the RestConnector's relative URL.
Using the Set Variable action, the value of this variable would be increased in a loop. For each cycle, the URL for the Copy Activity is dynamically set.If you want to loop or iterate, you may use the Until activity.
Alternative:
In my experience, the REST connection pagination is quite rigid. Usually put the action within a loop. As a result, to have more control.
FOREACH Loop, here

For those following the thread, I used IpsitaDash-MT's suggestion using the ForEach loop. In the case of this API, when a call is made I get a property returned at the end of the call named "totalResults". Here are the steps I used to achieve what I was looking to do:
Make a dummy call to the API to get the "totalResults" parameter. This is just a call to return the number of results I am looking to get. In the case of this API, the body of the request is a SQL statement, so when the dummy request is made I am only asking for the ID's of the results I am looking to get.
SQL statement example
I then take the property "totalResults" from that request set a dynamic value in the "Items" of the ForEach loop like this:
#range(0,add(div(sub(int(activity('Get Pages Customers').output.totalResults),mod(int(activity('Get Pages Customers').output.totalResults),1000)),1000),1))
NOTE: The API only allows pages of 1000 results, I do some math to get a range of page numbers. I also have to add 1 to the final result to include the last page.
ForEach Loop Settings
In the API I have two parameters that can be passed "limit" and "offset". Since I want all of the data there is no reason to have limit set to anything other than 1000 (the max allowable number). The offset parameter can be set to any number less than or equal to "totalResults" - "limit" and greater than or equal to 0. So I use the range established in step 2 and multiply it out by 1000 to set the offset parameter in the URL.
Setting the offset parameter in the copy data activity
Dynamic value of the Relative URL in the REST connector
NOTE: I found it better to sink the data as JSON into ADLS2 first rather than into a dedicated SQL pool due to the Lookup feature.
Since synapse does not allow nested ForEach loops, I run the data through a data flow to format the data and check for duplicates and updates.
When the data flow is completed it kicks off a lookup activity to get the data that was just processed and pass it into a new pipeline to use another ForEach loop to get the child data for each ID of parent data.
Data Flow and Lookup for child data pipeline

Google Places API returns 0 results after deploying on Google Cloud

I have developed an application that is powered by Google Places API. The problem is the places are loading when running locally but not after deploying it on google cloud. I am using a default keyword to fetch the desired results but surprisingly it is not working after its deployed. I tried changing the keyword but still, it returns 0 results. Please have a look at my code below
await axios
.get("https://maps.googleapis.com/maps/api/place/nearbysearch/json", {
params: {
key: process.env.GOOGLE_PLACE_API_KEY,
location: req.body.ll,
radius: 20000,
keyword: "popular destinations near me",
},
})
and the response I get
{ html_attributions: [], results: [], status: 'ZERO_RESULTS' }
Postman request of the same works without any issue
and the same request sent with a raw JSON data, I am getting an error
{
"key": "my key",
"location": "my location",
"radius": "20000",
"keyword": "popular destinations near me"
}
{
"error_message": "You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account",
"html_attributions": [],
"results": [],
"status": "REQUEST_DENIED"
}
The same request sent using postman returns 20+ results. I have no clue what could possibly be wrong with the above request. Any help is appreciated, thanks.

The Place API Place Search Nearby Search requests required parameters which are key, location, and radius. There are optional parameters you can input too, they are
keyword(not keywords),
language,
minprice&maxprice,
name,
The name field is no longer restricted to place names. Values in this field are combined with values in the keyword field and passed as part of the same search string. We recommend using only the keyword parameter for all search terms.
opennow,rankby,type,pagetoken.
So you cannot assign more than one keyword to your request.

Custom update actions in RESTful Services

My API requires many different types of updates that can be performed by different types of roles. A ticket can have it's data updated, a ticket can be approved (which includes some information), a ticket can be rejected, a ticket can be archived (state that makes a ticket unable to be updated), etc.
I've recently started working as a backend developer and I really do not know what is the most correct approach to this situation but I've two ideas in mind:
A single update endpoint (e.g. /api/tickets/:id) that accepts an action field with the type of update that wants to be done to that file;
Multiple custom action endpoints (e.g. /api/tickets/:id/validate, /api/tickets/:id/accept, etc.)
Which one of those is the best approach to the situation when it comes to the REST architecture? If both are incorrect when it comes to REST, what would be the most correct approach? I couldn't really find any post on SO that answered my question so I decided to create my own. Thank you!

REST stands for Representational State Transfer, which means that the client and the server affect each other’s state by exchanging representations of resources.
A client might GET a representation of a ticket like this:
GET /api/tickets/123 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
"id": 123,
"state": "new",
"archived": false,
"comments": []
}
Now the client can PUT a new representation (replacing it wholesale) or PATCH specific parts of the existing one:
PATCH /api/tickets/123 HTTP/1.1
Content-Type: application/json-patch+json
[
{"op": "replace", "path": "/state", "value": "approved"},
{"op": "add", "path": "/comments", "value": {
"text": "Looks good to me!"
}}
]
HTTP/1.1 200 OK
Content-Type: application/json
{
"id": 123,
"state": "approved",
"archived": false,
"comments": [
{"author": "Thomas", "text": "Looks good to me!"}
]
}
Note how the update is expressed in terms of what is to be changed in the JSON representation, using the standard JSON Patch format. The client could use the same approach to update any resource represented in JSON. This contributes to a uniform interface, decoupling the client from the specific server.
Of course, the server need not support arbitrary updates:
PATCH /api/tickets/123 HTTP/1.1
Content-Type: application/json-patch+json
[
{"op": "replace", "path": "/id", "value": 456}
]
HTTP/1.1 422 Unprocessable Entity
Content-Type: text/plain
Cannot replace /id.
Still, this might require more complexity on the server than dedicated operations like “approve”, “reject”, etc. You might want to use a library to implement JSON Patch. If you find that you need many custom actions which are hard to express in terms of a representation, an RPC architecture might suit you better than REST.

REST is all about resources. And the state of the resources should be manipulated using representations (such as JSON, XML, you name it) on the top of stateless communication between client and server.
Once URIs are meant to identify the resources, it makes sense to use nouns instead of verbs in the URI. And when designing a REST API over the HTTP protocol, HTTP methods should be used to indicate the action intended to be performed over the resource.
Performing partial updates
You could use the PATCH HTTP verb to perform partial updates to your resource. The PATCH request payload must contain set of changes to applied to the resource.
There are a couple of formats that can be used to describe the set of changes to be applied to the resource:
###JSON Merge Patch
It's defined in the RFC 7396 and can be applied as described below:
If the provided merge patch contains members that do not appear within the target, those members are added. If the target does contain the member, the value is replaced. Null values in the merge patch are given special meaning to indicate the removal of existing values in the target.
So a request to modify the status of a ticket could be like:
PATCH /api/tickets/1 HTTP/1.1
Host: example.org
Content-Type: application/merge-patch+json
{
"status": "valid"
}
###JSON Patch
It's defined in the RFC 6902 and expresses a sequence of operations to be applies to a target JSON document. A request to modify the status of a ticket could be like:
PATCH /api/tickets/1 HTTP/1.1
Host: example.org
Content-Type: application/json-patch+json
[
{
"op": "replace",
"path": "/status",
"value": "valid"
}
]
The path is a JSON Pointer value, as described in the RFC 6901.

Try to either
Deal with a single object -> api/v1/tickets/1
Deal with a list of objects -> api/v1/tickets/.
Then try to capture all actions as CRUD actions.
Create object(s) -> HTTP POST
Retreive object(s) -> HTTP GET
Update object(s) -> HTTP PATCH
Delete object(s) -> HTTP DELETE
And also:
Save object(s) entirely -> HTTP PUT
When you are changing statuses, and these are just attributes on a ticket. I would send a PATCH request, for instance. If I need to change the statues of ticket #1 to "rejected" I would send something like PATCH api/v1/tickets/1 with a payload like:
{
"status": "rejected"
}
REST has a lot of best practices but not everything is set in stone. Maybe this tutorial: https://restfulapi.net can help you?

Really it all comes down to a matter of taste. It is often observed in the industry to have the static parameters in the URL (eg: /tickets/update, /users/add, /admin/accounts) and the variable parameters in the query (eg: IDs, messages, names). It allows to have a fixed number of endpoints.
I see you're using NodeJS so you're probably using Express and in Express getting the url parameters and the body parameters are equally easy:
// supposing you're using a JSON-based API with body-parser for JSON parsing
app.get('/some/path/:someParam', (req, res) => {
console.log(req.params.someParam);
console.log(req.body.someOtherParam);
res.send();
}

should I pass query parameters in the url or in the payload?

Suppose I have a REST endpoint like this :
http://server/users/query
And I have parameters in my query : age, city, country
I want to do a GET request with those parameters.
Should I better pass the parameters in the url ? Or put something like this in the payload of my GET request.
"query": {
"age": "something",
"city": "something",
"country": "something"
}

On my understanding, you have a collection of users and you want to get a representation of it. You should consider query parameters to filter your collection, as following:
http://[host]/api/users?age=something&city=something&country=something
And avoid GET requests with a payload. See the quote from the RFC 7231:
A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.

From MDN: GET requests (typically) don't have bodies so use the query parameters or the path.
If you are making requests to a server you should instead read the documentation of it's API.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string