Shaping API endpoints and response - node.js

Hello Stackoverflow,
I'm writing API's for quite a bit of time right now and now it came to work with one of these bigger api's. Started wondering how to shape this API, as many times I've seen on a bigger platforms that one big entity (for example product page in shop) is loaded separately (We can see that item body loaded, but comments are still fetching etc.).
Usually what I've done was attaching comments as a relation in SQL query, so my frontend queried single API Endpoint like:
http://api.example.com/items/:id
And it returned all necessary data like seller info, photos etc.
Logically seller info and photos are small pieces of data (Item can only have 1 seller and no more than 10 photos for example), but number of comments might be way larger collection with relationship (comment author).
Does it make sense to separate one endpoint into 2 independent endpoints like:
http://api.example.com/items/:id
http://api.example.com/items/:id/comments
What are downsides of this approach? Is it common practice? Or maybe I misunderstood some concept?
One downside might be 2 request performed, but on the other hand, first endpoint should return data faster (as it's lighter than fetching n of comments), so page might be displayed faster and display spinner for comments section. This way I'll be able to paginate comments too.
Are there any improvements that might be included in this separation of endpoints? Or maybe I'm totally wrong and it should be done totally different way?

I think it is a good approach if:
The number of comments of one item can be large, because with this approach you could paginate it easier.
If you are going to need to access to the comments of one item without needing rest of item information
I think any of the previous conditions justify this decition, and yes, it is common approach.

Related

REST API-server dilemma

I plan to build a server that will use a REST API on NODE.JS with Meteor.
What are the differences between these two methods of writing an API:
1.http://meteorpedia.com/read/REST_API
example: someSrver.com/post/:_id
someSrver.com/post?id=_id
Thanks
I think there's no really difference at least you are handling request correctly.
I think the second style suit better if you have to pass server a lot of parameters for example a service that provide high res image, where you can specify tile size, coordinates and other things like that.
If you use api as an interface to database usually first style is used.
I've develop a rest interface using this library: https://atmospherejs.com/nimble/restivus , it's very easy to use and it use first style.
So, you should read up on REST principles and API design if you want to really understand the why's.
But generally, the rule of thumb is that a URL should represent a resource. The paths generally represent a given "thing" and the querystring represents some kind of filter on that "thing".
So if you "post" object is logically its own entity (e.g.a blog post), then it'd have its own unique url such that a GET to www.example.com/posts/:id would return the one specific blog entry you're talking about.
GET /posts would map to a list of all posts, for example, and GET /posts?tagged=cheetahs would get you a list of all posts filtered to return just those with the tag of 'cheetahs' assigned to them.
This is all rules of thumb and standards. The implementation really doesn't matter and most servers don't care; but there is value in following the standards as they tend to be more maintainable, elegant, and help you not have to make a million design decisions. If you ever want others to integrate with you, it makes it easier for them to know what to expect as well.
According to the URI standard, the query is for non-hierarchical filters while, the path is for hierarchical ones.
I would use query if we are talking about a filtered collection, so the result will be a representation of a collection, for example json []. On the other hand if we are talking about an item, then I would use the path and the json would be an object/item {}. But this is only my own style, you can use which one you prefer. (The URI structure has only routing purposes if you use REST with HATEOAS. I assume you don't.)
In REST the URL/URI is the address of an item or a collection of items. So, to get all addresses for customer 2 you could do this:
/api/customer/2/addresses
If however you wanted just those addresses with a postcode you could go:
/api/customer/2/addresses?withPostcode=1
In this case, the first URL/URI represents a thing/things whereas the second has a modifier, restriction or filter applied to it.
Therefore someSrver.com/post/:_id means get me the post which is known by that ID (though ideally it would be someSrver.com/posts/:_id - note the plural). Whereas the second one (someSrver.com/post?id=_id) implies that everything to the left of the question mark has already narrowed down your thing/things and they now need filtering by an ID property (in this case) on the thing.
It's a subtle distinction in many cases, but I'd sum it up as the first applying a selector/location and the second applying a selector/location with a filter.
Although I didn't implement REST API server in node yet, I want to share with you few important points when you design your server:
Try to use flat paths for the controllers, nested paths are causing confusion.
Avoid custom HTTP methods such as PUT, HEAD and PATCH, not all firewalls like them.
Use applicative error codes with HTTP 200 OK and use the HTTP protocol error codes only for HTTP errors.
See more: http://restafar.com/create-new-rest-server/

How can I get all the exercises for a topic (e.g., math) and all its subtopics from the khanacademy api?

Khan Academy's API Explorer has an exercises section that mentions filtering by tags, but the url with math tag applied returns nothing.
The generic exercise objects don't contain the topic they're in. My guess is that there's an id to join on somewhere in the topictree/exercises json objects, but I don't know an efficient way to find it.
Here are the raw exercises json and raw topictree json (note, the second one is huge, and contains many topics other than math).
I don't think there is a nice way to return exercises from just a subtree of the topictree (e.g. just math). Tags are a different concept, and there isn't a tag common to everything in math. Probably your best bet is to load the full topictree with just Exercises (and Topics) and work from there:
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
If you need to reference this structure repeatedly, it probably makes sense to download and filter it ahead of time, and maybe re-fetch it from time to time to account for changes to Khan Academy content. But it depends on your exact use case.
Generally, any content item can be referenced by content_id (sometimes just called id) or by slug, but unfortunately, the naming and usage aren't consistent everywhere.
You can use the following to get all the exercises:
http://www.khanacademy.org/api/v1/exercises
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
I'm not sure what's the difference between these two - I don't use them.
I prefer to fetch the data for the individual topic nodes as follows:
http://www.khanacademy.org/api/v1/topic/%s
http://www.khanacademy.org/api/v1/topic/%s/exercises
http://www.khanacademy.org/api/v1/topic/%s/videos
where %s is the "node_slug" property for each topic. The root of the tree is just "root". The first one will give you the topic details and a list of sub-items in the "child_data" array. Use the "id" properties of each sub-topic in this array to look up its details in the "children" array having "internal_id" equal to "id". There you get the "node_slug" to for the next API call for that sub-topic. The "child_data" array has all the sub-items in the order that they appear on the website when you're working with the missions.
I cache these responses so that I don't have to download everything every time.

Spotify Metadata API: Search By Artist

The original plan was to write this as a blog post, entitled "Inefficiencies in the Spotify Metadata API : Or, How the Jackson 5 killed my Browser", but changed my mind at the last minute as I have a habit of missing the obvious in documentation,perhaps an undocumented feature might exist which I have missed, or someone else has solved the issue - so hence this question has a certain blog-post tone about it!
I am developing a small web-app, mostly for a small group of people, which will allow anyone to update a Spotify playlist. As not everyone has Spotify (though I don't know why!), the page will update a database with songs, as App running in Spotify on my laptop polls the database for updates, then using the Spotify Apps API, the playlist is updated, and anyone subscribing to the playlist gets the update. This is ok, though I would like to use push rather than poll, but that's a topic for another day.
I searched around for a Javascript library to use with the Spotify Metadata API, and found one (https://github.com/palmerj3/SpotifyJS) though it was basically a wrapper and still required you to parse the JSON yourself. Thinking I could go one better and put some basic parsing in for the most common fields (title, artist, album, Spotify URI) I started working on my own library/JQuery plugin.
Search by track is not a problem, it's a single call to the spotify metadata API, the results are easily parsed, matching the returned artist with the requested artist (if present) makes for an easy search by title/artist.
Search by Artist (obtain a list of all songs by a particular artist) though, appears to be a pain-in-the-**! As best as I can tell from the docs, this is the process.
Search for the artist: this will return a list of artists which match the query
For each artist, lookup their albums: this will return a list albums
Lookup each album and retrieve a track list
Compare the artist for each track with the search artist, if it matches output
The first step will return a small list of artist matches, Foo Fighters has 2, Silverchair 1, and The Jackson 5 has 4. This small list turns into a larger number of album matches - from memory Foo Fighters returned 112, which then turns into even larger number of track lists. From a Javascript/JQuery perspective, this leads to daisy-chained AJAX request, for each step, and at each step massive numbers of, nearly concurrent GET requests against the Spotify servers.
The initial version I wrote cheated and used synchronous AJAX, and worked ok, as each request must complete before the next would start, though, this would lock the browser up for some time, and removed the possibility of using feedback to the user that the system was running. I then switched to asynchronous requests and all hell broke loose! You immediately hit issues with rate limiting on the Spotify end, which returns resoponses with 502 bad gateway (not listed in the spotify docs as a status by the way), or 503 - both of which JQuery interpreted as status code 0 - which was interesting, requiring debugging in Firebug. I throttled the requests down on the client side, I found that 1 every second was about right, to avoid rate limiting and ensure I got a response containing data each time, however, this then causes massive lock ups in the browser as it had upwards of 30 or 40 GET requests in parallel, all returning pretty much at the same time (though some requests responded after 15+ seconds!) and then parsing all the JSON responses.
I looked into alleviating the load by using a server-side approach, though this has downsides as well:
1. you don't avoid the basic issue in that the API can not handle the task in an efficient manner
2. for a busy site, the bandwidth usage will be against the server, which will also present a single IP, for multiple users you will soon hit the rate limit due to parallel users
The server side does offer caching though which could be beneficial, to this end I found a PHP Library - metatune (https://github.com/mikaelbr/metatune) advertised as the "The ultimate PHP Wrapper for the Spotify Metadata API", but unfortunately only offers the same basic lookup/search as the Spotify Metadata API - i.e.: no listing of all songs by an artist.
Thus, I have now disabled searching by artist, until I find a suitable solution.
Assuming I have not missed anything, it seems, to me at least, to not be an efficient API design, as it encourages you to make large numbers of requests to the Spotify servers, which is not good for me as a client, and not good for Spotify as a server. I can't help but think that if there was a request such as:
ws.spotify.com/search/1/artist.json?q=foo+fighters&extras=tracks
then the issues discussed here would be alleviated, a single request would cover what requires 3 sets of multiple requests currently; rate limiting wouldn't be as big an issue; the overheads to process the data on the client are greatly reduced; the overheads for Spotify to handle would be reduced and the entire service would be more efficient. The fact that the request would return a very large data set is not an issue, as the API already splits data into "pages".
So, my questions to the crowd:
1. Have I missed something obvious in the documentation, or is there a secret request?
2. In the absence of an API request, does anyone have a suggestion on how to make my system more efficient?
3. Has anyone solved this issue before?
Thanks for reading! Took a long time to get to the questions, but I felt it necessary to provide as much reasoning to find the best solution, and also, it illustrates the deficiency in the API, which I hope someone from Spotify will notice!
Finally as an aside, projects like this make me feel like we've swapped Flash for Javascript but the performance is still as bad! Anyone else feel the same?
Cheers!
sockThief
Unless I'm missing something, does this do what you want?
http://ws.spotify.com/search/1/track.json?q=artist:foo+fighters
The artist: prefix tells the search service to only match on artist. You can read more about the advanced search syntax (which also works in the client) here.

Magento: Attribute with thousands of values/options

I'm creating a Book store in Magento and am having trouble figuring out the best way to handle the Authors of a Book (which would be the product).
What I currently have is an Attribute called "authors" which is multi-select and a thousand [test] values. It's still manageable but does get a little slow when editing a product. Also, when adding an option/value to the authors attribute itself, a huge list is rendered in the HTML making this an inefficient solution.
Is there another approach I should take?
Is it possible to create an Author object (entity type?) which is associated to a product through a join table? If yes, can someone give me an explanation about how that is done or point me to some good documentation?
If I'd take the Author object approach, could that still be used in the layered navigation?
How would I show the list of all books for a single author?
Thanks in advance!
PS: I am aware of extensions like Improved Navigation but AFAIK it adds something like attributes to attributes themselves which is not what I'm looking for.
For Googlers: The same would apply for Artists of a music site or manufacturers.
If you create an author entity type, you'll just increase your work trying to add it to layered navigation, and I don't see a reason why it would be faster.
Your approach seems the best fit to the problem, given the way Magento is set up. How are you going to display 1,000 (which presumably pales in comparison to the actual list) authors in layered navigation?
Depending on the requirements, you could go the route of denormalizing the field and accepting text for it. That would still allow you to display it, search based on it, etc, but would eliminate the need to render every possible artist to manipulate the list. You could add a little code around selecting the proper artist (basically add an AJAX autocomplete to the backend field) to minimize typos as well.
Alternatively, you could write a simple utility to add a new artist to the system without some of the overhead of Magento's loading the list. To be honest, though, it seems that the lag that this has the potential to create on the frontend will probably outweigh the backend trouble.
Hope that helps!
Thanks,
Joe

standard way of encoding pagination info on a restful url get?

I think my question would be better explained with a couple of examples...
GET http://myservice/myresource/?name=xxx&country=xxxx&_page=3&_page_len=10&_order=name asc
that is, on the one hand I have conditions ( name=xxx&country=xxxx ) and on the other hand I have parameters affecting the query ( _page=3&_page_len=10&_order=name asc )
now, I though about using some special prefix ( "_" in thes case ) to avoid collisions between conditions and parameters ( what if my resourse has an "order" property? )
is there some standard way to handle these situations?
--
I found this example (just to pick one)
http://www.peej.co.uk/articles/restfully-delicious.html
GET http://del.icio.us/api/peej/bookmarks/?tag=mytag&dt=2009-05-30&start=1&end=2
but in this case condition fields are already defined (there is no start nor end property)
I'm looking for some general solution...
--
edit, a more detailed example to clarify
Each item is completely indepent from one another... let's say that my resources are customers, and that (luckily) I have a couple millions of them in my db.
so the url could be something like
http://myservice/customers/?country=argentina,last_operation=2009-01-01..2010-01-01
It should give me all the customers from argentina that bought anything in the last year
Now I'd like to use this service to build a browse page, or to fill a combo with ajax, for example, so the idea was to add some metada to control what info should I get
to build the browse page I would add
http://...,_page=1,_page_len=10,_order=state,name
and to fill an autosuggest combo with ajax
http://...,_page=1,_page_len=100,_order=state,name,name=what_ever_type_the_user*
to fill the combo with the first 100 customers matching what the user typed...
my question was if there was some standard (written or not) way of encoding this kind of stuff in a restfull url manner...
While there is no standard, Web API Design (by Apigee) is a great book of advice when creating Web APIs. I treat it as a sort of standard, and follow its recommendations whenever I can.
Under "Pagination and partial response" they suggest (page 17):
Use limit and offset
We recommend limit and offset. It is more common, well understood in leading databases, and easy for developers.
/dogs?limit=25&offset=50
There's no standard or convention which defines a way to do this, but using underscores (one or two) to denote meta-info isn't a bad idea. This is what's used to specify member variables by convention in some languages.
Note:
I started writing this as a comment to my previous answer. Then I was going to add it as an edit, but I think that it belongs as a separate answer instead. This is a completely different approach and a separate answer in its own right since it is a different approach.
The more that I have been thinking about this, I think that you really have two different resources that you have to deal with:
A page of resources
Each resource that is collected into the page
I may have missed something (could be... I've been guilty of misinterpretation). Since a page is a resource in its own right, the paging meta-information is really an attribute of the resource so placing it in the URL isn't necessarily the wrong approach. If you consider what can be cached downstream for a page and/or referred to as a resource in the future, the resource is defined by the paging attributes and the query parameters so they should both be in the URL. To continue with my entirely too lengthy response, the page resource would be something like:
http://.../myresource/page-10/3?name=xxx&country=yyy&order=name&orderby=asc
I think that this gets to the core of your original question. If the page itself is a resource, then the URI should describe the page so something like page-10 is my way of saying "a page of 10 items" and the next portion of the page is the page number. The query portion contains the filter.
The other resource names each item that the page contains. How the items are identified should be controlled by what the resources are. I think that a key question is whether the result resources stand on their own or not. How you represent the item resources differs based on this concept.
If the item representations are only appropriate when in the context of the page, then it might be appropriate to include the representation inline. If you do this, then identify them individually and make sure that you can retrieve them using either URI fragment syntax or an additional path element. It seems that the following URLs should result in the fifth item on the third page of ten items:
http://.../myresource/page-10/3?...#5
http://.../myresource/page-10/3/5?...
The largest factor in deciding between these two is how strongly coupled the individual item is with the page. The fragment syntax is considerably more binding than the path element IMHO.
Now, if the item resources are free-standing and the page is simply the result of a query (which I think is likely the case here), then the page resource should be an ordered list of URLs for each item resource. The item resource should be independent of the page resource in this case. You might want to use a URI that is based on the identifying attribute of the item itself. So you might end up with something like:
http://.../myresource/item/42
http://.../myresource/item/307E8599-AD9B-4B32-8612-F8EAF754DFDB
The key deciding factor is whether the items are freestanding resources or not. If they are not, then they are derived from the page URI. If they are freestanding, then they should have their are defined by their own resources and should be included in the page resource as links instead.
I know that the RESTful folk tend to dislike the usage of HTTP headers, but has anyone actually looked into using the HTTP ranges to solve pagination. I wrote a ISAPI extension a few years back that included pagination information along with other non-property information in the URI and I never really like the feel of it. I was thinking about doing something like:
GET http://...?name=xxx&country=xxxx&_orderby=name&_order=asc HTTP/1.1
Range: pageditems=20-29
...
This puts the result set parameters (e.g., _orderby and _order) in the URI and the selection as a Range header. I have a feeling that most HTTP implementations would screw this up though especially since support for non-byte ranges is a MAY in RFC2616. I started thinking more seriously about this after doing a bunch of work with RTSP. The Range header in RTSP is a nice example of extending ranges to handle time as well as bytes.
I guess another way of handling this is to make a separate request for each item on the page as an individual resource in its own right. If your representation allows for this, then you might want to consider it. It is more likely that intermediate caching would work very well with this approach. So your resources would be defined as:
myresource/name=xxx;country=xxx/orderby=name;order=asc/20/
myresource/name=xxx;country=xxx/orderby=name;order=asc/21/
myresource/name=xxx;country=xxx/orderby=name;order=asc/22/
myresource/name=xxx;country=xxx/orderby=name;order=asc/23/
myresource/name=xxx;country=xxx/orderby=name;order=asc/24/
I'm not sure if anyone has tried something like this or not. This would make URIs constructible which is always a useful property IMHO. The bonus to this approach is that the individual responses could be cached and the server is free to optimize handling of collecting pages of items and what not in the most efficient way. The basic idea is to have the client specify the query in the URI and the index of them item that it wants to retrieve. No need to push the idea of a "page" into the resource or even to make it visible. The client can iteratively retrieve objects until it's page is full or it receives a 404.
There is a downside of course... the HTTP server and infrastructure has to support pipelining or the cost of creation/destruction of connections might kill the idea outright.

Resources