Downsides of an API which neglects http method and path - node.js

I'm wondering what the downsides would be for a production server whose api is totally ignorant of the HTTP request path. For example, an api which is fully determined by query parameters, or even fully determined by the http body.
let server = require('http').createServer(async (req, res) => {
let { headers, method, path, query, body } = await parseRequest(res);
// `headers` is an Object representing headers
// `method` is 'get', 'post', etc.
// `path` could look like /api/v2/people
// `query` could look like { filter: 'age>17&age<35', page: 7 }
// `body` could be some (potentially large) http body
// MOST apis would use all these values to determine a response...
// let response = determineResponse(headers, method, path, query, body);
// But THIS api completely ignores everything except for `query` and `body`
let response = determineResponse(query, body);
doSendResponse(res, response); // Sets response headers, etc, sends response
});
The above server's API is quite strange. It will completely ignore the path, method, headers, and body. While most APIs primarily consider method and path, and look like this...
method path description
GET /api - Metadata about api
GET /api/v1 - Old version of api
GET /api/v2 - Current api
GET /api/v2/people - Make "people" db queries
POST /api/v2/people - Insert a new person into db
GET /api/v2/vehicles - Make "vehicle" db queries
POST /api/v2/vehicles - Insert a new vehicle into db
.
.
.
This API only considers url query, and looks very different:
url query description
<empty> - Metadata about api
apiVersion=1 - Old version of api
apiVersion=2 - Current api
apiVersion=2&table=people&action=query - Make "people" db queries
apiVersion=2&table=people&action=insert - Add new people to db
.
.
.
Implementing this kind of api, and ensuring clients use the correct api schema is not necessarily an issue. I am instead wondering about what other issues could arise for my app, due to writing an api with this kind of schema.
Would this be detrimental for SEO?
Would this be detrimental to performance? (caching?)
Are there additional issues that occur when an api is ignorant of method and url path?

That's indeed very unusual but it's basically how a RPC web api would work.
There would not be any SEO issue as far as I know.
Performance/caching should be the same, as the full "path" is composed of the same parameters in the end.
It however would be complicated to use with anything that doesn't expect it (express router, fancy http clients, etc.).
The only fundamental difference I see is how browsers treat POST requests as special (e.g. won't ever be created just with a link), and your API would expose deletion/creation of data just with a link. That's more or less dangerous depending on your scenario.
My advice would be: don't do that, stick to standards unless you have a very good reason not to.

Related

router handler returns an array of object but client doesn't get them in json though response with 200 status

I am implementing a express.js project with Typescript.
I have defined a enum and a interface :
export enum ProductType {
FOOD = 'food',
CLOTH = 'cloth',
TOOL = 'tool'
}
export interface MyProduct {
type: ProductType;
info: {
price: number;
date: Date;
};
}
One of my router handler needs to return an array of MyProduct to client. I tried this :
const productArr: MyProduct[] = // call another service returns an array of MyProduct
app.get('/products', (req, res) => {
res.status(200).send({products: productArr});
});
I use Postman tested this endpoint, it responses with status 200 but with a default HTML page instead of the array of objects in JSON.
What do I miss? Is it because express.js can't automatically parse the enum and interface to json object??
P.S. I have set up json parser, so it is not about that, other endpoints work fine with json response:
const app = express();
app.use(express.json());
...
As mentioned in the comments, your code should work. I'll list some steps which can be used to try to find the problem.
Show debug info
Set DEBUG=* in your environment. DEBUG is an environment variable which controls logging for many Node modules. You'll be able to see the flow of a request through Express. If there is too much info, you can limit the output like so: DEBUG=*,-babel,-babel:*,-nodemon,-nodemon:*,-router:layer,-follow-redirects,-send (use a comma-separated list and put a - in front of any module you'd like to exclude)
This should help you trace the life of a request through the various routers and routes. You're now in a position to...
Check for another route that is short-circuiting the request
The fact that you're seeing an HTML page when the Express route is sending an object might indicate that your request is matching a different route. Look for catch-all routes such as non-middleware app.use() or wildcard routes which appear ABOVE your route.
Other suggestions
Don't explicitly set the status
Adding .status(200) is more code and unnecessary.
Use res.json()
Use .json() instead of .send(). If will always add the Content-Type: application/json header, whereas .send() will not when it cannot determine the content type (e.g. .send(null) or .send('hello') will not set the Content Type header to application/json, which may confuse clients).
As there is a lack of full response headers and server environment, assuming you are using AWS service with reverse proxy. So, there might be few possibilities listed here that need to look upon :
If router handler returns an array of object but client doesn't get them in json though response with 200 status then there might be a reverse proxy acting as a backend server, serving default content with status code 200 for unknown routes from the client. So in this scenario, you need to whitelist a new route in your reverse proxy server, assuming you are using AWS Amplify for API rewrite and redirects then you need to whitelist this route in your AWS amplify settings, or else it will serve the default content like it is happening in current scenrio.
If issue still persists then :
Make sure you have proper CORS specification on your server.
Make sure productArr is an array returned by service, because if some service returns this value - it might be an unresolved promise. So, proper test cases will help you out here or for debugging purposes set DEBUG=* in your environment and make sure it should return value as expected.
Check for another route that is short-circuiting the request: The fact that you're seeing an HTML page when the Express route is sending an object might indicate that your request is matching a different route. Look for catch-all routes such as non-middleware app.use() or wildcard routes that appear above your route.

How can I invalidate Google Cloud CDN cache from my express server?

Is there a way to invalidate / clear cached content on Cloud CDN from my express server?
For example, if I'm generating server rendered content to make it readily available and I update a specific route from my website, like editing a blogPost, for example. I need to do the following:
export const editBlogPostHandler = (req,res,next) => {
// 1. UPDATE BLOGPOST WITH SLUG some-blogpost-slug ON DB
// 2. INVALIDATE /some-blogpost-slug ROUTE ON CLOUD CDN CACHE
// THIS IS NECESSARY FOR NEW REQUESTS TO GET FRESH DATA RATHER THAN A STALE DATA RESPONSE
};
How can I do that from my express server?
From Cloud CDN - Invalidating Cached Content:
You can invalidate cached content from Cloud CDN through these methods:
Using the console:
Using gcloud SDK:
There is an API endpoint for that :
https://cloud.google.com/compute/docs/reference/rest/v1/urlMaps/invalidateCache
POST https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps/{resourceId}/invalidateCache
As a complement to Alexandre accepted answer, here are more details on how to use this endpoint:
POST https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps/{resourceId}/invalidateCache
In order to get the resourceId, you can call the endpoint mentioned here in order to get a list of urlMaps resources and their associated ids.
GET https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps
Once you've got the resourceId, you also need to specify the path of the file/folder that you wish to invalidate in the request body (wildcard paths also work):
{ "path": "/folder/file.mp4" }
In the response body, you will find the id of the compute operation - If you want to check this operation progress, you can query it using the Compute Operation Global Get method.
In addition and in order to avoid running the same request several times, it is advised to give a unique requestId parameter under the form of a UUID (as specified in RFC 4122)

At what point are request and response objects populated in express app

I’m always coding backend api’s and I don’t really get how express does its bidding with my code. I know what the request and response objects offer, I just don’t understand how they come to be.
This simplified code for instance:
exports.getBlurts = function() {
return function(req, res) {
// build query…
qry.exec(function(err, results) {
res.json(results);
}
});
}
}
Then I’d call in one of my routes:
app.get('/getblurts/, middleware.requireUser, routes.api.blurtapi.getBlurts());
I get that the function is called upon the route request. It’s very abstract to me though and I don’t understand the when, where, or how as it pertains to the req\res params being injected.
For instance. I use a CMS that modifies the request object by adding a user property, which is then available globally on all requests made whether ajax or otherwise, making it easy at all times to determine if a user is logged in.
Are the req and res objects just pre-cooked by express but allow freedom for them to be modified to your needs? When are they actually 'built'
At its heart express is actually using node's default http-module and passing the express-application as a callback to the http.createServer-function. The request and response objects are populated at that point, i.e. from node itself for every incoming connection. See the nodeJS documentation for more details regarding node's http-module and what req/res are.
You might want to check out express' source code which shows how the express application is passed as a callback to http.createServer.
https://github.com/expressjs/express/blob/master/lib/request.js and https://github.com/expressjs/express/blob/master/lib/response.js show how node's request/response are extended by express specific functions.

Express param length limit for GET

I have a nodejs app where I am using express for REST APIs.
One of the api is which accepts a SQL query, runs it on a DB and returns the JSON response. Everything was working fine until I tested the api with a long sql query.
Upon debugging , I noticed that the SQL query is trimmed automatically.
Is there a limit on the length of the param that can be passed in the GET URL?
This is what my api looks like
app.get('/v1/runsql/:query', (req, res) => {
let result = runQuery.executeQuery(req.params.query);
..... execute some more code here
})
Node enforces a limit not on the URL itself, but on the overall request headers (including URI) Requested headers + URI can not be more than 80 kb.
Also, it's an incredibly bad idea to expose an API that allows arbitrary SQL queries regardless of whether they're on the URL or not. Most applications spend a lot of effort trying to prevent arbitrary SQL from querying records that shouldn't be exposed, dropping tables, etc. Intentionally exposing an endpoint like this just feels like you're asking for trouble.
The http protocol not limit the length of url, but the browser and server(whatever Node or others) do the limit. If you really want to implement that, you may use a POST method instead of Get
And the http protocol spec set that: The server may return code 414 if the url length is out of limit

Cloudfront cache with GraphQL?

At my company we're using graphql for production apps, but only for private ressources.
For now our public APIs are REST APIs with a Cloudfront service for cache. We want to transform them as GraphQL APIs, but the question is : how to handle cache properly with GraphQL ?
We thought using a GET graphql endpoint, and cache on querystring but we are a bit affraid of the size of the URL requested (as we support IE9+ and sell to schools with sometime really dummy proxy and firewalls)
So we would like to use POST graphQL endpoint but...cloudfront cannot cache a request based on its body
Anyone has an idea / best practice to share ?
Thanks
The two best options today are:
Use a specialized caching solution, like FastQL.io
Use persisted queries with GET, where some queries are saved on your server and accessed by name via GET
*Full disclosure: I started FastQL after running into these issues without a good solution.
I am not sure if it has a specific name, but I've seen a pattern in the wild where the graphQL queries themselves are hosted on the backend with a specific id.
It's much less flexible as it required pre-defined queries baked in.
The client would just send arguments/params and ID of said pre-defined query to use and that would be your cache key. Similar to how HTTP caching would work with an authenticated request to /my-profile with CloudFront serving different responses based on auth token in headers.
How the client sends it depends on your backends implementation of graphQL.
You could either pass it as a white listed header or query string.
So if the backend has defined a query that looks like
(Using pseudo code)
const MyQuery = gql`
query HeroNameAndFriends($episode: int) {
hero(episode: $episode) {
name
friends {
name
}
}
}
`
Then your request would be to something like api.app.com/graphQL/MyQuery?episode=3.
That being said, have you actually measured that your queries wouldn't fit in a GET request? I'd say go with GET requests if CDN Caching is what you need and use the approach mentioned above for the requests that don't fit the limits.
Edit: Seems it has a name: Automatic Persisted Queries. https://www.apollographql.com/docs/apollo-server/performance/apq/
Another alternative to remain with POST requests is to use Lambda#Edge on your CloudFront and by using DynamoDB tables to store your caches similar to how CloudFlare workers do it.
async function handleRequest(event) {
let cache = caches.default
let response = await cache.match(event.request)
if (!response){
response = await fetch(event.request)
if (response.ok) {
event.waitUntil(cache.put(event.request, response.clone()))
}
}
return response
}
Some reading material on that
https://aws.amazon.com/blogs/networking-and-content-delivery/lambdaedge-design-best-practices/
https://aws.amazon.com/blogs/networking-and-content-delivery/leveraging-external-data-in-lambdaedge/
An option I've explored on paper but not yet implemented is to use Lambda#Edge in request trigger mode to transform a client POST to a GET, which can then result in a cache hit.
This way clients can still use POST to send GQL requests, and you're working with a small number of controlled services within AWS when trying to work out the max URL length for the converted GET request (and these limits are generally quite high).
There will still be a length limit, but once you have 16kB+ GQL requests, it's probably time to take the other suggestion of using predefined queries on server and just reference them by name.
It does have the disadvantage that request trigger Lambdas run on every request, even a cache hit, so will generate some cost, although the lambda itself should be very fast/simple.

Resources