Cloudfront cache with GraphQL? - node.js

At my company we're using graphql for production apps, but only for private ressources.
For now our public APIs are REST APIs with a Cloudfront service for cache. We want to transform them as GraphQL APIs, but the question is : how to handle cache properly with GraphQL ?
We thought using a GET graphql endpoint, and cache on querystring but we are a bit affraid of the size of the URL requested (as we support IE9+ and sell to schools with sometime really dummy proxy and firewalls)
So we would like to use POST graphQL endpoint but...cloudfront cannot cache a request based on its body
Anyone has an idea / best practice to share ?
Thanks

The two best options today are:
Use a specialized caching solution, like FastQL.io
Use persisted queries with GET, where some queries are saved on your server and accessed by name via GET
*Full disclosure: I started FastQL after running into these issues without a good solution.

I am not sure if it has a specific name, but I've seen a pattern in the wild where the graphQL queries themselves are hosted on the backend with a specific id.
It's much less flexible as it required pre-defined queries baked in.
The client would just send arguments/params and ID of said pre-defined query to use and that would be your cache key. Similar to how HTTP caching would work with an authenticated request to /my-profile with CloudFront serving different responses based on auth token in headers.
How the client sends it depends on your backends implementation of graphQL.
You could either pass it as a white listed header or query string.
So if the backend has defined a query that looks like
(Using pseudo code)
const MyQuery = gql`
query HeroNameAndFriends($episode: int) {
hero(episode: $episode) {
name
friends {
name
}
}
}
`
Then your request would be to something like api.app.com/graphQL/MyQuery?episode=3.
That being said, have you actually measured that your queries wouldn't fit in a GET request? I'd say go with GET requests if CDN Caching is what you need and use the approach mentioned above for the requests that don't fit the limits.
Edit: Seems it has a name: Automatic Persisted Queries. https://www.apollographql.com/docs/apollo-server/performance/apq/
Another alternative to remain with POST requests is to use Lambda#Edge on your CloudFront and by using DynamoDB tables to store your caches similar to how CloudFlare workers do it.
async function handleRequest(event) {
let cache = caches.default
let response = await cache.match(event.request)
if (!response){
response = await fetch(event.request)
if (response.ok) {
event.waitUntil(cache.put(event.request, response.clone()))
}
}
return response
}
Some reading material on that
https://aws.amazon.com/blogs/networking-and-content-delivery/lambdaedge-design-best-practices/
https://aws.amazon.com/blogs/networking-and-content-delivery/leveraging-external-data-in-lambdaedge/

An option I've explored on paper but not yet implemented is to use Lambda#Edge in request trigger mode to transform a client POST to a GET, which can then result in a cache hit.
This way clients can still use POST to send GQL requests, and you're working with a small number of controlled services within AWS when trying to work out the max URL length for the converted GET request (and these limits are generally quite high).
There will still be a length limit, but once you have 16kB+ GQL requests, it's probably time to take the other suggestion of using predefined queries on server and just reference them by name.
It does have the disadvantage that request trigger Lambdas run on every request, even a cache hit, so will generate some cost, although the lambda itself should be very fast/simple.

Related

How can I invalidate Google Cloud CDN cache from my express server?

Is there a way to invalidate / clear cached content on Cloud CDN from my express server?
For example, if I'm generating server rendered content to make it readily available and I update a specific route from my website, like editing a blogPost, for example. I need to do the following:
export const editBlogPostHandler = (req,res,next) => {
// 1. UPDATE BLOGPOST WITH SLUG some-blogpost-slug ON DB
// 2. INVALIDATE /some-blogpost-slug ROUTE ON CLOUD CDN CACHE
// THIS IS NECESSARY FOR NEW REQUESTS TO GET FRESH DATA RATHER THAN A STALE DATA RESPONSE
};
How can I do that from my express server?
From Cloud CDN - Invalidating Cached Content:
You can invalidate cached content from Cloud CDN through these methods:
Using the console:
Using gcloud SDK:
There is an API endpoint for that :
https://cloud.google.com/compute/docs/reference/rest/v1/urlMaps/invalidateCache
POST https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps/{resourceId}/invalidateCache
As a complement to Alexandre accepted answer, here are more details on how to use this endpoint:
POST https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps/{resourceId}/invalidateCache
In order to get the resourceId, you can call the endpoint mentioned here in order to get a list of urlMaps resources and their associated ids.
GET https://compute.googleapis.com/compute/v1/projects/{project}/global/urlMaps
Once you've got the resourceId, you also need to specify the path of the file/folder that you wish to invalidate in the request body (wildcard paths also work):
{ "path": "/folder/file.mp4" }
In the response body, you will find the id of the compute operation - If you want to check this operation progress, you can query it using the Compute Operation Global Get method.
In addition and in order to avoid running the same request several times, it is advised to give a unique requestId parameter under the form of a UUID (as specified in RFC 4122)

Downsides of an API which neglects http method and path

I'm wondering what the downsides would be for a production server whose api is totally ignorant of the HTTP request path. For example, an api which is fully determined by query parameters, or even fully determined by the http body.
let server = require('http').createServer(async (req, res) => {
let { headers, method, path, query, body } = await parseRequest(res);
// `headers` is an Object representing headers
// `method` is 'get', 'post', etc.
// `path` could look like /api/v2/people
// `query` could look like { filter: 'age>17&age<35', page: 7 }
// `body` could be some (potentially large) http body
// MOST apis would use all these values to determine a response...
// let response = determineResponse(headers, method, path, query, body);
// But THIS api completely ignores everything except for `query` and `body`
let response = determineResponse(query, body);
doSendResponse(res, response); // Sets response headers, etc, sends response
});
The above server's API is quite strange. It will completely ignore the path, method, headers, and body. While most APIs primarily consider method and path, and look like this...
method path description
GET /api - Metadata about api
GET /api/v1 - Old version of api
GET /api/v2 - Current api
GET /api/v2/people - Make "people" db queries
POST /api/v2/people - Insert a new person into db
GET /api/v2/vehicles - Make "vehicle" db queries
POST /api/v2/vehicles - Insert a new vehicle into db
.
.
.
This API only considers url query, and looks very different:
url query description
<empty> - Metadata about api
apiVersion=1 - Old version of api
apiVersion=2 - Current api
apiVersion=2&table=people&action=query - Make "people" db queries
apiVersion=2&table=people&action=insert - Add new people to db
.
.
.
Implementing this kind of api, and ensuring clients use the correct api schema is not necessarily an issue. I am instead wondering about what other issues could arise for my app, due to writing an api with this kind of schema.
Would this be detrimental for SEO?
Would this be detrimental to performance? (caching?)
Are there additional issues that occur when an api is ignorant of method and url path?
That's indeed very unusual but it's basically how a RPC web api would work.
There would not be any SEO issue as far as I know.
Performance/caching should be the same, as the full "path" is composed of the same parameters in the end.
It however would be complicated to use with anything that doesn't expect it (express router, fancy http clients, etc.).
The only fundamental difference I see is how browsers treat POST requests as special (e.g. won't ever be created just with a link), and your API would expose deletion/creation of data just with a link. That's more or less dangerous depending on your scenario.
My advice would be: don't do that, stick to standards unless you have a very good reason not to.

Express param length limit for GET

I have a nodejs app where I am using express for REST APIs.
One of the api is which accepts a SQL query, runs it on a DB and returns the JSON response. Everything was working fine until I tested the api with a long sql query.
Upon debugging , I noticed that the SQL query is trimmed automatically.
Is there a limit on the length of the param that can be passed in the GET URL?
This is what my api looks like
app.get('/v1/runsql/:query', (req, res) => {
let result = runQuery.executeQuery(req.params.query);
..... execute some more code here
})
Node enforces a limit not on the URL itself, but on the overall request headers (including URI) Requested headers + URI can not be more than 80 kb.
Also, it's an incredibly bad idea to expose an API that allows arbitrary SQL queries regardless of whether they're on the URL or not. Most applications spend a lot of effort trying to prevent arbitrary SQL from querying records that shouldn't be exposed, dropping tables, etc. Intentionally exposing an endpoint like this just feels like you're asking for trouble.
The http protocol not limit the length of url, but the browser and server(whatever Node or others) do the limit. If you really want to implement that, you may use a POST method instead of Get
And the http protocol spec set that: The server may return code 414 if the url length is out of limit

Prevent users from accessing subdomain in Express

I am very new to web development and Node.js / Express. I have an Express server. In my client I send a GET request for some data in the DB. The GET request is handled by
app.get( '/pathname', controller.getsomedata );
The problem is, the user can now type in the URL domainname.com/pathname in the browser and get directed to that pathname. If they add in a certain queries domainname.com/pathname?query, they are able to retrieve data from the DB (this is supposed to happen), but I would prefer if GET requests to /pathname only occur from within the client code, not when the user enters it in the browser.
Is there a better way to do my GET request? Or is there a way to restrict users from accessing /pathname.
I apologize for the newbie question, but I don't know how to word it well enough to do a google search for the solution. Thank you!
It's impossible to do that. If your client-side code is able to access something, malicious user can do that as well.
You can mitigate the issue by using custom HTTP header or something like that, but it's better to validate all data on the server-side.
Allow whole client request as DB query may cause security issues. So be sure to validate query parameters and use them as DB query conditions.
If you want to query DB freely from HTTP query parameter, you should prepend authentication/authorization to the route.
app.get( '/pathname', function(req, res, next) {
if (confirmThisRequestIsFromMe(req)) {
next();
} else {
res.send(401);
}
}, controller.getsomedata );

Add data to couchdb with jsonp response

Is there a way add data to a couchdb that runs on another domain and get back an response whether the operation was successfully or not? I know couchdb supports jsonp callback but can I add data with this approach?
No, you cannot currently do this. CouchDB's REST API requires a POST or PUT request in order to insert data, but JSONP only supports GET requests. So you can retrieve data from CouchDB across domains, but updates/inserts/deletes won't work.
You can use client-side javascript to make a form to do the POST, direct the output to an iframe, and use cross-window iframe messaging to get the result.
Of course, someone has already made a nice javascript library to do this. Get the code here:
https://github.com/benvinegar/couchdb-xd
Follow the instructions to push it as an additional database on your couchdb server. Then, on any site, include one not in the 'your-couch-server' domain, you can do the following (just try it in the javascript console):
jQuery.getScript(
"http://YOUR-COUCH-SERVER/couchdb-xd/_design/couchdb-xd/couchdb.js",
function() {
Couch.init(
function() {
var s = new Couch.Server('http://YOUR-COUCH-SERVER/');
var d = new Couch.Database(s,'YOURDB');
d.put(
"stackoverflow-test 1",
{ foo: 111, bar: 222 },
function(resp) {
console.log(resp);
}
);
}
)
}
);
The above presumes you have jquery is already loaded on the page. If not, you'll need to add it however you're currently interacting with the other page.
The library only works on modern browsers with window.postMessage() support, though a small patch may eventually allow older browsers to use it via src/hash communication.

Resources