Express param length limit for GET - node.js

I have a nodejs app where I am using express for REST APIs.
One of the api is which accepts a SQL query, runs it on a DB and returns the JSON response. Everything was working fine until I tested the api with a long sql query.
Upon debugging , I noticed that the SQL query is trimmed automatically.
Is there a limit on the length of the param that can be passed in the GET URL?
This is what my api looks like
app.get('/v1/runsql/:query', (req, res) => {
let result = runQuery.executeQuery(req.params.query);
..... execute some more code here
})

Node enforces a limit not on the URL itself, but on the overall request headers (including URI) Requested headers + URI can not be more than 80 kb.
Also, it's an incredibly bad idea to expose an API that allows arbitrary SQL queries regardless of whether they're on the URL or not. Most applications spend a lot of effort trying to prevent arbitrary SQL from querying records that shouldn't be exposed, dropping tables, etc. Intentionally exposing an endpoint like this just feels like you're asking for trouble.

The http protocol not limit the length of url, but the browser and server(whatever Node or others) do the limit. If you really want to implement that, you may use a POST method instead of Get
And the http protocol spec set that: The server may return code 414 if the url length is out of limit

Related

Downsides of an API which neglects http method and path

I'm wondering what the downsides would be for a production server whose api is totally ignorant of the HTTP request path. For example, an api which is fully determined by query parameters, or even fully determined by the http body.
let server = require('http').createServer(async (req, res) => {
let { headers, method, path, query, body } = await parseRequest(res);
// `headers` is an Object representing headers
// `method` is 'get', 'post', etc.
// `path` could look like /api/v2/people
// `query` could look like { filter: 'age>17&age<35', page: 7 }
// `body` could be some (potentially large) http body
// MOST apis would use all these values to determine a response...
// let response = determineResponse(headers, method, path, query, body);
// But THIS api completely ignores everything except for `query` and `body`
let response = determineResponse(query, body);
doSendResponse(res, response); // Sets response headers, etc, sends response
});
The above server's API is quite strange. It will completely ignore the path, method, headers, and body. While most APIs primarily consider method and path, and look like this...
method path description
GET /api - Metadata about api
GET /api/v1 - Old version of api
GET /api/v2 - Current api
GET /api/v2/people - Make "people" db queries
POST /api/v2/people - Insert a new person into db
GET /api/v2/vehicles - Make "vehicle" db queries
POST /api/v2/vehicles - Insert a new vehicle into db
.
.
.
This API only considers url query, and looks very different:
url query description
<empty> - Metadata about api
apiVersion=1 - Old version of api
apiVersion=2 - Current api
apiVersion=2&table=people&action=query - Make "people" db queries
apiVersion=2&table=people&action=insert - Add new people to db
.
.
.
Implementing this kind of api, and ensuring clients use the correct api schema is not necessarily an issue. I am instead wondering about what other issues could arise for my app, due to writing an api with this kind of schema.
Would this be detrimental for SEO?
Would this be detrimental to performance? (caching?)
Are there additional issues that occur when an api is ignorant of method and url path?
That's indeed very unusual but it's basically how a RPC web api would work.
There would not be any SEO issue as far as I know.
Performance/caching should be the same, as the full "path" is composed of the same parameters in the end.
It however would be complicated to use with anything that doesn't expect it (express router, fancy http clients, etc.).
The only fundamental difference I see is how browsers treat POST requests as special (e.g. won't ever be created just with a link), and your API would expose deletion/creation of data just with a link. That's more or less dangerous depending on your scenario.
My advice would be: don't do that, stick to standards unless you have a very good reason not to.

node.js Azure Web App returns 400 on long urls

I'm hosting a node.js application in an Azure Web App. This works well, except the server always returns HTTP 400 if the request is too long (ie. a long URL or many headers.)
It seems that the error is returned by the Kestrel gateway without reaching my application, and this happens if the length of the request exceeds 2581 bytes in length. The same problem does not occur when running locally. This is a GET request, and it does not make a difference whether the URL is long, or there are long headers.
My application simply returns the current time:
// Module dependencies
let http = require('http');
http.createServer(function (request, response) {
console.log('request ', request.url);
response.write("Request served at " + new Date().toISOString());
response.end();
}).listen(80);
If I request GET /anything the response is as expected. However if I do GET /{any_very_long_path} (or include a header with a long value) it fails.
Why would Azure be limiting the request length like this? The same issue does not happen when hosting an ASP.NET application.
If I request GET /anything the response is as expected.
However if I do GET /{any_very_long_path} (or include a header with a long value)
it fails.
I believe the problem is related to this issue # GitHub,
it mentions :
On our website, we have pretty big cookies in some scenarios (it might
not be good indeed) and we have been facing many 400 http errors since
yesterday.
This is because there is a 8KB hard-coded (HTTP_MAX_HEADER_SIZE) limit, anything bigger returns error code 400...
Nodejs Documentation : http.maxHeaderSize
Read-only property specifying the maximum allowed size of HTTP headers
in bytes. Defaults to 8KB. Configurable using the
--max-http-header-size CLI option.
Solution :
You could try using the --max-http-header-size CLI option as the Docs mention.
Remember that if you are using a reverse proxy in-between (such as NGINX) you will also have to increase the max header size...

Postman Requests Receive a HTTP 401 Status Code

I am working on creating a Node.js REST API, using the Express module, that redirects HTTP GET and PUT requests to another server. However, when running test queries in Postman, I always get HTTP 401 Unauthorized responses. Yet, when I try the same on query on the Chrome browser I get a successful response (HTTP 302). I read through some documentation on the HTTP request/response cycle and authorization. The server I am redirecting to uses HTTP Basic authentication. In my code I am redirecting the API call to my application server using the res.redirect(server) method. In my Postman request I am setting the username/password in Authorization tab for my request. I know this is gets encoded using base64, but I am guessing this isn't being passed on the redirect when done through Postman.
The following code snippets show what I've created thus far.
This is the Express route I created for GET requests
app.get('/companyrecords/:name', function(req, res) {
var credentials = Buffer.from("username:password").toString('base64');
console.log(req);
var requestURL = helperFunctions.createURL(req);
res.redirect(requestURL);
});
I define a function called createURL inside a file called helperFunctions. The purpose of this function is set up the URL to which requests will be directed to. Here is the code for that function.
module.exports.createURL = function (requestURL) {
var pathname = requestURL._parsedUrl.pathname;
var tablename = pathname.split("/")[1];
var filter = `?&filter=name=\'${requestURL.params.hostname}\'`;
var fullPath = BASE_URL + tablename.concat('/') + filter;
console.log(fullPath);
return fullPath;
}
Where BASE_URL is a constant defined in the following form:
http://hostname:port/path/to/resource/
Is this something I need to change in my code to support redirects through Postman or is there a setting in Postman that I need to change so that my queries can execute successfully.
Unfortunately you can't tell Postman not to do what was arguably the correct thing.
Effectively clients should be removing authorisation headers on a redirect. This is to prevent a man-in-the-middle from sticking a 302 in and collecting all your usernames and passwords on their own server. However, as you've noticed, a lot of clients do not behave perfectly (and have since maintained this behaviour for legacy reasons).
As discussed here however you do have some options:
Allow a secondary way of authorising using a query string: res.redirect(302, 'http://appServer:5001/?auth=auth') however this is not great because query strings are often logged without redacting
Act as a proxy and pipe the authenticated request yourself: http.request(authedRequest).on('response', (response) => response.pipe(res))
Respond with a 200 and the link for your client to then follow.

Cloudfront cache with GraphQL?

At my company we're using graphql for production apps, but only for private ressources.
For now our public APIs are REST APIs with a Cloudfront service for cache. We want to transform them as GraphQL APIs, but the question is : how to handle cache properly with GraphQL ?
We thought using a GET graphql endpoint, and cache on querystring but we are a bit affraid of the size of the URL requested (as we support IE9+ and sell to schools with sometime really dummy proxy and firewalls)
So we would like to use POST graphQL endpoint but...cloudfront cannot cache a request based on its body
Anyone has an idea / best practice to share ?
Thanks
The two best options today are:
Use a specialized caching solution, like FastQL.io
Use persisted queries with GET, where some queries are saved on your server and accessed by name via GET
*Full disclosure: I started FastQL after running into these issues without a good solution.
I am not sure if it has a specific name, but I've seen a pattern in the wild where the graphQL queries themselves are hosted on the backend with a specific id.
It's much less flexible as it required pre-defined queries baked in.
The client would just send arguments/params and ID of said pre-defined query to use and that would be your cache key. Similar to how HTTP caching would work with an authenticated request to /my-profile with CloudFront serving different responses based on auth token in headers.
How the client sends it depends on your backends implementation of graphQL.
You could either pass it as a white listed header or query string.
So if the backend has defined a query that looks like
(Using pseudo code)
const MyQuery = gql`
query HeroNameAndFriends($episode: int) {
hero(episode: $episode) {
name
friends {
name
}
}
}
`
Then your request would be to something like api.app.com/graphQL/MyQuery?episode=3.
That being said, have you actually measured that your queries wouldn't fit in a GET request? I'd say go with GET requests if CDN Caching is what you need and use the approach mentioned above for the requests that don't fit the limits.
Edit: Seems it has a name: Automatic Persisted Queries. https://www.apollographql.com/docs/apollo-server/performance/apq/
Another alternative to remain with POST requests is to use Lambda#Edge on your CloudFront and by using DynamoDB tables to store your caches similar to how CloudFlare workers do it.
async function handleRequest(event) {
let cache = caches.default
let response = await cache.match(event.request)
if (!response){
response = await fetch(event.request)
if (response.ok) {
event.waitUntil(cache.put(event.request, response.clone()))
}
}
return response
}
Some reading material on that
https://aws.amazon.com/blogs/networking-and-content-delivery/lambdaedge-design-best-practices/
https://aws.amazon.com/blogs/networking-and-content-delivery/leveraging-external-data-in-lambdaedge/
An option I've explored on paper but not yet implemented is to use Lambda#Edge in request trigger mode to transform a client POST to a GET, which can then result in a cache hit.
This way clients can still use POST to send GQL requests, and you're working with a small number of controlled services within AWS when trying to work out the max URL length for the converted GET request (and these limits are generally quite high).
There will still be a length limit, but once you have 16kB+ GQL requests, it's probably time to take the other suggestion of using predefined queries on server and just reference them by name.
It does have the disadvantage that request trigger Lambdas run on every request, even a cache hit, so will generate some cost, although the lambda itself should be very fast/simple.

Prevent users from accessing subdomain in Express

I am very new to web development and Node.js / Express. I have an Express server. In my client I send a GET request for some data in the DB. The GET request is handled by
app.get( '/pathname', controller.getsomedata );
The problem is, the user can now type in the URL domainname.com/pathname in the browser and get directed to that pathname. If they add in a certain queries domainname.com/pathname?query, they are able to retrieve data from the DB (this is supposed to happen), but I would prefer if GET requests to /pathname only occur from within the client code, not when the user enters it in the browser.
Is there a better way to do my GET request? Or is there a way to restrict users from accessing /pathname.
I apologize for the newbie question, but I don't know how to word it well enough to do a google search for the solution. Thank you!
It's impossible to do that. If your client-side code is able to access something, malicious user can do that as well.
You can mitigate the issue by using custom HTTP header or something like that, but it's better to validate all data on the server-side.
Allow whole client request as DB query may cause security issues. So be sure to validate query parameters and use them as DB query conditions.
If you want to query DB freely from HTTP query parameter, you should prepend authentication/authorization to the route.
app.get( '/pathname', function(req, res, next) {
if (confirmThisRequestIsFromMe(req)) {
next();
} else {
res.send(401);
}
}, controller.getsomedata );

Resources