NodeJS, bypass URL length limitation

NodeJS, bypass URL length limitation - node.js

It seems NodeJS only allows URLs of max. size 80KB.
I need to pass longer URLs to an internal application, is that possible to bypass that limitation without recompiling NodeJs (which is impossible for me on the setup).

You cannot change the limit without modifing the http_parser.h file.
You will need a better way of sending the data to your application whether it be in the request body or a file within the request body. Without more information it is difficult to propose a solution to your problem

Related

Best practice for sending query parameters in a GET request?

I am writing a backend for my application that will accept query parameters from the front end, and then query my DB based on these parameters. This sounds to me like it should be a GET request, but since I have a lot of params that I'm passing with some of them being optional I think it would be easiest to do a POST request and send the search params in a request body. I know I can convert my params to a query string and append it to my GET request, but there has to be a better way because I will be passing different data types and will end up having to parse the params on the backend anyways if I do it this way.

This depends heavily on the context, but I would prefer using GET request in your scenario.
What Request Method should I use
According to the widely accepted convention, one uses:
GET to read existing data
POST to create something new
More details can be found here: https://www.restapitutorial.com/lessons/httpmethods.html
How do I pass the parameters
Regarding the way to pass parameters, it is a less obvious thing. Unless there's something sensitive in the request parameters, it is perfectly fine to send them as part of URL.
Parameters may be either part of path:
myapi/customers/123
or a query string:
myapi?customer=123
Both options are feasible, and I'd say a choice depends heavily on the application domain model. One popular rule of thumb is:
use "parameters as a part of a path" for mandatory parameters
use "parameters as a query string" for optional parameters.

I'd recommend using POST in the case where there are a lot of parameters/options. There are a few of reasons why I think it's better than GET:
Your url will be cleaner looking
You hide internal structure from the user (it's still visible if they use the Developer Tools of the browser though)
People can't easily change the options to adjust your query. Having it in the url is simple to just modify and reload with other values. It's more work to do this as a POST.
However, if it's of any use that the URL you end up with can be bookmarked or shared, then you'd want all parameters encoded as part of the query, so using GET would be best in that case.
Another answer stated that POST should be used for creating something new, but I disagree. That might apply to PUT, but it's perfectly fine to use POST to allow more complex structures to be passed even when retrieving existing data.
For example, with POST you can send a JSON body object that has nested structure. This can be very handy and would be difficult to explode into a traditional GET query. You also have to worry about URL-encoding your data then decoding it when receiving it, which is a hassle.

For simple frontend to backend communication you don't really need REST to start with as it targets cases where the server is accessed by a plethora of clients not under your control or a client has to access plenty of different servers and should work with all of them. REST should be aimed for if you see benefit in a server that can evolve freely in future without having to fear breaking clients as they will adept to changes quite easily. Such strong properties however come at its price in terms of development overhead and careful designing. Don't get me wrong, you can still aim for a REST architecture, but for such a simple application-2-backend scenario this sounds like an overkill.
In a REST architecture usually a server will tell clients how it wants to receive input data. Think of HTML forms where the method and enctype attributes specify which HTTP method to use and to which representation format the input to convert to. Which HTTP method to use depends on the use case actually. If a server constantly receives the same request for the same input parameters and calculating the result may be costly, then caching the response once and serving further requests from that cache might take away a lot of unnecessary computation overhead from the server. I.e. the BBC claims that the cache is the single most important technology in keeping sites scalable and fast. I once read that they cache most articles for only a minute but this is sufficient enough to spare them form retrieving the same content thousands and thousands of times again and again, freeing up the resources for other requests or tasks. It is no miracle that caching also belongs to one of the few constraints REST has.
HTTP by default will allow caches to store response representations for requested URIs (including any query, path or matrix parameters) if requested via safe operations, such as HEAD or GET requests. Any unsafe operation invoked, however, will lead to a cache invalidation and therefore the removal of any stored representations for that target URI. Hence, any followup requests of that URI will reach the server in order to process a response for the requesting client.
Unfortunately caching isn't the only factor to consider when to decide between using GET or POST as also the current representation format the client currently processes has an influence on the decision. Think of a client processing the previous HTML response received from a server. The HTML response contains a form that teaches a client what fields the server expects as input as well as the choices a client can make for certain input parameters. HTML is a perfect example where the media-type restricts which HTTP methods are available (GET as default method and POST are supported) and which not (all of the other HTTP methods). Other representation formats might only support POST (i.e. while application/soap+xml would allow for either GET or POST (at least in SOAP 1.2), I have never seen GET requests in reality and so everything is exchanged with POST).
A further point that may prevent you from using GET requests is a de facto limitation on the URI length most HTTP implementations have. If you exceed this limitations some HTTP frameworks might not be able to process the message exchanged. On looking at the Web, however, one might find a little workaround to such a limitation. In most Web shops the checkout area is usually split into different pages where each page consists of a form that collects some input like address information, bank or payment data and further input that as a whole act as kind of wizard to guide the user through the payment process. Such a wizard style could be implemented in this case as well. Parts of the request are sent via POST to a dedicated endpoint that takes care of collecting the data and on the final "page" of the wizard the server will ask for a final confirmation on the collected data and uses that resource as GET target. This way the response remains cacheable even though the input data exceeded the typical URL limitation imposed by some HTTP frameworks.
While the arguments listed by Always Learning aren't wrong, I wouldn't rely on those from a security standpoint. While it may filter out people with little knowledge, it won't hinder the ones for long with knowledge (and there are plenty out there) to modify the request before sending it to your server. So simply recommending using PUT as a way to making user edits harder feels odd to me.
So, in summary, I'd base the decision whether to use POST or GET for sending data to the server mainly on the factor whether the response should be cacheable, as it is often requested, or not. In cases where the URI might get so large that certain HTTP frameworks may fail processing the request you are basically forced to use POST anyway unless you can split the actual request into multiple tinier requests which act as wizard for the data collection until a final confirmation request triggers the actual final HTTP call.

Simple server-side response to wget request with password

My organization maintains a front-end server with back-end compute nodes. Is the following possible (or a good way to do this)?
user sends a password protected wget request supplying an input parameter to a PHP or Python script, e.g. wget http://adress/script.php/arg1&value/
PHP script launches a job on the backend, which prepares a data file by doing some computation dependent on the input parameters
download of the data file begins for the user
I understand the question is somewhat vague, but we are unsure how to most quickly implement the above feature. We are anticipating receiving at most 1-2 requests per day, and not at the same time.

Yes, it is possible.
It is a 'good way of doing it' on the condition that the script simply produces output data and makes no changes to the server (or the backends), other than some log entries, cache, etc. (stuff that has no visible effect on operation).
You should use https:// and not http:// if this is visible on the public network (to keep the passwords and returned data safe).
You can include the input parameters in any shape and form you want in the URL, but unless you have a special reason not to use an HTTP query, this form is probably best:
wget "https://address/script.php?arg1name=arg1value&arg2name=value2..."
How to implement it quickly: depends what your server setup is. If it has PHP, that's quick and easy enough. Using plain old CGI (with a Python or a shell script) is also an option, nearly all HTTP servers support it.

How to change response header (cache) in CouchDB?

Do you know how to change the response header in CouchDB? Now it has Cache-control: must-revalidate; and I want to change it to no-cache.

I do not see any way to configure CouchDB's cache header behavior in its configuration documentation for general (built-in) API calls. Since this is not a typical need, lack of configuration for this does not surprise me.
Likewise, last I tried even show and list functions (which do give custom developer-provided functions some control over headers) do not really leave the cache headers under developer control either.
However, if you are hosting your CouchDB instance behind a reverse proxy like nginx, you could probably override the headers at that level. Another option would be to add the usual "cache busting" hack of adding a random query parameter in the code accessing your server. This is sometimes necessary in the case of broken client cache implementations but is not typical.
But taking a step back: why do you want to make responses no-cache instead of must-revalidate? I could see perhaps occasionally wanting to override in the other direction, letting clients cache documents for a little while without having to revalidate. Not letting clients cache at all seems a little curious to me, since the built-in CouchDB behavior using revalidated Etags should not yield any incorrect data unless the client is broken.

Varnish / VCL gurus: How to pass request body using Varnish fetch?

I'm afraid I am fairly new to varnish but I have a problem whch I cannot find a solution to anywhere (yet): Varnish is set up to cache GET requests. We have some requests which have so many parameters that we decided to pass them in the body of the request. This works fine when we bypass Varnish but when we go through Varnish (for caching), the request is passed on without the body, so the service behind Varnish fails.
I know we could use POST, but we want to GET data. I also know that Varnish CAN pass the request body on if we use pass mode but as far as I can see, requests made in pass mode aren't cached. I've already put a hash into the url so that when things work, we will actually get the correct data from cache (as far as the url goes the calls would otherwise all look to be the same).
The problem now is "just" how to rewrite vcl_fetch to pass on the request body to the webserver? Any hints and tips welcome!
Thanks in advance
Jon

I don't think you can, but, even if you can, it's very dangerous : Varnish won't store request body into cache or hash table, so it won't be able to see any difference between 2 requests with same URI and different body.
I haven't heard about a VCL key to read request body but, if it exists, you can pass it to req.hash to differenciate requests.
Anyway, request body should only be used with POST or PUT...and POST/PUT requests should not be cached.
Request body is supposed to send data to the server. A cache is used to get data...
I don't know the details, but I think there's a design issue in your process...

I am not sure I got your question right but if you try to interact with the request body in some way this is not possible with VCL. You do not have any VCL variable/subroutine to do this.
You can find the list of variables available in VCL here (or in man vcl) :
https://github.com/varnish/Varnish-Cache/blob/master/lib/libvcl/generate.py#L105
I agree with Gauthier, you seem to have a design issue in your system.
'Hope that helps.

Anyone know of a secure way to give Browser access to a specific view result set on a couchdb database

I am using CouchDB for my Data Layer in a Rails 3 application using CouchRest::Model hosted on Heroku.
I am requesting a List of Documents and returning them as JSON to my Browser and using jQuery Templates to represent that data.
Is there a way I could build the request on the server side, and return the request that would need to be called from the browser WITHOUT opening a huge security hole i.e. giving the browser access to the whole database?
Ideally it would be a one off token access to a specific query, Where the token would be generated on the server side, and CouchDB would take the token, and make sure it matches what the query should be, and give access to the results.
One way that comes to mind would be to generate a token Document and use a show function (http://guide.couchdb.org/draft/show.html) to return the results for that token Document's view results. Though I am not sure if that is possible.
Though another is to put a token on the Document itself and use a list function (http://guide.couchdb.org/draft/transforming.html)
Save that, any other ideas?
Thanks in Advance

Is there a way I could build the
request on the server side, and return
the request that would need to be
called from the browser WITHOUT
opening a huge security hole i.e.
giving the browser access to the whole
database?
Yes. One method is to create a rack app and mount it inside your rails app. You can have it receive requests from users' browsers at "/couch" and forward that request to your "real" couchdb url, returning couch's JSON response as-is or modifying it however you need.
You may also be able to use Couch's rewrite and virtual host features to control what Couch URLs the general public is able to reach. This probably will necessitate the use of list or show functions. http://blog.couchone.com/post/1602827844/of-rewrites-and-virtual-hosting-an-introduction
Ideally it would be a one off token access to a specific query, Where the token would be generated on the server side, and CouchDB would take the token, and make sure it matches what the query should be, and give access to the results.
You might use cookies for this since list and show functions can set and get cookie values on requests.
But you could also include a hash value as part of each request. Heroku's add-on API has a good example of how this works. https://addons.heroku.com/provider/resources/technical/build/sso
Notice that the API calls are invalid outside of a certain window of time, which may be exactly what you need.
I'm not sure I precisely understand your needs, but I hope I have been able to give you some helpful ideas.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string