Apache httpd request/response logging

Apache httpd request/response logging - linux

Is there a way to log request (full GET URI, response and POST data and response) content with Apache? I have a bunch of games that communicate with the client side over HTTP, they use different variables and output all sorts of things.
I'd like to push all this content into a database for further processing so I can report game play step by step. Cannot modify the server side game files themselves to log this data, they're too many (thousands).
It's not much data, up to 512 bytes or 1K per of both request and response data.
Can't set up a varnish or squid to do it, I have lots of back-end servers, can't add yet another layer to this, I have a load of stuff happening before the app servers (load balancing, firewall, whatnot).
TIA

mod_dumpio will get it into a file.
http://httpd.apache.org/docs/2.2/mod/mod_dumpio.html

Related

ForEach Bulk Post Requests are failing

I have this script where I'm taking a large dataset and calling a remote api, using request-promise, using a post method. If I do this individually, the request works just fine. However, if I loop through a sample set of 200-records using forEach and async/await, only about 6-15 of the requests come back with a status of 200, the others are returning with a 500 error.
I've worked with the owner of the API, and their logs only show the 200-requests. So I don't think node is actually sending out the ones that come back as 500.
Has anyone run into this, and/or know how I can get around this?

To my knowledge, there's no code in node.js that automatically makes a 500 http response for you. Those 500 responses are apparently coming from the target server's network. You could look at a network trace on your server machine to see for sure.
If they are not in the target server logs, then it's probably coming from some defense mechanism deployed in front of their server to stop misuse or overuse of their server (such as rate limiting from one source) and/or to protect its ability to respond to a meaningful number of requests (proxy, firewall, load balancer, etc...). It could even be part of a configuration in the hosting facility.
You will likely need to find out how many simultaneous requests the target server will accept without error and then modify your code to never send more than that number of requests at once. They could also be measuring requests/sec to it might not only be an in-flight count, but could be the rate at which requests are sent.

Simple message passing Nodejs server accepting only 4 requests at a time

We have a simple express node server deployed on windows server 2012 that recieves GET requests with just 3 parameters. It does some minor processing on these parameters, has a very simple in-memory node-cache for caching some of these parameter combinations, interfaces with an external license server to fetch license for the requesting user and sets it in the cookie, followed by which, it interfaces with some workers via a load balancer (running with zmq) to download some large files (in chunks, and unzips and extracts them, writes them to some directories) and display them to the user. On deploying these files, some other calls to the workers are initiated as well.
The node server does not talk to any database or disk. It simply waits for response from the load balancer running on some other machines (these are long operations taking typically between 2-3 minutes to send response). So, essentially, the computation and database interactions happens on other machines. The node server is only a simple message passing/handshaking server that waits for response in event handlers, initiates other requests and renders the response.
We are not using a 'cluster' module or nginx at the moment. With a bare bones node server, is it possible to accept and process atleast 16 requests simultaneously ? Pages such as these http://adrianmejia.com/blog/2016/03/23/how-to-scale-a-nodejs-app-based-on-number-of-users/ mention that a simple node server can handle only 2-9 requests at a time. But even with our bare bones implementation, not more than 4 requests are accepted at a time.
Is using a cluster module or nginx necessary even for this case ? How to scale this application for a few hundred users to begin with ?

An Express server can handle many more than 9 requests at a time, especially if it isn't talking to a datebase.
The article you're referring to assumes some database access on each request and serving static assets via node itself, rather than a CDN. All of this taking place on a single CPU with 1GB of RAM. That's a database and web server all running on a single core with minimal RAM.
There really are not hard numbers on this sort of thing; You build it and see how it performs. If it doesn't perform well enough, put a reverse proxy in front of it like nginx or haproxy to do load balancing.
However, based on your problem, if you really are running into bottlenecks where only 4 connections are possible at a time, it sounds like you're keeping those connections open way too long and blocking others. Better to have those long running processes kicked off by node, close the connections, then have those servers call back somehow when they're done.

Caching and Replay Proxy Server in Deployment

I have a logging server that receives data from some stateless clients on a single network (inaccessible from the outside world). I'd like to make sure all logs are eventually received by the server, even if the internet connection goes down.
To do this the easiest solution would be to set up a proxy server, and have the client log to both the logging server and the proxy server. The proxy server then tries to log to the logging server, and if it fails it caches the request for later. Something like this:
Notes:
All requests are idempotent.
The clients are stateless (logs can not be cached on the clients)
All parts of the system, except the intermediate "internet" step, are configurable.
The proxy server does not need to read or modify the data.
The logging server response is not used by the client.
I cannot make significant changes to the client or logging server (Cassandra would be great for this application, though).
My questions: is there any off the shelf software that can serve as the proxy? If not, anything to think about when writing this? Are there any concerns with this scheme?

your proxy looks like a simple persistent queue. all you have to do is to add/configure connector to the logging server.
but even without a queue the whole process looks like 2 db queries and and 2 rest calls - you will probably waste more time comparing different products than writing it on your own

Response body missing characters

I've seen this issue happen on multiple machines, using different languages and server-side environments. It seems to always be IIS, but it may be more widespread.
On slower connections, characters are occasionally missing from the response body. It happens somewhere between 25% and 50% of the time but only on certain pages, and only on a slow connection such as VPN. A refresh usually fixes the issue.
The current application in question is .NET 4 with SQL Server.
Example:
<script>
document.write('Something');
</script>
is being received by the client as
<scrit>
document.write('Something');
</script>
This causes the JavaScript inside the tag to instead be printed to the page, rather than executing.
Does anyone know why this occurs? Is it specific to IIS?

Speaking generally, the problem you describe would require corruption at the HTTP layer or above, since TCP/IP has checksums, packet lengths, sequence numbers, and re-transmissions to avoid this sort of issue.
That leaves:
The application generating the data
Any intermediate filters between the application and the server
The HTTP server returning the data
Any intermediary HTTP proxies, transparent or otherwise
The HTTP client requesting the data
The user-agent interpreting the data
You can diagnose further based off of a network capture performed at the server edge, and at the client edge.
Examine the request made by the client at the client edge to verify that the client is making a request for the entire document, and is not relying upon cache (no Range or If-* headers).
If the data is correct when it leaves the server (pay particular attention to the Content-Length header and verify it is a 200 response), neither the server nor the application are at fault.
If the data is correct as received by the client, you can rule out intermediary proxies.
If there is still an issue, it is a user-agent issue
If I had to psychically debug such a problem, I would look first at the application to ensure it is generating the correct document, then assume some interloper is modifying the data in transit. (Some HTTP proxy for wan-acceleration, aggressive caching, virus scanning, etc...) I might also assume some browser plugin or ad blocker is modifying the response after it is received.
I would not, however, assume it is the HTTP server without very strong evidence. If corruption is detected on the client, but not the server, I might disable TCP Offload and look for an updated NIC Driver.

Setting up a secure back-end NodeJS server for multiple front-end domains

I've been doing a lot of research recently on creating a backend for all the websites that I run and a few days ago I leased a VPS running Debian.
Long-term, I'd like to use it as the back-end for some web applications. However, these client-side javascript apps are running on completely different domains than the VPS domain. I was thinking about running the various back-end applications on the VPS as daemons. For example, daemon 1 is a python app, daemons 2 and 3 are node js, etc. I have no idea how many of these I might eventually create.
Currently, I only have a single NodeJS app running on the VPS. I want to implement two methods on it listening over some arbitrary port, port 4000 for example:
/GetSomeData (GET request) - takes some params and serves back some JSON
/AddSomeData (POST request) - takes some params and adds to a back-end MySQL db
These methods should only be useable from one specific domain (called DomainA) which is different than the VPS domain.
Now one issue that I feel I'm going to hit my head against is CORS policy. It sounds like I need to include a response header for Access-Control-Allow-Origin: DomainA. The problem is that in the future, I may want to add another acceptable requester domain, for example DomainB. What would I do then? Would I need to validate the incoming request.connection.remoteAddress, and if it matched DomainA/DomainB, write the corresponding Access-Control-Allow-Origin?
As of about 5 minutes ago before posting this question, I came across this from the W3C site:
Resources that wish to enable themselves to be shared with multiple Origins but do not respond uniformly with "*" must in practice generate the Access-Control-Allow-Origin header dynamically in response to every request they wish to allow. As a consequence, authors of such resources should send a Vary: Origin HTTP header or provide other appropriate control directives to prevent caching of such responses, which may be inaccurate if re-used across-origins.
Even if I do this, I'm a little worried about security. By design anyone on my DomainA website can use the web app, you don't have to be a registered user. I'm concerned about attackers spoofing their IP address to be equal to DomainA. It seems like it wouldn't matter for the GetSomeData request since my NodeJS would then send the data back to DaemonA rather than the attacker. However, what would happen if the attackers ran a script to POST to AddSomeData a thousand times? I don't want my sql table being filled up by malicious requests.
On another note, I've been reading about nginx and virtual hosts and how you can use them to establish different routes depending on the incoming domain but I don't BELIEVE that I need these things; however perhaps I'm mistaken.
Once again, I don't want to use the VPS as a web-site server, the Node JS listener is going to be returning some collection of JSON hence why I'm not making use of port 80. In fact the primary use of the VPS is to do some heavy manipulation of data (perhaps involving the local MySQL db) and then return a collection of JSON that any number of front-end client browser apps can use.
I've also read some recommendations about making use of NodeJS Restify or ExpressJS. Do I need these for what I'm trying to do?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string