Advise: Tracking HTTP requests with CloudFlare and Ghost - node.js

I have a very interesting requirement that I am not too sure of the answer. I am turning to Stack Overflow in the hope that someone is able to share their experiences and propose a solution.
Setup
I have a front facing website that is powered by Ghost running a standard MEAN stack enviorment and all traffic is handled via CloudFlare.
Problem
I have become aware recently that I have been receiving a large amount of requests via the CloudFlare display that do not appear in my Google Analytics. I am aware that some people may have JS disabled, however we are talking orders of magnitude difference between the two. I would very much like to know why.
Hypothesis
I suspect that person(s) are trying to use port scanning, or attempt to find vulnerabilities in my platform. Or it could be a simple case of linking going astray. Either way, I am not sure.
Solutions
This is the part I am not sure about. What would be the best approach to record and retain HTTP requests? One consideration I have had is to use Morgan to to filestream requests into a .log file and review at a later date. However, I wonder if there is a more elegant solution.
I welcome any thoughts you may have.
Thanks

Google Analytics is a fair bit more conservative than Cloudflare. One reason, as you mentioned is that Cloudflare is able to access raw HTTP logs, instead of having to use JavaScript to identify page views. As Cloudflare only marks HTTP requests, port scanning would not be recorded as a hit.
However, even with bots accounted for, Cloudflare may still record views which Google Analytics can't, for example; AJAX content requests. As the Google Analytics beacon is only run once when the page is loaded, Google Analytics only records this once - Cloudflare sees this as 2 HTTP requests in it's raw logs.
For details, please see the following blog post, it goes into detail as to how Google Analytics and Cloudflare Analytics can differ: Understanding Analytics: When Is a Page View Not a Page View?

Related

Deploying my front end and detecting client location by IP address - which AWS service should handle this? Confused by my options

I'm still new to AWS and just following the documentation and asking questions here when I get stuck. Please excuse me if this question sounds really noobish.
So far, I've deployed the following:
EB to deploy my REST API
RDS to deploy my psql database
Lambda functions to handle things like authentication & sending JWTs, uploading images to S3, etc.
I have got my basic back end (no caching (just started learning about redis), etc. set up yet, just the bare bones so far) deployed.
I'm still developing my front end, and have not even thought about how I will be deploying it yet (probably another deployment on EB, since I am using universal react). I am just developing it locally but using my production env variables now so I am hitting my deployed API, etc.
One of the MAJOR things I have no idea on how to do is detecting incoming requests from client side to get the client's location by IP address. This is so that I can return the INITIAL results in your general location just like yelp, foursquare, etc. do when you go to to their sites.
For now, I am just building a web app on desktop so I just want to worry about getting the IP address to get the general area of the user. My use case is something similar to other sites you might have used which provides an INITIAL result set for things in your area (think foursquare or yelp).
Here are my questions:
What would be a good way to do this? I'm thinking of handling this in my front end react universal deployment since it will be a node server with rendered page caching. Is this a terrible idea? It would work something like
(1) request from client comes in
(2) get IP from request and lookup the IP location using some service (still not sure what I'm going to use, have found a few plus a nodejs library called node-geoip). Preferably, I can get the zip code since I am trying to save having to do so many queries by unique locations in my database, and instead return results in the zip code and the front end will show an initial map with the initial results in that zip code.
(3) return to client the rendered page with those location params if it exists, otherwise create it, send it, and cache it.
Is the above a really dumb idea? Maybe you have already done something like this, and could share your wisdom :)
Is there an AWS service which can already handle something like this for me? Perhaps there's some functionality which can already do this.
Thanks.
AGAIN - I apologize if this is long winded. I don't know anyone in real life who can help me and I feel alone :(. I appreciate the help you guys can provide.
There are two parts to this:
Getting the user's IP address. You mentioned you're using 'EB' - I presume you mean AWS ELB (Elastic Load Balancer)? If so, then you need to read the X-Forwarded-For HTTP header in your app code, since otherwise what you'll really detect is the ELB's IP address. X-Forwarded-For contains the user's real IP - or rather, the IP of the end-connection being made (there's no telling if this is really a VPN, Proxy or something else-- but it's as far as you can get with an IP.)
Querying an IP DB that can turn the addr into a location object. There are tons of libraries for you. Assuming you're using Node, you can use node-geoip as you mentioned. Or you can just search 'geoip service' on Google and find managed services, like Telize on Mashape. If you don't want to manage the DB lookup yourself or keep the thing up to date, then a managed service would help.
In either case, it's likely that you'll be doing asynchronous look-ups. In that case, you might want to use async/await to get the user's full object before injecting that into your React props and ultimately rendering it as a HTML string that's sent down to the client.
You could also use a library like redial to decorate your components with data requirements, and return a Promise you can await on to know when you're okay to render.
Since you probably want to enable client routing too (i.e. where the user can click on a route in their browser, and the server isn't touched at all), then you will probably need some way to retrieve the IP address/results based on that IP even when the server isn't involved in the initial render.
For that, you could write a REST service that retrieves the results. Or write a GraphQL back-end that gets the data. It doesn't matter how you write it, since the server will have access to the X-Forwarded-For header and can use that to retrieve the results and send back location-aware data.
FYI, I'm writing a React starter kit (called ReactNow) that uses rxjs for handling async streams. It's not ready yet, but it might help you figure out the code layout that would offer a balanced mix between rendering on the server, and writing universal code that requires some heavy lifting from the server.

Hide sensitive HTTP Server Response in Nodejs

I've been hacking my way through the internet for the past half hour looking for an answer to this issue. I want to build an online system for school testing running on nodejs. This apps front-end would request questions "and corresponding answers" from the backend and this information will be delivered to the front end. The whole purpose of this is that the app should calculate test scores instantly and display it.
Now, in the browsers network tab, we can see server responses, and if I were to build an app that submits both questions and answers, any student could just peep at the answers in the dev console and get perfect scores.
One way wouldve been to deliver the questions alone and then send back to the server to do the scoring and then send back the score but that doesnt feel "real-time".
REQUEST INFORMATION IS VERY OK
NEED TO REMOVE RESPONSE FROM BEING DISPLAYED IN BROWSERS DEV CONSOLE
So, how can I safely transport this information to the front end, but hide it from showing in the dev console headers response zone in a browser? Or any ideas on how I can implement this real-time concept without losing out on security.
Thanks.
You can do something like serializing it to binary and when they send the answer deserialize it back and check for the answer. That way even if they look at the network tab the will only see binary they could not understand.

at server i keep getting socketio request constantly every 2 seconds, but google analytics show no one is here.

I've removed all socketio code but someone either hasn't uploaded page for days or something else. But for some reason server is getting bombarded with socketio request which are failing because i removed all the code both on client and server. However, they are still coming. ??? what can i do. Block ip?
I can't change webdomain name. Which is given. I can't think of any options, they're coming from like 6 different ips. They would have been legit requests some weeks ago. but not now.
Are you worried that handling these requests will impede your server's performance? The only legitimate reason I can think of is that someone's browser cache hasn't been cleared properly since the update, assuming you enabled caching on your express server.
If your intention is to improve performance, I suggest putting that path high on the express method chain so that the server can end the request as quickly as possible and minimize the load on the server.
If you want the people to become aware that their requests are invalid, you could route the path to a javascript file that redirects the current page to another document. On the document, have directions that instruct the user to clear their browser cache in order to properly update their client.
Hope that helps.

Google app engine bot attack?

I have an application in Google app engine that only runs cron jobs and uses a backend, so there are no incoming requests from any client. I noticed that a request from a user named 'niki-bot' was received and I'm quite surprised as my app url does not appear anywhere it's only used by admin account which sends cron requests. Fortunately I had setup security on my crons so this user got a 403 forbidden message, but I'm still wondering how could this happen. Has any of you guys experienced something similar?
You were likely running the 'Awesome Screenshot' plugin in your browser, or similar software which leaks all your browsing history to an upstream service - that upstream service appears to return with a niki-bot crawler to scrape or do something with those 'impossible to otherwise find' URLs.
Read more about it here: https://mig5.net/content/awesome-screenshot-and-niki-bot
As I think you are aware, backends are addressable to the outside world, it's only the public/private status and the security level applied to the endpoints that determines if the calls are successful.
Regarding how a bot would have gotten your App ID, I suppose they could just be trying random ones to see if there is anything they can exploit.
Were the requests for standard admin endpoints? I get many random requests for the PHP files below, and my app isn't even on PHP. People just trying to attack known systems (this is on my front-end module):
/mysqladmin/scripts/setup.php
/myadmin/scripts/setup.php
/MyAdmin/scripts/setup.php
/pma/scripts/setup.php
/phpMyAdmin/scripts/setup.php
/phpmyadmin/scripts/setup.php
/db/scripts/setup.php
/dbadmin/scripts/setup.php

Distributing Twitter Widget Among Application Users

Hey guys is there anyway to circumvent the Twitter rate limit by using a Twitter widget and embedding it in the end users browser? In other words would using Twitter Search widget apart of the user's browser's session (while they are using my app) so that their calls to Twitter are made through their IP address (and not the IP address of my app) - I would do this to avoid getting the IP of my app blacklisted. Is that fine or would that violate Twitter's terms of use?
I would use the Twitter search widget. Would using Twitter stream be a better idea?
Depending on your implementation, you may want to consider the Streaming API for this purpose. It's probably considered more "kosher". You can query for a particular set of phrases and open whats called a firehose, and Twitter will push updates to your application and it's not really bound by rate limits although there is a rate limit system in place here. For my particular use case, this didn't work and I had to do what you described in your question. But if you want to use the Twitter streaming API and are using PHP in conjunction, I would highly recommend looking at the 140 Twitter Server framework at the start. It will make it a lot easier to implement the streaming API at the get go.
This is fine, and this is the solution I'm using. Use jQuery or something similar for the Ajax calls and send the response to the server for processing. The carry will be on each of the IP's that use your application. So, if that user is spamming Twitter with requests - they would get blacklisted, not your application.

Resources