AWS Cloud front - check distribution's traffic by API before disabling - amazon-cloudfront

I have a list of AWS Cloudfront distributions and some of them use a default certificate.
I want to disable these distributions, but first i want to make sure that these distributions are not in use.
I would like to accomplish it with API/SDK, how can i do that ?
Is there a standard way to verify that the distribution is not accessed at all ?
I was thinking of some how to use the cloudFront logging capability, but it provides logs with irrelevant data.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/logging.html
is there a simple way to do it ? say a query to AWS that tell me how many requests were made to a specific distribution in a given time interval.

If you have logs enabled you can query Athena by hosts or distribution id to find out how many requests are made to particular distribution
https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html
Or you can look at the monitoring section of the Cloudfront where it shows number of requests by going to Monitoring section of cloudfront in Admin console.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/monitoring-using-cloudwatch.html

Related

How to serve node.js service for worldwide customers and fast?

I have a local VPS that hosting and providing my Node.js REST API in my country.
However soon I will need to open it for different countries.
That means that clients from remote will ask for my services.
Since they are far it will be probably slow connection.
How can I avoid this? Maybe I need more servers located in their countries too, but still, how the data could be shared over one DB?
I do not looking for a full tutorial for how to do that (could be nice to have) but I am looking for get info about the methodology of this.
What do you recommend to do, keep buying servers in remote countries, sharing their data between them someway, or maybe choose to use some cloud service like Firebase? How cloud services work in first place?
Without going into too much detail for each item, here are some keypoints in which I think you should focus your on learning to solve your problem.
For data storage - look into firestore (not the json database) as firestore is globally scaleable.
For your REST endpoints I would use google cloud functions, but without knowing the nature of your application its hard to say if its suitable. The key to being able to reach global scale is having cacheable endpoints. Then you are leveraging google's global CDN which is much faster than hitting the origin server. Note: The firebase cloud functions infrastructure WILL face cold start issues which may/may not be a problem for you.
Cache invalidation is a little lacking so you can leverage longer max-age cache settings but use either cache busing and/or the header stale-while-revalidate to help with this.
There is some great info here https://www.youtube.com/watch?v=dbV-293m1dQ that covers some of what I have mentioned in more detail.

Express-rate-limit vs NGINX in a node server

I'm currently using express-rate-limit module to block multiple requests from the same ip or logged in user account in my node server, and this is working pretty good against DoS attacks. This server is a small local business that requires only one instance, as it doesn't have too many users and it's computing requirements aren't too intensive.
I've been reading a lot about nginx lately, and many people recommends using it in node servers, but I can't see the major advantages of using it in this kind of application.
How would nginx be better for my application? What can it do that other npm modules can't in terms of security for a single server application?
Well I am not an NGINX expert but I use NGINX in production currently on my EC2 instance. When it comes to rate limiting there are a couple of options available with respect to express
You can use redis as a store, get the IP address of each incoming request and check how many hits they currently have before deciding to service them. This could be a middleware that works on all routes
You could use a library like express-rate-limit or rate-limiter-flexible which will handle the redis part for you
Now when you take NGINX, it is a web server whose strongest point is not rate limiting to be precise. It still supports rate limiting though if you modify the configuration. HERE is an insight into NGINX rate limiting.
Another option you havent considered is called HAProxy which is a load balancer which is considered superior for tasks such as rate limiting. You can read about HERE
Lets talk about the second part of your question
Rate limiting inside an application is a bad idea. It does not belong to the application as such. It is not a part of business logic. Also, It does not work well with clustered mode (more than one cores running express at the same time) unless you tweak it for supporting cluster.
Rate limiting using NGINX configuration just needs 2 extra lines as shown in the earlier link I posted. If suddenly you want to add an extra route or exempt some route from rate limiting NGINX can easily do that.
If you want to exempt your cloudfront addresses or CDN server addresses from being rate limited, you can add a whitelist of IPs to NGINX conf so that it will exempt them. Doing this in the application will be a real pain as you would have to git commit, redeploy etc. THIS answer covers how to exempt addresses

Securing elasticsearch

I am completely new to elasticsearch but I like it very much. The only thing I can't find and can't get done is to secure elasticsearch for production systems. I read a lot about using nginx as a proxy in front of elasticsearch but I never used nginx and never worked with proxies.
Is this the typical way to secure elasticsearch in production systems?
If so, are there any tutorials or nice reads that could help me to implement this feature. I really would like to use elasticsearch in our production system instead of solr and tomcat.
There's an article about securing Elasticsearch which covers quite a few points to be aware of here: http://www.found.no/foundation/elasticsearch-security/ (Full disclosure: I wrote it and work for Found)
There's also some things here you should know: http://www.found.no/foundation/elasticsearch-in-production/
To summarize the summary:
At the moment, Elasticsearch does not consider security to be its job. Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”.
Disable dynamic scripts. They are dangerous.
Understand the sometimes tricky configuration is required to limit access controls to indexes.
Consider the performance implications of multiple tenants, a weakness or a bad query in one can bring down an entire cluster!
Proxying ES traffic through nginx with, say, basic auth enabled is one way of handling this (but use HTTPS to protect the credentials). Even without basic auth in your proxy rules, you might, for instance, restrict access to various endpoints to specific users or from specific IP addresses.
What we do in one of our environments is to use Docker. Docker containers are only accessible to the world AND/OR other Docker containers if you explicitly define them as such. By default, they are blind.
In our docker-compose setup, we have the following containers defined:
nginx - Handles all web requests, serves up static files and proxies API queries to a container named 'middleware'
middleware - A Java server that handles and authenticates all API requests. It interacts with the following three containers, each of which is exposed only to middleware:
redis
mongodb
elasticsearch
The net effect of this arrangement is the access to elasticsearch can only be through the middleware piece, which ensures authentication, roles and permissions are correctly handled before any queries are sent through.
A full docker environment is more work to setup than a simple nginx proxy, but the end result is something that is more flexible, scalable and secure.
Here's a very important addition to the info presented in answers above. I would have added it as a comment, but don't yet have the reputation to do so.
While this thread is old(ish), people like me still end up here via Google.
Main point: this link is referenced in Alex Brasetvik's post:
https://www.elastic.co/blog/found-elasticsearch-security
He has since updated it with this passage:
Update April 7, 2015: Elastic has released Shield, a product which provides comprehensive security for Elasticsearch, including encrypted communications, role-based access control, AD/LDAP integration and Auditing. The following article was authored before Shield was available.
You can find a wealth of information about Shield here: here
A very key point to note is this requires version 1.5 or newer.
Ya I also have the same question but I found one plugin which is provide by elasticsearch team i.e shield it is limited version for production you need to buy a license and please find attached link for your perusal.
https://www.elastic.co/guide/en/shield/current/index.html

Fetching data from cerner's EMR / EHR

I don't have much idea in medical domain.
We evaluating a requirement from our client who is using Cerner EMR system
As per the requirement we need to expose the Cerner EMR or fetch some EMR / EHR data and to display it in SharePoint 2013 portal.
To meet this requirement what kind of integration options Cerner proposes. Is there any API’s or Web services exposed which can be used to build custom solutions for the same?
As far as I know Cerner did expose EMR / EHR information in HL7 format, but i don't have any idea how to access that.
I had also requested Cerner for the same awaiting replies from their end.
If anybody who have associated with similar kind of job can through some light and provide me with some insights.
You will need to request an interface between your organization and the facility with the EMR. An interface in the Health Care IT world is not the same as a GUI. Is is the mechanism (program/tool) that transfers HL7 data between one entity and the other. There will probably be a cost to have an interface setup. However, that is the traditional way Cerner communicates with 3rd parties. HIPAA laws will require that this connection be very secure.
You might also see if the facility with the EMR has an existing interface that produces the info you are after. You may be able to share that data or have a flat file generated from that interface that you could get access to. Because of HIPAA regulations, your client may be reluctant to share information in that manner.
I would suggest you start with your client's interface/integration team. They would be the ones that manage the information into and out of Cerner. They could also shed some light on how they prefer to see things done.
Good Luck
There are two ways of achieving this as I know. One is a direct connectivity to Cerner's Oracle database. This seems less likely to be possible as Cerner doesn't allow other vendors to have a direct access to their database.
The other way is to use Cerner's mPage Web Services. We have achieved this using mPage Web Services. The client needs to host the web services on a IBM WAS or some other container. We used WAS as that was readily available to us. Once the hosting is done, you will get a URL and using that you can execute any CCL program which will return you the data in JSON/XML format. mPage webservice has a basic HTTP authentication.
Now, CCL has to be written in a way which can return you the data you require.
We have a successful setup and have been working on this since 2014. For more details you can also try uCERN portal.
Thanks,
Navin

Automatically sync two Amazon S3 buckets, besides s3cmd?

Is there a another automated way of syncing two Amazon S3 bucket besides using s3cmd? Maybe Amazon has this as an option? The environment is linux, and every day I would like to sync new & deleted files to another bucket. I hate the thought of keeping all eggs in one basket.
You could use the standard Amazon CLI to make the sync.
You just have to do something like:
aws s3 sync s3://bucket1/folder1 s3://bucket2/folder2
http://aws.amazon.com/cli/
S3 buckets != baskets
From their site:
Data Durability and Reliability
Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 Region. To help ensure durability, Amazon S3 PUT and COPY operations synchronously store your data across multiple facilities before returning SUCCESS. Once stored, Amazon S3 maintains the durability of your objects by quickly detecting and repairing any lost redundancy. Amazon S3 also regularly verifies the integrity of data stored using checksums. If corruption is detected, it is repaired using redundant data. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data.
Amazon S3’s standard storage is:
Backed with the Amazon S3 Service Level Agreement.
Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
Designed to sustain the concurrent loss of data in two facilities.
Amazon S3 provides further protection via Versioning. You can use Versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. This allows you to easily recover from both unintended user actions and application failures. By default, requests will retrieve the most recently written version. Older versions of an object can be retrieved by specifying a version in the request. Storage rates apply for every version stored.
That's very reliable.
I'm looking for something similar and there are a few options:
Commercial applications like: s3RSync. Additionally, CloudBerry for S3 provides Powershell extensions for Windows that you can use for scripting, but I know you're using *nix.
AWS API + (Fav Language) + Cron. (hear me out). It would take a decently savvy person with no experience in AWS's libraries a short time to build something to copy and compare files (using the ETag feature of the s3 keys). Just providing a source/target bucket, creds, and iterating through keys and issuing the native "Copy" command in AWS. I used Java. If you use Python and Cron you could make short work of a useful tool.
I'm still looking for something already built that's open source or free. But #2 is really not a terribly difficult task.
EDIT: I came back to this post and realized nowadays Attunity CloudBeam is also a commercial solution many folks.
It is now possible to replicate objects between buckets in two different regions via the AWS Console.
The official announcement on the AWS blog explains the feature.

Resources