Securing elasticsearch

Securing elasticsearch - security

I am completely new to elasticsearch but I like it very much. The only thing I can't find and can't get done is to secure elasticsearch for production systems. I read a lot about using nginx as a proxy in front of elasticsearch but I never used nginx and never worked with proxies.
Is this the typical way to secure elasticsearch in production systems?
If so, are there any tutorials or nice reads that could help me to implement this feature. I really would like to use elasticsearch in our production system instead of solr and tomcat.

There's an article about securing Elasticsearch which covers quite a few points to be aware of here: http://www.found.no/foundation/elasticsearch-security/ (Full disclosure: I wrote it and work for Found)
There's also some things here you should know: http://www.found.no/foundation/elasticsearch-in-production/
To summarize the summary:
At the moment, Elasticsearch does not consider security to be its job. Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”.
Disable dynamic scripts. They are dangerous.
Understand the sometimes tricky configuration is required to limit access controls to indexes.
Consider the performance implications of multiple tenants, a weakness or a bad query in one can bring down an entire cluster!

Proxying ES traffic through nginx with, say, basic auth enabled is one way of handling this (but use HTTPS to protect the credentials). Even without basic auth in your proxy rules, you might, for instance, restrict access to various endpoints to specific users or from specific IP addresses.
What we do in one of our environments is to use Docker. Docker containers are only accessible to the world AND/OR other Docker containers if you explicitly define them as such. By default, they are blind.
In our docker-compose setup, we have the following containers defined:
nginx - Handles all web requests, serves up static files and proxies API queries to a container named 'middleware'
middleware - A Java server that handles and authenticates all API requests. It interacts with the following three containers, each of which is exposed only to middleware:
redis
mongodb
elasticsearch
The net effect of this arrangement is the access to elasticsearch can only be through the middleware piece, which ensures authentication, roles and permissions are correctly handled before any queries are sent through.
A full docker environment is more work to setup than a simple nginx proxy, but the end result is something that is more flexible, scalable and secure.

Here's a very important addition to the info presented in answers above. I would have added it as a comment, but don't yet have the reputation to do so.
While this thread is old(ish), people like me still end up here via Google.
Main point: this link is referenced in Alex Brasetvik's post:
https://www.elastic.co/blog/found-elasticsearch-security
He has since updated it with this passage:
Update April 7, 2015: Elastic has released Shield, a product which provides comprehensive security for Elasticsearch, including encrypted communications, role-based access control, AD/LDAP integration and Auditing. The following article was authored before Shield was available.
You can find a wealth of information about Shield here: here
A very key point to note is this requires version 1.5 or newer.

Ya I also have the same question but I found one plugin which is provide by elasticsearch team i.e shield it is limited version for production you need to buy a license and please find attached link for your perusal.
https://www.elastic.co/guide/en/shield/current/index.html

Related

Create multiple front-ends hitting same data source

I want to create and host 4-5 websites using the same database. The only difference between the sites will be:
branding (colours and header)
data will be filtered per website (through sql query) and
Each site will be on a separate domain (but can be hosted on same server)
My 1st thought was to use API / Rest model and provision five front-ends in their own sub-domain. But as sites can be hosted on same server (I'm assuming one hosting account which enables multiple sub-domains), I think I can simply connect all sites with connection string to same database, avoiding complexities of using REST.
Is this possible and would i run into database conflicts doing this?
If later, I wanted to add a mobile app client, then will I need to build out a rest interface anyway?
Thanks

The right thing to do here depends a lot on your specific use case, expected load, preferred backend/edge technology, future plans, etc.
Site domains and servers -
The main point here is that you can host your domains/subdomains on the same or different servers. You simply need to update the DNS to point to the correct IP (update the subdomain's A record).
Note: If these sites are all public-facing, then I highly recommend using an edge/proxy server and even consider a load balancer, depending on expected number of visitors (Nginx, or Apache Web Server)
Decoupled architecture is almost always preferred -
I would definitely have an API/REST layer to abstract the database from the sites. This ensures that you establish a contract through which any clients can interact with the backend, including your mobile application. You also don't have to duplicate DB-specific code across the various clients. What if you decided to change your schema? Or even your database solution? Then all clients will be broken and your customers would be unhappy. As a guiding principle, think: if I change any one thing in my architecture, how many other things will need to change as a result? In terms of scalability, this architecture will also allow you to easily spin up more instances of whatever it is you need (databases, REST service, etc) should the need arise.
How do I build and deploy a REST API?Re: #2, to set up a simple custom REST service running on Node.js (and express), this is a good tutorial. The example also walks through setting up and integrating with an in-memory MongoDB database.
Database collisions?If you follow the above steps, this should be a moot point. Node.js/express and the databases expose ways to configure connection pools if the defaults do not suffice. Again, this will depend on your needs - how many concurrent users you expect.

Fetching data from cerner's EMR / EHR

I don't have much idea in medical domain.
We evaluating a requirement from our client who is using Cerner EMR system
As per the requirement we need to expose the Cerner EMR or fetch some EMR / EHR data and to display it in SharePoint 2013 portal.
To meet this requirement what kind of integration options Cerner proposes. Is there any API’s or Web services exposed which can be used to build custom solutions for the same?
As far as I know Cerner did expose EMR / EHR information in HL7 format, but i don't have any idea how to access that.
I had also requested Cerner for the same awaiting replies from their end.
If anybody who have associated with similar kind of job can through some light and provide me with some insights.

You will need to request an interface between your organization and the facility with the EMR. An interface in the Health Care IT world is not the same as a GUI. Is is the mechanism (program/tool) that transfers HL7 data between one entity and the other. There will probably be a cost to have an interface setup. However, that is the traditional way Cerner communicates with 3rd parties. HIPAA laws will require that this connection be very secure.
You might also see if the facility with the EMR has an existing interface that produces the info you are after. You may be able to share that data or have a flat file generated from that interface that you could get access to. Because of HIPAA regulations, your client may be reluctant to share information in that manner.
I would suggest you start with your client's interface/integration team. They would be the ones that manage the information into and out of Cerner. They could also shed some light on how they prefer to see things done.
Good Luck

There are two ways of achieving this as I know. One is a direct connectivity to Cerner's Oracle database. This seems less likely to be possible as Cerner doesn't allow other vendors to have a direct access to their database.
The other way is to use Cerner's mPage Web Services. We have achieved this using mPage Web Services. The client needs to host the web services on a IBM WAS or some other container. We used WAS as that was readily available to us. Once the hosting is done, you will get a URL and using that you can execute any CCL program which will return you the data in JSON/XML format. mPage webservice has a basic HTTP authentication.
Now, CCL has to be written in a way which can return you the data you require.
We have a successful setup and have been working on this since 2014. For more details you can also try uCERN portal.
Thanks,
Navin

How to use mod_security as standalone?

I've seen the module named standalone in the package of Mod_Security; but I'm not sure how to use it after making and installing it!
Is there any good resources for the start up?

It does not appear to be possible; based on what the ModSecurity website says for its modes of operation:
Reverse proxies are effectively HTTP routers, designed
to stand between web servers and their clients. When you install a
dedicated Apache reverse proxy and add ModSecurity to it, you get a
"proper" network web application firewall, which you can use to
protect any number of web servers on the same network. Many security
practitioners prefer having a separate security layer. With it you get
complete isolation from the systems you are protecting. On the
performance front, a standalone ModSecurity will have resources
dedicated to it, which means that you will be able to do more (i.e.,
have more complex rules). The main disadvantage of this approach is
the new point of failure, which will need to be addressed with a
high-availability setup of two or more reverse proxies.
They are considering it separate by created a dedicated host that is used for proxying to internal hosts.
That works; but it's technically not standalone.
I also filed a bug, and it was confirmed by Felipe Zimmerle:
Standalone is a wrapper to Apache internals that allows ModSecurity to be executed. That wrapper still demand Apache pieces. It is true that you can extend your application using the Standalone version although, you will need some Apache pieces

As you have noted ModSecurity is an add on to an existing web server - originally as an Apache module (hence the name) but now also available for Nginx and IIS.
You can run it in embedded mode (i.e. as part of your main web server) or run it in reverse proxy mode (which is basically the same but you set up a separate web server and run it on that, and then direct all traffic through that).
To be perfectly honest I've never found much point in the reverse proxy method. I guess it does mean you could use it on non-supported web servers (i.e. if you are not using Apache, Nginx nor IIS), and it would reduce the load on your main web server, but other than that it seems like an extra step and infrastructure for no real gains. Some people might also prefer to do the ModSecurity checks in front of several web servers but I woudl argue if you have several web servers, then it is likely for performance and resiliency reasons so why not spread the ModSecurity to this level too rather than creating a single point of failure which might be a bottleneck in front of it. Only other reason would be to apply session level rules (e.g. if people are changing session ids), which might ultimately be spread between different web servers but I've never been convinced that those rules are that great anyway.
When I build ModSecurity I get a mod_security2.so library being built but no separate standalone file(s) so I presume you're just seeing this from hunting through the source (I do see a standalone)? I'd say just because there is a "standalone" folder in the source is not a guarantee that it can run as a completely separate, standalone piece.
I'd question why you want to run this as a standalone app even if you could? Web servers have a lot of functionality in them and depending on ModSecurity, which was written for web security, rather than web security and all the other things a web server does (e.g. be quick, understand HTTP protocol, gzip and ungzip...etc), needlessly stretches what ModSecurity would need to handle. So why not use a web server to take care of this and let ModSecurity do what it's good at?
If you are using ModSecurity then I guess you have web apps (presumably with a web server), so why not use it through that?
Finally is there any problem with installing this through Apache (or Nginx or IIS)? It's free software that's well supported and easy to set up.
I guess ultimately I don't understand the reason for your question. Is there a particular problem you are trying to solve, or is this more just curiosity?

Web app to synchronize data with server

Is there an easy way to manage offline data with a web app, and synchronize with a server when there is a connection? I have been looking at Meteor, CouchDB and the likes, but still not sure what would be the least painfull way.
I could of course implement it myself with sockets or something similar, but if something is already made for the purpose, I don't see a reason to do it again.
I'm planning to work with Node as the server.
Thanks

You're talking about two things; 1) How to store/persist data if/when offline (storage mechanism), and 2) How to synchronize with a server when online (communication mechanism). The answer to 1 is some kind of local storage, and there any several ways of doing that (localstorage, websql, filesystem APIs etc) depending on your platform. The answer to 2 really depend on how urgent your synchronization needs are, but in general you can use HTTP itself with periodic (long-) polling, websockets and similar.
On top of both storage and communication mechanisms there are numerous libraries that make the job simpler, like Meteor (communication) and CouchDB (storage), but also many many more. There are even libraries that take care of the actual synchronization mechanism (with possible conflict resolution as well), but this very much depends on your actual application.
Updated: This framework looks promising, but I haven't tested it myself:
http://blog.nateps.com/announcing-racer-experimental-realtime-model

You might want to look at cloud services as well. These are best if you are developing a new application as they push you more to a serverless model, and of course you have to be happy using a service.
Simperium (simperium) is an interesting cloud service - the only one I can find today that does syncing (unlike Firebase and Spire.io who are similar in other respects), and for iOS it includes offline storage, while for JavaScript clients you'd need to cover the local storage yourself using HTML5 features. Backbone.js seems to have some support for this, and Simperium can integrate with Backbone, using a similar API style.
For non-cloud services, Derbyjs (derbyjs) is an open source project that includes Racer, a data synchronization library (mentioned by the earlier answer) - both are under rapid development and not yet complete, but look interesting if your timescales allow, and don't require a cloud service. There is a comparison of Derbyjs to Meteor that is useful - although it's written by the Derbyjs developers it's not too biased.
I also looked at CouchDB, which has some interesting built-in replication features, but I didn't like its use of indexes that are updated lazily when a query needs them (or by a batch process), and I wasn't happy with exposing the server DB directly to clients to enable replication/sync. Generally I think it's best to decouple the client side local storage from the server side DB, and of course for a web app it would be hard to use CouchDB on the client.

What are my options when it comes to node.js lifecycle?

Are there any examples or conventions out there of how to use node.js to host multiple web apps?
I'm already aware that node itself can be used to build a server, but I'm curious as to whether there have been implementations where you aren't necessarily running it all the time. Strictly for the reason that perhaps there are multiple sites being hosted, each with their own copy of a framework, static files and custom functionality.
Or maybe you do run one instance of node and code a multiple site architecture to ensure one bad site doesn't take the server downin some way?
Virtual hosts, ensuring that one site can't crash others...these are all things that have been considered with other platforms, but I have had some difficulties finding for node! :)
I am already aware of connect, express and other middleware, however it doesn't cover what I'm asking here.

If you're worried about runtime isolation, each "site" should run it's own node process. Then use a proxy like node-http-proxy that will do host header based routing. Another great node based option is bouncy, but you don't necessarily need to use node to do the host based routing. You could just as well use haproxy, nginx, etc.
The baseline RAM overhead of each node process is very small (~10mb - 15mb). Also, if you do HTTP based routing you can spread your sites easily across machines, user home directories, etc.
If you want to handle the site/host registration programmatically, I would use seaport and then communicate the hostname and host + port details back to the proxy so that the routing table can by dynamic. This would also make it fairly easy to scale a site across multiple node processes.
Good luck!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string