The default Accumulo mointor allows to check the status of the servers and scans. As far as I know it does neither offer a way to see actual data nor does it allow to perform actions such as insertion, updates or deletions.
For many databases there are web interfaces which offer this, such as Adminer or phpMyAdmin. Is there any project which offers something like that for Accumulo?
If you enable the Accumulo monitor to run with SSL (http://accumulo.apache.org/1.7/accumulo_user_manual.html), you can access the Accumulo shell through your Web browser.
Related
We plan to use Cassandra 3.x and we want to allow our customers to connect to Cassandra directly for exporting the data into their data warehouses.
They will connect via ODBC from remote.
Is there any way to prevent that the customer executes huge or bad SELECT statements that will result in a high load for all nodes? We use an extra data center in our replication strategy where only customers can connect, so live system will not be affected. But we want to setup some workers that will run on this shadow system also. Most important thing is, that a connected remote client will not have any noticable impact on other remote connections or our local worker jobs. There is a materialized view already and I want to force customers to get data based on primary key only (i.e. disallow usage of ALLOW FILTERING). It would be great also, if one can limit the number of rows returned (e.g. 1 million) to prevent a pull of all data.
Is there a best practise for this use case?
I know of BlackRocks video related to multi-tenant strategy in C* which advises to use tenant_id in schema. That is what we're doing already, but how can I ensure security/isolation via ODBC connected tenants/customers? Or do I have to write an API on my own which handles security?
I would recommend to expose access via API, not via ODBC - at least you would have greater control on what is executed, and enforce tenant_id, and other checks, like limits, etc. You can try to utilize the Cassandra's CQL parser to decompose query, and put all required things back.
Theoretically, you can could utilize Apache Calcite, for example. It has implementation of JDBC driver that could be used, plus there is existing Cassandra adapter that you can modify to accomplish your task (mapping authentication into tenant_ids, etc.), but this will be quite a lot of work.
I'm looking at setting up a synchronisation between device and server database using CouchDB.
The interface i'm looking at using is loopback's built-in synchronisation (soon to become loopback-component-sync). What I would like to know is if we can use document-level replication in this case. For example, we may only want to replicate data for a specific user in the database.
I'm new to this functionality and would appreciate some insight as to how this might work with loopback as the replication host/client for sync between databases.
I am completely new to elasticsearch but I like it very much. The only thing I can't find and can't get done is to secure elasticsearch for production systems. I read a lot about using nginx as a proxy in front of elasticsearch but I never used nginx and never worked with proxies.
Is this the typical way to secure elasticsearch in production systems?
If so, are there any tutorials or nice reads that could help me to implement this feature. I really would like to use elasticsearch in our production system instead of solr and tomcat.
There's an article about securing Elasticsearch which covers quite a few points to be aware of here: http://www.found.no/foundation/elasticsearch-security/ (Full disclosure: I wrote it and work for Found)
There's also some things here you should know: http://www.found.no/foundation/elasticsearch-in-production/
To summarize the summary:
At the moment, Elasticsearch does not consider security to be its job. Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”.
Disable dynamic scripts. They are dangerous.
Understand the sometimes tricky configuration is required to limit access controls to indexes.
Consider the performance implications of multiple tenants, a weakness or a bad query in one can bring down an entire cluster!
Proxying ES traffic through nginx with, say, basic auth enabled is one way of handling this (but use HTTPS to protect the credentials). Even without basic auth in your proxy rules, you might, for instance, restrict access to various endpoints to specific users or from specific IP addresses.
What we do in one of our environments is to use Docker. Docker containers are only accessible to the world AND/OR other Docker containers if you explicitly define them as such. By default, they are blind.
In our docker-compose setup, we have the following containers defined:
nginx - Handles all web requests, serves up static files and proxies API queries to a container named 'middleware'
middleware - A Java server that handles and authenticates all API requests. It interacts with the following three containers, each of which is exposed only to middleware:
redis
mongodb
elasticsearch
The net effect of this arrangement is the access to elasticsearch can only be through the middleware piece, which ensures authentication, roles and permissions are correctly handled before any queries are sent through.
A full docker environment is more work to setup than a simple nginx proxy, but the end result is something that is more flexible, scalable and secure.
Here's a very important addition to the info presented in answers above. I would have added it as a comment, but don't yet have the reputation to do so.
While this thread is old(ish), people like me still end up here via Google.
Main point: this link is referenced in Alex Brasetvik's post:
https://www.elastic.co/blog/found-elasticsearch-security
He has since updated it with this passage:
Update April 7, 2015: Elastic has released Shield, a product which provides comprehensive security for Elasticsearch, including encrypted communications, role-based access control, AD/LDAP integration and Auditing. The following article was authored before Shield was available.
You can find a wealth of information about Shield here: here
A very key point to note is this requires version 1.5 or newer.
Ya I also have the same question but I found one plugin which is provide by elasticsearch team i.e shield it is limited version for production you need to buy a license and please find attached link for your perusal.
https://www.elastic.co/guide/en/shield/current/index.html
I don't think that I am understanding how CouchDB works. My impression is that everything runs on the client side, so wouldn't that mean it is useless for storing user data because anyone can write a simple script to access that information? This doesn't make sense to me, do I have it all wrong?
Aside from map-reduce and update operations, everything in CouchDB does run on the client. In this context, client means client connecting to the database server, which will usually be an application or script running on your web server. That's the case for other database systems, too: to connect to a MySQL database from a PHP script, you need to use a MySQL client library.
One special thing about CouchDB is that instead of using its own transfer protocol (like other systems like MySQL do), it uses HTTP, which is implemented by almost every single available language out there. This makes the development of a CouchDB client extremely easy.
The other special thing about CouchDB is that its security model does allow you to let end users connect directly to the database. In such a situation, you would write a JavaScript application that runs entirely in the users' browsers and queries the database through AJAX. The server would then authenticate the user and grant access only to those databases that the user is allowed to access, in either read-only or read-write mode. While this requires a bid of server-side scaffolding (to register new users and create a brand new database for them).
But you don't have to. My company uses CouchDB as a general-purpose persistent storage that is completely invisible from the internet, and only our web server is allowed to access it.
There's a really good book on CouchDb here: http://guide.couchdb.org/
Iam developing a VOIP platform which would allow users to make 100s of calls concurrently using my service.
Asterisk stores all call detail records in the CDR table. I would like to know where is best place to keep this table for the best possible architecture of my system.
Should I keep it on the Asterisk Box and run a cron to sync it with the database server OR should I directly call the DB server by the Asterisk Box and log all data directly on the database remotely though Asterisk.
All feel that both the architectures have their own pros and cons. I would like the help of experts to suggest me which would be best possible path for long term scalability and sustainability.
The best architecture would be to use distributed nodes(Server) i.e. PBX,web server & DB server in different nodes. PBX will populate your CDR table(this must be in a DB server) after every call, you can fetch these records from your web server for your reporting & billing purpose.
Using Cron to Sync DB table is not recommended as it will eat up the system resources & Bandwidth too (as this cron will run each time eating up the system resource & syncing with Db will cause bandwidth usage)
So using above defined architecture you can save system resources that will be used in running cron
Secondly if you place CDR in same node as PBX it will save system resource due to cron but for reporting & billing you have to fetch data from this node so you cant save Bandwidth, this schema has a major drawback, as currently you are talking about 100 calls concurrently what about if you had 1000 or more ??
In this case you have to definitely use PBX clustering in that case you will need a centralized DB server that will be synced by your PBX clusters.
So in all aspects my suggested architecture would perfectly suite your need.
As it is stated in the question that you need only 100s of concurrent calls you can use a single node for DB & Web server while PBx in other node
Using a separate database server to store your CDR's is the correct option for anything but a hobby Asterisk implementation. Asterisk makes it easy to select a destination database for your CDR's and has a myriad of different database options: MySQL, Postgresql. MSSQL etc. The Asterisk CDR implementation only uses a single table so it's actually a very simple integration between it and your database server.
One thing to be VERY aware of is that if your database server or the connection between it and your Asterisk server has problems it WILL impact your call processing. If there's a problem Asterisk will block while it keeps trying to connect to the database to write the CDR's. While it's doing that it won't process any other calls. Arguably this is desired behaviour as CDR's are critical for billing and not being able to log them means any calls would potentially end up being free. As a backup you can also set up CDR logging to a .csv file on the Asterisk server as a belt and braces approach.
I think that if you can connect directly from Asterisk to database than you should use it. I have seen it on some Asterisk installations (including one quite big call center) and it worked well.
The other option I use where there is no direct connection from Asterisk to database, but there is HTTPS connection to other service, or where billing table structure is not compatible to Asterisk tables it to use CSV CDR files. Such file is send every few minutes to CRM system. I use cron and little Python script. This way I can easily adapt to CSV format used by CRM billing system.