What are all the other options I have to get data to user interface from Hive through Presto query engine other than JDBC
UI <--> Presto <--> Hive
The best interface for UI programming is the Presto REST interface. At Facebook we use this REST interface directly in PHP, Python and R for everything from graphical dashboards to statistical analysis. We are working on formal documentation for the REST interface, but for now the best documentation is here:
https://gist.github.com/electrum/7710544
BTW, the current JDBC driver is just a thin wrapper around the Presto REST interface and is really just a prototype. We are working on improving the driver for an internal project at FB, so expect it to become much better over the next few months.
If you are a python user, there is a decent library PyHive from Dropbox. PrestoDB site lists a collection of different Presto clients.
However, all of them are wrappers on top of Presto REST API with high-level API support.
Related
I'm trying to use Apache Flink's Table concept in one of my projects to combine data from multiple sources in real-time. Unfortunately, all of my team members are Node.JS developers. So, I'm looking for possible ways to connect to Flink from NodeJS and query from it. In Flink's documentation for SQL Client, it's mentioned that
The SQL Client aims to provide an easy way of writing, debugging, and submitting table programs to a Flink cluster without a single line of Java or Scala code. The SQL Client CLI allows for retrieving and visualizing real-time results from the running distributed application on the command line.
Based on this, is there any way to connect to Flink's SQL client from NodeJS? Is there any driver already available for this like Node.JS drivers for MySQL or MSSQL. Otherwise, what are the possible ways of achieving this?
Any idea or clarity on achieving this would be greatly helpful and much appreciated.
There's currently not much that you can do. The SQL Client runs on local machines and connects to the cluster there. I think what will help you is the introduction of the Flink SQL Gateway, which is expected to be released with Flink 1.16. You can read more about that on https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway
Another alternative is to check out some of the products that offer a FlinkSQL editor on the market, maybe that is useful path for your colleagues.
For example:
https://www.ververica.com/apache-flink-sql-on-ververica-platform
https://docs.cloudera.com/csa/1.7.0/ssb-overview/topics/csa-ssb-intro.html
Note that this is not exactly what you asked for, but could be an option to enable your team.
i have tried both the native api and fluent api for datastax graph in java.
i found fluent api more readable since it resembles java's OOP.
Native api has less readability in java since basically strings are being appended to create the entire gremlin script. but on the plus side a single call is made to execute the entire gremlin script
i wanted to know which is the best api to go with in case i need to add a large number of edges and vertices in one transaction and what are the performance issues which can occur in either case
Going forward I would recommend using the Fluent API over the String-based API. While we still support the string-based API in the DataStax drivers, most of our work and improvements will be using the fluent API.
The primary benefits of the Fluent API is that you can use the Apache TinkerPop library directly to form Traversals, it doesn't need to go through the groovy scripting engine (like the String-based API does).
In terms of loading multiple vertices/edges in one transaction, you can do that with Apache TinkerPop, and it will be much more effective than the String-based API because that all doesn't need to be evaluated through the gremlin-groovy engine. Also any future work around batching will likely be done in the Fluent API (via Apache TinkerPop), see JAVA-1311 for more details.
If this question is inappropriate for stackoverflow please feel free to remove this question.
Typically when we connect to relational databases from applications (e.g. Java or .Net), a JDBC or ODBC driver is used.
What driver does a Node.js application use to connect to databases? It appears that there isn't a standard way (similar to JDBC or ODBC).
I observed that each vendor provides a driver for Node.js.
Here are a couple that I searched for MS SQLServer and Oracle.
Node.js Driver for SQL Server
Oracle Database driver for Node.js
There are other node.js libraries, some that uses ODBC under the covers.Is there a standard around this space (similar to JDBC for Java)?
Though this question is specifically for RBDMS, it is applicable to NoSQL databases too.
Note: I primarily use Java/JEE (and its ecosystem) for my solutions.
There is no standard Database access API for Node.js. However if you like the JDBC APIs you can use a JDBC driver in your Node application with Avatar.js. This works fine with Oracle's UCP and JDBC thin driver (and maybe other drivers). This technique requires a thread pool to turn the blocking JDBC calls into non-blocking calls. At 2016's JavaOne, new asynchronous Database APIs for Java were presented which will hopefully be part of JDK10 and when that happens these APIs will fit quite nicely from within Node.js.
I would like to work with Cassandra from javascript web app using REST API.
REST should support basic commands working with DB - create table, select/add/update/remove items. Will be perfect to have something similar to odata protocol.
P.S. I'm looking for some library or component. Java is a most preferred.
Staash solution looks perfect for the task - https://github.com/Netflix/staash
You can use DataStax drivers. I used it via Scala but you can use Java, a Session object is a long-lived object and it should not be used in a request/response short-lived fashion but it's up to you.
ref. rules when using datastax drivers
There is no "best" language for REST APIs, it depends on what you're comfortable using. Virtually all languages will be able to do this reasonable well, depending on your skill level.
The obvious choice is probably java, because cassandra's written in java, the java driver from Datastax is well supported, and because it's probably pretty easy to find some spring REST frameworks to do what you want. Second beyond that would be python - again, good driver support and REST frameworks with things like django or flask+potion. Ruby driver isn't bad, lots of ruby REST APIs out there, too.
Is there an advantage to using one or the classes to execute statement in a .Net application. As a .Net developer using CqlConnection and CqlCommand is very similar what is done for other dbs (like SqlServer). I read on some web sites that Cluster and Session is a better way to go.
The documentation in DataStax does not describe the differences or any suggestions of which to use under what circumstances.
Thanks
Use the cluster and session objects in the DataStax driver
DataStax drivers provide critical functionality for enterprise cassandra apps, including configurable load balancing policies, automatic failover, retry policy, and tunability. These features are exposed via the cluster and session objects.
Notice that CqlConnection and CqlCommand are not even mentioned in the DataStax documentation. This is because they are used under the hood by the driver.
You can certainly use these to connect and read/write to cassandra but you will be missing out on the features I mentioned.
Pro Tip: Check the code comments here to see the functionality of the Cluster object. DataStax drivers are Open Source so feel free to go code diving!