How to connect to Flink SQL Client from NodeJS? - node.js

I'm trying to use Apache Flink's Table concept in one of my projects to combine data from multiple sources in real-time. Unfortunately, all of my team members are Node.JS developers. So, I'm looking for possible ways to connect to Flink from NodeJS and query from it. In Flink's documentation for SQL Client, it's mentioned that
The SQL Client aims to provide an easy way of writing, debugging, and submitting table programs to a Flink cluster without a single line of Java or Scala code. The SQL Client CLI allows for retrieving and visualizing real-time results from the running distributed application on the command line.
Based on this, is there any way to connect to Flink's SQL client from NodeJS? Is there any driver already available for this like Node.JS drivers for MySQL or MSSQL. Otherwise, what are the possible ways of achieving this?
Any idea or clarity on achieving this would be greatly helpful and much appreciated.

There's currently not much that you can do. The SQL Client runs on local machines and connects to the cluster there. I think what will help you is the introduction of the Flink SQL Gateway, which is expected to be released with Flink 1.16. You can read more about that on https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway

Another alternative is to check out some of the products that offer a FlinkSQL editor on the market, maybe that is useful path for your colleagues.
For example:
https://www.ververica.com/apache-flink-sql-on-ververica-platform
https://docs.cloudera.com/csa/1.7.0/ssb-overview/topics/csa-ssb-intro.html
Note that this is not exactly what you asked for, but could be an option to enable your team.

Related

Is there a simple Jmeter performance test case for Cassandra

We are creating Jmeter performance benchmarking for our Cassandra installation.
For which we have been referring to the default Cassandra plugin mentioned in the site
This plugin does not take any Cassandra server connection parameter for the "put", no much help is also present to how to use this plugin.
Some can help me with this plugin if any one knows how to configure Cassandra connection
Hence we switched to an article to test Cassandra with Groovy. (Link here)
This site calls to add multiple jar some are bundles and cannot find the exeat JAR
snappy-java-1.0.5
netty-transport-4.0.33.Final
netty-handler-4.0.33.Final
netty-common-4.0.33.Final
netty-codec-4.0.33.Final
netty-buffer-4.0.33.Final
metrics-core-3.1.2
lz4-1.2.0
HdrHistogram-2.1.4
guava-16.0.1
Can some help me with some simpler test perform on Cassandra ?
For correct performance testing of Cassandra it's better to use specialized tools, like NoSQLBench that was developed specifically for that task. Generic tools won't give you the real performance numbers. Please read NoSQLBench documentation on how to correctly test Cassandra to take into account things like compaction, repairs, etc.
Have you tried to read documentation which mentions CassandraProperties configuration element where you can define your connection server parameters:
If you want to have the full control and not only be limited to what other guys implemented you can consider following instructions from Cassandra Load Testing with Groovy article

nodejs bigtable copy rows using filters like prefix

Is it possible in Bigtable/nodejs-bigtable to do something similar to createReadStream but instead of first retrieving the rows just to write them back again I'm looking for a way to do this on the server like a insert into select from in sql
Cloud Bigtable does not offer any direct way to run application code on its servers.
Cloud Bigtable's general strategy of running high volume jobs is to suggest runing on Dataflow with the Cloud Bigtable HBase connector (although that requires java code).
That said, the specific implementation very much depends on your objectives. Any additional information about your use case would help.

Connecting/Accessing Hive data through Spark Thrift server on Power BI

I am rather new to data connectivity on multiple platforms, my requirement here is simple, I need to be able to access Spark Thrift server via Power BI, can anyone guide me with the required steps for the same?
I've had to integrate quite a few big data & analytics tools, and have a good amount of experience with spark
Typically I look for it on the tableau documentation
https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_sparksql.html
or the tool's docs
https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-feature-summary/#spark
but I'm partial to these docs
https://github.com/oracle/learning-library/blob/master/workshops/journey2-new-data-lake/files/18.1.4/pdf/Connecting%20DVD3%20and%20Spark.pdf
You'll need to make sure you've got spark-thift up and listening to an open port. Then you'll need different information and the type of connection you're using (jdbc, odbc...)
This is assuming you've got a preview version of the DirectQuery
https://learn.microsoft.com/en-us/power-bi/desktop-directquery-data-sources

stubbed cassandra for data storage

I need an embedded cassandra for my project and I was wondering if I can use Stubbed Cassandra for data storage. Because I need a system to simulate CQL requests and responses.
Thanks everyone.
You cant use it as a real datastore. Use real cassandra as a real cassandra datastore. check out ccm which is probably more what your looking for.
There are wrappers for it in dtests (python) and the java driver uses it for testing and has a java wrapper.
I don't really have any experience at all with SCassandra but I worked on several projects using Apache Cassandra and there are some use cases like multidatacenter infrastructure to experiment and I don't think SCassandra can do it. So if you plan to do simple tests, that's fine, But advanced use cases really need to be tested in a real cassandra distribution.
As others have mentioned, you will need the real Cassandra for data storage. However, if you want to test CQL requests/responses then you can use this library:
Cassandra-Spy
It runs an actual embedded Cassandra and also can simulate failures for inserts/selects. This helps you test your app's behaviour in failure cases. I wrote the library to address this specific use case.

Best way to benchmark Cassandra and Hbase for performance?

What's the best way to benchmark Cassandra and Hbase for performance?
I'm working on an application where the Read (80%) and Write (20%) usage through an web application. Users can also do CRUD (Create, Read, Update, Delete) to the data. Our data is all structured from (RDBMS). I have heard about YCSB (Yahoo! Cloud Serving Benchmark).
Had anyone done benchmark on Cassandra vs Hbase for a similar usecase like above?
I will assume that your Cassandra is sitting behind a web app?
If so (as you mentioned CRUD), just benchmark the end points of your CRUD for WRITE (the Create) and the READ via Apache Workbench or Siege under load (ie concurrent calls, etc..)
Update
If you want to purely test if your configuration of Cassandra is correct for raw power:
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html
but if you want to test the application as a whole, Apache workbench and Siege will test your App.
Most of the databases provide some tool to do performance testing. In my opinion, the best way to get an unbiased view is to use a third party tool like https://github.com/brianfrankcooper/YCSB which supports testing different types of ACID and NoSQL databases.

Resources