Connecting/Accessing Hive data through Spark Thrift server on Power BI

Connecting/Accessing Hive data through Spark Thrift server on Power BI - apache-spark

I am rather new to data connectivity on multiple platforms, my requirement here is simple, I need to be able to access Spark Thrift server via Power BI, can anyone guide me with the required steps for the same?

I've had to integrate quite a few big data & analytics tools, and have a good amount of experience with spark
Typically I look for it on the tableau documentation
https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_sparksql.html
or the tool's docs
https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-feature-summary/#spark
but I'm partial to these docs
https://github.com/oracle/learning-library/blob/master/workshops/journey2-new-data-lake/files/18.1.4/pdf/Connecting%20DVD3%20and%20Spark.pdf
You'll need to make sure you've got spark-thift up and listening to an open port. Then you'll need different information and the type of connection you're using (jdbc, odbc...)
This is assuming you've got a preview version of the DirectQuery
https://learn.microsoft.com/en-us/power-bi/desktop-directquery-data-sources

Related

How to connect to Flink SQL Client from NodeJS?

I'm trying to use Apache Flink's Table concept in one of my projects to combine data from multiple sources in real-time. Unfortunately, all of my team members are Node.JS developers. So, I'm looking for possible ways to connect to Flink from NodeJS and query from it. In Flink's documentation for SQL Client, it's mentioned that
The SQL Client aims to provide an easy way of writing, debugging, and submitting table programs to a Flink cluster without a single line of Java or Scala code. The SQL Client CLI allows for retrieving and visualizing real-time results from the running distributed application on the command line.
Based on this, is there any way to connect to Flink's SQL client from NodeJS? Is there any driver already available for this like Node.JS drivers for MySQL or MSSQL. Otherwise, what are the possible ways of achieving this?
Any idea or clarity on achieving this would be greatly helpful and much appreciated.

There's currently not much that you can do. The SQL Client runs on local machines and connects to the cluster there. I think what will help you is the introduction of the Flink SQL Gateway, which is expected to be released with Flink 1.16. You can read more about that on https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway

Another alternative is to check out some of the products that offer a FlinkSQL editor on the market, maybe that is useful path for your colleagues.
For example:
https://www.ververica.com/apache-flink-sql-on-ververica-platform
https://docs.cloudera.com/csa/1.7.0/ssb-overview/topics/csa-ssb-intro.html
Note that this is not exactly what you asked for, but could be an option to enable your team.

COPY INTO vs Spark Connector fastest SQL Server to Snowflake data load

I am loading data from SQL Server on prem into Snowflake. I am able to use the COPY INTO command (https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html) OR the Spark connector (https://docs.snowflake.com/en/user-guide/spark-connector-overview.html#internal-data-transfer) via Azure Data Factory. I cannot tell which one is more performant (in dev they run in similar amounts of time) or which is "better" technically. Snowflake seems to push the COPY INTO process, but that requires an extra storage step since the source is SQL Server.
Does anyone know which technology is more suited to a daily ELT process and why?

as you’ve found out for yourself, neither solution is more performant than the other, given your particular set of circumstances.
Neither solution is technically better (how would you define “better” anyway?) - they are just different ways of achieving the same result

Visualisation of Web Data through API Calls

i am using a Web server service where i am able to get real-time data using RestAPI calls. Now i want to be able to collect the data - store them somehow and then visualise them in a nice way (produce graphs basically). My approach would be to store them in a database and then use the PowerBI's internal feature "Get Data" from an "SQL Server Database". No idea if this the correct approach. Can anyone advise here ?

Hello and welcome to Stack Overflow!
I agree with Andrey's comment above. But if you want to know about all the Data sources that PowerBI supports connecting to, please check the following resources:
Data sources in Power BI Desktop
Power BI Data Source Prerequisites
Connect to a web page from Power BI Desktop
Real-time streaming in Power BI
Additionally, you may also go through Microsoft Power BI Guided Learning to understand the next steps for visualization.
Hope this helps!

There's another approach that is to build a custom data connector for that API to Power BI.
That allows you to fetch the data inside Power BI, and build the visuals. You can store it in excel files or sql (you can use python scripts for this) and you can schedule refreshes on the service.

How do you make a connection to Power BI through an application's API

I have an Azure SQL Database and have made a direct connection from Power BI to it. The problem is that to successfully import the data, I had to give direct access to the data through the database firewall which I cannot allow.
Is there a way to use my application's API as the data source for Power BI rather than SQL.

You cannot do that.
Most of the tools that work on representing/caching/plotting data work with industry-standard adapters (sql, mongo, hadoop, etc.). There are a varieties of reasons for that.
Some simpler tools might exist where you can push data for reprsentation but that kills the power of things like PowerBI, Periscope or ChartIO.
Now, why not grant PowerBI access to your database?

One option I would suggest is that you could make a small piece of code that gets the necessary data (either through your API or directly from DB) and pushes it to Power BI through their REST API.

You can query an API via PowerBI. Please see my answer to a similar question.
If you can, I would recommend using OData, as PowerBi plays well with it.
https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-tutorial-analyzing-sales-data-from-excel-and-an-odata-feed/

SQL Azure Profiling

I read on the MS site that SQL Azure does not support SQL Profiler. What are people using to profile queries running on this platform?

I haven't got too far playing around with SQL Azure as yet, but from what I understand there isn't anything you can use at the moment.
From MS (probably the article you read):
Because SQL Azure performs the
physical administration, any
statements and options that attempt to
directly manipulate physical resources
will be blocked, such as Resource
Governor, file group references, and
some physical server DDL statements.
It is also not possible to set server
options and SQL trace flags or use the
SQL Server Profiler or the Database
Tuning Advisor utilities.
If there were to be an alernative, I'd imagine it would require the ability to set trace flags which you can't do, hence I don't think there is an option at the moment.
Solution? I can only suggest you have a local development copy of the db so you can run profiler locally on it. I know that won't help with "live" issues/debugging/monitoring but it depends on what you need it for.
Edit:
Quote from MSDN forum:
Q: Is SQL Profiler supported in SQL
Azure?
A: We do not support SQL Profiler in
v1 of SQL Azure.
Now, you could interpret that as a hint that Profiler will be supported in future versions. I think it will be a big requirement to get a lot of people on board, using SQL Azure seriously.

Update as of 9/17/2015:
Microsoft just announced a new feature called Index Advisor:
How does Index Advisor work? Index Advisor continuously monitors your
database workload, performs the analysis and recommends new indexes
that can further improve the DB performance.
Recommendations are always kept up-to-date: As the DB workload and
schema evolves, Index Advisor will monitor the changes and adjust the
recommendations accordingly. Each recommendation comes with the
estimated impact to DB workload performance: You can use this
information to prioritize the most impactful recommendations first. In
addition, Index Advisor provides a very easy and powerful way of
creating the recommended indexes.
Creating new indexes only takes a couple of clicks. Index Advisor
measures the impact of newly created indexes and provides a report on
index impact to users. You can get started with Index Advisor and
improve your database performance with the following simple steps. It
literally takes five minutes to get accustomed with Index Advisor’s
simple and intuitive user interface. Let’s get started!
Original Answer:
SQL Azure now has some native profiling. See http://blogs.msdn.com/b/benko/archive/2012/05/19/cloudtip-14-how-do-i-get-sql-profiler-info-from-sql-azure.aspx for details.

Microsoft's stated position SQL Server Profiler is deprecated. As much as this is a bad idea, that's what they have said.
SQL Profile is already deprecated in SQL Server, and that’s part of
the reason that it doesn’t make sense to bring to SQL DB.
What this means is you are going back 20+ years in database performance monitoring and everyone is going to have to write their own perf monitoring scripts instead of having a standard factory delivered tool that's on every server you will go to. It's tantamount to deprecating "sp_help" and making every DBA write their own. Hope you know all your DMVs inside and out... INNER JOIN, OUTER JOIN, and CROSS APPLY syntax really well.

Update as of 2017/04/14:
Microsoft's Scott Guthrie today announced a lot of new features in SQLAzure(this is called sqlazure managed instance,which is currently in preview),which are expected to be present in SQLAzure in coming months..below are them
1.SQLAgent
2.SQLProfiler
3.SQLCLR
4.Service Broker
5.Logshipping,Transactional Replication
6.Native/Backup restore
7.Additional DMV's and Xevents
8.cross database querying
References:
https://youtu.be/0uT46lpjeQE?t=1415

I have tried today a new tool suggested by Microsoft that is called Azure Data Studio.
In this tool you can download an extension called Profiler and it seems to be working just as expected.

You can use Query store feature, look here for more details: http://azure.microsoft.com/blog/2015/06/08/query-store-a-flight-data-recorder-for-your-database/

The most close to SQL profiler, that I found working in Azure SQL, is SQL Workload Profiler
However note, that it’s beta version of a tool, created but a single person, and it is not too convinient to use.

SQL Azure offers following features to tune performance, profile queries in its own way, identity long running queries and much more
Intelligent Performance
Performance overview
Performance recommendations
Query Performance Insight
Automatic tuning

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Connecting/Accessing Hive data through Spark Thrift server on Power BI - apache-spark

I am rather new to data connectivity on multiple platforms, my requirement here is simple, I need to be able to access Spark Thrift server via Power BI, can anyone guide me with the required steps for the same?

Related

How to connect to Flink SQL Client from NodeJS?

COPY INTO vs Spark Connector fastest SQL Server to Snowflake data load

Visualisation of Web Data through API Calls

How do you make a connection to Power BI through an application's API

SQL Azure Profiling

Categories

Resources