Sql query for Data Service Server 3.1.0 - apache-poi

I am working on wso2 DSS 3.1.0 and inserting data as array into sql , this is Sql query
INSERT INTO memployeecount (CompanyCode,NoofEmployees) VALUES
('SPS', 1000),
('SPS', 2000),
('SPS', 3000),
('SFS', 500),
('SFS', 600),
('SFS', 700);
it's working fine,
But how can i write the same query for Data Service server.
Some one guide me.

WSO2 DSS supports batch requests, which takes of array of parameters as the input. What you need to do is create a dataservice enabling batch requests[1], with a simple insert query.
Once it is deployed there will be a batch enabled request created from which you can insert multiple records. For more information you can tryout Batch Processing sample available inside DSS[2].
[1]http://docs.wso2.org/display/DSS310/Creating+Using+Various+Data+Sources
[2]http://docs.wso2.org/display/DSS310/Batch+Processing+Sample

Related

How to get SQL Profiler working with Azure SQLDb?

We are having a Power BI dataset in the service. The source of this dataset are some Azure SQLdb tables. PaaS (Platform as a Service) setup. The daily refresh of this Power BI dataset takes long. SQL Profiler would be the tool to check the events that are happening. But we can't get it working in this PaaS environment.
We used SQLServer, DAX Studio and Azure Data Studio.
So how can I trace query execution, capture events in a PaaS environment?
Really hope someone has the answer
regards Ron
SQL Server Profiler and SQL Trace are deprecated. For Azure SQL Database you should use Extended Events to capture the queries.
With Extended Events you can create a session, define what events to be captured in this session, and for each of the events to say which fields to be retrieved. You can define filters on these fields too (e.g. capture the events in one specific database only). The last thing when you create a session is to define where to store the data - in a file, ring buffer and so on. In your case, sql_batch_starting event with sql_text field, captured to a ring buffer should be enough (capturing to a file will require setting up Azure Storage).
You can create the event session with a script or with a wizard in SQL Server Management Studio. The script could be something like this:
CREATE EVENT SESSION [Capture queries] ON DATABASE
ADD EVENT sqlserver.sql_batch_starting(
ACTION(sqlserver.sql_text))
ADD TARGET package0.ring_buffer
GO
where [Capture queries] is the name of the session. If you create the session with the wizard, you have the option to start it automatically after it is created, but if you use the script, you must start it manually, like this:
ALTER EVENT SESSION [Capture queries] ON SERVER STATE = START
It is very important to stop the session, when it is not needed anymore, because it has impact on the performance. You can stop a session with the following script:
ALTER EVENT SESSION [Capture queries] ON SERVER STATE = STOP
And eventually drop it when it is no longer needed:
DROP EVENT SESSION [Capture queries] ON SERVER
In SQL Server Management Studio, you can see the result by right-clicking on the ring buffer and select View Target Data...:
which will show you an XML to click on:
Or you can use a query, like this:
select
se.name as session_name,
ev.event_name,
ac.action_name,
st.target_name,
se.session_source,
st.target_data,
CAST(st.target_data AS XML) as target_data_XML
from sys.dm_xe_database_session_event_actions ac
INNER JOIN sys.dm_xe_database_session_events ev on ev.event_name = ac.event_name and cast(ev.event_session_address AS BINARY(8)) = cast(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_session_object_columns oc on cast(oc.event_session_address AS BINARY(8)) = cast(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_session_targets st on cast(st.event_session_address AS BINARY(8)) = cast(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_sessions se on cast(ac.event_session_address AS BINARY(8)) = cast(se.address AS BINARY(8))
The last column is an XML like the one above, where you can see the captured statements:
Of course, it is possible to use XQuery and transform the returned XML to a tabular result, but in your case it is not needed - just look for the queries in the XML itself.

What does "avoid multiple Kudu clients per cluster" mean?

I am looking at kudu's documentation.
Below is a partial description of kudu-spark.
https://kudu.apache.org/docs/developing.html#_avoid_multiple_kudu_clients_per_cluster
Avoid multiple Kudu clients per cluster.
One common Kudu-Spark coding error is instantiating extra KuduClient objects. In kudu-spark, a KuduClient is owned by the KuduContext. Spark application code should not create another KuduClient connecting to the same cluster. Instead, application code should use the KuduContext to access a KuduClient using KuduContext#syncClient.
To diagnose multiple KuduClient instances in a Spark job, look for signs in the logs of the master being overloaded by many GetTableLocations or GetTabletLocations requests coming from different clients, usually around the same time. This symptom is especially likely in Spark Streaming code, where creating a KuduClient per task will result in periodic waves of master requests from new clients.
Does this mean that I can only run one kudu-spark task at a time?
If I have a spark-streaming program that is always writing data to the kudu,
How can I connect to kudu with other spark programs?
In a non-Spark program you use a KUDU Client for accessing KUDU. With a Spark App you use a KUDU Context that has such a Client already, for that KUDU cluster.
Simple JAVA program requires a KUDU Client using JAVA API and maven
approach.
KuduClient kuduClient = new KuduClientBuilder("kudu-master-hostname").build();
See http://harshj.com/writing-a-simple-kudu-java-api-program/
Spark / Scala program of which many can be running at the same time
against the same Cluster using Spark KUDU Integration. Snippet
borrowed from official guide as quite some time ago I looked at this.
import org.apache.kudu.client._
import collection.JavaConverters._
// Read a table from Kudu
val df = spark.read
.options(Map("kudu.master" -> "kudu.master:7051", "kudu.table" -> "kudu_table"))
.format("kudu").load
// Query using the Spark API...
df.select("id").filter("id >= 5").show()
// ...or register a temporary table and use SQL
df.registerTempTable("kudu_table")
val filteredDF = spark.sql("select id from kudu_table where id >= 5").show()
// Use KuduContext to create, delete, or write to Kudu tables
val kuduContext = new KuduContext("kudu.master:7051", spark.sparkContext)
// Create a new Kudu table from a dataframe schema
// NB: No rows from the dataframe are inserted into the table
kuduContext.createTable("test_table", df.schema, Seq("key"),
new CreateTableOptions()
.setNumReplicas(1)
.addHashPartitions(List("key").asJava, 3))
// Insert data
kuduContext.insertRows(df, "test_table")
See https://kudu.apache.org/docs/developing.html
The more clear statement of "avoid multiple Kudu clients per cluster" is "avoid multiple Kudu clients per spark application".
Instead, application code should use the KuduContext to access a KuduClient using KuduContext#syncClient.

Query my temporary tables outside my java app

I have created a java application starting spark (local[*]) and exploiting it to read a csv file as a Dataset<Row> and to create a temporary view with createOrReplaceTempView.
At this point I am able to exploit SQL to query the view inside my application.
What I would like to do, for development and debugging purposes, is to execute queries in an interactive way from outside my application.
Any hints?
Thanks in advance
You can use spark's DeveloperApi - HiveThriftServer2.
#DeveloperApi
def startWithContext(sqlContext: SQLContext): Unit = {
val server = new HiveThriftServer2(sqlContext)
Only thing you need to do in your application is to get SQLContext and use it as follows:
HiveThriftServer2.startWithContext(sqlContext)
This will start hive thrift server (by default on port 10000) and you can use sql client - e.g. beeline for accessing and querying your data in temp tables.
Also you will need to set --conf spark.sql.hive.thriftServer.singleSession=true which allows you to see temp tables. By default it's set to false so each connection has it's own session and they dont see others temp tables.
"spark.sql.hive.thriftServer.singleSession" - When set to true, Hive Thrift server is running in a single session
mode. All the JDBC/ODBC connections share the temporary views, function registries, SQL configuration and the current database.

Accessing Spark RDDs from a web browser via thrift server - java

We have processed our data using Spark 1.2.1 with Java and stored in Hive tables. We want to access this data as RDDs from an web browser.
I read documentation and I understood the steps to do the task.
I am unable to find the way to interact with Spark SQL RDDs via thrift server. Examples I found have belw line in the code and I am not find the class for this in Spark 1.2.1 java API docs.
HiveThriftServer2.startWithContext
In github i saw scala examples using
import org.apache.spark.sql.hive.thriftserver , but I dont see this in Java API docs. Not sure if I am missing something.
Did anybody had luck with accessing Spark SQL RDDs from a browser via thrift? Can you post the code snippet. We are using Java.
I've got most of this working. Lets dissect each part of it: (References at bottom of post)
HiveThriftServer2.startWithContext is defined in Scala. I was never able to access it from Java or from Python using Py4j, and am no JVM expert, but I ended up switching to Scala. This may have something to do with the annotation #DeveloperApi . This is how I imported it Scala in Spark 1.6.1:
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
For anyone reading this and not using Hive, a Spark SQL context won't do, and you need a hive context. However, the HiveContext constructor requires a Java spark context, not a scala one.
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.hive.HiveContext
var hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc))
Now start the thrift server
HiveThriftServer2.startWithContext(hiveContext)
// Yay
Next, we need to make our RDDs available as SQL tables. First, we have to convert them into Spark SQL DataFrames:
val someDF = hiveContext.createDataFrame(someRDD)
Then, we need to turn them into Spark SQL tables. You do this by persisting them to Hive, or making the RDD available as a temporary table.
Persist to Hive:
// Deprecated since Spark 1.4, to be removed in Spark 2.0:
someDF.saveAsTable("someTable")
// Up-to-date at time of writing
someDF.write().saveAsTable("someTable")
Or, use a temporary table:
// Use the Data Frame as a Temporary Table
// Introduced in Spark 1.3.0
someDF.registerTempTable("someTable")
Note - temporary tables are isolated to an SQL session.
Spark's hive thrift server is multi-session by default
in version 1.6 (one session per connection). Therefore,
for clients to access temporary tables you've registered,
you'll need to set the option spark.sql.hive.thriftServer.singleSession to true
You can test this by querying the tables in beeline, a command line utility for interacting with the hive thrift server. It ships with Spark.
Finally, you need a way of accessing the hive thrift server from the browser. Thanks to its awesome developers, it has an HTTP mode, so if you want to build a web app, you can use the thrift protocol over AJAX requests from the browser. A simpler strategy might be to create an IPython notebook, and use pyhive to connect to the thrift server.
Data Frame Reference:
https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrame.html
singleSession option pull request:
https://mail-archives.apache.org/mod_mbox/spark-commits/201511.mbox/%3Cc2bd1313f7ca4e618ec89badbd8f9f31#git.apache.org%3E
HTTP mode and beeline howto:
https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Pyhive:
https://github.com/dropbox/PyHive
HiveThriftServer2 startWithContext definition:
https://github.com/apache/spark/blob/6b1a6180e7bd45b0a0ec47de9f7c7956543f4dfa/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L56-73
Thrift is JDBC/ODBC server.
You can connect to it via JDBC/ODBC connections and access content through the HiveDriver.
You can not get RDDs back from it, because HiveContext is not available.
What you refered to is an experimental feature not available for Java.
As a workaround, you could re-parse the results and create your structures for your client.
For example:
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
private static String hiveConnectionString = "jdbc:hive2://YourHiveServer:Port";
private static String tableName = "SOME_TABLE";
Class c = Class.forName(driverName);
Connection con = DriverManager.getConnection(hiveConnectionString, "user", "pwd");
Statement stmt = con.createStatement();
String sql = "select * from "+tableName;
ResultSet res = stmt.executeQuery(sql);
parseResultsToObjects(res);

Clustered Index Entity Framework

When I Create the azure database with Entity Framework model first pattern, then the creation works. But when I want to save the database I get the following error:
"Tables without a clustered index are not supported in this version of SQL Server. Please create a clustered index and try again."
I updated entity Framework to Version 6.1.2. But still get the same eroor. Do you have any idea?

Resources