How to define local connection to Spark Thrift in Power BI

How to define local connection to Spark Thrift in Power BI - apache-spark

I am trying to configure the local connection to the Spark Thrift in Power BI. I am able to connect using Spark ODBC (localhost:10000 with mechanism User Name and Thrift transport SASL). But I would like to use Spark connector as it supports Direct Query.
I couldn't find how to define the connection string. Tried several things like localhost:10000/default/;transportMode=http;ssl=true;user=... but always get the error
ERROR TThreadPoolServer:297 - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status 80
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status 80
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
Any hints would be appreciated!

Solved. As written here https://community.powerbi.com/t5/Desktop/Connect-Power-BI-to-Hadoop-Direct-query-HDFS-vs-Spark-vs-custom/td-p/374625
it just doesn't work in Power BI from Microsoft Store. It works in the app from the website.

Related

Caused by: javax.mail.AuthenticationFailedException: failed to connect, no password specified? Spring Integration Imap adapter with Oauth2

I have tried below code.
#Bean
public IntegrationFlow mailListener() {
return IntegrationFlows.from(Mail.imapInboundAdapter("imap://[emailId]:[accesstoken]#imap.gmail.com/INBOX")
.shouldMarkMessagesAsRead(true).javaMailProperties(p -> p
.put("mail.imaps.ssl.enable", "true")
.put("mail.imaps.sasl.enable", "true")
.put("mail.imaps.sasl.mechanisms", "XOAUTH2")
.put("mail.imaps.auth.login.disable", "true")
.put("mail.imaps.auth.plain.disable", "true")
.put("mail.debug.auth", "true")
).shouldDeleteMessages(false).get(),
e -> e.poller(Pollers.fixedRate(5000).maxMessagesPerPoll(1)))
.<Message>handle((payload, header) -> logMail(payload))
.get();
}
I have referred Is Spring Integration Mail IMAP support OAuth2?
I get below exception
15:46:31.870 ERROR [task-scheduler-1][org.springframework.integration.handler.LoggingHandler] org.springframework.messaging.MessagingException: failure occurred while polling for mail; nested exception is javax.mail.AuthenticationFailedException: failed to connect, no password specified?
at org.springframework.integration.mail.MailReceivingMessageSource.doReceive(MailReceivingMessageSource.java:74)
at org.springframework.integration.endpoint.AbstractMessageSource.receive(AbstractMessageSource.java:184)
at org.springframework.integration.endpoint.SourcePollingChannelAdapter.receiveMessage(SourcePollingChannelAdapter.java:212)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:407)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.pollForMessage(AbstractPollingEndpoint.java:376)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$null$3(AbstractPollingEndpoint.java:323)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:57)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:55)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$4(AbstractPollingEndpoint.java:320)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

First of all you don't show the whole stack trace. I believe there is some cause by for that MailReceivingMessageSource.doReceive(). Secondly it is not clear what is your accesstoken for this configuration since we can get it only at runtime when we call an OAuth REST endpoint for such a token. And only after that you can configure an IntegraitonFlow dynamically for already retrieved token. And finally there should be some DEBUG logs for Java Mail according your "mail.debug.auth", "true" config. Plus the mentioned SO question is marked as solved. So, you indeed do something what doesn't fit to the required model or is fully not related to the subject.

Try to check gmail settings if you have enabled the allow of conection for less secure applications. You should put it off to connect to gmail from outside

Kafka Sink Connector for Cassandra failed

I'm creating Kafka Sink Conector for Cassandra via Lenses. My configuration is:
connector.class=com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector
connect.cassandra.key.space=space1
connect.cassandra.contact.points=cassandra1
tasks.max=1
topics=demo-1206-enriched-clicks-v0.1
connect.cassandra.port=9042
connect.cassandra.kcql=INSERT INTO space1.CLicks_Test SELECT ClicksId from demo-1206-enriched-clicks-v0.1
name=test_cassandra
but, I'm getting this error:
org.apache.kafka.common.config.ConfigException: Mandatory `topics` configuration contains topics not set in connect.cassandra.kcql: Set(demo-1206-enriched-clicks-v0.1)
at com.datamountaineer.streamreactor.connect.config.Helpers$.checkInputTopics(Helpers.scala:107)
at com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkConnector.start(CassandraSinkConnector.scala:65)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:100)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:125)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:182)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:210)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:872)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.processConnectorConfigUpdates(DistributedHerder.java:324)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:296)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:199)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Any ideas why?

Jelena, as discussed already the problem is with Kafka Connect framework not being able to parse .1 in topics=demo-1206-enriched-clicks-v0.1. A bug should be reported to Kafka GitHub.
#Alex what you see there is Not KSQL. The configuration SQL we use for all our connectors is KCQL(Kafka connect query language).
Furthermore we have our equivalent SQL for Apache Kafka.called LSQL: http://www.landoop.com/docs/lenses/

This was discussed at https://launchpass.com/datamountaineers and KCQL and the Connector was verified that properly supports the . character in the KCQL configuration. The root cause, was identified to be an issue upstream in Kafka Connect framework

Spark on Amazon EMR: "Timeout waiting for connection from pool"

I'm running a Spark job on a small three server Amazon EMR 5 (Spark 2.0) cluster. My job runs for an hour or so, fails with the error below. I can manually restart and it works, processes more data, and eventually fails again.
My Spark code is fairly simple and is not using any Amazon or S3 APIs directly. My Spark code passes S3 text string paths to Spark and Spark uses S3 internally.
My Spark program just does the following in a loop: Load data from S3 -> Process -> Write data to different location on S3.
My first suspicion is that some internal Amazon or Spark code is not properly disposing of connections and the connection pool becomes exhausted.
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1015)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:991)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:212)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy44.retrieveMetadata(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:780)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1428)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:313)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:85)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:487)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.$Proxy45.getConnection(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:423)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
... 41 more

I encountered this issue with a very trivial program on EMR (read data from S3, filter, write to S3).
I could solve it by using the S3A file system implementation and setting fs.s3a.connection.maximum to 100 to have a bigger connection pool.
(default is 15; see Hadoop-AWS module: Integration with Amazon Web Services for more config properties)
This is how I set the configuration:
// in Scala
val hc = sc.hadoopConfiguration
// in Python (not tested)
hc = sc._jsc.hadoopConfiguration()
// setting the config is the same for both languages
hc.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hc.setInt("fs.s3a.connection.maximum", 100)
To make it work, the S3 URIs passed to Spark have to start with s3a://...

This issue may also be resolved while remaining on EMRFS by setting fs.s3.maxConnections to something larger than the default 500 in emrfs-site config
https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
If using Java SDK to spin up an EMR cluster you can set this using the withConfigurations method (which is much easier than doing it manually by modifying files). See also https://stackoverflow.com/a/52595058/1586965
You can check this has been set correctly by using the Configurations tab in EMR, e.g.

All the TCP connection b/w DataStax driver to the Cassandra Remain in Active close state . i.e TIME_WAIT state.

The setup:
Web server
Apache Tomcat
RestFull web services
Using DataStax java driver 2.0
Database
-2-node Cassandra 2.0.7.31 cluster
-replicas=1
Problem
After sending set of 1500 request more than three times. I got error at the tomcat log
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
at com.jpmc.es.rtm.storage.impl.EventExtract.main(EventExtract.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:98)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:165)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
Observation
After this state of tomcat. All the further request attaining the same fate. That is drivers are not able to send my insert request to cassandra.
After executing net stat command i find that the all the TCP connection b/w web server and the Cassandra are in TIMED_WAIT state.
What could be the reason ? why Datastax driver is not able to take back the connection back to the pool? or why does the Cassandra is engaging all the connection form its client.
Thanks in Advance

The connection was increasing Due to calling creating multiple session for each request. Now it is working Fine.
builder = new Cluster.Builder().
addContactPoints("192.168.114.42");
builder.withPoolingOptions(new PoolingOptions().setCoreConnectionsPerHost(
HostDistance.LOCAL, new PoolingOptions().getMaxConnectionsPerHost(HostDistance.LOCAL)));
cluster = builder
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L))
.build();
session = cluster.connect("demodb");
Now Driver is maintain 17-26 number of connection irrespective of number of transaction.

JBoss - configuring server.xml Connector

I've configured HTTP Connector in server.xml adding some ssl features. I tryied to set my keyAlias to which is the name of the alias for certain certificate (not the private key of the keystore). Then, when I start JBoss I get something like:
[2012-04-12 17:01:37,236 ERROR [org.apache.coyote.http11.Http11Protocol] Error
initializing endpoint
java.io.IOException: Alias name <somealias> do not indetify a key entry
I'm new to ssl configuration and web security core concepts as well. Thanks for your patience.
Edit: complete stacktrace follows:
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.getKeyManagers(JSSESocketFactory.java:412)
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.init(JSSESocketFactory.java:378)
at org.apache.tomcat.util.net.jsse.JSSESocketFactory.createSocket(JSSESocketFactory.java:135)
at org.apache.tomcat.util.net.JIoEndpoint.init(JIoEndpoint.java:497)
at org.apache.tomcat.util.net.JIoEndpoint.start(JIoEndpoint.java:514)
at org.apache.coyote.http11.Http11Protocol.start(Http11Protocol.java:203)
at org.apache.catalina.connector.Connector.start(Connector.java:1146)
at org.jboss.web.tomcat.service.JBossWeb.startConnectors(JBossWeb.java:601)
at org.jboss.web.tomcat.service.JBossWeb.handleNotification(JBossWeb.java:638)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.mx.notification.NotificationListenerProxy.invoke(NotificationListenerProxy.java:153)
at $Proxy46.handleNotification(Unknown Source)
at org.jboss.mx.util.JBossNotificationBroadcasterSupport.handleNotification(JBossNotificationBroadcasterSupport.java:127)
at org.jboss.mx.util.JBossNotificationBroadcasterSupport.sendNotification(JBossNotificationBroadcasterSupport.java:108)
at org.jboss.system.server.ServerImpl.sendNotification(ServerImpl.java:916)
at org.jboss.system.server.ServerImpl.doStart(ServerImpl.java:497)
at org.jboss.system.server.ServerImpl.start(ServerImpl.java:362)
at org.jboss.Main.boot(Main.java:200)
at org.jboss.Main$1.run(Main.java:508)
at java.lang.Thread.run(Thread.java:662)

It looks like you are not importing your keys properties. I'd recommend you review your steps against these two documents
http://docs.jboss.org/jbossweb/3.0.x/ssl-howto.html
A shorter version is here
http://www.agentbob.info/agentbob/79-AB.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to define local connection to Spark Thrift in Power BI - apache-spark

Solved. As written here https://community.powerbi.com/t5/Desktop/Connect-Power-BI-to-Hadoop-Direct-query-HDFS-vs-Spark-vs-custom/td-p/374625 it just doesn't work in Power BI from Microsoft Store. It works in the app from the website.

Related

Caused by: javax.mail.AuthenticationFailedException: failed to connect, no password specified? Spring Integration Imap adapter with Oauth2

Kafka Sink Connector for Cassandra failed

Spark on Amazon EMR: "Timeout waiting for connection from pool"

All the TCP connection b/w DataStax driver to the Cassandra Remain in Active close state . i.e TIME_WAIT state.

JBoss - configuring server.xml Connector

Categories

Resources