Spark is unable to connect to mysql when deployed on GKE - apache-spark

I am deploying a batch spark job on Kubernetes on GKE.
Job tries to fetch some data from MySQL (Google Cloud SQL) but it gives connection link failure.
I tried manually connecting to mysql by installing mysql client from pod and it connected fine.
is there any additional thing which I need to configure?
Exception:
Exception in thread "main" com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:590)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:57)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:1606)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:633)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:347)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:219)
at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)

Issue was actually with the firewall rules in GCP. Working fine now.

Related

On kubernetes my spark worker pod is trying to access thrift pod by name

Okay. Where to start? I am deploying a set of Spark applications to a Kubernetes cluster. I have one Spark Master, 2 Spark Workers, MariaDB, a Hive Metastore (that uses MariaDB - and it's not a full Hive install - it's just the Metastore), and a Spark Thrift Server (that talks to Hive Metastore and implements the Hive API).
So this setup is working pretty well for everything except the setup of the Thrift Server job (start-thriftserver.sh in the Spark sbin directory on the thrift server pod). By working well I say that outside my cluster I can create spark jobs and submit them to master and then using the Web UI I can see my code test app ran to completion utilizing both workers.
Now the problem. When you launch the start-thriftserver.sh it submits a job to the cluster with itself as the driver (I believe - which is correct behavior). And when I look at the related spark job via the WebUI I see it has workers and they repeatedly get hatched and then exit shortly therafter. When I look at the workers' stderr logs I see that every worker launches and tries to connect back to the thrift server pod at the spark.driver.port. This is correct behavior I believe. The gotcha is that connection fails because it says unknown host exception and it uses a kubernetes raw pod name (not a service name and with no IP in the name) of the thrift server pod to say it can't find the thrift server that initiated the connection. Now Kubernetes DNS stores service names and then only pod names as prefaced with their private IP. In other words the raw name of the pod (without an IP) is never registered with the DNS. That is not how kubernetes works.
So my question. I am struggling to figure out why the spark worker pod is using a raw pod name to try to find the thrift server. It seems it should never do this and that it should be impossible to ever satisfy that request. I have wondered if there is some spark config setting that would tell the workers that the (thrift) driver it needs to be searching for is actually spark-thriftserver.my-namespace.svc. But I can't find anything having done much searching.
There are so many settings that go into a cluster like this that I don't want to barrage you with info. One thing that might clarify my setup: the following string is dumped at the top of a worker log that fails. Notice the raw pod name of the thrift server for driver-url. If anyone has any clue what steps to take to fix this please let me know. I'll edit this post and share settings etc as people request them. Thanks for helping.
Spark Executor Command: "/usr/lib/jvm/java-1.8-openjdk/jre/bin/java" "-cp" "/spark/conf/:/spark/jars/*" "-Xmx512M" "-Dspark.master.port=7077" "-Dspark.history.ui.port=18081" "-Dspark.ui.port=4040" "-Dspark.driver.port=41617" "-Dspark.blockManager.port=41618" "-Dspark.master.rest.port=6066" "-Dspark.master.ui.port=8080" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#spark-thriftserver-6bbb54768b-j8hz8:41617" "--executor-id" "12" "--hostname" "172.17.0.6" "--cores" "1" "--app-id" "app-20220408001035-0000" "--worker-url" "spark://Worker#172.17.0.6:37369"

ConfiguredGraphFactory.open() on JanusGraph returned Cassandra DriverTimeoutException

I am new to Janusgraph. We have janusgraph setup with cassandra as backend.
We are using ConfiguredGraphFactory to dynamically create graphs at runtime. But when trying to open the created graph using ConfiguredGraphFactory.open("graphName") getting the below error
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
at com.datastax.oss.driver.api.core.DriverTimeoutException.copy(DriverTimeoutException.java:34)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
We are using single cassandra node and not a cluster. If we are not using ConfiguredGraphFactory we are able to connect to cassandra & it is not a network/wrong port issue.
Any Help would be appreciated.
JanusGraph uses the Java driver to connect to Cassandra. This error comes from the driver and indicates that the nodes didn't respond:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
The DriverTimeoutException is different from a read or write timeout. It gets thrown when a request from the driver timed out because it didn't get a response from the Cassandra nodes after 2 seconds (PT2S).
You'll need to check if there's a network route between the JanusGraph server and the Cassandra nodes. One thing to check for is that a firewall is not blocking access to the CQL client port on the C* nodes (default is 9042). Cheers!

Apache pulsar get timeout in unpredictable way

I installed Apache pulsar standalone. Pulsar get timeout sometimes. It's not related to high throuput neither to a particular topic (following log). Pulsar-admin brokers healthcheck returns OK or timeout also. How to investigate on it ?
10:46:46.365 [pulsar-ordered-OrderedExecutor-7-0] WARN org.apache.pulsar.broker.service.BrokerService - Got exception when reading persistence policy for persistent://nnx/agent_ns/action_up-53da8177-b4b9-4b92-8f75-efe94dc2309d: null
java.util.concurrent.TimeoutException: null
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_232]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_232]
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0]
at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$32(BrokerService.java:922) ~[org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0]
at org.apache.bookkeeper.mledger.util.SafeRun$2.safeRun(SafeRun.java:49) [org.apache.pulsar-managed-ledger-2.5.0.jar:2.5.0]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
I am glad you were able to resolve the issue my adding more cores. The issue was a connection timeout while trying to access some topic metadata that is stored inside of ZookKeeper as indicated by the following line in the stack trace:
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0]
Increasing the cores must of freed up enough threads to allow the ZK node to respond to this request.
You can check the connection to the server looks like connection issue if you are using any TLScertificate file path check if you have the right certificate.
The Problem is we don't have lot of solutions in internet for apache pulsar but if you are following the apache pulsar doc might help and also we have apache pulsar git hub and sample projects.

Connection to MirthDB in Azure

I am running Mirth 3.7.1 on a VM within Azure. The Mirth database is on a SQL Server managed instance within the same Azure subscription. I have several channels which consume ADT/ORM messages which seem to be working as expected, however, I also have a File Reader channel which reads PDF files from disk and sends them as MDM messages. This channel is intermittently erroring (see stack traces below) in what appears to me to be with its connection to the Mirth DB. I am assuming that this is due to the fact that it is attempting to save out the larger file data as it moves through the steps in the channel since the ADT/ORM channels are not having the same issue. We had this same channel running in a traditional environment and we did not see this same problem. Any thoughts on how to resolve this issue?
Also, I have alerts configured to send email when an error occurs. I am recieving these when the error is within the channel, but I am not being notified of these internal Mirth errors. Is there any way that I can be notified?
Mike
com.mirth.connect.donkey.server.channel.ChannelException: com.mirth.connect.donkey.server.data.DonkeyDaoException: java.sql.SQLException: I/O Error: Connection reset
at com.mirth.connect.donkey.server.channel.Channel.dispatchRawMessage(Channel.java:1213)
at com.mirth.connect.donkey.server.channel.SourceConnector.dispatchRawMessage(SourceConnector.java:192)
at com.mirth.connect.donkey.server.channel.SourceConnector.dispatchRawMessage(SourceConnector.java:170)
at com.mirth.connect.connectors.file.FileReceiver.processFile(FileReceiver.java:354)
at com.mirth.connect.connectors.file.FileReceiver.processFiles(FileReceiver.java:247)
at com.mirth.connect.connectors.file.FileReceiver.poll(FileReceiver.java:203)
at com.mirth.connect.donkey.server.channel.PollConnectorJob.execute(PollConnectorJob.java:49)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)Caused by: com.mirth.connect.donkey.server.data.DonkeyDaoException: java.sql.SQLException: I/O Error: Connection reset
at com.mirth.connect.donkey.server.data.jdbc.JdbcDao.insertContent(JdbcDao.java:274)
at com.mirth.connect.donkey.server.data.jdbc.JdbcDao.insertMessageContent(JdbcDao.java:193)
at com.mirth.connect.donkey.server.data.buffered.BufferedDao.executeTasks(BufferedDao.java:110)
at com.mirth.connect.donkey.server.data.buffered.BufferedDao.commit(BufferedDao.java:85)
at com.mirth.connect.donkey.server.data.buffered.BufferedDao.commit(BufferedDao.java:72)
at com.mirth.connect.donkey.server.channel.Channel.dispatchRawMessage(Channel.java:1185)
... 8 moreCaused by: java.sql.SQLException: I/O Error: Connection reset
at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1093)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQL(JtdsStatement.java:563)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeUpdate(JtdsPreparedStatement.java:727)
at com.mirth.connect.donkey.server.data.jdbc.JdbcDao.insertContent(JdbcDao.java:271)
... 13 moreCaused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readFully(Unknown Source)
at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114)
at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:99)
at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
... 16 more
Azure Monitor has capabilities to monitor Azure VM health but only the following perf counters are included for Windows VMs and Linux VMs.
However, Using Azure Monitor for VMs (preview) Map to understand application components there is an ability to create application maps that allow you to monitor specific aspects of an application environment and trigger alerts. For example, you can set-up a map for failed connections for both processes and connections.

How to connect Tableau Desktop to Spark SQL 2.0 through Spark Thrift Server?

I am trying to connect to Spark SQL (Spark 2.0.0) from Tableau Desktop 10.1.1 from OS X. I have SimbaSparkODBC already installed, and Spark Thirft Server is up and running. I am able to use beeline to connect and verify the Thrift Server.
However, when I configure Tableau using Spark SQL connector, it does not connect. After sometime, the query times out. When I checked the Thrift Server logs, I see the following message.
16/11/17 17:01:26 ERROR TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
I tried with Spark 1.6.1 and the outcome is the same. Does anyone have Tableau working with a similar setup? If so, what am I missing here?
When connecting to Spark SQL, choose "Username" authentication instead of "No Authentication". You can leave the username blank.

Resources