After running "nodetool repair" command cassandra node gone down and did not start again.
INFO [main] 2016-10-19 12:44:50,244 ColumnFamilyStore.java:405 - Initializing system_schema.aggregates
INFO [main] 2016-10-19 12:44:50,247 ColumnFamilyStore.java:405 - Initializing system_schema.indexes
INFO [main] 2016-10-19 12:44:50,248 ViewManager.java:139 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
Cassandra version 3.7
Turned on the node and it's fine. It took too long to start (more than 30 minutes).
INFO [main] 2016-10-19 15:32:48,348 ColumnFamilyStore.java:405 - Initializing system_schema.indexes
INFO [main] 2016-10-19 15:32:48,354 ViewManager.java:139 - Not submitting build tasks for views in keyspace system_schema as storage service is not initialized
INFO [main] 2016-10-19 16:07:36,529 ColumnFamilyStore.java:405 - Initializing system_distributed.parent_repair_history
INFO [main] 2016-10-19 16:07:36,546 ColumnFamilyStore.java:405 - Initializing system_distributed.repair_history
Now I'm trying to figure out why it is so slow.
Related
I am trying to deploy a Flink job in Kubernetes cluster (Azure AKS). The Job Cluster is getting aborted just after starting but Task manager is running fine.
The docker image is created successfully without any exception. I am able to run the docker image as well as able to SSH to docker image.
I have followed steps mentioned in the below link:
https://github.com/apache/flink/tree/release-1.9/flink-container/kubernetes
While creating image I have provided Job jar and it has been copied on "/opt/artifacts" inside the image. But still not getting why getting below exception in Job Cluster pod log.
Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.
I am new in Kubernetes, Could you please give me some clue to debug this issue.
Please find below complete logs:
A. flink-job-cluster Pod Log
develk#ACIDLAELKV01:~/cntx_eng$ kubectl logs flink-job-cluster-kszwf
Starting the job-cluster
Starting standalonejob as a console application on host flink-job-cluster-kszwf.
2019-12-12 10:37:17,170 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-12-12 10:37:17,172 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneJobClusterEntryPoint (Version: 1.8.0, Rev:4caec0d, Date:03.04.2019 # 13:25:54 PDT)
2019-12-12 10:37:17,172 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: flink
2019-12-12 10:37:17,173 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-12-12 10:37:17,173 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - IcedTea - 1.8/25.212-b04
2019-12-12 10:37:17,173 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 989 MiBytes
2019-12-12 10:37:17,173 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
2019-12-12 10:37:17,174 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available
2019-12-12 10:37:17,174 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2019-12-12 10:37:17,174 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2019-12-12 10:37:17,174 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2019-12-12 10:37:17,174 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink-1.8.0/conf/log4j-console.properties
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink-1.8.0/conf/logback-console.xml
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments:
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink-1.8.0/conf
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djobmanager.rpc.address=flink-job-cluster
2019-12-12 10:37:17,175 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dparallelism.default=1
2019-12-12 10:37:17,176 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dblob.server.port=6124
2019-12-12 10:37:17,176 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dqueryable-state.server.ports=6125
2019-12-12 10:37:17,176 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink-1.8.0/lib/log4j-1.2.17.jar:/opt/flink-1.8.0/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.8.0/lib/flink-dist_2.11-1.8.0.jar:::
2019-12-12 10:37:17,176 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-12-12 10:37:17,178 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-12-12 10:37:17,306 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2019-12-12 10:37:17,306 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-12-12 10:37:17,307 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2019-12-12 10:37:17,307 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m
2019-12-12 10:37:17,307 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-12-12 10:37:17,307 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2019-12-12 10:37:17,336 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneJobClusterEntryPoint.
2019-12-12 10:37:17,336 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2019-12-12 10:37:17,343 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-12-12 10:37:17,352 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context.
2019-12-12 10:37:17,362 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-12-12 10:37:17,381 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-12-12 10:37:17,382 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2019-12-12 10:37:17,638 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at flink-job-cluster:6123
2019-12-12 10:37:18,163 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-12-12 10:37:18,237 INFO akka.remote.Remoting - Starting remoting
2019-12-12 10:37:18,366 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink#flink-job-cluster:6123]
2019-12-12 10:37:18,375 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink#flink-job-cluster:6123
2019-12-12 10:37:18,398 INFO org.apache.flink.configuration.Configuration - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-12-12 10:37:18,407 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-63338044-67c1-4872-a3d9-c94563b3a7c3
2019-12-12 10:37:18,412 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2019-12-12 10:37:18,428 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2019-12-12 10:37:18,430 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at flink-job-cluster:0
2019-12-12 10:37:18,464 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-12-12 10:37:18,472 INFO akka.remote.Remoting - Starting remoting
2019-12-12 10:37:18,480 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics#flink-job-cluster:33529]
2019-12-12 10:37:18,482 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink-metrics#flink-job-cluster:33529
2019-12-12 10:37:18,490 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /tmp/blobStore-ba64dcdb-5095-41fc-9c98-0f1528d95c40
2019-12-12 10:37:18,514 INFO org.apache.flink.configuration.Configuration - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-12-12 10:37:18,515 WARN org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Upload directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-12-12 10:37:18,516 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Created directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-upload for file uploads.
2019-12-12 10:37:18,603 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Starting rest endpoint.
2019-12-12 10:37:18,872 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set.
2019-12-12 10:37:18,872 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (fallback keys: [{key=jobmanager.web.log.path, isDeprecated=true}])'.
2019-12-12 10:37:19,115 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Rest endpoint listening at flink-job-cluster:8081
2019-12-12 10:37:19,116 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - http://flink-job-cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-12-12 10:37:19,116 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Web frontend listening at http://flink-job-cluster:8081.
2019-12-12 10:37:19,239 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2019-12-12 10:37:19,262 INFO org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever - Scanning class path for job JAR
2019-12-12 10:37:19,270 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Shutting down rest endpoint.
2019-12-12 10:37:19,295 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Removing cache directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-ui
2019-12-12 10:37:19,299 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - http://flink-job-cluster:8081 lost leadership
2019-12-12 10:37:19,299 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Shut down complete.
2019-12-12 10:37:19,302 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting StandaloneJobClusterEntryPoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:131)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:114)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
... 6 more
Caused by: java.util.NoSuchElementException: No JAR with manifest attribute for entry class
at org.apache.flink.container.entrypoint.JarManifestParser.findOnlyEntryClass(JarManifestParser.java:80)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.scanClassPathForJobJar(ClassPathJobGraphRetriever.java:137)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:129)
... 11 more
.
2019-12-12 10:37:19,305 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:6124
2019-12-12 10:37:19,305 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2019-12-12 10:37:19,315 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2019-12-12 10:37:19,320 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2019-12-12 10:37:19,321 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2019-12-12 10:37:19,323 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-12 10:37:19,325 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-12 10:37:19,354 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2019-12-12 10:37:19,356 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2019-12-12 10:37:19,378 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2019-12-12 10:37:19,382 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint StandaloneJobClusterEntryPoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneJobClusterEntryPoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
... 2 more
Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:131)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:114)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
... 6 more
Caused by: java.util.NoSuchElementException: No JAR with manifest attribute for entry class
at org.apache.flink.container.entrypoint.JarManifestParser.findOnlyEntryClass(JarManifestParser.java:80)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.scanClassPathForJobJar(ClassPathJobGraphRetriever.java:137)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:129)
... 11 more
develk#ACIDLAELKV01:~/cntx_eng$
Now, I have added Job class name as in argument section of "job-cluster-job.yaml.template" file.
Like below:
args: ["job-cluster",
"--job-classname", "com.flink.wordCountSimple",
"-Djobmanager.rpc.address=flink-job-cluster",
But after that I am getting below exception:
Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
Please see below detail log.
2019-12-13 19:08:34,323 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Shut down complete.
2019-12-13 19:08:34,329 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting StandaloneJobClusterEntryPoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:119)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
... 6 more
Caused by: java.lang.ClassNotFoundException: com.flink.wordCountSimple
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:116)
... 10 more
.
2019-12-13 19:08:34,337 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:6124
2019-12-13 19:08:34,338 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2019-12-13 19:08:34,364 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2019-12-13 19:08:34,368 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2019-12-13 19:08:34,372 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-13 19:08:34,392 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2019-12-13 19:08:34,392 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-13 19:08:34,406 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2019-12-13 19:08:34,410 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2019-12-13 19:08:34,434 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2019-12-13 19:08:34,443 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint StandaloneJobClusterEntryPoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneJobClusterEntryPoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
... 2 more
Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:119)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
... 6 more
Caused by: java.lang.ClassNotFoundException: com.flink.wordCountSimple
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:116)
... 10 more
There's a complete, working example of creating and running a flink job cluster on kubernetes in https://github.com/alpinegizmo/flink-containers-example. Maybe that will help. See also https://www.youtube.com/watch?v=ceZtUDgh2TE.
version: "2.1"
services:
jobmanager:
build:
context: ./
args:
JAR_FILE: flink-event-tracker-bundled-1.6.0.jar
image: test/flink-event-tracker
expose:
- "6123"
ports:
- "8081:8081"
- "6123:6123"
command: job-cluster --job-classname com.company.test.flink.pipelines.KafkaPipelineConsumer -Djobmanager.rpc.address=jobmanager --runner=FlinkRunner --streaming=true --checkpointingInterval=30000
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
- JOB_MANAGER=jobmanager
volumes:
- data-volume:/docker/volumes
taskmanager:
image: test/flink-event-tracker
expose:
- "6121"
- "6122"
depends_on:
- jobmanager
command: task-manager -Djobmanager.rpc.address=jobmanager
links:
- "jobmanager:jobmanager"
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
- JOB_MANAGER=jobmanager
volumes:
- data-volume:/docker/volumes
volumes:
data-volume:
driver: local
driver_opts:
o: bind
type: none
device: /Users/home/Development/docker/volumes/flink
Docker file
FROM flink:1.9
ARG JAR_FILE=""
ENV APP_OPTS ""
ENV JAVA_OPTS ""
ENV JOB_MANAGER=""
# Build arg allows passing the version at runtime
ARG VERSION=unset-version
COPY flink-conf.yml $FLINK_HOME/conf/flink-conf.yaml
COPY target/$JAR_FILE $FLINK_HOME/lib/event-tracker.jar
COPY docker-cluster-entrypoint.sh /docker-cluster-entrypoint.sh
RUN apt-get update && apt-get install procps -y && apt-get install curl -y
RUN echo "root:root" | chpasswd
RUN chmod 777 /docker-cluster-entrypoint.sh
RUN chmod 777 $FLINK_HOME/lib/event-tracker.jar
ENTRYPOINT [ "bash","/docker-cluster-entrypoint.sh" ]
docker-cluster-entrypoint.sh
FLINK_HOME=${FLINK_HOME:-"/opt/flink/bin"}
JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"
CMD="$1"
shift;
if [ "${CMD}" = "--help" -o "${CMD}" = "-h" ]; then
echo "Usage: $(basename $0) (${JOB_CLUSTER}|${TASK_MANAGER})"
exit 0
elif [ "${CMD}" = "${JOB_CLUSTER}" -o "${CMD}" = "${TASK_MANAGER}" ]; then
echo "Starting the ${CMD}"
if [ "${CMD}" = "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$#"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$#"
fi
fi
How to run:-
mvn clean install
docker-compose -f docker-compose.local.yml up --scale taskmanager=2 > exceptionlog.log
docker-compose -f docker-compose.local.yml build
this is the entire conf that runs your docker. but if you want to run in kube, just convert the docker-compose file to its corresponding kube files...remaining can stay same.. may be do a helm that way kube maintenance is better.
Note:- we are using apache beam to code the job
nodetool status -- shows some nodes as down and also application is not able to reach the cassandra nodes. When I connect to cassandra nodes and check the logs there are some errors in the logs (system.log and debug.log).
Didn't understand how to check if when did Cassandra got started / re-started and got shutdown. Is there any way to check this from logs ? if so which log and how ?
In the logs folder of your Cassandra installation you should find system.log file. The default location for the log files is /var/log/cassandra.
When Cassandra starts the output looks like this:
INFO [main] 2018-08-18 19:10:27,162 YamlConfigurationLoader.java:89 - Configuration location: file:/home/20171127/.ccm/3113/node1/conf/cassandra.yaml
INFO [main] 2018-08-18 19:10:27,696 Config.java:495 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator;INFO [main] 2018-08-18 19:10:27,698 DatabaseDescriptor.java:367 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2018-08-18 19:10:27,699 DatabaseDescriptor.java:425 - Global memtable on-heap threshold is enabled at 123MB
INFO [main] 2018-08-18 19:10:27,700 DatabaseDescriptor.java:429 - Global memtable off-heap threshold is enabled at 123MB
WARN [main] 2018-08-18 19:10:27,974 DatabaseDescriptor.java:550 - Only 30.922GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
INFO [main] 2018-08-18 19:10:28,025 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 2000.
INFO [main] 2018-08-18 19:10:28,026 DatabaseDescriptor.java:729 - Back-pressure is disabled with strategy org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}.
INFO [main] 2018-08-18 19:10:28,265 JMXServerUtils.java:246 - Configured JMX server at: service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7100/jmxrmi
INFO [main] 2018-08-18 19:10:28,282 CassandraDaemon.java:473 - Hostname: 2VVg5
INFO [main] 2018-08-18 19:10:28,284 CassandraDaemon.java:480 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_181
INFO [main] 2018-08-18 19:10:28,285 CassandraDaemon.java:481 - Heap size: 495.000MiB/495.000MiB
INFO [main] 2018-08-18 19:10:28,287 CassandraDaemon.java:486 - Code Cache Non-heap memory: init = 2555904(2496K) used = 4178816(4080K) committed = 4194304(4096K) max = 251658240(245760K)
INFO [main] 2018-08-18 19:10:28,289 CassandraDaemon.java:486 - Metaspace Non-heap memory: init = 0(0K) used = 18530200(18095K) committed = 19005440(18560K) max = -1(-1K)
INFO [main] 2018-08-18 19:10:28,289 CassandraDaemon.java:486 - Compressed Class Space Non-heap memory: init = 0(0K) used = 2092504(2043K) committed = 2228224(2176K) max = 1073741824(1048576K)
INFO [main] 2018-08-18 19:10:28,290 CassandraDaemon.java:486 - Par Eden Space Heap memory: init = 41943040(40960K) used = 41943040(40960K) committed = 41943040(40960K) max = 41943040(40960K)
INFO [main] 2018-08-18 19:10:28,291 CassandraDaemon.java:486 - Par Survivor Space Heap memory: init = 5242880(5120K) used = 5242880(5120K) committed = 5242880(5120K) max = 5242880(5120K)
INFO [main] 2018-08-18 19:10:28,293 CassandraDaemon.java:486 - CMS Old Gen Heap memory: init = 471859200(460800K) used = 669336(653K) committed = 471859200(460800K) max = 471859200(460800K)
When Cassandra shuts down properly, the log entries look like this:
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:23,661 Gossiper.java:1559 - Announcing shutdown
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:23,662 StorageService.java:2289 - Node /127.0.0.1 state jump to shutdown
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:25,664 MessagingService.java:992 - Waiting for messaging service to quiesce
INFO [ACCEPT-/127.0.0.1] 2018-08-18 19:28:25,665 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO [StorageServiceShutdownHook] 2018-08-18 19:28:25,729 HintsService.java:220 - Paused hints dispatch
Depending on how Cassandra stopped is possible to not have any log entries where you could see that Cassandra stopped.
Other interesting files that you could look into are debug.log and gc.log.
I think I might of stumbled across a bug and wanted to get other people's input. I am running a pyspark application using Spark 2.2.0 in standalone mode. I am doing a somewhat heavy transformation in python inside a flatMap and the driver keeps killing the workers.
Here is what am I seeing:
The master after 60s of not seeing any heartbeat message from the workers it prints out this message to the log:
Removing worker [worker name] because we got no heartbeat in 60
seconds
Removing worker [worker name] on [IP]:[port]
Telling app of
lost executor: [executor number]
I then see in the driver log the following message:
Lost executor [executor number] on [executor IP]: worker lost
The worker then terminates and I see this message in its log:
Driver commanded a shutdown
I have looked at the Spark source code and from what I can tell, as long as the executor is alive it should send a heartbeat message back as it is using a ThreadUtils.newDaemonSingleThreadScheduledExecutor to do this.
One other thing that I noticed while I was running top on one of the workers, is that the executor JVM seems to be suspended throughout this process. There are as many python processes as I specified in the SPARK_WORKER_CORES env variable and each is consuming close to 100% of the CPU.
Anyone have any thoughts on this?
I was facing this same issue, increasing interval worked.
Excerpt from Logs start-all.sh logs
INFO Utils: Successfully started service 'sparkMaster' on port 7077.
INFO Master: Starting Spark master at spark://master:7077
INFO Master: Running Spark version 3.0.1
INFO Utils: Successfully started service 'MasterUI' on port 8080.
INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://master:8080
INFO Master: I have been elected leader! New state: ALIVE
INFO Master: Registering worker slave01:41191 with 16 cores, 15.7 GiB RAM
INFO Master: Registering worker slave02:37853 with 16 cores, 15.7 GiB RAM
WARN Master: Removing worker-20210618205117-slave01-41191 because we got no heartbeat in 60 seconds
INFO Master: Removing worker worker-20210618205117-slave01-41191 on slave01:41191
INFO Master: Telling app of lost worker: worker-20210618205117-slave01-41191
WARN Master: Removing worker-20210618204723-slave02-37853 because we got no heartbeat in 60 seconds
INFO Master: Removing worker worker-20210618204723-slave02-37853 on slave02:37853
INFO Master: Telling app of lost worker: worker-20210618204723-slave02-37853
WARN Master: Got heartbeat from unregistered worker worker-20210618205117-slave01-41191. This worker was never registered, so ignoring the heartbeat.
WARN Master: Got heartbeat from unregistered worker worker-20210618204723-slave02-37853. This worker was never registered, so ignoring the heartbeat.
Solution: add following configs to $SPARK_HOME/conf/spark-defaults.conf
spark.network.timeout 50000
spark.executor.heartbeatInterval 5000
spark.worker.timeout 5000
Jobserver 0.7.0 it have 4Gb ram available and 10Gb for the context, the system have 3 more free Gb. The context was running for a while and at the time when receive a request fails without any error. The request is the same like other ones that have processed while it was up, is not a special one. The following log corresponds to the jobserver log and as you can see, the last successfully job was finished at 03:08:23,341 and when receive the next one then the driver command a shutdown.
[2017-05-16 03:08:23,340] INFO output.FileOutputCommitter [] [] - Saved output of task 'attempt_201705160308_0321_m_000199_0' to file:/value_iq/spark-warehouse/spark_cube_users_v/tenant_id=7/_temporary/0/task_201705160308_0321_m_000199
[2017-05-16 03:08:23,340] INFO pred.SparkHadoopMapRedUtil [] [] - attempt_201705160308_0321_m_000199_0: Committed
[2017-05-16 03:08:23,341] INFO he.spark.executor.Executor [] [] - Finished task 199.0 in stage 321.0 (TID 49474). 2738 bytes result sent to driver
[2017-05-16 03:39:02,195] INFO arseGrainedExecutorBackend [] [] - Driver commanded a shutdown
[2017-05-16 03:39:02,239] INFO storage.memory.MemoryStore [] [] - MemoryStore cleared
[2017-05-16 03:39:02,254] INFO spark.storage.BlockManager [] [] - BlockManager stopped
[2017-05-16 03:39:02,363] ERROR arseGrainedExecutorBackend [] [] - RECEIVED SIGNAL TERM
[2017-05-16 03:39:02,404] INFO k.util.ShutdownHookManager [] [] - Shutdown hook called
[2017-05-16 03:39:02,412] INFO k.util.ShutdownHookManager [] [] - Deleting directory /tmp/spark-556033e2-c456-49d6-a43c-ef2cd3494b71/executor-b3ceaf84-e66a-45ed-acfe-1052ab1de2f8/spark-87671e4f-54da-47d7-a077-eb5f75d07e39
The Spark Worker server just log the following:
17/05/15 19:25:54 INFO ExternalShuffleBlockResolver: Registered executor AppExecId{appId=app-20170515192550-0004, execId=0} with ExecutorShuffleInfo{localDirs=[/tmp/spark-556033e2-c456-49d6-a43c-ef2cd3494b71/executor-b3ceaf84-e66a-45ed-acfe-1052ab1de2f8/blockmgr-eca888c0-4e63-421c-9e61-d959ee45f8e9], subDirsPerLocalDir=64, shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
17/05/16 03:39:02 INFO Worker: Asked to kill executor app-20170515192550-0004/0
17/05/16 03:39:02 INFO ExecutorRunner: Runner thread for executor app-20170515192550-0004/0 interrupted
17/05/16 03:39:02 INFO ExecutorRunner: Killing process!
17/05/16 03:39:02 INFO Worker: Executor app-20170515192550-0004/0 finished with state KILLED exitStatus 0
17/05/16 03:39:02 INFO Worker: Cleaning up local directories for application app-20170515192550-0004
17/05/16 03:39:07 INFO ExternalShuffleBlockResolver: Application app-20170515192550-0004 removed, cleanupLocalDirs = true
17/05/16 03:39:07 INFO ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=app-20170515192550-0004, execId=0}'s 1 local dirs
And the Master log:
17/05/16 03:39:02 INFO Master: Received unregister request from application app-20170515192550-0004
17/05/16 03:39:02 INFO Master: Removing app app-20170515192550-0004
17/05/16 03:39:02 INFO Master: 157.97.107.150:33928 got disassociated, removing it.
17/05/16 03:39:02 INFO Master: 157.97.107.150:55444 got disassociated, removing it.
17/05/16 03:39:02 WARN Master: Got status update for unknown executor app-20170515192550-0004/0
Before receiving this request spark wasn't executing any other job, the context was using 5,3G/10G and the driver 1,3G/4G.
What meas "Driver commanded a shutdown"?
There is any log property that can be changed to see more details on the logs?
How can a simple request just break the context?
This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
I'm trying using Spark SQL to read a table from Hive metastore but Spark gives an error about table not found. I'm afraid that Spark SQL creates a whole new empty metastore.
I submit the spark task through this command:
spark-submit --class etl.EIServerSpark --driver-class-path '/opt/cloudera/parcels/CDH/lib/hive/lib/*' --driver-java-options '-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*' --jars $HIVE_CLASSPATH --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/yarn-site.xml --master yarn-client /root/etl.jar
This is the error:
2015-06-30 17:50:51,563 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Copying /etc/hive/conf/hive-site.xml to /tmp/spark-568de027-8b66-40fa-97a4-2ec50614f486/hive-site.xml
2015-06-30 17:50:51,568 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Added file file:/etc/hive/conf/hive-site.xml at http://10.136.149.126:43349/files/hive-site.xml with timestamp 1435683051561
2015-06-30 17:50:51,568 INFO [main] util.Utils (Logging.scala:logInfo(59)) - Copying /etc/hadoop/conf/yarn-site.xml to /tmp/spark-568de027-8b66-40fa-97a4-2ec50614f486/yarn-site.xml
2015-06-30 17:50:51,570 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Added file file:/etc/hadoop/conf/yarn-site.xml at http://10.136.149.126:43349/files/yarn-site.xml with timestamp 1435683051568
2015-06-30 17:50:51,637 INFO [sparkDriver-akka.actor.default-dispatcher-5] util.AkkaUtils (Logging.scala:logInfo(59)) - Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#gateway.edp.hadoop:52818/user/HeartbeatReceiver
2015-06-30 17:50:51,756 INFO [main] netty.NettyBlockTransferService (Logging.scala:logInfo(59)) - Server created on 40198
2015-06-30 17:50:51,757 INFO [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Trying to register BlockManager
2015-06-30 17:50:51,759 INFO [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerMasterActor (Logging.scala:logInfo(59)) - Registering block manager localhost:40198 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 40198)
2015-06-30 17:50:51,761 INFO [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Registered BlockManager
2015-06-30 17:50:52,840 INFO [main] parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: SELECT id, name FROM eiserver.eismpt
2015-06-30 17:50:53,141 INFO [main] parse.ParseDriver (ParseDriver.java:parse(206)) - Parse Completed
2015-06-30 17:50:54,041 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(502)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-06-30 17:50:54,064 INFO [main] metastore.ObjectStore (ObjectStore.java:initialize(247)) - ObjectStore, initialize called
2015-06-30 17:50:54,227 WARN [main] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hive/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/jars/datanucleus-rdbms-3.2.9.jar."
2015-06-30 17:50:54,268 WARN [main] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/jars/datanucleus-api-jdo-3.2.6.jar."
2015-06-30 17:50:54,274 WARN [main] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hive/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/jars/datanucleus-core-3.2.10.jar."
2015-06-30 17:50:54,314 INFO [main] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will be ignored
2015-06-30 17:50:54,315 INFO [main] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2015-06-30 17:50:56,109 INFO [main] metastore.ObjectStore (ObjectStore.java:getPMF(318)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-06-30 17:50:56,170 INFO [main] metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(110)) - MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "#" (64), after : "".
2015-06-30 17:50:57,315 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-06-30 17:50:57,316 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-06-30 17:50:57,688 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-06-30 17:50:57,688 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-06-30 17:50:57,842 INFO [main] DataNucleus.Query (Log4JLogger.java:info(77)) - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
2015-06-30 17:50:57,844 INFO [main] metastore.ObjectStore (ObjectStore.java:setConf(230)) - Initialized ObjectStore
2015-06-30 17:50:58,113 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added admin role in metastore
2015-06-30 17:50:58,115 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(569)) - Added public role in metastore
2015-06-30 17:50:58,198 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(597)) - No user is added in admin role, since config is empty
2015-06-30 17:50:58,376 INFO [main] session.SessionState (SessionState.java:start(383)) - No Tez session required at this point. hive.execution.engine=mr.
2015-06-30 17:50:58,525 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: get_table : db=eiserver tbl=eismpt
2015-06-30 17:50:58,525 INFO [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=get_table : db=eiserver tbl=eismpt
2015-06-30 17:50:58,567 ERROR [main] metadata.Hive (Hive.java:getTable(1003)) - NoSuchObjectException(message:eiserver.eismpt table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1569)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
How can I configure spark sql to access hive metastore deployed on a postgres? I'm using CDH 5.3.2.
Thank you
Configure Spark to use the Hive metastore thriftserver:
Edit $SPARK_HOME/conf/hive-site.xml to remove the direct connection information and to add this property:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value> /*make sure to replace with your hive-metastore service's thrift url*/
<description>URI for client to contact metastore server</description>
</property>
</configuration>
If hive-site.xml is not there in $SPARK_HOME/conf then, to connect to hive metastore you need to copy the hive-site.xml file into spark/conf directory. So run the following command after logging in as root user,
cp /usr/lib/hive/conf/hive-site.xml /usr/lib/spark/conf/
Create Hive Context
At a scala> REPL prompt type the following:
import org.apache.spark.sql.hive.HiveContext
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
Create Hive Table
hiveContext.sql("CREATE TABLE IF NOT EXISTS TestTable (key INT, value STRING)")
Show Hive Tables
scala> hiveContext.hql("SHOW TABLES").collect().foreach(println)
Test out the configuration(Optional)
Stop the Spark SQL thriftserver with cd $SPARK_HOME; sbin/stop-thriftserver.sh
Start the Hive metastore thriftserver with cd;./start-thriftserver.sh
Check the logs at $HIVE_HOME/logs/metastore.out for any errors.
The Spark SQL thriftserver won't start until it can make a successful connection to
this server, so it must be running.
Start the Spark SQL thriftserver
with cd $SPARK_HOME; sbin/start-thriftserver.sh
Check the log file that are indicated in the returned line.
You should see lines like this:
16/12/29 20:22:19 INFO metastore: Trying to connect to metastore with URI thrift://localhost:9083
16/12/29 20:22:19 INFO metastore: Connected to metastore.
Run $SPARK_HOME/bin/beeline -u 'jdbc:hive2://localhost:10000/' and try out the !tables command to make sure that you are able to list the metadata.
The doc says to put spark.sql.hive.metastore.sharedPrefixes = org.postgresql in the configuration file, did you try this ?
Make sure the $HIVE_HOME/conf/hive-site.xml configuration which is pointing to complete path of metastore.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hive/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
Place the hive-site.xml file in $SPARK_HOME/conf to point SparkR to the same metastore as Hive.
Hope this solves your issue.