Spark and Mesos: No credentials provided - apache-spark

I want to run a spark example on my mesos master node and it gaves me this problem. It just 'stop' here without showing any results or exceptions.
I0908 18:38:01.636055 9044 sched.cpp:226] Version: 1.0.1 I0908
18:38:01.636485 28512 sched.cpp:330] New master detected at
master#124.216.0.14:5050 I0908 18:38:01.636642 28512 sched.cpp:341] No
credentials provided. Attempting to register without authentication

There's not enough info there to debug, unfortunately.
New master detected at master#124.216.0.14:5050
This is the normal debug message during the Spark startup process or after Mesos master re-election. It discovered a Mesos master it didn't know about and acted accordingly.
No credentials provided. Attempting to register without authentication
This line is just a control flow debug message, a red herring that looks like an error but is really harmless in most cases.
It indicates that Mesos authentication is being skipped when registering with as a framework. Optional framework/service authentication was added for compatibility with the new Mesosphere Enterprise DC/OS which allows strict, permissive, or disabled modes of security when using the Mesos API. If you're using open source DC/OS or just plain Mesos, it's probably functioning in disabled security mode, because it doesn't have the authorization or service account infrastructure.

Related

Why shouldn't local mode in Spark be used for production?

The official documentation and all sorts of books and articles repeat the recommendation that Spark in local mode should not be used for production purposes. Why not? Why is it a bad idea to run a Spark application on one machine for production purposes? Is it simply because Spark is designed for distributed computing and if you only have one machine there are much easier ways to proceed?
Local mode in Apache Spark is intended for development and testing purposes, and should not be used in production because:
Scalability: Local mode only uses a single machine, so it cannot
handle large data sets or handle the processing needs of a production
environment.
Resource Management: Spark’s standalone cluster manager or a cluster
manager like YARN, Mesos, or Kubernetes provides more advanced
resource management capabilities for production environments compared
to local mode.
Fault Tolerance: Local mode does not have the ability to recover from
failures, while a cluster manager can provide fault tolerance by
automatically restarting failed tasks on other nodes.
Security: Spark’s cluster manager provides built-in security features
such as authentication and authorization, which are not present in
local mode.
Therefore, it is recommended to use a cluster manager for production environments to ensure scalability, resource management, fault tolerance, and security.
I have the same question. I am certainly not an authority on the subject, but because no-one has answered this question, I'll try to list the reasons I've encountered while using Spark local mode in Java. So far:
Spark uses System.exit() calls in certain occassions, such as an out of memory error or when the local dir does not have write permissions, so if such a call is triggered, the entire JVM shuts down (including your own application from within which Spark runs, see e.g., https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala#L45, https://github.com/apache/spark/blob/b22946ed8b5f41648a32a4e0c4c40226141a06a0/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L63). Moreover, it circumvents your own shutdownHook, so there is no way to gracefully handle such system exits in your own application. On a cluster, it is usually fine if a certain machine restarts, but, if all Spark components are contained in a single JVM, the assumption that we can shut down the entire JVM upon a Spark failure does not (always) hold.

Activating the cluster in Apache Ignite

I am using Apache Ignite 2.8.0. When persistence is enabled then my server becomes Inactive. When I am trying to activate the cluster (only one server) by control.bat --activate, it asks username and password, but when I activate using code ignite.cluster().active(true); it doesn't ask.
I need an explanation why it doesn't ask username and password when I activating the cluster by code?
You can only do that from a node that's already a part of topology (obviously) and thus passed security checks.
Apache Ignite only has thin client authentication currently. If you're looking for server-server authentication, use SSL or check GridGain security plugin.

Determine where spark program is failing?

Is there anyway to debug a Spark application that is running in a cluster mode? I have a program that has been running successfully for a while, which processes a couple hundred GB at a time. Recently I had some data cause the run to fail due to executors being disconnected. From what I have read, this is likely a memory issue. I'm trying to determine what function/action is causing the memory issue to trigger. I am using Spark on an EMR cluster(which uses YARN), what would be the best way to debug this issue?
For cluster mode you can go to the YARN Resource Manager UI and select the Tracking UI for your specific running application (which points to the spark driver running on the Application Master within the YARN Node Manager) to open up the Spark UI which is the core developer interface for debugging spark apps.
For client mode you can also go to the YARN RM UI like previously mentioned as well as hit the Spark UI via this address => http://[driverHostname]:4040 where driverHostName is the Master Node in EMR and 4040 is the default port (this can be changed).
Additionally you can access submitted and completed spark apps via the Spark History Server via this default address => http://master-public-dns-name:18080/
These are the essential resources with the Spark UI being the main toolkit for your request.
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-webui.html

How to use authentication on Spark Thrift server on Spark standalone cluster

I have a standalone spark cluster on Kubernetes and I want to use that to load some temp views in memory and expose them via JDBC using spark thrift server.
I already got it working with no security by submitting a spark job (pyspark in my case) and starting thrift server in this same job so I can access the temp views.
Since I'll need to expose some sensitive data, I want to apply at least an authentication mechanism.
I've been reading a lot and I see basically 2 methods to do so:
PAM - which is not advised for production since some critical files needs to have grant permission to user beside root.
Kerberos - which appears to be the most appropriate one for this situation.
My question is:
- For a standalone spark cluster (running on K8s) is Kerberos the best approach? If not which one?
- If Kerberos is the best one, it's really hard to find some guidance or step by step on how to setup Kerberos to work with spark thrift server specially in my case where I'm not using any specific distribution (MapR, Hortonworks, etc).
Appreciate your help

OpsCenter with HTTPS kills session when clicked on Spark Console

I have a DataStax Enterprise cluster running on 2 AWS nodes. DSE is installed in enterprise mode and one of the nodes is configured in Analytics mode.
Everything was working normal until I followed the steps outlined here to enable HTTPS for OpsCenter: http://docs.datastax.com/en/opscenter/5.0/opsc/configure/opscEnablingAuth.html
OpsCenter authentication is now working fine. However if I click the Spark Console hyperlink of the Analytics node the raw text of the Spark job details will show but the page's CSS and images are gone, looking at Chrome's developer tools it looks like I'm getting an access denied on these resources. Also, as soon as I click the link and the Spark Console popup opens, the OpsCenter tab will kill my session and log me out. I was able to observe the same behavior with Chrome and IE.
Instance: m3.large
AMI: DataStax Auto-Clustering AMI 2.6.3-1404-hvm - ami-8b392cbb
I've reproduced this issue using OpsCenter 5.2 and DSE 4.7. I've created a ticket in our internal tracking system to address this issue; the reference number for that ticket is OPSC-6606.
Thanks for bringing this issue to our attention!

Resources