How to implement spark.ui.filter - apache-spark

I have a spark cluster set up on 2 CentOS machines. I want to secure the web UI of my cluster (master node). I have made a BasicAuthenticationFilter servlet. I am unable to understand:
how should I use spark.ui.filter to secure my web UI.
Where should I place the servlet/jar file.
Kindly help.

I also needed to handle this security problem to prevent unauthorized access to spark standalone UI. At last I fixed it after surfing on the web, the procedure is :
code and compile a java filter using standard basic authentication protocol, I refered to this [blog]: http://lambda.fortytools.com/post/26977061125/servlet-filter-for-http-basic-auth
packaged above filter class as a jar file, put it in $spark_home/jars/
add config lines in $spark_home/conf/spark-default.conf as :
spark.ui.filters xxx.BasicAuthFilter # the full class name
spark.test.BasicAuthFilter.params user=foo,password=cool,realm=some
the username and password need to provide to access the spark UI, “realm” is insignificant whatever you typed
restart all slave and master process and test to find it works

Hi place the jar file in all the nodes in the folder /opt/spark/conf/. In terminal, type the following commands:
Navigate to the directory /usr/local/share/jupyter/kernels/pyspark/kernel.json
Edit the file kernel.json
Add the following argument to the PYSPARK_SUBMIT_ARGS --jars /opt/spark/conf/filterauth.jar –conf spark.ui.filters=authenticate.MyFilter
Here, filterauth.jar is the jar file created and authenticate.MyFilter represents <package name>.<class name>
Hope this answers your query. :)

Related

To get application id in particular file after spark-submit in cluster deploy mode

I want to get application id in text file at local when I deploy application in cluster mode.
For this I had edited log4j.properties file and configs it for client but I is not working .
I had also followed this blog :https://largecats.github.io/blog/2020/09/21/spark-cluster-mode-collect-log/ but do not get satisfactory result.
I had also follow this spark-submit in cluster deploy mode get application id to console but it is showing application id on console.
so, please anyone help me , I am stuck there of a week but do not get proper solution.
You should set the tag to your Spark app during submitting and later ask Yarn based on tag value.
--conf spark.yarn.tags=tag-name

configuring Hortonworks Data Platform Sandbox 2.6.5 from the command line

I am building a a demo/training environment for one of our products which work with Hive & Spark. I am using HDP 2.6.5 and If I configure the hive settings I need (primarily these: ACID Settings) through the Ambari GUI it works fine. But I want to automate this and setting these in hive-site.xml is not working (I have found many copies of this file, so it could simply be I am using the wrong one? )
How can I change from the command line what changes when I make changes in Dashboard->Hive->Configs ?
Where are these changes stored? I am sure I have missed something obvious in the docs, but I can't find it.
Thanks!
#Leigh K You should check out the Ambari REST API to make changes to hive. I did not find a quick link to official documentation, but I was able to find this post that goes into detail using PIG:
https://markobigdata.com/2018/07/22/adding-service-to-hdp-using-rest-api/2/

Keep all jars in a folder running on jvm's as services

On a remote server I'm having a folder which contains serveral jars. Each of it will represent an application, which will open up a port, on which a web-application is served.
It might look like this:
jars
app1.jar
app2.jar
app3.jar
I'm trying to find a solution now to ensure that all of them are constantly running on separated jvm's. Whenever one jar is replaced by another uploaded jar, this should immediately be handled.
At the moment I can achieve parts of this manually by setting up a service for each of it. Something like:
my-servers-service-setup-tool app1 java -jar app1.jar 12345
(last parameter is the port)
In case app1.jar is now overridden by an upload: How could the service react on that? Either restart itself or if this is necessary setup a new service for the same port.
If a new jar enters the folder, I'd setup a new service for this as described above. Might there maybe be a more declarative approach to this? I mean, another automation that would detect the arrival of a new jar and that would setup a new service.

What is the proper way of running a Spark application on YARN using Oozie (with Hue)?

I have written an application in Scala that uses Spark.
The application consists of two modules - the App module which contains classes with different logic, and the Env module which contains environment and system initialization code, as well as utility functions.
The entry point is located in Env, and after initialization, it creates a class in App (according to args, using Class.forName) and the logic is executed.
The modules are exported into 2 different JARs (namely, env.jar and app.jar).
When I run the application locally, it executes well. The next step is to deploy the application to my servers. I use Cloudera's CDH 5.4.
I used Hue to create a new Oozie workflow with a Spark task with the following parameters:
Spark Master: yarn
Mode: cluster
App name: myApp
Jars/py files: lib/env.jar,lib/app.jar
Main class: env.Main (in Env module)
Arguments: app.AggBlock1Task
I then placed the 2 JARs inside the lib folder in the workflow's folder (/user/hue/oozie/workspaces/hue-oozie-1439807802.48).
When I run the workflow, it throws a FileNotFoundException and the application does not execute:
java.io.FileNotFoundException: File file:/cloudera/yarn/nm/usercache/danny/appcache/application_1439823995861_0029/container_1439823995861_0029_01_000001/lib/app.jar,lib/env.jar does not exist
However, when I leave the Spark master and mode parameters empty, it all works properly, but when I check spark.master programmatically it is set to local[*] and not yarn. Also, when observing the logs, I encountered this under Oozie Spark action configuration:
--master
null
--name
myApp
--class
env.Main
--verbose
lib/env.jar,lib/app.jar
app.AggBlock1Task
I assume I'm not doing it right - not setting Spark master and mode parameters and running the application with spark.master set to local[*]. As far as I understand, creating a SparkConf object within the application should set the spark.master property to whatever I specify in Oozie (in this case yarn) but it just doesn't work when I do that..
Is there something I'm doing wrong or missing?
Any help will be much appreciated!
I managed to solve the problem by putting the two JARs in the user directory /user/danny/app/ and specifying the Jar/py files parameter as ${nameNode}/user/danny/app/env.jar. Running it caused a ClassNotFoundException to be thrown, even though the JAR was located at the same folder in HDFS. To work around that, I had to go to the settings and add the following to the options list: --jars ${nameNode}/user/danny/app/app.jar. This way the App module is referenced as well and the application runs successfully.

Cassandra Nagios plugins

I am trying to monitor cassandra using nagios casandra plugins by following this link.
http://assets.nagios.com/downloads/nagiosxi/docs/Monitoring_Apache_Cassandra_Databases_with_Nagios_XI.pdf
I do not see core config manager as I am using Nagios 3.3.1. How do we configure casandra specific checks using Nagios Core 3.3.1. Can anyone who did it point me to a good resource please.
Thank you in advance.
The Nagios Core version 3 manual can be found here: http://library.nagios.com/library/products/nagioscore/manuals/
Instead of performing steps 5 and 6 in the pdf you attached, you will have to add the server, the command, and the service check to the Nagios .cfg files manually.
Create the host object for the server using this page: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#host
Create a command object for the new Cassandra check, using this: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#command
Create the service object, the details are found here: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#service
Remember that the items in 'red' on those pages are required in the object entries.

Resources